Last updated: 2020-09-08

Checks: 2 0

Knit directory: smash-gen/

This reproducible R Markdown analysis was created with workflowr (version 1.5.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Repository version: c239d87

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/.DS_Store

Untracked files:
    Untracked:  analysis/pln_smooth.Rmd
    Untracked:  analysis/smashadditive.Rmd
    Untracked:  analysis/talk1011.Rmd
    Untracked:  talk.Rmd
    Untracked:  talk.html
    Untracked:  talk.pdf

Unstaged changes:
    Modified:   analysis/binomial.Rmd
    Modified:   analysis/chipexo.Rmd
    Deleted:    analysis/chipexocut.Rmd
    Modified:   analysis/chipseqref.Rmd
    Modified:   analysis/fda.Rmd
    Modified:   analysis/protein.Rmd
    Modified:   analysis/r2.Rmd
    Modified:   analysis/sigma.Rmd
    Modified:   analysis/vstlikcompare.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
Rmd	c239d87	Dongyue Xie	2020-09-08	wflow_publish(“analysis/index.Rmd”)
html	949873c	Dongyue Xie	2020-09-08	Build site.
Rmd	3cf84d5	Dongyue Xie	2020-09-08	wflow_publish(“analysis/index.Rmd”)
html	c11d203	Dongyue Xie	2019-11-19	Build site.
Rmd	e3dc37c	Dongyue Xie	2019-11-19	wflow_publish(c(“analysis/initialinvest.Rmd”, “analysis/estimatenugget.Rmd”,
html	b40056f	Dongyue Xie	2019-02-14	Build site.
Rmd	3a91d12	Dongyue Xie	2019-02-14	wflow_publish(“analysis/index.Rmd”)
html	29125f9	Dongyue Xie	2019-01-06	Build site.
Rmd	9b7cd1f	Dongyue Xie	2019-01-06	wflow_publish(c(“analysis/index.Rmd”, “analysis/binomial2.Rmd”))
html	be0680c	Dongyue Xie	2018-11-11	Build site.
Rmd	4db2338	Dongyue Xie	2018-11-11	wflow_publish(c(“analysis/index.Rmd”, “analysis/vstiter.Rmd”, “analysis/fda.Rmd”))
html	e55f4a7	Dongyue Xie	2018-10-18	Build site.
Rmd	83d5406	Dongyue Xie	2018-10-18	wflow_publish(“analysis/index.Rmd”)
html	9916bf6	Dongyue Xie	2018-10-07	Build site.
Rmd	86d1b4b	Dongyue Xie	2018-10-07	add
html	3ce9535	Dongyue Xie	2018-10-05	Build site.
Rmd	2f295fb	Dongyue Xie	2018-10-05	add files
html	ba619d6	Dongyue Xie	2018-10-02	Build site.
Rmd	d7c4a01	Dongyue Xie	2018-10-02	revise
html	eaee67b	Dongyue Xie	2018-10-02	Build site.
Rmd	eddcacf	Dongyue Xie	2018-10-02	revise
html	b3c65ef	Dongyue	2018-06-04	chip seq data analysis
Rmd	7095e13	Dongyue	2018-06-04	chip seq data analysis
html	54238d8	Dongyue	2018-06-03	missing data
Rmd	3aa5ca5	Dongyue	2018-06-03	missing data
html	549140f	Dongyue	2018-05-30	covariate iterative
Rmd	38e8063	Dongyue	2018-05-30	covariate iterative
html	4b0e6c4	Dongyue	2018-05-26	edit
Rmd	d810c3e	Dongyue	2018-05-26	edit
html	568cf1a	Dongyue	2018-05-24	edit
Rmd	35cc4d6	Dongyue	2018-05-24	edit
Rmd	ec87323	Dongyue	2018-05-24	edit
html	511adb2	Dongyue	2018-05-20	wave basis
Rmd	0f77e70	Dongyue	2018-05-20	wave basis
html	4b2e5d1	Dongyue	2018-05-17	add known version
Rmd	bdddcf6	Dongyue	2018-05-14	edit
Rmd	7ee9791	Dongyue	2018-05-14	edit
html	5f9b5c6	Dongyue	2018-05-09	edit
Rmd	fda2411	Dongyue	2018-05-09	one iteration ash poisson
html	3a42238	Dongyue	2018-05-08	correction
Rmd	910bc07	Dongyue	2018-05-08	correction
html	cb91cb1	Dongyue	2018-05-08	add robust
Rmd	2e73919	Dongyue	2018-05-08	add robust
html	44d3413	Dongyue	2018-05-07	correct mu_t+E(u_t)
Rmd	b082c1b	Dongyue	2018-05-07	correct mu_t+E(u_t)
Rmd	ee44935	Dongyue	2018-05-06	unknown sigma version
html	767077f	Dongyue	2018-05-06	unknown sigma version
html	341e471	Dongyue	2018-05-06	edit
html	c72247b	Dongyue	2018-05-06	edit
Rmd	3bd6a61	Dongyue	2018-05-06	first commit
html	ca322e8	Dongyue	2018-05-06	first commit
Rmd	b7e89a3	DongyueXie	2018-05-01	Start workflowr project.

Project Overview

We generalize smash(Xing and Stephens, 2016), a flexible empirical Bayes method for signal denoising, to deal with non-Gaussian data, and account for additional unknown variances.

This R package contains functions for this project, the main function is smash_gen_poiss.R.

Analysis

A list of analysis related to the project

Method

This is a review and summary I wrote in Sept 2018.

Introduction and method

Intial investigation

Early stage analysis

Estimate nugget effect

Methods on estimating nugget effect. The final method I chose is the MLE estimate of \(\sigma^2\) in \(y\sim N(\mu,\sigma^2+s^2)\).

Poisson data with unknown nugget effect

Poisson nugget simulation(unknown \(\sigma\)): \(\sigma\) is unknown.
Fix spike issues: updated version, 10/07/2018

Other analysis includes different wavelet basis.

Binomial data with unknown nugget effect

A summary of binomial sequence smoothing

Other analysis includes using Poisson apporximation.

Smoothing with covariates

Now suppose at each \(t\), \(Y_t=X_t\beta+\mu_t+\epsilon_t\), where \(\mu\) has smooth structure and \(\epsilon_t\sim N(0,\sigma^2_t)\). The structure of \(\mu\) cannot be explained by the ordinary least square so it is contained in the residual \(e\). Thus \(e\) consists of \(\mu\) and noises. Using smash.gaus recovers \(\mu\) and estimates \(\sigma^2\).

Unevenly spaced data

We treat unevenly spaced data as missing and set them to 0 with corresponding standard error \(10^6\). The idea is that if standard error is very big then value of y becomes irrelevant. It doesn’t work.

Missing data?

Variance stabilizing transformation(vst)

In addiiton to likelihood expansion, VST is another way to make data normal dsitributed.

Chip-Exo and Chip-seq data smoothing

Some real data applications of smashgen.

The primary role of CTCF is thought to be in regulating the 3D structure of chromatin.CTCF binds together strands of DNA, thus forming chromatin loops, and anchors DNA to cellular structures like the nuclear lamina. It also defines the boundaries between active and heterochromatic DNA.

DNA methylation data smoothing

BS Cancer data: DNA methylation data from normal and cancer.

Wavelet-based Functional data analysis.

Apply smash-gen to functional data analysis

Literatures on smoothing

I’m focusing on reading 1. additive models(gam, gamm, spam, gspam); 2. functional data analysis(wavelet based functional mixed models, etc); 3. More on exponential family Signal denoising(vst, tf)

Miscellaneous

Not relevant to this project. Just for convenience.

Smash-gen