Last updated: 2020-09-08

Checks: 2 0

Knit directory: smash-gen/

This reproducible R Markdown analysis was created with workflowr (version 1.5.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/.DS_Store

Untracked files:
    Untracked:  analysis/pln_smooth.Rmd
    Untracked:  analysis/smashadditive.Rmd
    Untracked:  analysis/talk1011.Rmd
    Untracked:  talk.Rmd
    Untracked:  talk.html
    Untracked:  talk.pdf

Unstaged changes:
    Modified:   analysis/binomial.Rmd
    Modified:   analysis/chipexo.Rmd
    Deleted:    analysis/chipexocut.Rmd
    Modified:   analysis/chipseqref.Rmd
    Modified:   analysis/fda.Rmd
    Modified:   analysis/protein.Rmd
    Modified:   analysis/r2.Rmd
    Modified:   analysis/sigma.Rmd
    Modified:   analysis/vstlikcompare.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd c239d87 Dongyue Xie 2020-09-08 wflow_publish(“analysis/index.Rmd”)
html 949873c Dongyue Xie 2020-09-08 Build site.
Rmd 3cf84d5 Dongyue Xie 2020-09-08 wflow_publish(“analysis/index.Rmd”)
html c11d203 Dongyue Xie 2019-11-19 Build site.
Rmd e3dc37c Dongyue Xie 2019-11-19 wflow_publish(c(“analysis/initialinvest.Rmd”, “analysis/estimatenugget.Rmd”,
html b40056f Dongyue Xie 2019-02-14 Build site.
Rmd 3a91d12 Dongyue Xie 2019-02-14 wflow_publish(“analysis/index.Rmd”)
html 29125f9 Dongyue Xie 2019-01-06 Build site.
Rmd 9b7cd1f Dongyue Xie 2019-01-06 wflow_publish(c(“analysis/index.Rmd”, “analysis/binomial2.Rmd”))
html be0680c Dongyue Xie 2018-11-11 Build site.
Rmd 4db2338 Dongyue Xie 2018-11-11 wflow_publish(c(“analysis/index.Rmd”, “analysis/vstiter.Rmd”, “analysis/fda.Rmd”))
html e55f4a7 Dongyue Xie 2018-10-18 Build site.
Rmd 83d5406 Dongyue Xie 2018-10-18 wflow_publish(“analysis/index.Rmd”)
html 9916bf6 Dongyue Xie 2018-10-07 Build site.
Rmd 86d1b4b Dongyue Xie 2018-10-07 add
html 3ce9535 Dongyue Xie 2018-10-05 Build site.
Rmd 2f295fb Dongyue Xie 2018-10-05 add files
html ba619d6 Dongyue Xie 2018-10-02 Build site.
Rmd d7c4a01 Dongyue Xie 2018-10-02 revise
html eaee67b Dongyue Xie 2018-10-02 Build site.
Rmd eddcacf Dongyue Xie 2018-10-02 revise
html b3c65ef Dongyue 2018-06-04 chip seq data analysis
Rmd 7095e13 Dongyue 2018-06-04 chip seq data analysis
html 54238d8 Dongyue 2018-06-03 missing data
Rmd 3aa5ca5 Dongyue 2018-06-03 missing data
html 549140f Dongyue 2018-05-30 covariate iterative
Rmd 38e8063 Dongyue 2018-05-30 covariate iterative
html 4b0e6c4 Dongyue 2018-05-26 edit
Rmd d810c3e Dongyue 2018-05-26 edit
html 568cf1a Dongyue 2018-05-24 edit
Rmd 35cc4d6 Dongyue 2018-05-24 edit
Rmd ec87323 Dongyue 2018-05-24 edit
html 511adb2 Dongyue 2018-05-20 wave basis
Rmd 0f77e70 Dongyue 2018-05-20 wave basis
html 4b2e5d1 Dongyue 2018-05-17 add known version
Rmd bdddcf6 Dongyue 2018-05-14 edit
Rmd 7ee9791 Dongyue 2018-05-14 edit
html 5f9b5c6 Dongyue 2018-05-09 edit
Rmd fda2411 Dongyue 2018-05-09 one iteration ash poisson
html 3a42238 Dongyue 2018-05-08 correction
Rmd 910bc07 Dongyue 2018-05-08 correction
html cb91cb1 Dongyue 2018-05-08 add robust
Rmd 2e73919 Dongyue 2018-05-08 add robust
html 44d3413 Dongyue 2018-05-07 correct mu_t+E(u_t)
Rmd b082c1b Dongyue 2018-05-07 correct mu_t+E(u_t)
Rmd ee44935 Dongyue 2018-05-06 unknown sigma version
html 767077f Dongyue 2018-05-06 unknown sigma version
html 341e471 Dongyue 2018-05-06 edit
html c72247b Dongyue 2018-05-06 edit
Rmd 3bd6a61 Dongyue 2018-05-06 first commit
html ca322e8 Dongyue 2018-05-06 first commit
Rmd b7e89a3 DongyueXie 2018-05-01 Start workflowr project.

Project Overview

We generalize smash(Xing and Stephens, 2016), a flexible empirical Bayes method for signal denoising, to deal with non-Gaussian data, and account for additional unknown variances.

This R package contains functions for this project, the main function is smash_gen_poiss.R.

Analysis

A list of analysis related to the project

Method

This is a review and summary I wrote in Sept 2018.

Introduction and method

Intial investigation

Early stage analysis

Estimate nugget effect

Methods on estimating nugget effect. The final method I chose is the MLE estimate of \(\sigma^2\) in \(y\sim N(\mu,\sigma^2+s^2)\).

Poisson data with unknown nugget effect

  1. Poisson nugget simulation(unknown \(\sigma\)): \(\sigma\) is unknown.
  2. Fix spike issues: updated version, 10/07/2018

Other analysis includes different wavelet basis.

Binomial data with unknown nugget effect

A summary of binomial sequence smoothing

Other analysis includes using Poisson apporximation.

Smoothing with covariates

Now suppose at each \(t\), \(Y_t=X_t\beta+\mu_t+\epsilon_t\), where \(\mu\) has smooth structure and \(\epsilon_t\sim N(0,\sigma^2_t)\). The structure of \(\mu\) cannot be explained by the ordinary least square so it is contained in the residual \(e\). Thus \(e\) consists of \(\mu\) and noises. Using smash.gaus recovers \(\mu\) and estimates \(\sigma^2\).

  1. Smoothing with covariates: Gaussian
  2. Smoothing with covariates: Gaussian, iterative version
  3. Smoothing with covariates: glm
  4. Smoothing with covariates: VST version

Unevenly spaced data

We treat unevenly spaced data as missing and set them to 0 with corresponding standard error \(10^6\). The idea is that if standard error is very big then value of y becomes irrelevant. It doesn’t work.

  1. Missing data?

Variance stabilizing transformation(vst)

In addiiton to likelihood expansion, VST is another way to make data normal dsitributed.

  1. Vst smoothing
  2. Compare log and anscombe transform
  3. Vst for nugget effect
  4. More in anscombe transformation
  5. Compare vst and lik expansion

Chip-Exo and Chip-seq data smoothing

Some real data applications of smashgen.

The primary role of CTCF is thought to be in regulating the 3D structure of chromatin.CTCF binds together strands of DNA, thus forming chromatin loops, and anchors DNA to cellular structures like the nuclear lamina. It also defines the boundaries between active and heterochromatic DNA.

  1. CTCF Chip-exo data
  2. CTCF Chip-seq data

DNA methylation data smoothing

  1. BS Cancer data: DNA methylation data from normal and cancer.

Wavelet-based Functional data analysis.

  1. Apply smash-gen to functional data analysis

Literatures on smoothing

I’m focusing on reading 1. additive models(gam, gamm, spam, gspam); 2. functional data analysis(wavelet based functional mixed models, etc); 3. More on exponential family Signal denoising(vst, tf)

  1. A collection of literatures
  2. additive models
  3. functional data analysis
  4. Signal denoising exponential family

Miscellaneous

Not relevant to this project. Just for convenience.

  1. Shrink R squared using fash
  2. More examples on shrinking \(R^2\) and compare with CorShrink