Last updated: 2017-07-18

Code version: 49915de

Dirichlet adaptive shrinkage, dash, performs Bayesian adaptive shrinkage to produce refined probabilities for compositional data. Assume the counts of A,C,G,T in one position follows multinomial distribution: \((n_A,n_C,n_G,n_T)\sim multinom(n,p_A,p_C,p_G,p_T)\), where \(n=n_A+n_C+n_G+n_T, p_A+p_C+p_G+p_T=1\) and the probability vector follows a mixture of Dirichlet distributions: \(p=(p_A,p_C,p_G,p_T)\sim \Sigma_{k=1}^K\pi_k*Dirichlet(\alpha_k,...,\alpha_k)\). The component Dirichlet distributions in the mixture have varying degrees of concentration, from infinity to less than 1. Inf corresponds to a point mass on the mode. Here, the \(\alpha_k,k=1,...,K\) are chosen to be \(Inf,100, 50, 20, 10, 5, 2, 1, 0.5, 0.1.\). To make the model have shrinkage effect, the prior of \(\pi_k,k=1,...,K\) is proportional to \((10,1,1,1,1,1,1,1,1,1)\). Then posterior mean of \(p\) is the estimator. For example, \(\hat p_A=\Sigma_{k=1}^K\hat\pi_k*\hat p_{Ak}\), where \(\hat p_{Ak}\) is the posterior mean of \(p_A\) of each Dirichlet component. Hence, \[\hat p_A=\Sigma_{k=1}^K\hat\pi_k*[(\frac{n_A}{n}*\frac{n}{n+4\alpha_k})+\frac{\alpha_k}{4\alpha_k}*\frac{4\alpha_k}{n+4\alpha_k}]\] \[=\frac{n_A}{n}\Sigma_{k=1}^K\hat\pi_k\frac{n}{n+4\alpha_k}+\frac{1}{4}\Sigma_{k=1}^K\hat\pi_k\frac{4\alpha_k}{n+4\alpha_k} .\] Obviously, the \(\hat p_A\) is the weighted average of prior mean and the observed value. When \(n\to \infty\), $p_A $; \(n\to 0\), \(\hat p_A\to \frac{1}{4}\). When \(\alpha_k\to\infty, \hat\pi_k\to1\), \(\hat p_A\to \frac{1}{4}\); \(\alpha_k\to0, \hat\pi_k\to1\), \(\hat p_A\to \frac{n_A}{n}\).

Session information

sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 15063)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.4.0  backports_1.0.5 magrittr_1.5    rprojroot_1.2  
 [5] tools_3.4.0     htmltools_0.3.5 yaml_2.1.14     Rcpp_0.12.11   
 [9] stringi_1.1.5   rmarkdown_1.6   knitr_1.15.1    git2r_0.18.0   
[13] stringr_1.2.0   digest_0.6.12   evaluate_0.10  

This R Markdown site was created with workflowr