Last updated: 2023-09-23
Checks: 7 0
Knit directory: gsmash/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220606)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 9333155. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Untracked files:
Untracked: analysis/GO_ORA_montoro.Rmd
Untracked: analysis/GO_ORA_pbmc_purified.Rmd
Untracked: analysis/fit_ebpmf_sla_2000.Rmd
Untracked: chipexo_rep1_reverse.rds
Untracked: data/Citation.RData
Untracked: data/SLA/
Untracked: data/abstract.txt
Untracked: data/abstract.vocab.txt
Untracked: data/ap.txt
Untracked: data/ap.vocab.txt
Untracked: data/sla_2000.rds
Untracked: data/sla_full.rds
Untracked: data/text.R
Untracked: data/tpm3.rds
Untracked: output/driving_gene_pbmc.rds
Untracked: output/pbmc_gsea.rds
Untracked: output/plots/
Untracked: output/tpm3_fit_fasttopics.rds
Untracked: output/tpm3_fit_stm.rds
Untracked: output/tpm3_fit_stm_slow.rds
Untracked: sla.rds
Unstaged changes:
Modified: analysis/PMF_splitting.Rmd
Modified: analysis/fit_ebpmf_sla.Rmd
Modified: analysis/index.Rmd
Modified: code/poisson_STM/structure_plot.R
Modified: code/poisson_mean/pois_log_normal_mle.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/sla_flash_init.Rmd
) and
HTML (docs/sla_flash_init.html
) files. If you’ve configured
a remote Git repository (see ?wflow_git_remote
), click on
the hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 9333155 | DongyueXie | 2023-09-23 | wflow_publish("analysis/sla_flash_init.Rmd") |
The default initialization of EBPMF of mu (the ``latent’’ data) and \(\sigma^2\) is using VGA with mean 0 (after init of intercepts). However this initialization possibly leads to under-fitting, as demonstrated here.
In this analysis, we use another init strategy - we run flash (and let flash estiamte \(\sigma^2\)) on the mu (from VGA).
library(ebpmf)
library(fastTopics)
library(ggplot2)
sla_full <- readRDS("data/sla_full.rds")
dim(sla_full$data)
[1] 3207 10104
sum(sla_full$data==0)/prod(dim(sla_full$data))
[1] 0.9948157
doc_to_use = order(rowSums(sla_full$data),decreasing = T)[1:round(nrow(sla_full$data)*0.6)]
mat = sla_full$data[doc_to_use,]
samples = sla_full$samples
samples = lapply(samples, function(z){z[doc_to_use]})
word_to_use = which(colSums(mat>0)>=5)
mat = mat[,word_to_use]
set.seed(1)
# fit_ebpmf_K1 = ebpmf_log(mat,
# flash_control=list(backfit_extrapolate=T,backfit_warmstart=T,
# ebnm.fn = c(ebnm::ebnm_point_exponential, ebnm::ebnm_point_exponential),
# loadings_sign = 1,factors_sign=1,Kmax=1),
# init_control = list(n_cores=5,flash_est_sigma2=F,log_init_for_non0y=T),
# general_control = list(maxiter=100,save_init_val=T,save_latent_M=T),
# sigma2_control = list(return_sigma2_trace=T))
#plot(fit_ebpmf_K1$fit_flash$F_pm[,2])
#resid = flashier:::residuals.flash(fit_ebpmf_K1$fit_flash)
# saveRDS(fit_ebpmf_K1,file='/project2/mstephens/dongyue/poisson_mf/sla/sla_Kmax1_vgainit.rds')
fit_ebpmf_K1 = readRDS('/project2/mstephens/dongyue/poisson_mf/sla/sla_Kmax1_vgainit.rds')
fit_ebpmf_K1$elbo
[1] -520720.6
# fit_ebpmf_Kmax100 = ebpmf_log(mat,
# flash_control=list(backfit_extrapolate=T,backfit_warmstart=T,
# ebnm.fn = c(ebnm::ebnm_point_exponential, ebnm::ebnm_point_exponential),
# loadings_sign = 1,factors_sign=1,Kmax=100),
# init_control = list(n_cores=5,flash_est_sigma2=T,log_init_for_non0y=T),
# general_control = list(maxiter=100,save_init_val=T,save_latent_M=T),
# sigma2_control = list(return_sigma2_trace=T))
# saveRDS(fit_ebpmf_Kmax100,file='/project2/mstephens/dongyue/poisson_mf/sla/sla_Kmax100_flashinit.rds')
fit_ebpmf_Kmax100 = readRDS('/project2/mstephens/dongyue/poisson_mf/sla/sla_Kmax100_flashinit.rds')
fit_ebpmf_Kmax100$elbo
[1] -529834.3
# remove the first two columns (row and col intercepts)
L= fit_ebpmf_Kmax100$fit_flash$L_pm[,-c(1,2)]
F_pm = fit_ebpmf_Kmax100$fit_flash$F_pm[,-c(1,2)]
rownames(L)<-1:nrow(L)
Lnorm = t(t(L)/apply(L,2,max))
Fnorm = t(t(F_pm)*apply(L,2,max))
khat = apply(Lnorm,1,which.max)
Lmax = apply(Lnorm,1,max)
plot(Lmax)
khat[Lmax<0.1] = 0
keyw.nn =list()
for(k in 1:ncol(Fnorm)){
key = Fnorm[,k]>log(2)
keyw.nn[[k]] = (colnames(mat)[key])[order(Fnorm[key,k],decreasing = T)]
}
print(keyw.nn)
[[1]]
[1] "treatment" "depress" "adher" "placebo" "complianc"
[6] "noncompli" "causal" "estimand" "elder" "drug"
[11] "meet" "assign" "strata" "receiv" "guidelin"
[16] "arm" "protocol" "stratif" "children" "doserespons"
[21] "intervent" "efron" "plausibl" "outcom" "trial"
[26] "particip" "patient" "encourag" "dose" "sever"
[31] "subject" "princip" "imperfect" "fisher" "rubin"
[36] "treat" "random" "debat" "physician" "acknowledg"
[41] "primari" "care" "activ" "clinic" "prescrib"
[46] "effect" "educ" "assumpt" "logic" "instrument"
[51] "contrast" "doubleblind" "latent" "blind" "collabor"
[56] "import" "benefit" "opposit" "analys" "improv"
[61] "emphas" "childhood" "damag"
[[2]]
[1] "virus" "immunodefici" "hiv" "viral" "human"
[6] "resist" "pressur" "therapi" "mutat" "drug"
[11] "evolutionari" "transmiss" "pathway" "syndrom" "kinet"
[16] "infect" "respiratori" "immun" "riemannian"
[[3]]
[1] "fdr" "fals" "discoveri" "control" "stepdown"
[6] "stepup" "reject" "kfwer" "hochberg" "pvalu"
[11] "fwer" "fdp" "benjamini" "familywis" "hypothes"
[16] "singlestep" "sime" "soc" "roy" "bonferroni"
[21] "holm" "conserv" "ser" "divid" "intersect"
[26] "multipl" "procedur" "test" "null" "toler"
[31] "configur" "rate" "abil" "alpha" "power"
[36] "stringent" "proport" "stat" "gamma" "implicit"
[41] "simultan" "attent" "fix" "total" "number"
[46] "individu" "error" "ann" "depend" "appl"
[51] "detect" "defin" "nondecreas" "deriv" "der"
[56] "proc"
[[4]]
[1] "pacif" "forecast" "northwest" "probabilist" "energi"
[6] "calibr" "north" "ensembl" "weather" "geostatist"
[11] "american" "matern" "speed" "sharp" "wind"
[16] "predict" "resourc" "regim" "proper" "meteorolog"
[21] "hour" "score" "rule" "event" "crossvalid"
[26] "centr" "atmospher" "futur" "merg" "parsimoni"
[31] "climatolog" "safeti" "agricultur"
[[5]]
[1] "hazard" "surviv" "failur" "censor" "event"
[6] "recurr" "cure" "cox" "frailti" "rightcensor"
[11] "lengthbias" "cancer" "cohort" "transplant" "incid"
[16] "cumul" "breast" "proport" "baselin" "prostat"
[21] "bivari" "termin" "time" "life" "risk"
[26] "lifetim" "followup" "timedepend" "semiparametr" "compet"
[31] "death" "preval" "acceler" "casecohort" "patient"
[36] "timevari" "logrank" "diseas" "survivor" "progress"
[41] "occurr" "odd" "associ" "covari" "kaplanmei"
[46] "onset" "age" "copula" "medic" "subject"
[51] "equat" "clinic" "timetoev" "registri" "joint"
[56] "longterm" "trial" "dementia" "epidemiolog" "regress"
[61] "nonparametr" "studi" "depend" "estim" "data"
[66] "analysi"
[[6]]
[1] "markov" "chain" "mont" "carlo"
[5] "mcmc" "hidden" "revers" "sampler"
[9] "posterior" "jump" "updat" "algorithm"
[13] "bayesian" "parallel" "gibb" "transit"
[17] "hierarch" "state" "prior" "ergod"
[21] "mixtur" "walk" "augment" "space"
[25] "metropoli" "reversiblejump" "infer" "prohibit"
[29] "transdimension" "liu"
[[7]]
[1] "climat" "greenhous" "temperatur" "climatolog" "mitig"
[6] "atmospher" "northern" "earth" "proxi" "chang"
[11] "ozon" "ecolog" "futur" "trend" "opposit"
[16] "weather" "reconstruct" "global" "environment" "pollut"
[21] "longterm" "expert" "air" "tempor" "uncertainti"
[26] "quantifi" "centuri"
[[8]]
[1] "elect" "vote" "poll" "presidenti" "polit"
[6] "station" "quick" "invalid" "candid" "forecast"
[11] "evid" "york" "counti" "incom" "scientist"
[16] "percentag" "nonrespond" "nonignor" "nonrespons"
[[9]]
[1] "polici" "statistician" "maker" "promot" "disciplin"
[6] "decis" "today" "scienc" "technolog" "student"
[11] "organ" "live" "communic" "american" "polit"
[16] "children" "action" "govern" "effort" "intern"
[21] "bring" "role" "countri" "engin" "program"
[26] "social" "human" "foundat" "face" "nation"
[31] "confidenti" "way" "mathemat" "stronger" "industri"
[36] "advanc" "chang" "scientif" "encourag" "inform"
[41] "understand" "evolv" "broader" "place" "ingredi"
[46] "scientist" "play" "elementari" "closer" "protect"
[51] "access" "individu" "devic" "pressur" "option"
[56] "secondari" "spread" "public" "imposs" "knowledg"
[61] "futur" "forc" "excel" "modern" "statist"
[[10]]
[1] "loci" "allel" "locus" "map"
[5] "phenotyp" "pedigre" "retrospect" "trait"
[9] "linkag" "quantit" "geneenviron" "genet"
[13] "marker" "populationbas" "migrat" "household"
[17] "casecontrol" "domin" "chromosom" "popul"
[21] "genom" "gene" "polymorph"
[[11]]
[1] "motif" "bind" "transcript" "nucleotid" "width"
[6] "protein" "sequenc" "regul" "align" "regulatori"
[11] "delet" "pattern" "conserv" "short" "quick"
[16] "priori" "similar" "substant" "allevi" "site"
[21] "live" "switch" "dictionari" "core" "adjac"
[26] "dna" "yeast" "call" "discoveri" "twostag"
[31] "wish" "gene"
[[12]]
[1] "gene" "microarray" "express" "biolog" "array"
[6] "differenti" "cancer" "chromosom" "cdna" "hybrid"
[11] "evolutionari" "probe" "organ" "discoveri" "thousand"
[16] "diseas" "pathway" "breast" "shrinkag" "infect"
[21] "genom" "profil" "cell" "fals" "molecular"
[26] "technolog" "tissu" "dna" "experi" "multipl"
[31] "regul" "detect" "genet" "identifi" "challeng"
[36] "simultan" "throughput" "yeast" "analysi" "data"
[[13]]
[1] "toxic" "dosefind" "escal" "dose"
[5] "ethic" "reassess" "phase" "prespecifi"
[9] "coher" "trial" "clinic" "target"
[13] "patient" "elicit" "assign" "competit"
[17] "closest" "aforement" "human" "design"
[21] "enhanc" "ask" "parallel" "qualit"
[25] "lose" "probabl" "durat" "percentag"
[29] "satisfactorili" "drug" "guidelin" "grade"
[33] "soft" "continu" "physician" "variant"
[37] "virtual" "continuum"
[[14]]
[1] "lasso" "oracl" "penalti" "sparsiti"
[5] "spars" "penal" "nonzero" "dantzig"
[9] "selector" "select" "norm" "nonasymptot"
[13] "fan" "scad" "coeffici" "regular"
[17] "tune" "recoveri" "entri" "highdimension"
[21] "absolut" "matrix" "shrinkag" "convex"
[25] "adapt" "path" "larger" "threshold"
[29] "variabl" "graph" "clip" "element"
[33] "nois" "vector" "logarithm" "squar"
[37] "solv" "lnorm" "grow" "nonconvex"
[41] "true" "frobenius" "deviat" "logp"
[45] "pattern" "regress" "dimens" "bound"
[49] "properti" "size" "achiev" "number"
[53] "linear" "corrupt" "perform" "lregular"
[[15]]
[1] "morbid" "outbreak" "air" "pollut"
[5] "cardiovascular" "timedepend" "agenc" "public"
[9] "instrument" "person" "period" "matter"
[13] "environment" "deliveri" "mortal" "epidemiolog"
[17] "hope" "servic" "health"
[[16]]
[1] "besov" "wavelet" "ball" "phi" "decay"
[6] "threshold" "white" "nearoptim" "waveletbas" "rang"
[11] "wide" "nois" "dens" "convolut" "signal"
[16] "adapt" "noisi" "shape" "view" "minimax"
[21] "deconvolut"
[[17]]
[1] "null" "test" "hypothesi" "hypothes" "versus" "altern"
[7] "power" "ratio" "reject" "signific" "chisquar" "discoveri"
[13] "fals" "statist" "equal" "pvalu" "distribut" "expect"
[19] "asymptot" "nonnul" "true" "independ" "control" "procedur"
[[18]]
[1] "memori" "differenc" "longmemori" "taper"
[5] "periodogram" "whittl" "long" "slowli"
[9] "fraction" "nonstationari" "distinct" "move"
[13] "angl" "lambda" "frequenc" "seri"
[17] "element" "paramet"
[[19]]
[1] "onlin" "materi" "supplementari" "supplement"
[5] "proof" "detail" "technic" "articl"
[[20]]
[1] "highfrequ" "volatil" "asset" "lowfrequ" "financi" "price"
[7] "vast" "market" "exchang" "avail" "longrun" "nois"
[13] "wavelet" "noisi" "realiz" "pool" "period" "daili"
[19] "stock"
[[21]]
[1] "van" "der" "meng" "liu" "minor"
[6] "survivor" "biometrika" "bernoulli" "ergod" "augment"
[11] "ann"
[[22]]
[1] "garch" "volatil" "rescal" "innov"
[5] "fourth" "stationari" "heteroscedast" "moment"
[9] "autoregress" "reparameter" "financi" "move"
[13] "iid" "arma" "seri" "ergod"
[17] "residu" "root" "capabl" "process"
[21] "return" "satisfi" "sequenc" "local"
[25] "paper" "asymmetr"
[[23]]
[1] "propens" "score" "treatment" "confound" "rubin" "school"
[7] "prognost" "assign" "adjust" "unmeasur" "nonrandom" "doubli"
[13] "summar" "balanc" "outcom" "stratif" "return" "causal"
[19] "pretreat" "american" "miss" "potenti" "match" "averag"
[25] "correct" "collaps" "covari" "weight" "bias" "research"
[31] "reduc" "ubiquit" "observ"
[[24]]
[1] "earthquak" "explos" "discrimin" "extract" "occurr"
[6] "rescal" "spectra" "amplitud" "event" "wavelet"
[11] "seri" "thin" "featur" "diverg" "insid"
[16] "bear" "process" "background" "anisotrop" "depart"
[21] "california" "time" "bartlett" "parent"
[[25]]
[1] "bureau" "census" "race" "feder" "labor"
[6] "unemploy" "smallarea" "confidenti" "benchmark" "bridg"
[11] "imput" "add" "protect" "respond" "multiscal"
[16] "suppress" "employ" "implic" "aggreg" "collect"
[21] "survey" "nation" "statespac" "extrapol" "gender"
[26] "proxi" "nonrespons"
[[26]]
[1] "magnet" "reson" "fmri" "imag" "tissu" "voxel" "brain" "field"
[9] "activ" "volum" "signal" "visual" "motion"
[[27]]
[1] "nonrespons" "imput" "nonignor" "nonrespond" "survey"
[6] "panel" "preliminari" "benchmark" "miss" "incom"
[11] "agenc" "respond" "respons" "item" "plan"
[16] "agricultur" "calibr" "domain" "mechan" "counti"
[21] "valu" "compens" "handl" "household" "interview"
[26] "varianc" "race" "nation"
[[28]]
[1] "vaccin" "infecti" "infect" "suscept" "estimand" "transmiss"
[7] "household" "outbreak" "attack" "causal" "posttreat" "secondari"
[13] "protect" "individu" "communiti" "diseas" "syndrom" "efficaci"
[19] "interfer" "prevent" "defici" "outcom" "immun" "unit"
[25] "assign" "contact" "coverag" "reduc" "narrow" "placebo"
[31] "relat" "trial"
[[29]]
[1] "mother" "infant" "closur" "respiratori"
[5] "birth" "insensit" "advers" "assay"
[9] "convolut" "abrupt" "air" "code"
[13] "unmeasur" "citi" "hospit" "sure"
[17] "allevi" "tempor" "invalid" "happen"
[21] "notabl" "complementari" "cell" "action"
[25] "mixedeffect" "qualiti" "bioassay"
[[30]]
[1] "prior" "dirichlet" "posterior" "bayesian" "bay"
[6] "gibb" "mixtur" "frequentist" "hierarch" "intrins"
[11] "mode" "densiti" "conjug" "factor" "induc"
[16] "specif" "distribut" "process" "paradox" "probabl"
[21] "jeffrey" "polya" "model"
[[31]]
[1] "autocovari" "stream" "autocorrel" "seri" "meansquar"
[6] "white" "day" "autoregress" "longmemori" "time"
[[32]]
[1] "spacetim" "site" "asymmetr" "meteorolog" "ozon"
[6] "thin" "spatial" "smoother" "california" "tempor"
[11] "wind" "season" "daili" "monitor" "background"
[16] "intens" "year" "hour" "symmetr" "trend"
[21] "separ" "elabor" "emphas" "environment" "fit"
[26] "threat"
[[33]]
[1] "manifold" "riemannian" "geodes" "metric"
[5] "gender" "planar" "intrins" "tensor"
[9] "euclidean" "sphere" "embed" "orient"
[13] "space" "map" "perturb" "threedimension"
[17] "vision" "tangent" "shape" "intuit"
[21] "matric" "landmark" "diagnost" "cubic"
[[34]]
[1] "treatment" "trial" "clinic" "interim"
[5] "responseadapt" "patient" "endpoint" "alloc"
[9] "arm" "stage" "logrank" "efficaci"
[13] "design" "sequenti" "decis" "twostag"
[17] "rule" "prevent" "outcom" "assign"
[21] "criteria" "earli" "random" "event"
[25] "followup" "coin" "cancer" "medic"
[29] "control" "termin" "experienc" "firststag"
[33] "placebo" "test" "effect"
[[35]]
[1] "releas" "agenc" "registri" "selfconsist" "alter"
[6] "vehicl" "public" "databas" "census" "diagnosi"
[11] "report" "output" "protect" "threat" "delay"
[16] "data" "geograph"
[[36]]
[1] "spatial" "spatiotempor" "lattic" "krige"
[5] "scan" "surfac" "brain" "locat"
[9] "nonstationari" "map" "intens" "geostatist"
[13] "neighbor" "process" "stationari" "field"
[17] "block" "precipit" "voxel" "region"
[21] "inhomogen" "pattern" "hierarch" "correl"
[25] "geograph" "burden" "dataset" "structur"
[29] "point" "neuroimag" "weather" "comput"
[33] "satellit" "data"
[[37]]
[1] "minimax" "sobolev" "densiti" "sharp" "rate" "attain"
[7] "element" "inequ" "adapt" "optim" "risk" "loss"
[13] "unknown" "converg" "rateoptim" "gaussian" "sens" "prove"
[19] "smooth" "uniform" "vector" "lower" "bound" "class"
[25] "set" "estim" "function" "problem"
[[38]]
[1] "elicit" "question" "histor" "art" "uncertain"
[6] "psycholog" "answer" "hope" "prone" "inform"
[11] "respond" "judg" "hyperparamet" "believ" "colleagu"
[16] "expert" "closer" "person" "statistician" "particip"
[21] "pool" "lowdimension" "heurist" "thought" "bring"
[26] "kind" "indirect" "reach" "task" "encourag"
[31] "lack" "peopl" "logarithm" "modern" "review"
[36] "simpli" "conjug"
[[39]]
[1] "eigenvector" "eigenvalu" "tensor" "axe" "matric"
[6] "matrix" "invari" "scatter" "subspac" "diffus"
[11] "symmetr" "orthogon" "princip" "popul" "track"
[16] "covari" "spectral"
[[40]]
[1] "alcohol" "disord" "trait" "mental"
[5] "haplotyp" "dichotom" "ordin" "environment"
[9] "genet" "associ" "phenotyp" "believ"
[13] "quantit" "transmiss" "geneenviron" "ill"
[17] "routin" "zhang" "today" "topic"
[21] "wellestablish" "aggreg" "multist"
[[41]]
[1] "men" "sex" "women" "twosid"
[5] "parent" "opposit" "percentil" "race"
[9] "educ" "prefer" "record" "member"
[13] "household" "children" "onesid" "nation"
[17] "elder" "age" "complementari" "potenti"
[21] "conjunct" "stronger" "earlier" "crosssect"
[25] "detail" "characterist" "simpler" "affect"
[29] "wave" "rough" "stabil" "retrospect"
[[42]]
[1] "simex" "simulationextrapol" "errorpron"
[4] "longer" "withinclust" "frailti"
[7] "wang" "undersmooth"
[[43]]
[1] "undersmooth" "reml" "star" "selector"
[[44]]
[1] "bandwidth" "kernel" "selector" "crossvalid" "nonmonoton"
[6] "polynomi" "plugin" "local" "densiti" "smooth"
[11] "datadriven" "choos" "select" "hall" "chosen"
[16] "nearbi" "choic" "estim"
[[45]]
[1] "spectral" "periodogram" "densiti" "frequenc" "domain"
[6] "stationari" "whittl" "spectra" "fourier" "seri"
[11] "tail" "time" "calcul" "norm" "ldistanc"
[[46]]
[1] "filter" "particl" "statespac" "sequenti" "recurs"
[6] "state" "outofsampl" "resampl" "algorithm" "frequenc"
[11] "dynam" "iter" "mont" "carlo" "signaltonois"
[[47]]
[1] "earn" "encourag" "educ" "person" "train"
[6] "interview" "document" "instrument" "employ" "census"
[11] "incom" "accept" "categori" "manipul" "slight"
[16] "report" "feder" "prototyp" "file" "insensit"
[21] "subsequ" "total" "peopl" "resembl" "compens"
[[48]]
[1] "disabl" "live" "emphas" "debat" "foundat"
[6] "healthi" "consecut" "month" "progress" "life"
[11] "translat" "stationar" "feder" "daili" "longstand"
[16] "psycholog" "labor" "capac" "tabl" "preval"
[21] "framingham" "costeffect" "employ" "manipul" "report"
[26] "status" "health"
[[49]]
[1] "slice" "sir" "save" "invers" "central" "contour"
[7] "subspac" "exhaust" "reduct" "dimens" "nconsist" "goal"
[13] "direct" "averag" "regress" "method"
[[50]]
[1] "eigenfunct" "princip" "trajectori" "compon" "curv"
[6] "noisi" "compos" "elucid" "spars" "function"
[11] "eigenvalu" "span" "expans" "smooth" "random"
[16] "analysi" "deriv" "data"
[[51]]
[1] "custom" "retail" "compani" "deliveri"
[5] "arriv" "servic" "obvious" "center"
[9] "consum" "forecast" "bank" "irrelev"
[13] "capac" "week" "costeffect" "commerci"
[17] "contact" "tradit" "durat" "total"
[21] "tie" "lognorm" "satisfactorili" "pay"
[25] "style" "suppress" "histori" "today"
[29] "drop" "compon" "sex"
[[52]]
[1] "electr" "wind" "forecast" "load"
[5] "speed" "renew" "hour" "market"
[9] "daili" "energi" "power" "shortterm"
[13] "price" "distort" "heteroscedast" "bivari"
[17] "nuclear" "diagon" "serial" "width"
[21] "superpopul" "consumpt"
[[53]]
[1] "ovarian" "colorect" "geneenviron" "environment"
[5] "diseas" "exposur" "fine" "firststag"
[9] "ascertain" "smallsampl" "onset" "cancer"
[13] "lung" "genet" "casecontrol" "registri"
[17] "logist" "gather" "prostat" "complementari"
[21] "subject" "tumour" "proc" "studi"
[[54]]
[1] "racial" "stop" "depart" "race" "traffic"
[6] "york" "citi" "item" "enforc" "benchmark"
[11] "hour" "drive" "suspect" "person" "descent"
[16] "minor" "survey" "know" "california" "research"
[21] "polit"
[[55]]
[1] "immun" "overlap" "northwest" "vectorvalu" "contact"
[6] "stratum" "transmiss" "environment" "epidem" "appropri"
[11] "birth" "death" "problemat" "debat" "ecolog"
[16] "led" "delet" "role" "engin" "driven"
[21] "abund" "possibl" "proceed" "financ" "drawn"
[26] "transport"
[[56]]
[1] "classif" "classifi" "machin" "discrimin"
[5] "multicategori" "soft" "rule" "boost"
[9] "misclassif" "train" "support" "learn"
[13] "binari" "poor" "deliv" "loss"
[17] "diverg" "vector" "accumul" "featur"
[21] "centroid" "perform" "supervis"
[[57]]
[1] "biomark" "ftest" "birth" "devic" "exemplifi"
[6] "healthi" "anova" "cohort" "nutrit" "deliveri"
[11] "pathway" "exposur" "longterm" "initi" "expens"
[16] "prevent" "trajectori" "earli" "epidemiolog" "molecular"
[21] "status"
[[58]]
[1] "equivari" "scatter" "depth" "affin" "hyperplan"
[6] "locationscal" "breakdown" "tukey" "concept" "median"
[11] "project" "rootn" "ellipt" "plane" "bodi"
[16] "matrix" "multivari" "introduc" "translat"
[[59]]
[1] "surveil" "cancer" "counti" "incid" "institut"
[6] "epidemiolog" "mortal" "lung" "detect" "program"
[11] "unusu" "unit" "nation" "delay" "geograph"
[16] "epidem" "chang" "interview" "diseas" "prostat"
[21] "preliminari" "scan" "rate" "genuin"
[[60]]
[1] "suppli" "water" "beta" "kinet"
[5] "regulatori" "contamin" "tild" "tissu"
[9] "parallel" "regul" "flow" "lowerdimension"
[13] "column" "sigma" "epsilon" "perfect"
[17] "lregular" "speak" "superposit" "program"
[21] "tomographi" "semin" "uncertainti" "quantit"
[25] "suspect" "ideal" "emiss" "residu"
[29] "logarithm" "nonasymptot"
[[61]]
[1] "trade" "forecast" "datagener" "cubic"
[5] "day" "econom" "load" "stock"
[9] "autoregress" "feder" "lowerdimension" "daili"
[[62]]
[1] "usag" "content" "track" "document" "traffic" "capac"
[7] "wish" "copula" "mainten" "northern" "histor" "languag"
[[63]]
[1] "snp" "polymorph" "haplotyp" "genotyp" "nucleotid" "million"
[7] "genomewid" "parent" "gather" "genom" "scope" "genet"
[13] "singl" "diseas" "dna" "variant" "pedigre" "variat"
[19] "uncertain" "soft"
[[64]]
[1] "wishart" "cone" "graph" "conjug" "enrich"
[6] "decompos" "famili" "sigma" "zero" "graphic"
[11] "matric" "homogen" "shape" "tangent" "scalabl"
[16] "ann" "matrix" "invers" "definit" "correspond"
[21] "gaussian" "edg" "eigenvalu" "covari" "paramet"
[[65]]
[1] "nuisanc" "psi" "profil"
[4] "mestim" "theta" "lambda"
[7] "frequentist" "paramet" "elimin"
[10] "semiparametr" "likelihood" "infinitedimension"
[13] "ancillari"
[[66]]
[1] "registr" "curv" "amplitud" "landmark" "align" "crosssect"
[7] "phase" "radius" "curvatur" "shape" "metric" "geometri"
[13] "convex" "demograph" "variat" "tempor" "transform" "exhibit"
[19] "twostep" "closur"
[[67]]
[1] "health" "care" "servic" "hospit" "qualiti"
[6] "patient" "exposur" "physician" "status" "monitor"
[11] "visit" "survey" "age" "multilevel" "outcom"
[16] "nutrit" "year" "organ" "largest" "pattern"
[21] "prevent" "diseas" "report" "mortal" "nation"
[26] "state" "longitudin" "polici" "account" "popul"
[31] "administr" "analys"
[[68]]
[1] "mle" "mles" "siev" "brownian" "gap"
[6] "motion" "naiv" "status" "proof" "uniqu"
[11] "drift" "maximum" "likelihood" "main" "current"
[16] "prove"
[[69]]
[1] "confid" "interv" "coverag" "bootstrap" "band" "region"
[7] "invert" "nomin" "construct" "onesid" "limit" "asymptot"
[[70]]
[1] "factori" "design" "aberr" "doubl"
[5] "twolevel" "minimum" "complementari" "resolut"
[9] "run" "project" "factor" "fraction"
[13] "quantit" "engin" "construct" "agricultur"
[[71]]
[1] "cook" "influenti" "curvatur" "delet" "resolv" "discrep"
[7] "influenc" "perturb" "distanc" "rigor" "fundament" "address"
[13] "issu" "aim" "degre" "crosssect" "subset" "sir"
[[72]]
[1] "undirect" "graph" "boolean" "kinet" "chemic"
[6] "edg" "intern" "uniqu" "encod" "blood"
[11] "disjoint" "unidentifi" "correspond" "acycl" "graphic"
[16] "markov" "independ" "mitig" "transcript" "bind"
[21] "characteris" "centr" "read"
[[73]]
[1] "carrol" "liu" "lin" "backfit"
[5] "singleindex" "rootn" "amer" "assoc"
[9] "ball" "withinclust" "prototyp" "withinsubject"
[13] "stochast"
[[74]]
[1] "varyingcoeffici" "conduct"
[[75]]
[1] "predictor" "respons" "predict" "reduct" "regress"
[6] "scalar" "dimens" "distort" "trajectori" "squar"
[11] "unbias" "linear" "flexibl" "function"
[[76]]
[1] "upper" "bound" "lower" "risk" "tail" "radius" "deriv"
[[77]]
[1] "rankbas" "cam" "rank" "symmetri" "contour" "irrespect"
[7] "ellipt" "sign" "onestep" "acceler" "rootn" "uniform"
[13] "effici" "asymptot"
[[78]]
[1] "transport" "ozon" "format" "precipit"
[5] "atmospher" "satellit" "maxima" "pressur"
[9] "protein" "temperatur" "ensembl" "cycl"
[13] "weather" "forecast" "splinebas" "piecewiselinear"
[17] "mainten" "current" "asymmetr" "synthes"
[21] "retrospect"
[[79]]
[1] "subfamili" "asymmetr" "pivot" "reparameter" "mise"
[6] "famili" "skew" "symmetr" "frequentist" "subclass"
[11] "urn" "pursu" "disjoint" "add" "withingroup"
[[80]]
[1] "pca" "twoway" "embed" "succeed" "princip"
[6] "logp" "spike" "reduct" "perturb" "tree"
[11] "transit" "compon" "evolutionari" "eigenvalu" "lose"
[16] "sudden" "crossov" "geodes" "anim" "topolog"
[21] "diagon"
[[81]]
[1] "loglikelihood" "edgeworth" "quadrat" "ratio"
[5] "gaussian" "likelihood" "maximum"
[[82]]
[1] "kendal" "tau" "truncat" "copula" "shape"
[6] "angl" "sphere" "symmetr" "tomographi" "speak"
[11] "emb" "geodes"
[[83]]
[1] "evolut" "cycl" "mutat" "human" "dynam"
[6] "histor" "reconstruct" "period" "sequenc" "understand"
[11] "recognit" "time" "wind"
[[84]]
[1] "pseudo" "manifest" "collaps" "nconsist" "categori"
[6] "cross" "reweight" "hoc" "ail" "degener"
[11] "jackknif" "nonrespond"
[[85]]
[1] "slower" "converg" "rate"
[[86]]
[1] "entropi" "metric" "bracket" "theorem"
[[87]]
[1] "trim" "depth" "discard" "radius" "ellipt" "breakdown"
[7] "subsampl" "robust" "contamin" "outlier" "remov" "serv"
[13] "induc" "feasibl"
[[88]]
[1] "aic" "bic" "akaik" "criterion" "switch" "ail"
[7] "select" "criteria"
[[89]]
[1] "rotat" "axe" "motion" "sphere" "translat"
[6] "isotrop" "mise" "omega" "transform" "splinebas"
[11] "spheric" "map" "domain" "autocorrel" "anatom"
[[90]]
[1] "centuri" "neyman" "speak" "scientist" "tstatist"
[6] "million" "opportun" "statistician" "mortal" "earli"
[11] "deliv" "compris" "practition" "bring" "environ"
[16] "scientif" "thousand" "presidenti" "address" "histor"
[21] "fisher" "polit" "manipul" "merg" "age"
[26] "ago" "tdistribut"
[[91]]
[1] "peopl" "name" "widespread" "network" "interfer"
[6] "social" "environ" "sourc" "welldefin" "read"
[11] "organ" "engin" "psycholog"
[[92]]
[1] "compound" "nondiscoveri" "multipletest" "doserespons"
[5] "oracl" "voxel" "chemic" "datadriven"
[9] "realdata" "fals" "ineffici" "hypothes"
[13] "fdr" "subject" "minim" "constraint"
[17] "convent" "simultan" "tau" "optim"
[21] "landmark" "procedur" "responseadapt" "ethic"
[[93]]
[1] "jin" "tukey" "subtl" "distancebas" "succeed"
[6] "nonzero" "boundari" "fraction" "amplitud" "higher"
[11] "critic" "detect" "neighbour" "mention" "complianc"
[16] "nearest" "leav" "phenomena" "supremum" "lower"
[21] "qualit"
[[94]]
[1] "homoscedast" "heteroscedast" "multiscal" "transform"
[5] "regress"
[[95]]
[1] "bar" "vertic" "cap" "lambda" "theta" "beta" "psi" "hasti"
[[96]]
[1] "deconvolut" "blur" "fourier" "errorsinvari" "radiat"
[6] "polynomi" "argument" "viewpoint" "place" "discret"
[11] "recov" "kernel" "densiti" "physic" "unrealist"
[16] "error" "problem" "function"
[[97]]
[1] "languag" "abil" "underpin" "recognit" "expert"
[6] "introduct" "happen" "diagnosi" "encod" "logic"
[11] "uncertainti" "proven" "make" "kind" "learn"
[16] "mathemat" "possibl" "rich" "theori" "laplac"
[21] "deal" "notion" "machin" "argument" "attempt"
[26] "purpos" "benefit"
[[98]]
[1] "unspecifi" "wilk" "postul" "submodel"
[5] "distributionfre" "interact" "follow" "hold"
[9] "margin" "semiparametr" "baselin" "leav"
[13] "prone"
[[99]]
[1] "theorem" "central" "limit"
structure_plot_general = function(Lhat,Fhat,grouping,title=NULL,
loadings_order = 'embed',
print_plot=FALSE,
seed=12345,
n_samples = NULL,
gap=40,
std_L_method = 'sum_to_1',
show_legend=TRUE,
K = NULL
){
set.seed(seed)
#s <- apply(Lhat,2,max)
#Lhat <- t(t(Lhat) / s)
if(is.null(n_samples)&all(loadings_order == "embed")){
n_samples = 2000
}
if(std_L_method=='sum_to_1'){
Lhat = Lhat/rowSums(Lhat)
}
if(std_L_method=='row_max_1'){
Lhat = Lhat/c(apply(Lhat,1,max))
}
if(std_L_method=='col_max_1'){
Lhat = apply(Lhat,2,function(z){z/max(z)})
}
if(std_L_method=='col_norm_1'){
Lhat = apply(Lhat,2,function(z){z/norm(z,'2')})
}
if(!is.null(K)){
Lhat = Lhat[,1:K]
Fhat = Fhat[,1:K]
}
Fhat = matrix(1,nrow=3,ncol=ncol(Lhat))
if(is.null(colnames(Lhat))){
colnames(Lhat) <- paste0("k",1:ncol(Lhat))
}
fit_list <- list(L = Lhat,F = Fhat)
class(fit_list) <- c("multinom_topic_model_fit", "list")
p <- structure_plot(fit_list,grouping = grouping,
loadings_order = loadings_order,
n = n_samples,gap = gap,verbose=F) +
labs(y = "loading",color = "dim",fill = "dim") + ggtitle(title)
if(!show_legend){
p <- p + theme(legend.position="none")
}
if(print_plot){
print(p)
}
return(p)
}
structure_plot_general(Lnorm,Fnorm,grouping = samples$journal,std_L_method = 'col_max_1')
Running tsne on 508 x 99 matrix.
Running tsne on 280 x 99 matrix.
Running tsne on 885 x 99 matrix.
Running tsne on 251 x 99 matrix.
sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /software/R-4.1.0-no-openblas-el7-x86_64/lib64/R/lib/libRblas.so
LAPACK: /software/R-4.1.0-no-openblas-el7-x86_64/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C
[4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_3.4.1 fastTopics_0.6-158 ebpmf_2.3.3 workflowr_1.6.2
loaded via a namespace (and not attached):
[1] mcmc_0.9-7 bitops_1.0-7 matrixStats_0.59.0
[4] fs_1.5.0 progress_1.2.2 httr_1.4.5
[7] rprojroot_2.0.2 tools_4.1.0 bslib_0.4.2
[10] utf8_1.2.3 R6_2.5.1 irlba_2.3.5.1
[13] uwot_0.1.14 lazyeval_0.2.2 colorspace_2.1-0
[16] withr_2.5.0 wavethresh_4.7.2 prettyunits_1.1.1
[19] tidyselect_1.2.0 ebpm_0.0.1.3 compiler_4.1.0
[22] git2r_0.28.0 glmnet_4.1-2 cli_3.6.1
[25] quantreg_5.94 SparseM_1.81 plotly_4.10.1
[28] labeling_0.4.2 horseshoe_0.2.0 sass_0.4.0
[31] smashrgen_1.2.5 caTools_1.18.2 flashier_0.2.51
[34] scales_1.2.1 mvtnorm_1.1-2 SQUAREM_2021.1
[37] quadprog_1.5-8 pbapply_1.7-0 mixsqp_0.3-48
[40] stringr_1.5.0 digest_0.6.31 rmarkdown_2.9
[43] MCMCpack_1.6-3 deconvolveR_1.2-1 vebpm_0.4.9
[46] pkgconfig_2.0.3 htmltools_0.5.4 highr_0.9
[49] fastmap_1.1.0 invgamma_1.1 htmlwidgets_1.6.1
[52] rlang_1.1.1 rstudioapi_0.13 farver_2.1.1
[55] shape_1.4.6 jquerylib_0.1.4 generics_0.1.3
[58] jsonlite_1.8.4 dplyr_1.1.0 magrittr_2.0.3
[61] smashr_1.3-6 Matrix_1.5-3 Rcpp_1.0.10
[64] munsell_0.5.0 fansi_1.0.4 lifecycle_1.0.3
[67] RcppZiggurat_0.1.6 stringi_1.6.2 whisker_0.4
[70] yaml_2.3.7 MASS_7.3-54 Rtsne_0.16
[73] grid_4.1.0 parallel_4.1.0 promises_1.2.0.1
[76] ggrepel_0.9.3 crayon_1.5.2 lattice_0.20-44
[79] cowplot_1.1.1 splines_4.1.0 hms_1.1.2
[82] knitr_1.33 pillar_1.8.1 softImpute_1.4-1
[85] codetools_0.2-18 glue_1.6.2 evaluate_0.14
[88] trust_0.1-8 data.table_1.14.8 RcppParallel_5.1.7
[91] foreach_1.5.1 vctrs_0.6.2 nloptr_2.0.3
[94] httpuv_1.6.1 MatrixModels_0.5-1 gtable_0.3.1
[97] purrr_1.0.1 ebnm_1.0-54 tidyr_1.3.0
[100] ashr_2.2-54 cachem_1.0.5 xfun_0.24
[103] Rfast_2.0.7 coda_0.19-4 later_1.3.0
[106] mr.ash_0.1-87 survival_3.2-11 viridisLite_0.4.1
[109] truncnorm_1.0-8 tibble_3.2.1 iterators_1.0.13
[112] ellipsis_0.3.2