Last updated: 2017-09-09

Code version: bddb594

Here we list the data sources for the sake of reproducibility.

TFBS data

JASPAR

JASPAR database contains a curated, non-redundant set of profiles, derived from published collections of experimentally defined transcription factor binding sites for eukaryotes.

The database JASPAR2014 has been packaged into JASPAR2014. To search this database, user should use the package TFBSTools.

Install packages:

source("https://bioconductor.org/biocLite.R")
biocLite("JASPAR2014")

source("https://bioconductor.org/biocLite.R")
biocLite("TFBSTools")

library(TFBSTools)
library(JASPAR2014)

For the tutorial on how to searching the database, please refer to this webpage.

HOCOMOCO

HOCOMOCO provides transcription factor (TF) binding models for 600 human and 395 mouse TFs. The data can be found on the download page.

To read the data from HOCOMOCO, one could use the LoadMotifLibrary function from R package atSNP.

library(devtools)
install_github("chandlerzuo/atSNP")
library(atSNP)

PWM=LoadMotifLibrary('http://hocomoco10.autosome.ru/final_bundle/HUMAN/mono/HOCOMOCOv10_HUMAN_mono_jaspar_format.txt',tag = '>',transpose = F,field = 1,sep = c("\t", " ", ">"), skipcols = 1, skiprows = 1, pseudocount = 0)

Manollis Kellis webpage

Manollis Kellis webpage contains known and discovered motifs for the ENCODE TF ChIP-seq datasets. To read the data motif.txt, one can use the function LoadMotifLibrary.

PWM=LoadMotifLibrary('http://compbio.mit.edu/encode-motifs/motifs.txt',tag = ">",transpose = F, field = 1, sep = c("\t", " ", ">"), skipcols = 1, skiprows = 1, pseudocount = 0)

PlantTFDB

PlantTFDBis a database that provides transcription factors of more than 160 species, including main lineages of green plants. The data can be found on the download page and under the item “Sets of TF binding motifs for 156 species (157 organisms)”.

Protein Motif

Protein sequence motif can be found in 3PFDB database. The database provides the multiple alignment data, PSSM, PWM(weighted observed percentages), etc.

Mutation Signature

The raw data for plotting the Fig.4 in Shiraishi et al.(2015) are in the data folder. The other cancer mutation data are avaiable at the github webpage of the paper.

Alexandrov’s results (Nature, 2013) can be found here and they are tri-nucleotides.

Session information

sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 15063)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.4.0  backports_1.0.5 magrittr_1.5    rprojroot_1.2  
 [5] tools_3.4.0     htmltools_0.3.5 yaml_2.1.14     Rcpp_0.12.12   
 [9] stringi_1.1.5   rmarkdown_1.6   knitr_1.15.1    git2r_0.18.0   
[13] stringr_1.2.0   digest_0.6.12   evaluate_0.10  

This R Markdown site was created with workflowr