Last updated: 2017-09-09
Code version: bddb594
Here we list the data sources for the sake of reproducibility.
JASPAR database contains a curated, non-redundant set of profiles, derived from published collections of experimentally defined transcription factor binding sites for eukaryotes.
The database JASPAR2014 has been packaged into JASPAR2014
. To search this database, user should use the package TFBSTools
.
Install packages:
source("https://bioconductor.org/biocLite.R")
biocLite("JASPAR2014")
source("https://bioconductor.org/biocLite.R")
biocLite("TFBSTools")
library(TFBSTools)
library(JASPAR2014)
For the tutorial on how to searching the database, please refer to this webpage.
HOCOMOCO provides transcription factor (TF) binding models for 600 human and 395 mouse TFs. The data can be found on the download page.
To read the data from HOCOMOCO, one could use the LoadMotifLibrary
function from R package atSNP
.
library(devtools)
install_github("chandlerzuo/atSNP")
library(atSNP)
PWM=LoadMotifLibrary('http://hocomoco10.autosome.ru/final_bundle/HUMAN/mono/HOCOMOCOv10_HUMAN_mono_jaspar_format.txt',tag = '>',transpose = F,field = 1,sep = c("\t", " ", ">"), skipcols = 1, skiprows = 1, pseudocount = 0)
Manollis Kellis webpage contains known and discovered motifs for the ENCODE TF ChIP-seq datasets. To read the data motif.txt, one can use the function LoadMotifLibrary
.
PWM=LoadMotifLibrary('http://compbio.mit.edu/encode-motifs/motifs.txt',tag = ">",transpose = F, field = 1, sep = c("\t", " ", ">"), skipcols = 1, skiprows = 1, pseudocount = 0)
Protein sequence motif can be found in 3PFDB database. The database provides the multiple alignment data, PSSM, PWM(weighted observed percentages), etc.
The raw data for plotting the Fig.4 in Shiraishi et al.(2015) are in the data folder. The other cancer mutation data are avaiable at the github webpage of the paper.
Alexandrov’s results (Nature, 2013) can be found here and they are tri-nucleotides.
sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 15063)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.0 backports_1.0.5 magrittr_1.5 rprojroot_1.2
[5] tools_3.4.0 htmltools_0.3.5 yaml_2.1.14 Rcpp_0.12.12
[9] stringi_1.1.5 rmarkdown_1.6 knitr_1.15.1 git2r_0.18.0
[13] stringr_1.2.0 digest_0.6.12 evaluate_0.10
This R Markdown site was created with workflowr