Last updated: 2017-09-08

Code version: 8513a08

Sequence logos is a graphical representation of patterns in the aligned sets of sequences. It was first introduced by Schneider&Stephens(1990). Sequence logos is an informative visualization of patterns and easily shows the residue preferences and conservation of a particular position. These patterns are suggestive, in the sense of the key functional sites of transcription factor or protein family.

There are many existing packages or tools for generating sequence logos. For example, the R package seqLogofor displaying DNA sequence motif, web-based tool WebLogo , and Seq2Logo for construction and visualization of amino acid binding motifs.

However, certain limitations exist in these packages or tools. In particular, they have limited size of symbol library that can be used to make the plot, which greatly constrain the scope of applications. For example, the seqLogo could only plot logos of alphabet.

To address these constraints and make various applications possible, we introduce a new R package Logolas, for displaying logos that are not only standard but also user-specified. The logos could be alphabet, numbers, punctuation and etc. What is more important is that Logolas lets the user plot alphanumeric strings as logos, which extends the scope of the visualization massively, beyond the TFs or protein sequences. We will show examples of how this string representation of logos is effective in visualizing mutation signature patterns, ecological species abundance patterns etc.

Besides, the common sequence logos could only show the enrichment of residues, while the depletion is usually overlooked. In Logolas, we develop the negative logo plot that could display both the enrichment and depletion at one time. This would provide more insights and information.

Furthermore, based on the enrichment and depletion, a new symbol-calling way is proposed. This Logolas based nomenclature is a more generic alternative to the IUPAC and the Prosite nomenclatures used for calling nucleotide and amino acid respectively.

Most existing logo plotting tools take position weight matrix as input for making the logo plots. But this approach neglects the frequency scale underlying the position weight matrix. For example, a position weight computed for TF data based on just 10 fragments mapping to that position is less reliable compared to one based on 100 fragments. In such a case, the user would want to shrink the composition probability to the pre-defined background or prior much more in the first case compared to the second.

In Logolas, we provide a Dirichlet Adaptive Shrinkage method (dash), in similar lines to the adaptive shrinkage(ash) approach due to Stephens(2016), to adaptively shrink the positional weights based on the positional frequency scale.

Besides all the above benefits, Logolas allows for new and flexible stylizations, textures and color patterns of the logos , allows the user to create her own logos and add them to the library, and also provides an easy interface to combine Logolas plots with R base graphics and ggplot2 graphs in multi-panel visualizations.

Reference

Schneider, T. D., & Stephens, R. M. (1990). Sequence logos: a new way to display consensus sequences. Nucleic acids research, 18(20), 6097-6100.

Stephens, M. (2016). False discovery rates: a new deal. Biostatistics, 18(2), 275-294.

Session information

sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 15063)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.4.0  backports_1.0.5 magrittr_1.5    rprojroot_1.2  
 [5] tools_3.4.0     htmltools_0.3.5 yaml_2.1.14     Rcpp_0.12.12   
 [9] stringi_1.1.5   rmarkdown_1.6   knitr_1.15.1    git2r_0.18.0   
[13] stringr_1.2.0   digest_0.6.12   evaluate_0.10  

This R Markdown site was created with workflowr