Project Overview

This project aims to study the depletion of nucleotide in DNA sequence and its visualization. Transcription factor binding site is characterized by the DNA sequence motifs. A powerful tool to visualize the motif is the sequence logos. Usually, the enrichment is highlighted while the depletion is usually overlooked. Here, we introduce negative logo plot, where the depletion is plotted down below the zero on y axis so that we can find both strong positive and negative effects.

The plots are generated by the R package, Logolas. See GitHub or Bioconductor for details.

Analysis

These are results of the analysis.

  1. Negative logo plot: a new visualization.
  2. EDA: insight of the data.
  3. Depletion: the oppsite of enrichment.
  4. Dimer: an interesting finding.
  5. Dirichlet adaptive shrinkage: Let data make decision.
  6. The power of dash: illustrative examples.
  7. Cancer Mutation Signature: a new view.
  8. Consensus sequence: an informative way.
  9. Plant Motif: dealing with background probability.
  10. Protein Motif: more logos, more challenge

Data

All the results here are reproducible. See here for the list of data sources. The code could be found in the Rmarkdown files under analysis folder and the data are in the data folder in the github repository.

Credits

Thanks to Kushal K Dey, Matthew Stephens, John Blischak, and Hussein Al-Asadi for their great help.


This R Markdown site was created with workflowr