Misplaced Pages

MEME suite

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
This article may be too technical for most readers to understand. Please help improve it to make it understandable to non-experts, without removing the technical details. (December 2012) (Learn how and when to remove this message)
This article provides insufficient context for those unfamiliar with the subject. Please help improve the article by providing more context for the reader. (June 2018) (Learn how and when to remove this message)
This article's lead section may be too short to adequately summarize the key points. Please consider expanding the lead to provide an accessible overview of all important aspects of the article. (June 2018)
(Learn how and when to remove this message)

The MEME suite is a collection of tools for the discovery and analysis of sequence motifs.

Motif discovery

MEME

Main article: Multiple EM for Motif Elicitation

Multiple Expectation maximizations for Motif Elicitation (MEME) is a tool for discovering motifs in a group of related DNA or protein sequences. MEME takes as input a group of DNA or protein sequences and outputs as many motifs as requested up to a user-specified statistical confidence threshold. MEME uses statistical modeling techniques to automatically choose the best width, number of occurrences, and description for each motif.

GLAM2

Gapped local alignment of motifs (GLAM 2) is a tool for discovering gapped motifs in a group of DNA or protein sequences. Unlike MEME, GLAM2 does not try to find several different motifs all in one go. Instead, it performs replicates: it tries to find the best possible motif multiple times.

DREME

Discriminative Regular Expression Motif Elicitation (DREME) is a tool for discovering motifs in large collections of sequences. DREME is computationally efficient and therefore is suitable for motif search on large data sets derived from ChIP-seq (Chromatin immunoprecipitation followed by sequencing) experiments. In the interest of computational efficiency, DREME finds only motifs that can be expressed in the IUPAC alphabet, which contains the standard DNA alphabet ACGT as well as eleven 'wildcard' characters (for example, R indicates either A or G).

MEME-ChIP

MEME-ChIP is a tool for discovering motifs in data sets derived from ChIP-seq (Chromatin immunoprecipitation followed by sequencing) experiments.

Motif search

FIMO

Find Individual Motif Occurrences (FIMO) is a tool for finding instances of motifs in a sequence database. FIMO searches the database for the provided motifs, and reports a q-value for each match.

GLAM2SCAN

GLAM2SCAN is a tool for finding occurrences of a GLAM2 motif in a sequence database.

MAST

Motif Alignment & Search Tool (MAST) is a tool for searching biological sequence databases for sequences that contain an occurrence of each motif in a given set of motifs. MAST scores the matches and reports p-values for four types of events:

  • Position p-value: The p-value of a match of a given position within a sequence to a motif is defined as the probability of a randomly selected position in a randomly generated sequence having a match score at least as large as that of the given position. Note:If MAST is combining reverse complement DNA strands, the position p-value is not corrected for multiple tests.
  • Sequence p-value: The p-value of a match of a sequence to a motif is defined as the probability of a randomly generated sequence of the same length having a match score at least as large as the largest match score of any position in the sequence.
  • Combined p-value: The p-value of a match of a sequence to a group of motifs is defined as the probability of a randomly generated sequence of the same length having sequence p-values whose product is at least as small as the product of the sequence p-values of the matches of the motifs to the given sequence.
  • E-value: The E-value of the match of a sequence in a database to a group of motifs is defined as the expected number of sequences in a random database of the same size that would match the motifs as well as the sequence does and is equal to the combined p-value of the sequence times the number of sequences in the database.

Motif enrichment analysis

SpaMo

Spaced Motif Analysis Tool (SpaMo) is a tool for inferring interactions between transcription factors. SpaMo takes a set of sequences (typically sequences surrounding ChIP-seq peaks), a motif represented in these sequences, and a database of known motifs. SpaMo searches the database for instances of database motifs enriched in sites neighboring the given motif. These enrichments suggest physical interaction between the factors that bind each motif.

CentriMo

Central Motif Enrichment Analysis (CentriMo) is a tool for inferring direct DNA binding from ChIP-seq data. CentriMo is based on the observation that the positional distribution of binding sites matching the direct-binding motif tends to be unimodal, well centered and maximal in the precise center of the ChIP-seq peak regions. CentriMo takes a set of sequences and plots the occurrence of motifs relative to the ChIP-seq peak. Motifs that occur exclusively at the peak provide good evidence of direct binding, while motifs that do not occur in a consistent position relative to the peak may not bind directly.

Motif cluster search

MCAST

Motif Cluster Alignment and Search Tool (MCAST) is a tool for searching a sequence database for statistically significant clusters of non-overlapping occurrences of a set of motifs. Such clusters may represent regulatory modules.

Motif comparison

TOMTOM

Tomtom is a tool for comparing a DNA motif to a database of known motifs. TOMTOM searches for statistically significantly similar motifs to the query motif. TOMTOM is useful for determining whether a discovered motif is novel or is a variation of a known motif.

Motif function analysis

GOMO

Gene Ontology for MOtifs (GOMO) is a tool for identifying possible roles for DNA binding motifs. It does so by comparing genes the motif occurs upstream of to a Gene Ontology database. If the motif occurs statistically significantly upstream of genes related to a particular function (for example, lactose digestion), it suggests that the transcription factor that binds the motif may regulate that function (for example, by promoting transcription of proteins that digest lactose).

References

  1. Bailey T.L., Elkan C. Unsupervised Learning of Multiple Motifs In Biopolymers Using EM. Mach. Learn. 1995;21:51–80.
  2. Timothy L. Bailey, "DREME: Motif discovery in transcription factor ChIP-seq data", Bioinformatics, 27(12):1653-1659, 2011.
  3. MC Frith, NFW Saunders, B Kobe, TL Bailey, "Discovering sequence motifs with arbitrary insertions and deletions", PLoS Computational Biology, 4(5):e1000071, 2008
  4. Philip Machanick and Timothy L. Bailey, "MEME-ChIP: motif analysis of large DNA datasets", Bioinformatics, 2712, 1696-1697, 2011
  5. Charles E. Grant, Timothy L. Bailey, and William Stafford Noble, "FIMO: Scanning for occurrences of a given motif", Bioinformatics, 27(7):1017-1018, 2011
  6. MC Frith, NFW Saunders, B Kobe, TL Bailey (2008) Discovering sequence motifs with arbitrary insertions and deletions, PLoS Computational Biology, 4(5), e1000071, 2008
  7. Whitington, T., Frith, M. C., Johnson, J., & Bailey, T. L. (2011). Inferring transcription factor complexes from ChIP-seq data. Nucleic Acids Research, 39(15), e98-e98.
  8. Bailey, T. L., & Machanick, P. (2012). Inferring direct DNA binding from ChIP-seq. Nucleic Acids Research, 40(17), e128-e128

External links

Categories:
MEME suite Add topic