For each motif that it discovers in the training set,
MEME prints the following information:
MEME motifs are represented by position-specific probability matrices
that specify the probability of each possible letter appearing at each
possible position in an occurrence of the motif. These are displayed
as "sequence LOGOS", containing stacks of letters at each position
in the motif. The total height of the stack is the "information
content" of that position in the motif in bits. The height of the
individual letters in a stack is the probability of the letter at that
position multiplied by the total information content of the stack.
Note: The MEME LOGO differs from those produced by the
Weblogo program
because a small-sample correction is NOT applied.
However, MEME LOGOs in PNG and encapsulated postscript (EPS) formats
with small-sample correction (SSC) are available by clicking
on the download button with "SSC" set to "on" under
Download LOGO.
The MEME LOGOs without small sample correction are similarly available.
Error bars are included in the LOGOs with small-sample correction.
Modern web browsers supporting the canvas element and it's text manipulation functions as described in the
html 5 standard, can render the sequence LOGOs without needing the images. The browsers which work with this
feature are:
- Firefox 3.5 and above
- Safari 4 and above
- Google Chrome 4 and above
Unfortunately Internet Explorer 8 does not support any html 5 features.
The information content of each motif position is computed as described in the paper by Schneider and Stephens,
"Sequence Logos: A New Way to Display Consensus Sequences" but
the small-sample correction, e(n), is set to zero for the LOGO displayed in the MEME output.
The corrected information content of position i is given by
R(i) for amino acids = log2(20) - (H(i) + e(n)) (1a)
R(i) for nucleic acids = 2 - (H(i) + e(n)) (1b)
where H(i) is the entropy of position i,
H(l) = - (Sum f(a,i) * log2[ f(a,i) ]). (2)
Here, f(a,i) is the frequency of base or amino acid a at position i, and e(n) is the small-sample correction
for an alignment of n letters. The height of letter a in column i is given by
height = f(a,i) * R(i) (3)
The approximation for the small-sample correction, e(n), is given by:
e(n) = (s-1) / (2 * ln(2) * n), (4)
where s is 4 for nucleotides, 20 for amino acids, and n is the number of sequences in the alignment.
The letters in the logos are colored as follows.
For DNA sequences, the letter categories contain one letter each.
NUCLEIC ACIDS |
COLOR |
A |
RED |
C |
BLUE |
G |
ORANGE |
T |
GREEN |
For proteins, the categories are based on the biochemical properties of the various amino acids.
AMINO ACIDS |
COLOR |
PROPERTIES |
A, C, F, I, L, V, W and M |
BLUE |
Most hydrophobic[Kyte and Doolittle, 1982] |
NQST |
GREEN |
Polar, non-charged, non-aliphatic residues |
DE |
MAGENTA |
Acidic |
KR |
RED |
Positively charged |
H |
PINK |
|
G |
ORANGE |
|
P |
YELLOW |
|
Y |
TURQUOISE |
|
J. Kyte and R. Doolittle, 1982. "A Simple Method for Displaying the Hydropathic Character of a Protein",
J. Mol Biol. 157, 105-132.
Note: the "text" output format of MEME preserves the historical MEME format where LOGOS are replaced by a simplified probability
matrix, a relative entropy plot, and a multi-level consensus sequence.