2. Calculating mutual information values:
We know that
Intrinsic entropy H(A) = -
p(a) x log p(a)......equation 1
Relative/Joint entropy H(A,B) = -
p(a,b) x log p(a,b)......equation 2
MI = H(A)+H(B)-H(A,B)......equation 3
Consider 4 different profiles, each with a different distribution of its components.
Values towards 1 indicate absence, and vice-versa. The distribution is plotted in Figure C1.
Protein A 0.1 0.2 0.2 0.1 0.3 0.7
Protein B 0.0 0.0 0.0 0.0 0.0 0.0
Protein C 1.0 1.0 1.0 1.0 1.0 1.0
Protein D 1.0 1.0 1.0 0.0 0.0 0.0
Distributions of pij values for the four example profiles.
If the values in the profiles are binned in intervals of 0.1, we will get the following bins.
----------------------------------
Bins A B C D
----------------------------------
0 0 6 0 3
0.1 2 0 0 0
0.2 2 0 0 0
0.3 1 0 0 0
0.4 0 0 0 0
0.5 0 0 0 0
0.6 0 0 0 0
0.7 1 0 0 0
0.8 0 0 0 0
0.9 0 0 0 0
1 0 0 6 3
----------------------------------
Let's calculate p(a) x log p(a) for the first non-zero bin in profile of protein A - the 0.1 bin.
Total number of elements in the profile = 6.
p(a) = Total number of elements in bin / Total number of elements in profile
p(a) = 2/6 = 1/3 = 0.3333
log p(a) = ln (0.3333) = -1.0986
p(a) x log p(a) = -0.3661
Similarly,
p(a) x log p(a) for 0.2 bin is -0.3661
p(a) x log p(a) for 0.3 bin is -0.2986
p(a) x log p(a) for 0.7 bin is -0.2986
Substituting these values in eqn. 1, we get
H(A) = -[-0.3661 + (-0.3661) + (-0.2986) + (-0.2986)]
= -[-1.3294]
= 1.3294
Therefore, Intrinsic entropy for protein A is 1.3294.
Similarly, entropies for other protein profiles are
H(B) = 0
H(C) = 0
H(D) = 0.6931
For calculating joint entropies H(A,B) in equation 2, we perform similar calculations.
Bin counts are incremented if identical values are observed for a given position in both profiles.
Protein X 0.1 0.2 0.2 0.1 0.3 0.7
Protein Y 0.1 0.3 0.3 0.1 0.5 0.7
Bin (0.1,0.1) = 2
Bin (0.2,0.3) = 2
Bin (0.3,0.5) = 1
Bin (0.7,0.7) = 1
If profile of protein A is compared with itself, we will see identical bin counts, and the joint/relative
entropy will be the same as intrinsic entropy.
H(A,A) = 1.3294
Substituting the entropy values in equation 3, we get
MI (A,A) = 1.3294 + 1.3294 - 1.3294 = 1.3294
-----------------------------------------------------
NOTE THAT IN ACTUAL CALCULATIONS, LOG BASE 2 IS USED.
| Organism: |
Mutual Information Value Files |
Filtered Files* |
Highest MI |
Lowest MI |
| C. crescentus |
[ 86 MB, compressed, .bz2 ] |
ccrescentus-pairs-above-0.7.gz |
1.19895127672827 |
5.55111512312578e-17 |
| E. coli K12 |
[ 123 MB, compressed, .bz2 ] |
ecoli-K12-pairs-above-0.7.gz |
1.28267007200588 |
9.89124478767422e-07 |
| E. coli O157H7 |
[ 165 MB, compressed, .bz2 ] |
ecoli-O157-pairs-above-0.7.gz |
1.27509781507474 |
1.11022302462516e-16 |
| P. aeruginosa |
[ 205 MB, compressed, .bz2 ] |
paeruginosa-pairs-above-0.7.gz |
1.26654892510335 |
5.55111512312578e-17 |
| S. aureus |
[ 45 MB, compressed, .bz2 ] |
saureus-pairs-above-0.7.gz |
1.20788005888389 |
1.11022302462516e-16 |
| V. cholerae |
[ 91 MB, compressed, .bz2 ] |
vcholerae-pairs-above-0.7.gz |
1.34176921052772 |
5.55111512312578e-17 |
| S. cerevisiae |
[ 187 MB, compressed, .bz2 ] |
scerevisiae-pairs-above-0.7.gz |
1.32867461118848 |
5.55111512312578e-17 |
| Protein | Mutual information | Protein function |
| FlgB | 0.82 | Flagellar biosynthesis, cell-proximal portion of basal-body rod |
| FlgK | 0.80 | Flagellar biosynthesis, hook-filament junction protein 1 |
| FlgL | 0.78 | Flagellar biosynthesis; hook-filament junction protein |
| FliF | 0.75 | Flagellar biosynthesis; basal-body MS(membrane and supramembrane)-ring and collar protein |
| FlgE | 0.75 | Flagellar biosynthesis, hook protein |
| FliN | 0.75 | Flagellar biosynthesis, component of motor switch and energizing, enabling rotation and determining its direction |
| FlgF | 0.75 | Flagellar biosynthesis, cell-proximal portion of basal-body rod |
| FliG | 0.75 | Flagellar biosynthesis, component of motor switching and energizing, enabling rotation and determining its direction |
| FlgG | 0.75 | Flagellar biosynthesis, cell-distal portion of basal-body rod |
| FlgC | 0.75 | Flagellar biosynthesis, cell-proximal portion of basal-body rod |
| MotA | 0.69 | Proton conductor component of motor; no effect on switching |
| FliQ | 0.69 | Flagellar biosynthesis |
| FliS | 0.68 | Flagellar biosynthesis; repressor of class 3a and 3b operons (RflA activity) |
| FliR | 0.68 | Flagellar biosynthesis |
| FliC | 0.67 | Flagellar biosynthesis; flagellin, filament structural protein |
| Rnk | 0.67 | Regulator of nucleoside diphosphate kinase |
| FliM | 0.64 | Flagellar biosynthesis, component of motor switch and energizing, enabling rotation and determining its direction |
| YedA | 0.63 | Putative transmembrane subunit |
| FliD | 0.62 | Flagellar biosynthesis; filament capping protein; enables filament assembly |
| CsrA | 0.60 | Carbon storage regulator; controls glycogen synthesis, gluconeogenesis, cell size and surface properties |