PLEX Help/FAQs
Welcome to PLEX. We hope you find PLEX easy to use. We have complied a short list of questions that are most
frequently asked by users of PLEX. We hope it is sufficient. If you need more information, please write to
marcotte at icmb.utexas.edu
The old help file is located here.
General
Profiles
Profile construction
Profile Comparison
Gene neighbors
Rosetta stone protein links
Submission questions:
Citing, Availability and Copyright
Author contact information
What is PLEX? What can PLEX be used for?
PLEX stands for Protein Link Explorer.
Given an amino acid sequence, PLEX allows a phylogenetic profile to be constructed,
which can subsequently be compared to profiles of upto 350,111 other proteins from 89
fully sequenced genomes, that are stored in our database. Functional information from proteins
whose profiles match the query profile can then be superimposed on the query protein. In
addition, PLEX also allows searches for gene neighbors and Rosetta stone links of proteins
whose profiles match the query profile.
What is a phylogenetic profile?
Phylogenetic profiles were first described by
Pellegrini et al, 1999.
A phylogenetic profile is simply description of the presence or absence of a given protein in a
set of genomes.
|
Genome1 |
Genome2 |
Genome3 |
Genome4 |
Profile of protein A |
Present |
Absent |
Absent |
Absent |
Profile of protein B |
Present |
Present |
Absent |
Present |
Profile of protein C |
Present |
Present |
Absent |
Present |
In the example given above, profiles of proteins B and C are similar, whereas profile of protein A
is different. In real life, a profile is constructed using the BLAST similarity (E-value) score of
the highest scoring sequence from a given genome.
|
Genome1 |
Genome2 |
Genome3 |
Genome4 |
Profile of protein A |
1e-30 |
1e-298 |
1e-6 |
0.04 |
Profile of protein B |
1e-57 |
1e-84 |
1e-9 |
1e-9 |
Profile of protein C |
1e-57 |
1e-84 |
1e-9 |
1e-9 |
The E-values are subsequently converted to scores using the formula -1/log(E-value), and then included
in the profile. Profiles of all known proteins from 89 completely sequenced genomes are pre-calculated
are stored in our database. The following is an example of profile that is stored in our database:
>afulgidus|gi|11497625|ref|NP_068845.1| 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.03 0.02 0.02 1.00 0.02 0.02 0.02 1.00 0.02 1.00 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
0.03 0.02 0.02 0.02 0.03 0.03 0.03 0.03 0.02 0.02 0.02 1.00 1.00 0.02 1.00 0.03 0.03 0.03 0.02 0.02 0.03
0.03 0.03 0.03 1.00 0.02 0.02 0.02 0.02 0.03 0.02 0.03 0.03 0.01 0.01 0.02 0.02 0.02 0.03 1.00 0.02 1.00
0.02 0.02 0.02 1.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
How is a phylogenetic profile constructed in PLEX?
A phylogenetic profile is constructed by BLASTing the query amino acid sequence against a locally
maintained database of completely sequenced genomes. BLAST E-value of the amino acid sequence most
similar to the query sequence from each of the genomes is then used to construct the profile (or
mathematically referred to as the profile vector). Each E-value is converted to a score using the
formula -1/log (E-value), before it is used in profile construction.
What is the FASTA format?
In the FASTA format, a line beginning with a greater than (">") sign represents a comment, and is
ignored by programs that use files in the FASTA format as input. Lines that begin without this sign are
treated as sequence lines. Blank lines are ignored.
What are Raw and Formatted BLAST results?
A raw BLAST result is simply the result of comparing two sequences as reported by BLAST. These results
are formatted so that all the information relevant to profile construction is reported on one line.
What does profile comparison mean?
When comparing profiles, PLEX attempts to calculate similarity between the query profile and profiles
of all proteins from the selected genome by measuring mutual information between them.
Mutual information, an information theoretic measure, is maximum when there is complete covariation
between the occurrences of a pair of genes, and tends to zero as variation decreases or the gene occurrences
vary independently.
For details, see:
Date, S. V. & Marcotte, E. M
Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages,
Nature Biotechnology, 21, 9, 1055-1062 (2003).
How do I choose a cut-off value?
Choosing a mutual information cut-off value is completely user-dependent. In studies using normal and
shuffled profiles, shuffled profiles were rarely seen to score above 0.7 mutual information value (this does
not mean that values below 0.7 are uninformative). Mutual information values of 0.75 represent ~35-50% functional
similarity, whereas values above 0.95 indicate an almost 100% chance of two proteins being functionally linked,
in the E. coli K12 and S. cerevisiae genomes.
For details, see:
Date, S. V. & Marcotte, E. M
Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages,
Nature Biotechnology, 21, 9, 1055-1062 (2003).
What genomes can I select? Can I select more than one genome for comparison?
You can choose from any of the 89 completely sequenced prokaryotic and eukaryotic genomes that are currently
publicly available. Newer versions of PLEX will have more genomes, as they become available. Currently, only one
genome can be selected for analysis at one time.
Why should I omit homologs from my comparisons?
Homologs have the same function, and are expected to have the same phylogenetic profile. No new information
is obtained by comparing homologs.
What do the profile comparison results mean? Why should I select a candidate for further comparison?
After comparing the query profile with the profiles of proteins from the selected genome, the best
matches are rank ordered by their mutual information scores. The higher the mutual information score, the more likely
it is that the protein with a matching profile is similar in function to the query protein. By selecting a protein
whose profile that matches the profile of the query, you can investigate the protein's gene neighbors, as well as find
any Rosetta stone links.
What are patterns of gene distribution?
Patterns of gene distribution, for the sake of analysis carried out on the PLEX website, are the
observed distributions of genes in a set of reference genomes, as revealed during profile construction. When
such patterns are collectively viewed, they are helpful in discerning any trends, which can be subsequently used
in designating genes as belonging to a particular group.
What are gene neighbors? How are they calculated?
Genes placed next to each other on a chromosome (prokaryotic or eukaryotic), are designated as
gene neighbors. Gene neighbors are calculated based on the user-specified distance between predicted or
observed start and stop sites associated with each designated gene. Gene neighbors can have similar or
dissimilar orientations.
What are Rosetta stone proteins?
Fully functional proteins that are a fusion of two or more independent proteins, either in the same
or a different organism, are designated as Rosetta stone proteins. The presence of Rosetta stone protein
implies that the two independent proteins may be functionally linked with each other.
For details, see:
Marcotte, E. M., Pellegrini, M., Ng, H.-L., Rice, D. W., Yeates, T. O. &
Eisenberg, D. Detecting Protein Function & Protein-Protein Interactions from Genome Sequences.
Science 285, 751-753(1999).
How are Rosetta stone links calculated?
Rosetta stone links are precalculated and stored in our database. For a given query protein, all homologs are
considered potential Rosetta stone proteins. If these potential Rosetta stone proteins have other homologs
with regions of sequence similarity different from the regions where the query protein was similar, the potentials
are designated true.
For details, see:
Marcotte, E. M., Pellegrini, M., Ng, H.-L., Rice, D. W., Yeates, T. O. &
Eisenberg, D. Detecting Protein Function & Protein-Protein Interactions from Genome Sequences.
Science 285, 751-753(1999).
What is joint probability associated with the Rosetta stone link results?
Joint probability is the probability of observing protein fusions by random chance. The lower
the probability, the more likely it is that the observed fusion is true (the more accurate the result).
For details, see:
Verjovsky Marcotte, C. J. & Marcotte E. M:
Predicting functional linkages from gene fusions with confidence, App. Bioinform, 2002;1(2):1-8
What is a JobID?
JobID is the unique identifier assigned to the user every time a query is submitted to PLEX. The
identifier is required if results are to be retrieved at a later time from PLEX.
For how long will my results be available?
Results are available only for 24 hrs from the time the data was submitted.
Can I obtain my results after 24 hrs have passed?
Unfortunately, data cannot be retrieved after 24hrs have passed. It is automatically deleted. Future versions
of PLEX may allow longer storage times.
I want to use the results I obtained from PLEX in my paper. Whom should I cite?
Please cite:
Date, S.V. & Marcotte, E.M. Protein function prediction using the Protein Link Explorer
(to appear in Bioinformatics)
Can everybody use PLEX?
PLEX is free for academic and non-profilt use. Non-academic and 'for-profit' users should first write to:
marcotte at icmb.utexas.edu
Is PLEX and the data contained in PLEX copyrighted?
Yes, data contained in PLEX is copyrighted. The copyright notice is displayed at the bottom
of this page.
Author contact information:
Shailesh Date,
1423 Blockley Hall, Center for Bioinformatics
423 Guardian Drive, Univ. of Pennsylvania,
Philadelphia, PA 19104
Phone: 215-746-7020 | FAX: 215-573-3111
|
Edward Marcotte
Dept. of Chemistry and Biochemistry
MBB 3.232 ICMB,
1 University Station, A4800
University of Texas, Austin, TX 78712-0159
Phone: (512) 471-5435 | Fax: (512) 232-3432
Email: marcotte at icmb dot utexas dot edu
|
Copyright © 2002 - 2005, Shailesh Date and Edward Marcotte. The protein link explorer data and
server is the property of the University of Texas, and cannot be used for commercial purposes without
written permission of Edward Marcotte and the University of Texas. It is forbidden to redistribute,
derivatize, or encapsulate the protein link explorer data or server in another database without permission.
Sale of information derived from it, whether directly or in revised form, is forbidden except by permission
of the University of Texas and Edward Marcotte. All copies or mirrors of the protein link explorer must
carry this notice.
|