PLEX Help/FAQs

Welcome to PLEX. We hope you find PLEX easy to use. We have complied a short list of questions that are most frequently asked by users of PLEX. We hope it is sufficient. If you need more information, please write to marcotte at icmb.utexas.edu
The old help file is located here.
General Profiles Gene neighbors Rosetta stone protein links Submission questions: Citing, Availability and Copyright Author contact information


What is PLEX? What can PLEX be used for?
    PLEX stands for Protein Link Explorer. Given an amino acid sequence, PLEX allows a phylogenetic profile to be constructed, which can subsequently be compared to profiles of upto 350,111 other proteins from 89 fully sequenced genomes, that are stored in our database. Functional information from proteins whose profiles match the query profile can then be superimposed on the query protein. In addition, PLEX also allows searches for gene neighbors and Rosetta stone links of proteins whose profiles match the query profile.

What is a phylogenetic profile?
    Phylogenetic profiles were first described by Pellegrini et al, 1999. A phylogenetic profile is simply description of the presence or absence of a given protein in a set of genomes.

     
    Genome1
    Genome2
    Genome3
    Genome4
    Profile of protein A
    Present
    Absent
    Absent
    Absent
    Profile of protein B
    Present
    Present
    Absent
    Present
    Profile of protein C
    Present
    Present
    Absent
    Present

    In the example given above, profiles of proteins B and C are similar, whereas profile of protein A is different. In real life, a profile is constructed using the BLAST similarity (E-value) score of the highest scoring sequence from a given genome.

     
    Genome1
    Genome2
    Genome3
    Genome4
    Profile of protein A
    1e-30
    1e-298
    1e-6
    0.04
    Profile of protein B
    1e-57
    1e-84
    1e-9
    1e-9
    Profile of protein C
    1e-57
    1e-84
    1e-9
    1e-9

    The E-values are subsequently converted to scores using the formula -1/log(E-value), and then included in the profile. Profiles of all known proteins from 89 completely sequenced genomes are pre-calculated are stored in our database. The following is an example of profile that is stored in our database:
    >afulgidus|gi|11497625|ref|NP_068845.1|  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
    0.00 0.00 0.00 0.03 0.02 0.02 1.00 0.02 0.02 0.02 1.00 0.02 1.00 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 
    0.03 0.02 0.02 0.02 0.03 0.03 0.03 0.03 0.02 0.02 0.02 1.00 1.00 0.02 1.00 0.03 0.03 0.03 0.02 0.02 0.03 
    0.03 0.03 0.03 1.00 0.02 0.02 0.02 0.02 0.03 0.02 0.03 0.03 0.01 0.01 0.02 0.02 0.02 0.03 1.00 0.02 1.00 
    0.02 0.02 0.02 1.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
    

How is a phylogenetic profile constructed in PLEX?
    A phylogenetic profile is constructed by BLASTing the query amino acid sequence against a locally maintained database of completely sequenced genomes. BLAST E-value of the amino acid sequence most similar to the query sequence from each of the genomes is then used to construct the profile (or mathematically referred to as the profile vector). Each E-value is converted to a score using the formula -1/log (E-value), before it is used in profile construction.

What is the FASTA format?
    In the FASTA format, a line beginning with a greater than (">") sign represents a comment, and is ignored by programs that use files in the FASTA format as input. Lines that begin without this sign are treated as sequence lines. Blank lines are ignored.

What are Raw and Formatted BLAST results?
    A raw BLAST result is simply the result of comparing two sequences as reported by BLAST. These results are formatted so that all the information relevant to profile construction is reported on one line.

What does profile comparison mean?
    When comparing profiles, PLEX attempts to calculate similarity between the query profile and profiles of all proteins from the selected genome by measuring mutual information between them. Mutual information, an information theoretic measure, is maximum when there is complete covariation between the occurrences of a pair of genes, and tends to zero as variation decreases or the gene occurrences vary independently.
    For details, see:
    Date, S. V. & Marcotte, E. M   Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages, Nature Biotechnology, 21, 9, 1055-1062 (2003).

How do I choose a cut-off value?
    Choosing a mutual information cut-off value is completely user-dependent. In studies using normal and shuffled profiles, shuffled profiles were rarely seen to score above 0.7 mutual information value (this does not mean that values below 0.7 are uninformative). Mutual information values of 0.75 represent ~35-50% functional similarity, whereas values above 0.95 indicate an almost 100% chance of two proteins being functionally linked, in the E. coli K12 and S. cerevisiae genomes.
    For details, see:
    Date, S. V. & Marcotte, E. M   Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages, Nature Biotechnology, 21, 9, 1055-1062 (2003).

What genomes can I select? Can I select more than one genome for comparison?
    You can choose from any of the 89 completely sequenced prokaryotic and eukaryotic genomes that are currently publicly available. Newer versions of PLEX will have more genomes, as they become available. Currently, only one genome can be selected for analysis at one time.

Why should I omit homologs from my comparisons?
    Homologs have the same function, and are expected to have the same phylogenetic profile. No new information is obtained by comparing homologs.

What do the profile comparison results mean? Why should I select a candidate for further comparison?
    After comparing the query profile with the profiles of proteins from the selected genome, the best matches are rank ordered by their mutual information scores. The higher the mutual information score, the more likely it is that the protein with a matching profile is similar in function to the query protein. By selecting a protein whose profile that matches the profile of the query, you can investigate the protein's gene neighbors, as well as find any Rosetta stone links.

What are patterns of gene distribution?
    Patterns of gene distribution, for the sake of analysis carried out on the PLEX website, are the observed distributions of genes in a set of reference genomes, as revealed during profile construction. When such patterns are collectively viewed, they are helpful in discerning any trends, which can be subsequently used in designating genes as belonging to a particular group.

What are gene neighbors? How are they calculated?
    Genes placed next to each other on a chromosome (prokaryotic or eukaryotic), are designated as gene neighbors. Gene neighbors are calculated based on the user-specified distance between predicted or observed start and stop sites associated with each designated gene. Gene neighbors can have similar or dissimilar orientations.

What are Rosetta stone proteins?
    Fully functional proteins that are a fusion of two or more independent proteins, either in the same or a different organism, are designated as Rosetta stone proteins. The presence of Rosetta stone protein implies that the two independent proteins may be functionally linked with each other.
    For details, see:
    Marcotte, E. M., Pellegrini, M., Ng, H.-L., Rice, D. W., Yeates, T. O. & Eisenberg, D.  Detecting Protein Function & Protein-Protein Interactions from Genome Sequences.   Science  285, 751-753(1999).

How are Rosetta stone links calculated?
    Rosetta stone links are precalculated and stored in our database. For a given query protein, all homologs are considered potential Rosetta stone proteins. If these potential Rosetta stone proteins have other homologs with regions of sequence similarity different from the regions where the query protein was similar, the potentials are designated true.
    For details, see:
    Marcotte, E. M., Pellegrini, M., Ng, H.-L., Rice, D. W., Yeates, T. O. & Eisenberg, D.  Detecting Protein Function & Protein-Protein Interactions from Genome Sequences.   Science  285, 751-753(1999).

What is joint probability associated with the Rosetta stone link results?
    Joint probability is the probability of observing protein fusions by random chance. The lower the probability, the more likely it is that the observed fusion is true (the more accurate the result).
    For details, see:
    Verjovsky Marcotte, C. J. & Marcotte E. M: Predicting functional linkages from gene fusions with confidence, App. Bioinform, 2002;1(2):1-8

What is a JobID?
    JobID is the unique identifier assigned to the user every time a query is submitted to PLEX. The identifier is required if results are to be retrieved at a later time from PLEX.

For how long will my results be available?
    Results are available only for 24 hrs from the time the data was submitted.

Can I obtain my results after 24 hrs have passed?
    Unfortunately, data cannot be retrieved after 24hrs have passed. It is automatically deleted. Future versions of PLEX may allow longer storage times.

I want to use the results I obtained from PLEX in my paper. Whom should I cite?
    Please cite:
    Date, S.V. & Marcotte, E.M. Protein function prediction using the Protein Link Explorer (to appear in Bioinformatics)

Can everybody use PLEX?
    PLEX is free for academic and non-profilt use. Non-academic and 'for-profit' users should first write to:
    marcotte at icmb.utexas.edu
Is PLEX and the data contained in PLEX copyrighted?
    Yes, data contained in PLEX is copyrighted. The copyright notice is displayed at the bottom of this page.

Author contact information:
    Shailesh Date,
    1423 Blockley Hall, Center for Bioinformatics
    423 Guardian Drive, Univ. of Pennsylvania,
    Philadelphia, PA 19104
    Phone: 215-746-7020 | FAX: 215-573-3111
    Edward Marcotte
    Dept. of Chemistry and Biochemistry
    MBB 3.232 ICMB,
    1 University Station, A4800
    University of Texas, Austin, TX 78712-0159
    Phone: (512) 471-5435 | Fax: (512) 232-3432
    Email: marcotte at icmb dot utexas dot edu


Copyright © 2002 - 2005, Shailesh Date and Edward Marcotte. The protein link explorer data and server is the property of the University of Texas, and cannot be used for commercial purposes without written permission of Edward Marcotte and the University of Texas. It is forbidden to redistribute, derivatize, or encapsulate the protein link explorer data or server in another database without permission. Sale of information derived from it, whether directly or in revised form, is forbidden except by permission of the University of Texas and Edward Marcotte. All copies or mirrors of the protein link explorer must carry this notice.