|
PLEX - Tutorial / Guided-tour
Welcome the Protein Link Explorer (PLEX). Given an amino acid sequence or its GenBank sequence
identifier, PLEX allows a phylogenetic profile to be constructed, which can subsequently be compared to
profiles of upto 350,111 other proteins from 89 fully sequenced genomes, that are stored in our database. In
addition, PLEX allows direct phylogenetic profile input, and also allows creation of individual organism
or group-based profiles, which can then be compared with other profiles. Gene neighbors and rosetta stone links
of all proteins that match the query profile can also be investigated.
To start using PLEX, click on the 'Submit a new job' button (PLEX treats
each individual query as a new job). The user is assigned a job ID, which
can be used to retrieve
results for a period of 24hrs from the time the job was submitted. If you have
submitted a job previously, you can retrieve it by clicking on the 'Retrieve
a previously submitted job' button.
Submitting a new job
Figure 1. To start, click on the 'Submit a new job' button
There are 3 different types of inputs that PLEX can handle: GenBank Identifiers (GIs) of amino acid
sequences, amino acid sequences in FASTA format and phylogenetic profiles themselves.
A. Using GenBank Identifiers (GIs) or amino acid sequences in FASTA format as input:
- If a user submits a GI number, PLEX pulls the associated amino acid sequence from a local copy of the NCBI
non-redundant (nr) database. Once the sequence is obtained, PLEX proceeds to compare this query sequence
with sequences of more than 350,000 proteins from 89 completely sequenced genomes, using NCBI BLAST under
default settings. The top hit from each genome is used, and the raw E-values are converted to inverse-log
scores. Thus, a profile is constructed out of 89 such converted scores.
Figure 2. List of input options and the GI input option are indicated.
Figure 3. Sequence input option is indicated.
Figure 4A. When an amino acid sequence or a GI number is input, PLEX
compares with more than 350,000 amino acid sequences from 89 genomes, using
BLAST. Here, the job ID for the query, and the formatted BLAST results
are indicated.
If an amino acid sequence is submitted, PLEX treats it as the query sequence, and uses it to construct a
phylogenetic profile.
- In both cases, users have an option to view either detailed or formatted BLAST results for their query. The
profile is displayed as a series of colored boxes, where blue indicates an absence of the query from the given
genome, and shades of red indicate varying degrees of confidence in its presence. The user has a choice of
viewing more detailed results of hits to individual genomes, by clicking on the 'show profile details' button.
- Once a profile is created, the user can then compare it with profiles of all known proteins from any of the
89 completely sequenced genomes. The user is required to select a cut-off value for the lowest 'mutual
information score' to be displayed. Mutual information scores are used to indicate similarity between two
phylogenetic profiles. In general, we have observed scores of above 0.7 to be obtained by less than 1 in 1000000
by random chance (see Date and Marcotte), while testing both prokaryotic and eukaryotic genomes.
The user is also required to select the minimum number of matching profiles to be displayed (default = 20
profiles), and whether to include or exclude homologs from the analysis (default = exclude; this is
recommended, as homologs are expected to have the same phylogenetic profile, and will not generally provide
any new information). The user can then proceed to select a genome against which the profile will be compared.
Figure 4B. Phylogenetic profile of the query and comparison options
Figure 5. Results of the phylogenetic profile comparison are displayed. An individual candidate can
be selected for further comparison.
- When reporting the results of phylogenetic profile comparisons, PLEX will also display the user-selected
parameters for reference. The results of the comparison include information such as organism name, GI of the
matching protein sequence, the mutual information score and its function, as obtained from NCBI annotation.
B. Using phylogenetic profiles as input:
- A user can also directly submit a phylogenetic profile to PLEX for comparison against profiles of all
known proteins from a given genome. There are three different ways in which a user can input a phylogenetic
profile:
- A profile of any length, composed of raw E-values can be directly pasted in as the input. PLEX will
automatically convert the E-values to inverse-log scores, and use them for comparison. When a profile is
pasted in, each position is treated as corresponding to a defined genome. Information about genomes and
their positions are available here.
It is not necessary to paste in a profile with 89 values; PLEX will nullify
positions that are left empty. For instance, if a user is interested in archaeal proteins, only the first 16
positions need values. Positions that are empty, have E-values > 1.00 or have a 'N' at a given position
are treated as negative. Once a profile is pasted in, the user can select a mutual information cut-off value
(see section A3).
Figure 6. Example of a profile that is directly pasted into the text
box. Here, genes present in all Archaea, except for 3 Pyrococcus species
(indicated by 'N'), are queried for.
Figure 7. The results show the query profile, and a single profile from
the A. pernix genome that matched the query above a score of 0.5 mutual
information.
- A user can construct a profile by specifying groups of organisms and mathematical operators for
comparison with other profiles. The mathematical operators are listed above the input area. For instance,
a user interested in proteins that are only found in Archaea can use the following conditions:
Use operator = '<'; E-values = 1e-5; for all = Archaea; and ignore (operator = '*' and E-value = irrelevant)
the rest; then compare against genome = Halobacterium sp. NRC-1 - Whole genome; and display maximum = 10
results. In this case, PLEX will not try and match any profile, but will try to retrieve any profile that
matches the description. Also note that in the previous example, empty values were treated as negatives,
in this case, however, values from all other genomes are simply ignored.
- Similarly, a user can also specify E-values and comparison operators for individual organisms.
Figure 8. Group-based and individual genome-based profile queries.
Figure 9. Profiles that match the query description are returned.
- Some key points (where it is easy to get confused at times):
- E-values become more relevant as they decrease, so in most cases, you would end up using the '< (less than)'
operator
- Using the '* (any)' operator will cause E-values to be ignored
- The 'trickiness' factor for operators increases as follows: *, =, >, <, !. Remember, when '! (is not)' is
used, only the specified E-value is excluded, and all the rest are allowed.
- No, you cannot combine the operators yet, maybe in later versions ...
- The results obtained after comparison are displayed in the same format as the results that are obtained
if an amino acid sequence or GI number is input.
C. Selecting individual results for further evaluation:
- Once results for a particular query are obtained, an individual result (technically, the most promising)
can be evaluated further by selecting it, and clicking the 'Analyze further' button. This takes
the user to a new window, where the user can see the profile of the selected candidate, and options for an
iterative search, i.e., the selected profile can be further compared with other profiles from the database. PLEX
treats this as iterations of a search, and will display the iteration step with the results of the comparison.
Figure 10A. Profile of the candidate selected for an iterative search is displayed.
Figure 10B. Options for investigating gene neighbors and rosetta stone linkages for the selected candidate.
- Besides doing an iterative search, gene neighbors and rosetta stone links of the selected candidate can also
be investigated. When looking at gene neighbors, a user can change the intergenic distances, exclude certain
orientations of neighboring genes, and even allow for an specific overlap. The results returned include
information about the candidate, and a linear representation of the neighbors, including their directions
and intergenic distances.
Figure 11. A search for gene neighbors within given boundaries reveal several interesting genes in an operon.
- Results from searches for Rosetta stone fusion links are displayed as series of individual proteins linked
together by a fusion protein in the same or different organism. In addition, a probability score of the fusion
is also displayed, which can be used to estimate the accuracy of the prediction. In this case, the value is the
probability of the event happening by random chance (therefore, the lower, the better).
Figure 12. A search for Rosetta stone links turns up negative
- When results of profile comparison are displayed, user may see the option for visualizing all the profiles
('Draw profiles') together. This will display profiles of all the results on the page, and can be quite useful
for detecting patterns and trends within the gene distributions. This feature, however, is currently not
available for all results of all inputs.
Figure 13. Visualizing all profiles together can help reveal patterns and trends within the results
D. Collating results:
PLEX provides a unique opportunity for an individual user to evaluate possible functional interactions between
proteins via a variety of methods. The results provide glimpses of an interaction network, which can be
exploited to investigate various aspects of the query, such as its position in the network (such as, for
instance, a protein might connect two groups of proteins with different functions, thus indicating that the
protein is likely a fusion protein, and connects two pathways together), its inclusion within a certain
functional group, or even its discriminatory nature. We hope PLEX will prove as useful to you as it is to us.
Figure 14. The isoprenoid pathway as constructed using information from PLEX
Thanks!!
For questions or comments, please send an email to: marcotte at icmb dot utexas dot edu
Copyright © 2002 - 2005, Shailesh Date and Edward Marcotte. The protein link explorer data and
server is the property of the University of Texas, and cannot be used for commercial purposes without
written permission of Edward Marcotte and the University of Texas. It is forbidden to redistribute,
derivatize, or encapsulate the protein link explorer data or server in another database without permission.
Sale of information derived from it, whether directly or in revised form, is forbidden except by permission
of the University of Texas and Edward Marcotte. All copies or mirrors of the protein link explorer must
carry this notice.
|