| Supplementary Information: Lu, Nakorchevskiy, & Marcotte, Proc. Natl. Acad. Sci. USA in press, 2003. |
Within a mixed population of cells, distinct cell types have distinct programs of transcription. Likewise, cells from distinct phases of the cell cycle exhibit phase-specific transcriptional patterns. When transcription levels are measured from a population of cells in a typical experiment, such as by using DNA microarrays, the measured transcription actually represents the weighted average of these many independent transcriptional programs.
Expression deconvolution is a method we have developed to estimate the proportions of different cells or cell types in a cell population by deconvoluting or deconstructing the DNA microarray data of the entire cell population as the weighted combination of expression from the distinct cell types. We are essentially treating specific transcriptional patterns in DNA microarray data as cell-type specific markers, then looking for the relative proportions of these markers.
The program Deconvolute is a Java application that performs these calculations. The program operates on two files of microarray data:
A first set of microarray data are read in to the program to act as basis experiments,
representing the cell-type specific expression patterns. For example:
(1) To deconvolute
yeast microarray data into contributions from cells in different phases of the cell
cycle, the basis experiments we selected were expression data from synchronized yeast cells
in the G1, S, G2, M, and M/G1 phases of the cell cycle.
(2) To deconvolute expression data
from a tissue biopsy in order to find proportions of distinct cell types, the basis experiments
would be expression data from homogenous cell populations representing the distinct
cell types.
A second set of microarray data, in the same format, are read in to the program to be analyzed. Each column in this second file, representing one microarray experiment, is fit to find the optimal linear combination of the basis experiments that best model the cell population data. Specifically, mixtures of normalized basis vectors are evaluated with a simulated annealing algorithm to find the mixture giving the maximum Pearson correlation coefficient with the normalized target vector. The optimal weights of the basis vectors are interpreted as the proportions of the corresponding cells in the population.
Note that the simulated annealing algorithm is a stochastic algorithm, and is not guaranteed to find the optimal value every time it is run (although it works quite well.) We typically perform the deconvolution several times to ensure that the results are consistent across runs, and represent the most optimal mixture of basis sets. The outcome of different runs can be compared via the Final Score, which is the Pearson correlation coefficient between the mixture of basis sets and the actual population data--this correlation coefficient ranges from -1 to 1, where 1 represents the maximum positive correlation (and a perfect fit to the data.) The other statistics returned with each run refer to the progress of the simulated annealing algorithm, such as the convergence to a solution, and the rate of improvement of the fit over the course of the algorithm's search of different cell mixtures.
The program Deconvolute is available for download here. Deconvolute is programmed in Java 2 (v. 1.4) and should run on any system with Java installed. We have specifically tested it on PCs, where it is run by double-clicking on the icon. Two data sets are provided for testing out the algorithm. The first is the set of basis experiments for the yeast cell cycle, derived from the data of Spellman et al. (1998. Mol Biol Cell. 9:3273-97). The second data set is that of microarray experiments from yeast grown at constant temperature, collected by Gasch et al. (2000. Mol Biol Cell 11:4241-57.).Copyright © 2002, Edward Marcotte. This site is not intended for commercial use.
The program Deconvolute
is the property of the Regents of the University of Texas, and cannot be used for
commercial purposes without written permission of Edward Marcotte and the Regents of UT.
It is forbidden to redistribute, derivatize, or encapsulate the program or algorithm
in another database without permission. Sale of information
derived from it, whether directly or in revised form, is forbidden except by permission of UT
and Edward Marcotte. All copies or mirrors of the Deconvolute web server must carry this notice.