Supplementary Information: Lu, Nakorchevskiy, & Marcotte, Proc. Natl. Acad. Sci. USA in press, 2003.

Expression deconvolution: A reinterpretation of DNA microarray data reveals dynamic changes in cell populations

Peng Lu, Aleksey Nakorchevskiy, & Edward M Marcotte

Institute for Cellular and Molecular Biology, and
Department of Chemistry and Biochemistry
1 University Station A4800, University of Texas, Austin, Texas 78712.
Address all correspondence to EMM: marcotte@icmb.utexas.edu

Abstract

Cells grow in dynamically evolving populations, yet this aspect of experiments often goes unmeasured. A method is proposed for measuring the population dynamics of cells on the basis of their mRNA expression patterns. The population's expression pattern is modeled as the linear combination of mRNA expression from pure samples of cells, allowing reconstruction of the relative proportions of pure cell types in the population. Application of the method, termed expression deconvolution, to yeast grown under varying conditions reveals the population dynamics of the cells during the cell cycle, during the arrest of cells induced by DNA damage and the release of arrest in a cell cycle checkpoint mutant, during sporulation, and following environmental stress. Using expression deconvolution, cell cycle defects are detected and temporally ordered in 146 yeast deletion mutants; six of these defects are independently experimentally validated. Expression deconvolution allows a reinterpretation of the cell cycle dynamics underlying all previous microarray experiments and can be more generally applied to study most forms of cell population dynamics.

PubMed | PDF

Within a mixed population of cells, distinct cell types have distinct programs of transcription. Likewise, cells from distinct phases of the cell cycle exhibit phase-specific transcriptional patterns. When transcription levels are measured from a population of cells in a typical experiment, such as by using DNA microarrays, the measured transcription actually represents the weighted average of these many independent transcriptional programs.

Expression deconvolution is a method we have developed to estimate the proportions of different cells or cell types in a cell population by deconvoluting or deconstructing the DNA microarray data of the entire cell population as the weighted combination of expression from the distinct cell types. We are essentially treating specific transcriptional patterns in DNA microarray data as cell-type specific markers, then looking for the relative proportions of these markers.

The program Deconvolute is a Java application that performs these calculations. The program operates on two files of microarray data:

A first set of microarray data are read in to the program to act as basis experiments, representing the cell-type specific expression patterns. For example:
(1) To deconvolute yeast microarray data into contributions from cells in different phases of the cell cycle, the basis experiments we selected were expression data from synchronized yeast cells in the G1, S, G2, M, and M/G1 phases of the cell cycle.
(2) To deconvolute expression data from a tissue biopsy in order to find proportions of distinct cell types, the basis experiments would be expression data from homogenous cell populations representing the distinct cell types.

A second set of microarray data, in the same format, are read in to the program to be analyzed. Each column in this second file, representing one microarray experiment, is fit to find the optimal linear combination of the basis experiments that best model the cell population data. Specifically, mixtures of normalized basis vectors are evaluated with a simulated annealing algorithm to find the mixture giving the maximum Pearson correlation coefficient with the normalized target vector. The optimal weights of the basis vectors are interpreted as the proportions of the corresponding cells in the population.

Note that the simulated annealing algorithm is a stochastic algorithm, and is not guaranteed to find the optimal value every time it is run (although it works quite well.) We typically perform the deconvolution several times to ensure that the results are consistent across runs, and represent the most optimal mixture of basis sets. The outcome of different runs can be compared via the Final Score, which is the Pearson correlation coefficient between the mixture of basis sets and the actual population data--this correlation coefficient ranges from -1 to 1, where 1 represents the maximum positive correlation (and a perfect fit to the data.) The other statistics returned with each run refer to the progress of the simulated annealing algorithm, such as the convergence to a solution, and the rate of improvement of the fit over the course of the algorithm's search of different cell mixtures.

The program Deconvolute is available for download here. Deconvolute is programmed in Java 2 (v. 1.4) and should run on any system with Java installed. We have specifically tested it on PCs, where it is run by double-clicking on the icon. Two data sets are provided for testing out the algorithm. The first is the set of basis experiments for the yeast cell cycle, derived from the data of Spellman et al. (1998. Mol Biol Cell. 9:3273-97). The second data set is that of microarray experiments from yeast grown at constant temperature, collected by Gasch et al. (2000. Mol Biol Cell 11:4241-57.).

Last modified: Mon Aug 4 19:50:08 CDT 2003

Copyright © 2002, Edward Marcotte. This site is not intended for commercial use. The program Deconvolute is the property of the Regents of the University of Texas, and cannot be used for commercial purposes without written permission of Edward Marcotte and the Regents of UT. It is forbidden to redistribute, derivatize, or encapsulate the program or algorithm in another database without permission. Sale of information derived from it, whether directly or in revised form, is forbidden except by permission of UT and Edward Marcotte. All copies or mirrors of the Deconvolute web server must carry this notice.