- Vol. 38, No. 3 (2005)
BioTechniques 463
INTRODUCTION Biological processes undergo complex interactions between many genes and gene products. Genome- wide microarray profiling technology has been recognized as a breakthrough to understand such complex gene regulations and interactions simultane-
- usly in biology and medicine (1–4).
A cDNA microarray slide consists of thousands of cDNA clones spotted
- n a high-density glass slide. Each
slide is competitively hybridized with two independent mRNA samples, labeled with red (Cy™5) and green (Cy3) fluorescent dyes. Each cDNA clone’s (or gene’s) expression levels can then be measured by reading two fluorescence intensities in the green (G) and red (R) channels for the two RNA samples. The ratio of these two fluorescence intensities at each spot represents the relative abundance of the corresponding cDNA probe (5). However, in cDNA microarray experiments, different sources of systematic and random error can
- arise. These may significantly affect
the inference on the measured gene expression patterns. A normalization procedure and a variance-stabi- lizing transformation are commonly employed to remove (or minimize) the artifacts due to such error variation. Several normalization methods have been proposed using parametric and nonparametric statistical models (6–8). Those normalization methods mainly focus on adjusting for the location parameters such as means or medians. Larger variability is often observed at low log-transformed intensity regions, because at low intensity levels, the background noise is a larger proportion
- f the observed expression intensity
(i.e., lower signal-to-noise ratio), while at high levels of expression intensity, this background noise is dominated by the expression intensity. To obtain homogeneous variability across different intensity regions and genes, variance-stabilizing and other transformation approaches have been suggested, including generalized log transformation (9–14). While these normalization approaches and variance-stabi- lizing transformations are useful for adjusting the bias of each individual slide, they do not provide a rigorous statistical criterion to detect outlying slides that have unusual expression patterns or show larger variability than other slides. At an earlier stage of analysis, each microarray slide is often examined graphically using the scatter plot between the two intensity channels to examine the overall patterns and
- variability. However, such exami-
nation is based on subjective human pattern recognition, and outlying slides can frequently enter the subse- quent analysis, resulting in unreliable inference on the whole microarray study. Therefore, the main focus of this study is to identify the outlying slides that have unusual nonlinear expression patterns and/or larger variability than
- ther slides in a microarray data set.
We propose the diagnostic plot (DP) approach that succinctly summarizes and detects outlying slides from a cDNA microarray study. The proposed DP is motivated by the observation that adjustment
- f nonlinear trends between the two
intensity channels often results in different degrees of correlation between
- them. Figure 1 shows the log-scatter
plots based on Lowess normalization for the slides from the rat neuronal
Diagnostic plots for detecting outlying slides in a cDNA microarray experiment
Taesung Park1, Sung-Gon Yi1, SeungYeoun Lee2, and Jae K. Lee3
BioTechniques 38:463-471 (March 2005)
1Seoul National University, 2Sejong University, Seoul, Korea, and 3University of Virginia, Charlottesville, VA, USA
Different sources of systematic and random error variations are often observed in cDNA microarray experiments. A simple scatter plot is commonly used to examine outlying slides that have unusual expression patterns or larger variability than other slides. These
- utlying slides tend to have large impacts on the subsequent analyses, such as identification of differentially expressed genes and
clustering analysis. However, it is difficult to select outlying slides rigorously and consistently based on subjective human pattern recognition on their scatter plots. A graphical method and a rigorous diagnostic measure are proposed to detect outlying slides. The proposed graphical method is easy to implement and shown to be quite effective in detecting outlying slides in real microarray data
- sets. This diagnostic measure is also informative to compare variability among slides. Two cDNA microarray data sets are carefully
examined to illustrate the proposed approach. A 3840-gene microarray experiment for neuronal differentiation of cortical stem cells and a 2076-gene microarray experiment for anticancer compound time-course expression of the NCI-60 cancer cell lines.