Kasper Daniel Hansen
< khansen@jhsph.edu | www.hansenlab.org > McKusick-Nathans Institute of Genetic Medicine Department of Biostatistics Johns Hopkins University
Lessons from Gene Expression
1
Lessons from Gene Expression Kasper Daniel Hansen < - - PowerPoint PPT Presentation
Lessons from Gene Expression Kasper Daniel Hansen < khansen@jhsph.edu | www.hansenlab.org > McKusick-Nathans Institute of Genetic Medicine Department of Biostatistics Johns Hopkins University 1 Genomic Data Science Specialization
1
2
3
4
5
6
7
8
9
10
11
12
1.5 0.5 0.5 1.5 1.5 0.5 0.5
Sequencing s.d. Sequencing s.d. Array s.d. Array s.d. COX4NB RASGRP1 Centered expression Sequencing Array
1 –1 1 –1 10 40 10 40
Sample index
cor: 0.592 n: 5,003 cor: 0.492 n: 2,463
13
14
15
16
Fraction of sign-reversed correlations Batch pair
2002/2003 2002/2004 2002/2005 2003/2004 2003/2005 2004/2005 0.16 0.14 0.12 0.10 0.06 0.06 0.04 0.02
Labels not shuffled Labels shuffled
Figure 3 | Batch effects also change the correlations between genes. We normalized every gene in the second gene expression data set2 in to mean 0, variance 1 within each batch. (The 2006 batch was omitted owing to small sample size.) We identified all significant correlations (p < 0.05) between pairs of genes within each batch using a linear model. We looked at genes that showed a significant correlation in two batches and counted the fraction of times that the correlation changed between the two batches. A large percentage of significant correlations reversed signs across batches, suggesting that the correlation structure between genes changes substantially across batches. To confirm this phenomenon is due to batch, we repeated the process — looking for significant correla- tions that changed sign across batches — but with the batch labels randomly permuted. With random batches, a much smaller fraction of significant correlations change signs. This suggests that correlation patterns differ by batch, which would affect rank-based prediction methods as well as system biology approaches that rely on between-gene correlation to estimate pathways.
GENETICS 737
17
18
19
20
21