Lessons from Gene Expression Kasper Daniel Hansen < - - PowerPoint PPT Presentation

lessons from gene expression
SMART_READER_LITE
LIVE PREVIEW

Lessons from Gene Expression Kasper Daniel Hansen < - - PowerPoint PPT Presentation

Lessons from Gene Expression Kasper Daniel Hansen < khansen@jhsph.edu | www.hansenlab.org > McKusick-Nathans Institute of Genetic Medicine Department of Biostatistics Johns Hopkins University 1 Genomic Data Science Specialization


slide-1
SLIDE 1

Kasper Daniel Hansen

< khansen@jhsph.edu | www.hansenlab.org > McKusick-Nathans Institute of Genetic Medicine Department of Biostatistics Johns Hopkins University

Lessons from Gene Expression

1

slide-2
SLIDE 2

Genomic Data Science Specialization @Coursera by JHU

2

Liliana Florea
 Jeff Leek
 Steven Salzberg Kasper D. Hansen
 Mihaela Pertea
 James Taylor 6 classes
 4 weeks per class
 Continuous rollout

slide-3
SLIDE 3

3

slide-4
SLIDE 4

4

RNA-seq Microarrays scRNA-seq

slide-5
SLIDE 5

Replication / Reproducibility

5

It is difficult to get a man to understand something, when his salary depends upon his not understanding it!


  • Upton Sinclair

Replicate samples Replicate the experiment Replicate the conclusion Computation replication / reproduction

slide-6
SLIDE 6

Science

6

“Proof” “Crap”

slide-7
SLIDE 7

Science

7

“Proof” “Crap”

slide-8
SLIDE 8

Science

8

Most of biology

slide-9
SLIDE 9

Different sub-fields have different standards

9

“without knowing anything - if this was your plot, what do you think about that little guy top right"

http://drbecca.scientopia.org/2015/08/18/whose-problem-is-the-reproducibility-crisis-anyway/

slide-10
SLIDE 10

Technical variation

10

Focus

slide-11
SLIDE 11

Controls

11

slide-12
SLIDE 12
  • Seq. tech. does not remove biol. variability

12

  • 1.5

1.5 0.5 0.5 1.5 1.5 0.5 0.5

Sequencing s.d. Sequencing s.d. Array s.d. Array s.d. COX4NB RASGRP1 Centered expression Sequencing Array

1 –1 1 –1 10 40 10 40

Sample index

cor: 0.592 n: 5,003 cor: 0.492 n: 2,463

a c b

Hansen (2011) Nat. Biotech

slide-13
SLIDE 13

Number of replicates

13

GWAS Cell Biology

slide-14
SLIDE 14

Number of replicates

14

GWAS Cell Biology

slide-15
SLIDE 15

Number of replicates

15

“We applied MixupMapper to fjve publicly available human genetical genomics datasets. On average, 3% of all analyzed samples had been assigned incorrect expression phenotypes: in

  • ne of the datasets 23% of the samples had incorrect

expression phenotypes. “

Westra (2011) Bioinformatics

Studies with huge number of samples have challenges as well

slide-16
SLIDE 16

Batch effects

16

h r e n s n si- ly ,

Experimental design solutions

Fraction of sign-reversed correlations Batch pair

2002/2003 2002/2004 2002/2005 2003/2004 2003/2005 2004/2005 0.16 0.14 0.12 0.10 0.06 0.06 0.04 0.02

Labels not shuffled Labels shuffled

Figure 3 | Batch effects also change the correlations between genes. We normalized every gene in the second gene expression data set2 in to mean 0, variance 1 within each batch. (The 2006 batch was omitted owing to small sample size.) We identified all significant correlations (p < 0.05) between pairs of genes within each batch using a linear model. We looked at genes that showed a significant correlation in two batches and counted the fraction of times that the correlation changed between the two batches. A large percentage of significant correlations reversed signs across batches, suggesting that the correlation structure between genes changes substantially across batches. To confirm this phenomenon is due to batch, we repeated the process — looking for significant correla- tions that changed sign across batches — but with the batch labels randomly permuted. With random batches, a much smaller fraction of significant correlations change signs. This suggests that correlation patterns differ by batch, which would affect rank-based prediction methods as well as system biology approaches that rely on between-gene correlation to estimate pathways.

Glossary

PERSPECTIVES

GENETICS 737

Leek (2010) Nat Rev Genet

Tackling the widespread and critical impact of batch effects in high-throughput methods

slide-17
SLIDE 17

Combining experiments - Gene Expression Barcode

17

Latest: McCall (2014) NAR

Biggest barrier is metadata

slide-18
SLIDE 18

Speed of light measured by different groups

18

WJ Youden (1972) Technometrics

slide-19
SLIDE 19

Analysis

19

One-of-a-kind As-a-utility

slide-20
SLIDE 20

How do we know whether something works

20

Fake data
 (simulations) Real data Well designed,
 well executed
 reference experiments

slide-21
SLIDE 21

Lessons from gene expression

21

Huge advantage in common sofuware platform / common formats Designed reference experiments Technological standardization Physical models does not help (not clear this is general) All data is publicly available data