statistical analysis of rnaseq data
play

Statistical analysis of RNASeq Data Introduction to RNA-seq data - PowerPoint PPT Presentation

Statistical analysis of RNASeq Data Introduction to RNA-seq data analysis dominique-laurent.couturier@cruk.cam.ac.uk [Bioinformatics core] (Source: O. Rueda, CRUK-CI; G. Marot, INRIA) Introduction 2 Grand Picture of Statistics Statistical


  1. Statistical analysis of RNASeq Data Introduction to RNA-seq data analysis dominique-laurent.couturier@cruk.cam.ac.uk [Bioinformatics core] (Source: O. Rueda, CRUK-CI; G. Marot, INRIA)

  2. Introduction 2

  3. Grand Picture of Statistics Statistical Hypotheses Sample H0: µ B = µ L H1: µ B � = µ L Idea: Data: RNASeq counts EGF is differentially expressed (DE) ( x B, 1 ; x B, 2 ; ... ; x B,nB ) in luminal (L) and basal (B) cells ( x L, 1 ; x L, 2 ; ... ; x L,nL ) Inference Point estimation µ B − � � µ L � µ B − � µ L � T obs = ∼ St nT + nC − 2 1 1 s p nB + nL 3

  4. Outline ◮ 1/ Analysis of gene expression measured with Microarrays ⊲ 1a/ Normal distribution ⊲ 1b/ Test of equality of means for two samples: T-test ⊲ 1c/ Test of equality of means for > 2 samples: ANOVA ⊲ 1d/ Test of equality of means for 2 categorical predictors: ANOVA ⊲ 1e/ Test of equality of means for > 2 predictors: Linear model ⊲ 1f/ Confounding ◮ 2/ Analysis of gene expression measured by RNAseq ⊲ Generalisation of the linear model: Negative Binomial regression ◮ 2a/ Negative Binomial distribution ◮ 2b/ Nuisance parameter estimation: Shrinkage estimator ◮ 2c/ Controlling for Library size: Offset ◮ 3/ Controlling for multiple testing ⊲ 3a/ Family-wise error rate ⊲ 3b/ False discovery rate 4

  5. Analysis of gene expression measured with Microarrays Part I dominique-laurent.couturier@cruk.cam.ac.uk [Bioinformatics core] (Source: O. Rueda, CRUK-CI; G. Marot, INRIA)

  6. 1a/ Normal distribution 2 πσ 2 e − ( y − µ )2 1 X ∼ N ( µ, σ 2 ) , √ f Y ( y ) = 2 σ 2 Var [ Y ] = σ 2 , E [ Y ] = µ, Probability density function, f Y ( y | µ, σ ) 0.4 0.3 0.2 0.1 0.0 µ − 3 σ µ − 2 σ µ − σ µ µ + σ µ + 2 σ µ + 3 σ 68.27% 95.45% 99.73% 6

  7. 1a/ Normal distribution 2 πσ 2 e − ( y − µ )2 1 X ∼ N ( µ, σ 2 ) , √ f Y ( y ) = 2 σ 2 Var [ Y ] = σ 2 , E [ Y ] = µ, ◮ Suitable modelling for a lot of variables 0.5 0.4 0.3 0.2 0.1 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 (Gene expression values of gene ‘X’ of basal cells of 33 mice) 6

  8. 1a/ Normal distribution 2 πσ 2 e − ( y − µ )2 1 X ∼ N ( µ, σ 2 ) , √ f Y ( y ) = 2 σ 2 Var [ Y ] = σ 2 , E [ Y ] = µ, ◮ Suitable modelling for a lot of variables 0.5 0.4 0.3 0.2 0.1 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 (Gene expression values of gene ‘X’ of basal cells of 33 mice) 6

  9. 1b/ Test of equality of means for two samples Intensity expression of gene 'X' Basal ● n=33 Luminal ● n=43 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 We test H0 : µ B − µ L = 0 against H1 : µ B − µ L � = 0 . We know: 0.4 T ∼ St 100 T ∼ St 50 � µ B − � µ L ◮ Student’s t-test [assume σ 2 B = σ 2 T ∼ St 10 L ]: � ∼ t n B + n L − 2 , T ∼ St 2 0.3 1 1 s p nB + nL � 0.2 s 2 B ( n B − 1)+ s 2 L ( n L − 1) Density ◮ s p = . n B + N L − 2 0.1 0.0 -4.303 95% 4.303 -2.228 95% 2.228 -2.009 95% 2.009 -1.984 95% 1.984 -5 -4 -3 -2 -1 0 1 2 3 4 5 7

  10. 1b/ Test of equality of means for two samples Intensity expression of gene 'X' Basal ● n=33 Luminal ● n=43 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 We test H0 : µ B − µ L = 0 against H1 : µ B − µ L � = 0 . We know: 0.4 T ∼ St 100 T ∼ St 50 � µ B − � µ L ◮ Student’s t-test [assume σ 2 B = σ 2 T ∼ St 10 L ]: � ∼ t n B + n L − 2 , T ∼ St 2 0.3 1 1 s p nB + nL � 0.2 s 2 B ( n B − 1)+ s 2 L ( n L − 1) Density ◮ s p = . n B + N L − 2 0.1 Two Sample t-test 0.0 -4.303 95% 4.303 -2.228 95% 2.228 data: Basal and Luminal -2.009 95% 2.009 -1.984 95% 1.984 t = 6.6751, df = 74, p-value = 3.941e-09 alternative hypothesis: true difference in means is not equal to 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 95 percent confidence interval: 1.048457 1.940748 sample estimates: mean of x mean of y 2.923908 1.429305 7

  11. 1b/ Test of equality of means for two samples ◮ Modelling 1: Y i ( B ) = µ B + ǫ i Y i ( L ) = µ L + ǫ i Intensity expression of gene 'X' Basal ● n=33 Luminal ● n=43 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 8

  12. 1b/ Test of equality of means for two samples ◮ Modelling 1: Y i ( B ) = µ B + ǫ i Y i ( L ) = µ L + ǫ i Intensity expression of gene 'X' Basal ◮ Modelling 2: ● n=33 Luminal ● Y i = µ B + δ L I ( i ∈ L ) + ǫ i n=43 = β 0 + β 1 X 1 + ǫ i −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 where i = 1 , ..., n ; ǫ i ∼ N (0 , σ 2 ) . 8

  13. 1b/ Test of equality of means for two samples Intensity expression of gene 'X' Basal ● n=33 ◮ Modelling 1: Luminal ● n=43 Y i = µ B I ( i ∈ B ) + µ L I ( i ∈ L ) + ǫ i −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 Y = X β + ǫ where i = 1 , ..., n ; ǫ i ∼ N (0 , σ 2 ) . 9

  14. 1b/ Test of equality of means for two samples Intensity expression of gene 'X' Basal ● n=33 ◮ Modelling 1: Luminal ● n=43 Y i = µ B I ( i ∈ B ) + µ L I ( i ∈ L ) + ǫ i −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 Y = X β + ǫ where i = 1 , ..., n ; ǫ i ∼ N (0 , σ 2 ) . Call: lm(formula = expression ~ celltype - 1, data = microarrays) Residuals: Min 1Q Median 3Q Max -2.64401 -0.58586 0.01473 0.65051 2.47771 Coefficients: Estimate Std. Error t value Pr(>|t|) celltypeBasal 2.9239 0.1684 17.361 < 2e-16 *** celltypeLuminal 1.4293 0.1475 9.687 8.47e-15 *** --- 0 ,¨ o***,¨ o 0.001 ,¨ o**,¨ o 0.01 ,¨ o*,¨ o 0.05 ,¨ o.,¨ o 0.1 ,¨ o ,¨ Signif. codes: A` A^ A` A^ A` A^ A` A^ A` A^ o 1 Residual standard error: 0.9675 on 74 degrees of freedom Multiple R-squared: 0.8423,Adjusted R-squared: 0.838 F-statistic: 197.6 on 2 and 74 DF, p-value: < 2.2e-16 9

  15. 1b/ Test of equality of means for two samples Intensity expression of gene 'X' Basal ● n=33 ◮ Modelling 2: Luminal ● n=43 Y i = µ B + δ L I ( i ∈ L ) ǫ i −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 = β 0 + β 1 X 1 + ǫ i Y = X β + ǫ where i = 1 , ..., n ; ǫ i ∼ N (0 , σ 2 ) . 10

  16. 1b/ Test of equality of means for two samples Intensity expression of gene 'X' Basal ● n=33 ◮ Modelling 2: Luminal ● n=43 Y i = µ B + δ L I ( i ∈ L ) ǫ i −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 = β 0 + β 1 X 1 + ǫ i Y = X β + ǫ where i = 1 , ..., n ; ǫ i ∼ N (0 , σ 2 ) . Call: lm(formula = expression ~ celltype, data = microarrays) Residuals: Min 1Q Median 3Q Max -2.64401 -0.58586 0.01473 0.65051 2.47771 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.9239 0.1684 17.361 < 2e-16 *** celltypeLuminal -1.4946 0.2239 -6.675 3.94e-09 *** --- 0 ,¨ o***,¨ o 0.001 ,¨ o**,¨ o 0.01 ,¨ o*,¨ o 0.05 ,¨ o.,¨ o 0.1 ,¨ o ,¨ Signif. codes: A` A^ A` A^ A` A^ A` A^ A` A^ o 1 Residual standard error: 0.9675 on 74 degrees of freedom Multiple R-squared: 0.3758,Adjusted R-squared: 0.3674 F-statistic: 44.56 on 1 and 74 DF, p-value: 3.941e-09 10

  17. 1c/ Test of equality of means for > 2 samples ◮ One-way ANOVA hypotheses ⊲ H0: µ L = µ P = µ V , ⊲ H1: µ k � = µ l for at least one pair ( k, l ) . Intensity expression of gene 'X' Virgin Pregnant ● Lactating −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 11

  18. 1c/ Test of equality of means for > 2 samples ◮ One-way ANOVA hypotheses ⊲ H0: µ L = µ P = µ V , ⊲ H1: µ k � = µ l for at least one pair ( k, l ) . Intensity expression of gene 'X' ◮ Modelling 1: Virgin Y i ( L ) = µ L + ǫ i Pregnant ● Y i ( P ) = µ P + ǫ i Lactating Y i ( V ) = µ V + ǫ i −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 Y i = µ L I ( i ∈ L ) + µ P I ( i ∈ P ) + µ V I ( i ∈ V ) + ǫ i Y = X β + ǫ 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend