statistical models for sequencing data from experimental
play

Statistical Models for sequencing data: from Experimental Design to - PowerPoint PPT Presentation

Best practices in the analysis of RNA-Seq data 28 th -29 th March 2018 University of Cambridge, Cambridge, UK Statistical Models for sequencing data: from Experimental Design to Generalized Linear Models Oscar M. Rueda Breast Cancer Functional


  1. Best practices in the analysis of RNA-Seq data 28 th -29 th March 2018 University of Cambridge, Cambridge, UK Statistical Models for sequencing data: from Experimental Design to Generalized Linear Models Oscar M. Rueda Breast Cancer Functional Genomics Group. CRUK Cambridge Research Institute (a.k.a. Li Ka Shing Centre) � Oscar.Rueda@cruk.cam.ac.uk 1

  2. Outline • Experimental Design • Design and Contrast matrices • Generalized linear models • Models for coun:ng data 2

  3. To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of. Sir Ronald Fisher (1890-1962) [evolu:onary biologist, gene:cist and sta:s:cian] 3

  4. An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem. John Tukey (1915-2000) [Sta:s:cian] 4

  5. An unsophisticated forecaster uses statistics as a drunken man uses lamp-posts - for support rather than for illumination. Andrew Lang (1844-1912) [Poet, novelist and literary cri:c] 5

  6. Experimental Design

  7. Design of an experiment • Select biological ques:ons of interest • Iden:fy an appropriate measure to answer that ques:on • Select addi:onal variables or factors that can have an influence in the result of the experiment • Select a sample size and the sample units • Assign samples to lanes/flow cells. 7

  8. Principles of Sta:s:cal Design of Experiments • R. A. Fisher: – Replica:on – Blocking – Randomiza:on. • They have been used in microarray studies from the beginning. • Bar coding makes easy to adapt them to NGS studies. 8

  9. Unreplicated Data Inferences for RNA and fragment-level can be obtained through Fisher’s test. But they don’t reflect biological variability. 9 Auer and Doerge. Genetics 185:405-416(2010)

  10. Replicated Data Inferences for treatment effect using generalized linear models Is this a good design? (more on this later). We should randomize within block! 10 Auer and Doerge. Genetics 185:405-416(2010)

  11. Balanced Block Designs • Avoids confounding effects: – Lane effects (any errors from the point where the sample is input to the flow cell un:l the data output). Examples: systema:cally bad sequencing cycles, errors in base calling… – Batch effects (any errors afer random fragmenta:on of the RNA un:l it is input to the flow cell). Examples: PCR amplifica:on, reverse transcrip:on ar:facts… – Other effects non related to treatment. 11 Auer and Doerge. Genetics 185:405-416(2010)

  12. Balanced blocks by mul:plexing Auer and Doerge. Genetics 185:405-416(2010)

  13. Benefits of a proper design • NGS is benefited with design principles • Technical replicates can not replace biological replicates • It is possible to avoid mul:plexing with enough biological replicates and sequencing lanes • The advantages of mul:plexing are bigger than the disadvantages (cost, loss of sequencing depth, bar-code bias…) 13

  14. Design and contrast matrices

  15. Sta:s:cal models – We want to model the expected result of an outcome (dependent variable) under given values of other variables (independent variables) Arbitrary function (any shape) A set of k Expected value of variable Y independent variables E ( Y ) = f ( X ) (also called factors) This is the Y = f ( X ) + ε variability around the expected mean of y 15

  16. Design matrix – Represents the independent variables that have an influence in the response variable, but also the way we have coded the information and the design of the experiment. – For now, let’s restrict to models Y = β X + ε Stochastic error Response variable Parameter vector Design matrix 16

  17. Types of designs considered • Models with 1 factor – Models with two treatments – Models with several treatments • Models with 2 factors – Interac:ons • Paired designs • Models with categorical and con:nuous factors • TimeCourse Experiments • Mul:factorial models. 17

  18. Strategy • Define our set of samples • Define the factors, type of factors (con:nuous, categorical), number of levels… • Define the set of parameters: the effects we want to es:mate • Build the design matrix, that relates the informa:on that each sample contains about the parameters. • Es:mate the parameters of the model: tes:ng • Further es:ma:on (and tes:ng): contrast matrices.

  19. Models with 1 factor, 2 levels Treatme Sample Treatment Sample1 Treatment A Sample 2 Control Sample 3 Treatment A Sample 4 Control Sample 5 Treatment A Sample 6 Control Number of samples: 6 Number of factors: 1 Treatment: Number of levels: 2 Possible parameters (What differences are important)? - Effect of Treatment A - Effect of Control 19

  20. Design matrix for models with 1 factor, 2 levels Sample Treatment Sample1 Treatment A Sample 2 Control Sample 3 Treatment A Sample 4 Control Treat. A Control Sample 5 Treatment A Parameters (coefficients, Sample 6 Control levels of the variable) ! $ ! $ S 1 Sample 1 1 0 # & # & Sample 2 S 2 0 1 # & # & ! $ T Sample 3 # & # & S 3 1 0 = # & # & # & C Sample 4 " % 0 1 S 4 # & # & Sample 5 1 0 S 5 # & # & # & Sample 6 0 1 # & S 6 " % " % C is the mean expression of the control T is the mean expression of the treatment Design Matrix Equivalent to a t-test 20

  21. Design matrix for models with 1 factor, 2 levels Sample Treatment Sample1 Treatment A Sample 2 Control Sample 3 Treatment A Sample 4 Control Treat. A Control Sample 5 Treatment A Parameters (coefficients, Sample 6 Control levels of the variable) ! $ ! $ S 1 Sample 1 1 0 # & # & Sample 2 S 2 0 1 # & # & ! $ T Sample 3 # & # & S 3 1 0 = # & # & # & C Sample 4 " % 0 1 S 4 # & # & Sample 5 1 0 S 5 # & # & # & Sample 6 0 1 # & S 6 " % " % Design Matrix Equivalent to a t-test 21

  22. Intercepts Different parameteriza:on: using intercept Let’s now consider this parameteriza:on: Sample Treatment Sample1 Treatment A C= Baseline expression T A = Baseline expression + effect of treatment Sample 2 Control Sample 3 Treatment A So the set of parameters are: Sample 4 Control Sample 5 Treatment A C = Control (mean expression of the control) Sample 6 Control a = T A – Control (mean change in expression under treatment 22

  23. Intercept Different parameteriza:on: using intercept Treatment A Intercept Parameters (coefficients, levels of the variable) ! $ ! $ S 1 Sample 1 1 1 # & # & Sample 2 S 2 1 0 # & # & ! $ β 0 Sample 3 # & # & S 3 1 1 = # & # & # & a Sample 4 # & 1 0 S 4 " % # & # & Sample 5 1 1 S 5 # & # & # & Intercept measures the Sample 6 1 0 # & S 6 " % " % baseline expression. a measures now the differen:al expression between Treatment A and Design Matrix Control 23

  24. Contrast matrices Are the two parameteriza:ons equivalent? " $ ˆ T " $ & ' 1 − 1 = T − C # % ˆ & ' C # % Contrast matrices allow us to es:mate (and test) linear Contrast matrix combina:ons of our coefficients. 24

  25. Models with 1 factor, more than 2 levels Treatme Sample Treatment Sample1 Treatment A Sample 2 Treatment B Sample 3 Control Sample 4 Treatment A Sample 5 Treatment B Sample 6 Control ANOVA models Number of samples: 6 Number of factors: 1 Treatment: Number of levels: 3 Possible parameters (What differences are important)? - Effect of Treatment A - Effect of Treatment B - Effect of Control 25 - Differences between treatments?

  26. Design matrix for ANOVA models ! $ ! $ S 1 1 0 0 # & # & ! $ T A S 2 0 1 0 Sample Treatment # & # & # & # & # & 0 0 1 S 3 T B # & Sample1 Treatment A = # & # & 1 0 0 S 4 # & Sample 2 Treatment B # & C # & # & " % 0 1 0 S 5 # & # & Sample 3 Control # & 0 0 1 # & S 6 " % " % Sample 4 Treatment A Sample 5 Treatment B ! $ ! $ S 1 1 1 0 Sample 6 Control # & # & ! $ β 0 S 2 1 0 1 # & # & # & # & # & S 3 1 0 0 a # & = # & # & 1 1 0 S 4 # & b # & # & # & " % 1 1 1 S 5 # & # & # & 1 0 0 # & S 6 " % " % 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend