PH296, Section 36 February 25, 2002 Discussion of: K. Kerr, M. - PowerPoint PPT Presentation

PH296, Section 36 February 25, 2002 Discussion of: K. Kerr, M. Martin, and G. Churchill. (2000). Analysis of variance for gene expression microarray data. Journal of Computational Biology 7 (6): 819-837. S. Dudoit, Y.H. Yang, M. Callow, and T. P. Speed. (2002). Statistical methods for identifying differentially expressed genes in replicated DNA microarray experiments. Statistica Sinica 12 (1). R. Wolfinger, G. Gibson, E. Wolfinger, L. Bennett, H. Hamdadeh, P. Bushel, C. Afshari, and R. Paules. (2001). Assessing gene significance from cDNA microarray expression via mixed models. Journal of Computational Biology 8 (6): 625-637. 1

Issues • Identification of differentially expressed genes. • Magnitude of difference for the spotted genes given the sources of variation. • What level of observation is statistically significant? • Methods for analyzing data. • Experimental design, number of replications. 2

Sources of variation 1. Interesting variation • variation in the expression profile for a given gene • variation in the expression profile among genes • variation in the expression profile due to different treatments 2. Obscuring variation due to • sample preparation • manufacture of the array • hybridization of the sample • optical measurements 3

ANOVA Model Kerr and Churchill (2000) log( y ijkg ) = µ + A i + D j + T k + G g + ( AG ) ig + ( TG ) kg + ε ijkg µ - overall average signal (normalization term) A - array (normalization term) D - dye (normalization term) T - treatment (normalization term) G - overall gene effect ( AG ) - a particular spot on the array ( TG ) - gene expression attributable to treatments!!! ε ijkg independent, identically distributed 4

ANOVA Model - Bootstrap Kerr and Churchill (2000) Estimated differences (Latin square design) �� y 111 g 0 y 221 g 0 � TG ) 2 g 0 = 1 − 1 y 111 g y 221 g ( � TG ) 1 g 0 − ( � 2 log 2 N log y 122 g 0 y 212 g 0 y 122 g y 212 g g • variety × gene interactions are averages of just two observations (no CLT) • fitted residuals appear heavy-tailed • Bootstrap: simulated data sets log( y ijkg ) ∗ = ˆ µ + ˆ A i + ˆ D j + ˆ V k + ˆ G g + ( � AG ) ig + ( � TG ) k g + ε ∗ ijkg � 4 N/ ( N − 4) ˆ F (independently drawn), ˆ where ε ∗ F ijkg ∼ empirical distribution of original residuals. 5

• percentile method to obtain 99% confidence intervals for the differences ( � TG ) 1 g 0 − ( � TG ) 2 g 0 . Width=1.61, i.e. estimated fold change of e 1 . 61 / 2 = 2 . 24 is significant at the 0.01 level. (normal confidence interval width = 1.29) Checking assumptions: • residuals are identically distributed, • constant error variance, • log scale seems appropriate. • Multiple testing not taken into account. 6

ANOVA Model - Least squares estimators Kerr and Churchill (2000) Objective: Minimize the residual sum of squares, RSS. t ijkg = log( y ijkg ) � ( t ijkg − µ − A i − D j − V k − G g − ( AG ) ig − ( TG ) kg ) 2 RSS = ijkg Partial derivatives, constraints lead to ( � TG ) kg = t ·· kg − t ·· k · − t ··· g + t ···· 7

ANOVA Model - Comments Kerr and Churchill (2000) • early analyses of microarray data: fold changes to identify genes for the standardized log ratios of the fluorescence intensities. • “Global” normalization procedures may not be able to remove undesirable experimental effects. • ANOVA: estimate sources of variation for large data sets. • A, D, T terms normalize data without preliminary data manipulation. • no computation of log ratios • accounts for effects of dyes or variation between samples (experimental design). 8

• residual distribution nonnormal, but constant error variance: bootstrap approach. • large number of similar quantities → estimates of highest and lowest effects too extreme. • multiple testing not taken into account. 9

Multiple testing • false positives: genes declared to be differentially expressed which in reality are not • false negatives: genes truly differentially expressed but not declared as such 10

Normalization and multiple testing Dudoit et al. (2002) X of log intensities log 2 R/G with k rows (genes), n = n 1 + n 2 columns (control, treatment hybridizations). 1. Normalization: log 2 R/G → log 2 R/G − c j ( A ), c j ( A ) = l owess fit to M vs. A plot, j th print-tip. 2. test statistic x 2 j − ¯ ¯ x 1 j � t j = s 2 ij /n 1 + s 2 2 j /n 2 3. permutation test statistics t ( b ) 1 , . . . , t ( b ) k 4. adjusted p-values to account for multiple hypotheses testing (Westfall and Young) 11

Normalization - Comments Dudoit et al. (2000) • “Global” methods of normalization miss some experimental features • multiple testing • ANOVA model by Kerr et al: one main effect for normalization, one error term for all genes • strong model assumptions? (parametric models (gamma, Gaussian), functional relationships) • which effects should be included? • replication, experimental design questions 12

Effects • fixed effects: attributable to a finite set of factor levels that occur in the data • random effects: attributable to a (infinite) set of factor levels, of which a random sample occur in the data Mixed models: fixed effects and random effects Benefits: recovery of interblock information 13

Mixed Models Wolfinger et al. (2001) y gki = log 2 of the background corrected measurement from gene g , treatment k , and array i . 1. Normalization model y gki = µ + T k + A i + ( TA ) ki + ε gki , µ - overall mean value, T - main effect for treatments, A - main effect for arrays, ( TA ) - interaction effect of arrays and treatments, ε - stochastic error. random effects: A i , ( TA ) ki , ε gki normally distributed random variables, zero means, variance components σ 2 A , σ 2 T A , σ 2 ε 14

2. Gene model r gki = G g + ( GT ) gk + ( GA ) gi + γ gki , r gki - residuals of normalization model ( GA ) - spot effects random effects: ( GA ) gi , γ gki normally distributed random variables, zero means, variance components σ 2 ( GA ) g , σ 2 γ g , independent across their indices and with each other. 15

Restricted Maximum likelihood (REML) REML: maximize the part of the likelihood which is invariant to the location parameters of the model (i.e. to the fixed effects). REML takes account of implicit degrees of freedom associated with the fixed effects (ML does not). For balanced data: Solutions to REML equations = ANOVA estimators 16

Mixed Models - Comments Wolfinger et al. (2001) • replication within and between arrays necessary • experimental design • global distributional assumptions too strong • effects to be included depends on research question • heterogeneity in the gene models • false positive rates: cutoff at the Bonferroni value 0 . 05 / (6917 × 10) = 1 e − 6 . 14 for experimentwise false positive rate of 0.05. • missing values, background correction, various designs • correlation of the residuals: little difference in practice? • normality on the log scale “usually reasonable.” 17

Power analysis Wolfinger et al. (2001) Power - probability of declaring statistical significance when a true difference exists. power = 1 − P (false negative) • experimental design • model assumptions • approximate values for the model parameters • hypotheses to be tested • desired false positive rate 18

PH296, Section 36 February 25, 2002 Discussion of: K. Kerr, M. - PowerPoint PPT Presentation

PH296, Section 36 February 25, 2002 Discussion of: K. Kerr, M. Martin, and G. Churchill. (2000). Analysis of variance for gene expression microarray data. Journal of Computational Biology 7 (6): 819-837. S. Dudoit, Y.H. Yang, M. Callow, and T.

Overview of the Bioconductor project and marray packages Sandrine Dudoit PH296, Section 36 May

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Half Year Results Presentation 2019 6 months ended 30 June 2019 Section 1 Section 2 Section 3

2018 Full year results presentation 12 months ended 31 December 2018 1 Section 1 Section 2

May 2013 Agenda Section 1 Jaypee Group Overview Section 2 Company Overview Section 3 Yamuna

Fermilab NORTH 0 20 20 40 1"=20'-0" 2/8/2019 6:57:50 PM 4850 LEVEL SCALE SC LE

Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section

Probability Chapter 4 Section 2: Fundamentals Section 3: Addition Rule Section 4:

Probability Chapter 4 Section 2: Fundamentals Section 3: Addition Rule Section 4:

Investor Update CONTENTS SECTION 01 SECTION 02 Asset Overview management strategy SECTION

Agenda Section 1: Introduction Section 2: Emergency & Welfare Arrangements Section

Company presentation June 2016 Table of contents Section 1 Summary 3 Section 2 Market

1 2 3 4 Section 1 Section 2 Section 3 Section 4 INTRODUCTION FINANCIAL SEGMENTAL GROUP

SR 15 SECTION 088 CSVT SOUTHERN SECTION PUBLIC MEETING NOVEMBER 15, 2017 SR 15 SECTION 088

1 Table of content Introduction Section 1 Executive Summary 3 Corporate Overview 9 Section 2

SECTION 3 AGENDA WHAT IS SECTION 3? EXAMPLES OF SECTION 3 OPPORTUNITIES SECTION 3

Experimental design (continued) Spring 2017 Michelle Mazurek Some content adapted from Bilge

. . . . . : o . affine indep 4 . un . . VECTOR #TEN NO IN 2 WAYS AS AFF . COMB . as= and

Finite Fields, Applications and Open Problems Daniel Panario School of Mathematics and Statistics

Alex Suciu Northeastern University Conference on Hyperplane Arrangements and Characteristic

r -regular families of graph automorphisms Robert Jajcay Comenius University and University

A QUD-based theory of quantifier conjunction with but WCCFL 38, University of British Columbia

Lattice-Theoretic Data Flow Analysis Framework Lattices Define lattice D = ( S , ): Goals:

Effective Hamiltonians of anyon anyon lattice lattice Effective Hamiltonians of models models

Sambuz

Useful Links

Newsletter

Mail Us

PH296, Section 36 February 25, 2002 Discussion of: K. Kerr, M. - PowerPoint PPT Presentation

PH296, Section 36 February 25, 2002 Discussion of: K. Kerr, M. Martin, and G. Churchill. (2000). Analysis of variance for gene expression microarray data. Journal of Computational Biology 7 (6): 819-837. S. Dudoit, Y.H. Yang, M. Callow, and T.

Overview of the Bioconductor project and marray packages Sandrine Dudoit PH296, Section 36 May

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Half Year Results Presentation 2019 6 months ended 30 June 2019 Section 1 Section 2 Section 3

2018 Full year results presentation 12 months ended 31 December 2018 1 Section 1 Section 2

May 2013 Agenda Section 1 Jaypee Group Overview Section 2 Company Overview Section 3 Yamuna

Fermilab NORTH 0 20 20 40 1&quot;=20'-0&quot; 2/8/2019 6:57:50 PM 4850 LEVEL SCALE SC LE

Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section

Probability Chapter 4 Section 2: Fundamentals Section 3: Addition Rule Section 4:

Probability Chapter 4 Section 2: Fundamentals Section 3: Addition Rule Section 4:

Investor Update CONTENTS SECTION 01 SECTION 02 Asset Overview management strategy SECTION

Agenda Section 1: Introduction Section 2: Emergency &amp; Welfare Arrangements Section

Company presentation June 2016 Table of contents Section 1 Summary 3 Section 2 Market

1 2 3 4 Section 1 Section 2 Section 3 Section 4 INTRODUCTION FINANCIAL SEGMENTAL GROUP

SR 15 SECTION 088 CSVT SOUTHERN SECTION PUBLIC MEETING NOVEMBER 15, 2017 SR 15 SECTION 088

1 Table of content Introduction Section 1 Executive Summary 3 Corporate Overview 9 Section 2

SECTION 3 AGENDA WHAT IS SECTION 3? EXAMPLES OF SECTION 3 OPPORTUNITIES SECTION 3

Experimental design (continued) Spring 2017 Michelle Mazurek Some content adapted from Bilge

. . . . . : o . affine indep 4 . un . . VECTOR #TEN NO IN 2 WAYS AS AFF . COMB . as= and

Finite Fields, Applications and Open Problems Daniel Panario School of Mathematics and Statistics

Alex Suciu Northeastern University Conference on Hyperplane Arrangements and Characteristic

r -regular families of graph automorphisms Robert Jajcay Comenius University and University

A QUD-based theory of quantifier conjunction with but WCCFL 38, University of British Columbia

Lattice-Theoretic Data Flow Analysis Framework Lattices Define lattice D = ( S , ): Goals:

Effective Hamiltonians of anyon anyon lattice lattice Effective Hamiltonians of models models

Sambuz

Useful Links

Newsletter

Mail Us

Fermilab NORTH 0 20 20 40 1"=20'-0" 2/8/2019 6:57:50 PM 4850 LEVEL SCALE SC LE

Agenda Section 1: Introduction Section 2: Emergency & Welfare Arrangements Section