Comparison of Normalization Methods for cDNA Microarrays Liling - PowerPoint PPT Presentation

Comparison of Normalization Methods for cDNA Microarrays Liling Warren, Ben Hui Liu Bioinformatics Program, NCSU, Raleigh, NC Bio-informatics Group, Inc. Cary, NC 1

Topics of Discussion � Data flow in a microarray experiment � Describe different normalization methods � Evaluate different normalization methods � To normalize or not to normalize � Data quality � Experimental design � Conclusions 2

Data flow in a microarray experiment Arrays Samples Hybridization Scanned data Normalized results Analysis results 3

Purpose of Data Normalization � To remove systematic errors introduced at various stages of a microarray experiment. � Systematic effects include: � Array effect � Pin/block effect � Dye effect (Cy3/Cy5) � mRNA extraction effect � Dye labeling effect 4

Systematic Errors – Array Effect 11 11 10 10 9 9 8 8 log2(s) log2(s) 7 7 6 6 5 5 4 4 3 3 2 1 2 5 6 9 10 13 14 17 18 21 22 3 4 7 8 11 12 15 16 19 20 23 24 array array Analysis of Variance Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Prob > F Source DF Sum of Squares Mean Square F Ratio Prob > array 11 2977.951 270.723 288.6899 0.0000 array 11 2067.326 187.939 181.4893 0.0000 Error 32757 30718.314 0.938 Error 32754 33917.943 1.036 C. Total 32768 33696.265 C. Total 32765 35985.269 Box plots and ANOVA tests show that between array variation is highly significant with a P value < 0.0001 using either log ratios or log signal intensity 5

Systematic Errors – Block Effect 11 5 4 10 3 2 9 1 8 0 log2(s) -1 M 7 -2 6 -3 -4 5 -5 -6 4 -7 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 block block Analysis of Variance Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Prob > Source DF Sum of Squares Mean Square F Ratio Prob > F block 15 98.3035 6.55357 6.1835 <.0001 block 15 77.2856 5.15237 2.8905 0.0002 Error 2715 2877.4692 1.05984 Error 2715 4839.5028 1.78251 C. Total 2730 2975.7727 C. Total 2730 4916.7885 Box plots and ANOVA tests show that between block variation is highly significant with a P value < 0.0001 using either log ratios or log signal intensity 6

Systematic Errors – Dye Effect � M vs. A plots for block Log ratio 1, 2, 5,6 in array 1 of Kidney data = − M log ( R ) log ( G ) 2 2 = 1 / 2 (log ( R ) log ( G )) A + 2 2 � M vs. A plots reveal the dependency of log ratios on average signal intensity Average log signal intensity 7

Comparing Normalization Methods � Method #1: log ratio based, local smoothing method using loess function 4 4 3 3 2 2 1 0 1 -1 Y Y 0 -2 -3 -1 -4 -2 -5 -6 -3 -7 7 8 9 10 11 a1 6 7 8 9 10 11 a7 Y m1 p1 r1 Y m7 p7 r7 Red: M values; Green:Predicted values Blue:Residual values 8

Comparing Normalization Methods � Method #2: log ratio based, block-specific global normalization ~ = − y ( y y ) / s ijk ijk ij ij where i=1,…,24; j=1, …, 16; k=1, .., n ij , and y : block-specific mean ij s : block-specific standard deviation ij 9

Comparing Normalization Methods � Method #3: log ratio based, ANOVA normalization = µ + + + + + + ε y A B M D ( AB ) ijklm i j k l ij ijklm -- Random effects: A, B, AB -- Fixed effects: M, D -- Residuals are subsequently used as input for gene-based ANOVA model 10

Methods Omitting Normalization � Method #4 : gene-based ANOVA, omitting normalization, using log ratios = µ + + + + ε y m d ( md ) ijk i j ij ijk � Method #5 : gene-based Analysis of Covariance, omitting normalization, using log signal intensity = µ + + + + β − + ε y m d ( md ) ( x x ) ijk i j ij ijk ... ijk y : log signal intensity from test sample; ijk x : log signal intensity from reference sample ijk 11

“project normal” Data Analysis � Gene based ANOVA model: = µ + + + + ε y m d ( md ) , i=1 to 6, j=1, 2 and k=1 to 4. ijk i j ij ijk Source df MS EMS + τ 2 2 4 m σ Mouse 5 MSM σ + τ 2 2 Dye 1 MSD 20 d σ + τ 2 2 Mouse*Dye 5 MS(MD) 4 md σ Error 12 MSE 2 The null hypothesis of no mouse effect is tested with = F MSM / MSE , with df1=5 and df2=12. o 12

Comparison Results Method 1 Method 2 Method 3 Method 4 Method 5 Method 1 129 315 451 243 182 Method 2 89 275 522 362 318 Method 3 80 155 402 409 410 Method 4 51 78 158 165 174 Method 5 32 42 77 76 85 On the diagonal : numbers of genes detected by the specific method; Upper triangle : detected by either of the two corresponding methods; Lower triangle : detected by both methods (Kidney data). 13

Power Comparison � Power rank: Power 3 2 4 1 5 Method � Pair-wise power comparison - McNemar’s Test 14

McNemar’s Test First Second Method Method Reject Accept Total Reject n 11 n 12 n 1. Accept n 21 n 22 n 2. Total n .1 n .2 N π = π Under H 0 : + + 1 1 χ = − + 2 2 Test statistic: ( n n ) /( n n ) 1 12 21 12 21 χ > α = 3 . 84 0 . 05 Reject H 0 if at 2 1 15

McNemar’s Test Results Method 1 Method 2 Method 3 Method 4 Method 2 94.3 Method 3 12 22.4 Method 4 6.8 42.6 223.8 Method 5 12.91 130.8 301.77 65.31 Pair-wise power comparisons show all pairs of methods have significantly different power in detecting mouse effect 16

Why Do They Differ Genes Genes not significant significant Data before after Normalization Genes not Genes significant significant before after Data Quality Issues 17

Assessing Data Quality M1 M2 M3 M4 M5 M6 Kidney Liver Testis Reference Sample 18

Assessing Data Quality r1 r2 r3 r4 r1 r2 r3 r4 r1 r2 r3 r4 M1 M2 M3 M4 M5 M6 Kidney Test Sample Liver Test Sample Testis Test Sample + Reference Sample + Reference Sample + Reference Sample 19

Assessing Data Quality On a gene-by-gene basis x y Test Reference ijk ijk Samples Samples (72) (72) 3 6 4 3 6 4 ∑ ∑ ∑ ∑ ∑ ∑ = x y ijk ijk = = = = = = i 1 j 1 k 1 i 1 j 1 k 1 20

Assessing Data Quality 3 6 4 3 6 4 ∑∑∑ ∑∑∑ � Let r = x / y ijk ijk = = = = = = i 1 j 1 k 1 i 1 j 1 k 1 � Examine normalization effect within the set of genes where 1) r<0.5 2) r>2 � 388 genes in Kidney are significant by at least one method, among which 156 genes have r<0.5 or r >2. Histogram of r for all genes 21

Normalization Effect 1 0.9 Genes significant before, not 0.8 P-value after (method 1) significant after normalization 0.7 0.6 0.5 0.4 Genes not significant before, 0.3 significant after normalization 0.2 0.1 0 -0.1 0 .1 .2 .3 .4 .5 .6 .7 P-value before P-values before and after normalization method 1 22

Assessing Data Quality � When (foreground – background) < 0 � no hybridization � How about (foreground – background) >0, but < 100? � 388 genes in Kidney are significant by at least one method, among which 124 genes have (foreground – background) < 100. � Effect of normalization is examined in these genes. 23

Assessing Data Quality Rep # Signal(test) Signal(ref) Low signal � intensity 1 115 152 Low mRNA copy 2 1077 1476 � number? 3 58 60 Failed � 4 409 425 hybridization? 1 965 3453 - Due to Spotting (if both numbers are 2 865 2243 small) 3 94 1471 - Due to Labeling (one 4 407 2194 number is small) 24

Assessing Data Quality Rep # Signal Signal These genes are (test) (ref) affected by 1 610 151 normalization 2 575 145 3 10 6 4 365 364 1 525 426 Affecting other genes 2 84 77 in the same 3 596 753 normalization group 4 572 753 25

Normalization Effect 1 0.9 Genes significant before, not 0.8 P-value after (method 1) significant after normalization 0.7 0.6 0.5 0.4 Genes not significant before, 0.3 significant after normalization 0.2 0.1 0 -0.1 0 .1 .2 .3 .4 .5 .6 P-value before P-values before and after normalization method 1 26

Examine Normalization Effect Log (P-value before / P-value after) 388019 388019 580674 463951 463951 genes more significant after normalization genes significant before and after normalization 514599 514599 400713 400713 genes more significant before normalization STD within block / STD among mice 27

Examine Normalization Effect STD STD STD within P-value P-value cDNA among within block/ STD (before) (after) among mice mice block 514599 <0.000001 0.03125 1.40 1.28 0.91 400713 <0.000001 0.06612 1.17 1.11 0.95 463951 0.7182 0.00005 0.07 1.28 18.29 580674 0.6563 0.000008 0.08 1.26 15.75 388019 0.5597 0.000001 0.08 1.26 15.75 28

Examine Normalization Effect Large Create false positives, Systematic false Variation negatives Remove systematic errors Small Small Treatment Variation Large 29

Comparison of Normalization Methods for cDNA Microarrays Liling - PowerPoint PPT Presentation

Comparison of Normalization Methods for cDNA Microarrays Liling Warren, Ben Hui Liu Bioinformatics Program, NCSU, Raleigh, NC Bio-informatics Group, Inc. Cary, NC 1 Topics of Discussion Data flow in a microarray experiment Describe

DNA Microarrays Microarrays: What are they good for? Microarrays offer the ability to measure

TISSUE MICROARRAYS AND CONTROL SLIDES Tissue Microarrays | Infectious Disease Arrays | IHC &

Quantification of cross hybridization on oligonucleotide microarrays Li Zhang Dept. of

Introduction to microarrays Thierry Sengstag, PhD Bioinformatics Core Facility Swiss Institute

Nonequilibrium effects in DNA microarrays: a multiplatform study Jean-Charles Walter KU Leuven,

Microarrays False Discovery Rate Prof. Tesler Math 186 Winter 2019 Prof. Tesler

TAEP/ AWMA Joint Meeting TAEP/ AWMA Joint Meeting Normalization of the Abnorm Normalization of

Strong normalization for the parameter-free Strong polymorphic lambda calculus based on the

Normalization Lecture 9 Normalization 24 February 2015 1 Wentworth Institute of Technology

Linear Logic and Strong Normalization Beniamino Accattoli Carnegie Mellon University B.

Formalizing Strong Normalization Proofs Kazuhiko Sakaguchi College of Information Science,

Normal forms and normalization An example of normalization using normal forms We assume we have

Database Normalization Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th)

Normalization Redundancy causes several anomalies : insert, delete and update

Normalization-Invariant Fuzzy Logic Need for Normalization Operations Explain Empirical Success

Normalization by evaluation for Thorsten Altenkirch Tarmo Uustalu University of Nottingham

molecular mechanisms in Heavy Chain Deposition Disease Presented by Sbastien BENDER Disclosure

COVID-19 Employer Guidance: Workplace Safety-OSHA, CDC and Virus Testing Gregory R. Begg Partner

Care Among Hard-to-Reach Populations : The Northern New Jersey and New York City Hepatitis B

Application of Population Data for Informing Programmatic and Policy Decisions Laurel Omland, MS

Large-Scale Self-supervised Robot Learning with GPU-enabled Video-Prediction Models Frederik

Changes Are Afoot: The Prosecution and Litigation Landscape Post KSR and the New Continuation

UKF project presentation: Elucidation of the physiological roles of human dipeptidyl peptidase

Introduction of Research Center for Allergy and Immunology (RCAI) -What is Immunogenomics

Sambuz

Useful Links

Newsletter

Mail Us

Comparison of Normalization Methods for cDNA Microarrays Liling - PowerPoint PPT Presentation

Comparison of Normalization Methods for cDNA Microarrays Liling Warren, Ben Hui Liu Bioinformatics Program, NCSU, Raleigh, NC Bio-informatics Group, Inc. Cary, NC 1 Topics of Discussion Data flow in a microarray experiment Describe

DNA Microarrays Microarrays: What are they good for? Microarrays offer the ability to measure

TISSUE MICROARRAYS AND CONTROL SLIDES Tissue Microarrays | Infectious Disease Arrays | IHC &amp;

Quantification of cross hybridization on oligonucleotide microarrays Li Zhang Dept. of

Introduction to microarrays Thierry Sengstag, PhD Bioinformatics Core Facility Swiss Institute

Nonequilibrium effects in DNA microarrays: a multiplatform study Jean-Charles Walter KU Leuven,

Microarrays False Discovery Rate Prof. Tesler Math 186 Winter 2019 Prof. Tesler

TAEP/ AWMA Joint Meeting TAEP/ AWMA Joint Meeting Normalization of the Abnorm Normalization of

Strong normalization for the parameter-free Strong polymorphic lambda calculus based on the

Normalization Lecture 9 Normalization 24 February 2015 1 Wentworth Institute of Technology

Linear Logic and Strong Normalization Beniamino Accattoli Carnegie Mellon University B.

Formalizing Strong Normalization Proofs Kazuhiko Sakaguchi College of Information Science,

Normal forms and normalization An example of normalization using normal forms We assume we have

Database Normalization Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th)

Normalization Redundancy causes several anomalies : insert, delete and update

Normalization-Invariant Fuzzy Logic Need for Normalization Operations Explain Empirical Success

Normalization by evaluation for Thorsten Altenkirch Tarmo Uustalu University of Nottingham

molecular mechanisms in Heavy Chain Deposition Disease Presented by Sbastien BENDER Disclosure

COVID-19 Employer Guidance: Workplace Safety-OSHA, CDC and Virus Testing Gregory R. Begg Partner

Care Among Hard-to-Reach Populations : The Northern New Jersey and New York City Hepatitis B

Application of Population Data for Informing Programmatic and Policy Decisions Laurel Omland, MS

Large-Scale Self-supervised Robot Learning with GPU-enabled Video-Prediction Models Frederik

Changes Are Afoot: The Prosecution and Litigation Landscape Post KSR and the New Continuation

UKF project presentation: Elucidation of the physiological roles of human dipeptidyl peptidase

Introduction of Research Center for Allergy and Immunology (RCAI) -What is Immunogenomics

Sambuz

Useful Links

Newsletter

Mail Us

TISSUE MICROARRAYS AND CONTROL SLIDES Tissue Microarrays | Infectious Disease Arrays | IHC &