Micro-Array, Golub et al. Data Reampling-Based Testing S. Stanley - PowerPoint PPT Presentation

Micro-Array, Golub et al. Data Reampling-Based Testing S. Stanley Young Peter H. Westfall Glaxo Wellcome Inc. Texas Tech University Michael Emptage Dmitri Zaykin Glaxo Wellcome Inc. Glaxo Wellcome Inc. Duke00

Outline 1. Micro array data 2. Problem statement 3. Statistical methods 4. Results 5. Questions Duke00 2

Why micro array data? Business and Science Drivers • Drug target selection • Bio network understanding • Reduce drug development costs Duke00 3

Why micro array data? Knowledge and technology converge • Human Genome Project(s) • Bio chip technology • Informatics Duke00 4

Three Statistical Analysis Problems 1. Correlated genes (guilt by association). 2. General genetic structure. 3. Biology / gene associations. Duke00 5

Goal: Understand gene-phenotype relationships - Level gene correlations - Level k gene associations - Level one gene/bio associations - Level k gene/bio associations. Method : Resampling-based testing! Duke00 6

What are the problems? 1. Few statistical experimental units 2. Very many genes 3. Non-normal distributions 4. Phenotype and data quality 5. Statistical methods Duke00 7

Data Formulation Standard Formulation : Phenotype = f(Genotype) Phenotype Genes Duke00 8

Problems with Standard Formultions Standard Formulation : Phenotype = f(Genotype) 1. Gene expression measured with error. 2. Genotype relatively error free. 3. Enormous number of genes. Duke00 9

Solution: switch x and y Genes Trt Statistical Plan : permute Trt at random, and compute Max t over all genes. Duke00 10

Statistical Testing Strategy 1. Treat micro array data as Y vector. 2. Use t-test as score for each gene. 3. Use resampling to evaluate Max t. Duke00 11

Characteristics of method? 1. Identifies individual genes. 2. Adjusts for multiple testing. 3. Preserves correlation structure. 4. Exact p-values, modulo simulation. Duke00 12

Gene Scores X ALL - X AML T = S p 1/11 + 1/27 X ALL - X AML S Golub = SD ALL + SD AML Duke00 13

SAS proc multtest code proc multtest data=gene.espress out=adjp stepperm holm n=10000 noprint; classes disease; test mean(gene1-gene7129); contrast “AML vs ALL” -1 1; run; Duke00 14

SAS code (2) proc sort data=adjp (where=(stppermp le .05)); by raw_p; proc print data=adjp (where=(stppermp le .05)) noobs label; var _var_ raw_p stpbon_p stppermp; run; Duke00 15

Results (1) Gene RawP Holm CMinP GENE3320 1.38e-10 0.000001 0.0001 GENE4847 2.44e-10 0.000002 0.0001 GENE2020 6.58e-10 0.000005 0.0001 GENE1745 1 e- 8 0.000070 0.0004 GENE5039 1 e- 8 0.000072 0.0004 GENE1834 1.5 e- 8 0.000108 0.0005 GENE 461 3.6 e- 8 0.000257 0.0005 GENE4196 6.2 e- 8 0.000438 0.0009 GENE3847 7.2 e- 8 0.000510 0.0010 Duke00 16

Results (1) Gene RawP Holm CMinP GENE2288 8.90e-8 0.000635 0.0011 GENE1249 1.74e-7 0.001239 0.0017 GENE6201 1.76e-7 0.001250 0.0017 GENE2242 1.95e-7 0.001386 0.0020 GENE3258 2.11e-7 0.001500 0.0021 GENE1882 3.19e-7 0.002267 0.0024 GENE2111 3.66e-7 0.002606 0.0027 GENE2121 5.78e-7 0.004115 0.0041 GENE6200 6.23e-7 0.004428 0.0042 GENE6373 8.19e-7 0.005823 0.0058 Duke00 17

Results (3) Gene RawP Holm CMinP GENE6677 0.000003 0.024412 0.0196 GENE4052 0.000004 0.026268 0.0220 GENE1394 0.000005 0.034948 0.0282 GENE6405 0.000005 0.037980 0.0300 GENE248 0.000006 0.045267 0.0346 GENE2267 0.000006 0.046019 0.0352 GENE6041 0.000008 0.055335 0.0421 GENE6005 0.000008 0.056861 0.0428 GENE5772 0.000009 0.063771 0.0471 GENE6378 0.000010 0.067993 0.0500 Duke00 18

Scatterplot Matrix 3500 2500 1500 GENE3320 500 6000 4000 2000 GENE4847 0 3000 2500 2000 1500 GENE2020 1000 500 3000 2000 1000 GENE1745 500 500 150025003500 0 200040006000 500 1500 2500 500 1500 2500 3500 Duke00 19

Current Research Try some sort of linear combination of genes connonical correlation-like? PLS? RP? Q: Which Ys differeniate cancer type? Q: How many real cancer types? Find single gene then correlates to that gene. Then find second orthogonal gene that helps the prediction. Duke00 20

Micro-Array, Golub et al. Data Reampling-Based Testing S. Stanley - PowerPoint PPT Presentation

Micro-Array, Golub et al. Data Reampling-Based Testing S. Stanley Young Peter H. Westfall Glaxo Wellcome Inc. Texas Tech University Michael Emptage Dmitri Zaykin Glaxo Wellcome Inc. Glaxo Wellcome Inc. Duke00 Outline 1. Micro

singly linked lists Sept. 18, 2017 1 Recall last lecture: Java array array array array of

Review We can declare an array of any type, even other arrays A 2D array is an array of

Cache Performance 1 C and cache misses (1) int array[1024]; // 4KB array int even_sum = 0,

Tower of Babel or How to turn an elephant into a polyglot Pavlo Golub Senior Database

Phased Array Ultrasonic Tube Testing ECNDT Moscow June 2010 Phased Array Ultrasonic Testing of

Diffusion Self-Ignition of Hydrogen in Air Victor Golub Associated Institute for High

Hydrogen-Air Mixture Ignition and Combustion behind the Shock Waves Victor Golub Associated

Gene Golub SIAM Summer School 2012 Numerical Methods for Wave Propagation Finite Volume Methods

Gene Golub SIAM Summer School 2012 Numerical Methods for Wave Propagation Finite Volume Methods

Gene Golub SIAM Summer School 2012 Git Tutorial Randall J. LeVeque Applied Mathematics

Gene Golub SIAM Summer School 2012 Numerical Methods for Wave Propagation Finite Volume Methods

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

x86 ARRAYS RECALL ARRAYS char foo[80]; An array of 80 characters int bar[40]; An array of

From Tap Water to Marine Organisms: a Micro-Spectroscopic Approach to Micro-Plastic

G e n e r a t i v e P r o c e s s e s i n A r t a n d S c i e n c

More than preservation: Creating motivational designs and tailored incentives in research data

FORRT Open and Reproducible Research Training Sam Parsons, Flavio Azevedo, Carl Michael Galang

Unforeseen Challenges Bernardo Antonio Gonzlez Director, Fries Center for Global Studies &

Mithril Resources Ltd MTH:ASX AGM Presentation David Hutton | Managing Director 14 November

FISH50 Hydrology Forecasts for the Southeastern US Sa8sh

Corporate Presentation October 2015 Qualified Person Statement and Disclaimer Qualified Person

Advancing High Potential Gold & Silver Projects JANUARY 2018 Au Au Ag Ag Au Au BBB :

Micro-Array, Golub et al. Data Reampling-Based Testing S. Stanley - PowerPoint PPT Presentation

Micro-Array, Golub et al. Data Reampling-Based Testing S. Stanley Young Peter H. Westfall Glaxo Wellcome Inc. Texas Tech University Michael Emptage Dmitri Zaykin Glaxo Wellcome Inc. Glaxo Wellcome Inc. Duke00 Outline 1. Micro

singly linked lists Sept. 18, 2017 1 Recall last lecture: Java array array array array of

Review We can declare an array of any type, even other arrays A 2D array is an array of

Cache Performance 1 C and cache misses (1) int array[1024]; // 4KB array int even_sum = 0,

Tower of Babel or How to turn an elephant into a polyglot Pavlo Golub Senior Database

Phased Array Ultrasonic Tube Testing ECNDT Moscow June 2010 Phased Array Ultrasonic Testing of

Diffusion Self-Ignition of Hydrogen in Air Victor Golub Associated Institute for High

Hydrogen-Air Mixture Ignition and Combustion behind the Shock Waves Victor Golub Associated

Gene Golub SIAM Summer School 2012 Numerical Methods for Wave Propagation Finite Volume Methods

Gene Golub SIAM Summer School 2012 Numerical Methods for Wave Propagation Finite Volume Methods

Gene Golub SIAM Summer School 2012 Git Tutorial Randall J. LeVeque Applied Mathematics

Gene Golub SIAM Summer School 2012 Numerical Methods for Wave Propagation Finite Volume Methods

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

x86 ARRAYS RECALL ARRAYS char foo[80]; An array of 80 characters int bar[40]; An array of

From Tap Water to Marine Organisms: a Micro-Spectroscopic Approach to Micro-Plastic

G e n e r a t i v e P r o c e s s e s i n A r t a n d S c i e n c

More than preservation: Creating motivational designs and tailored incentives in research data

FORRT Open and Reproducible Research Training Sam Parsons, Flavio Azevedo, Carl Michael Galang

Unforeseen Challenges Bernardo Antonio Gonzlez Director, Fries Center for Global Studies &amp;

Mithril Resources Ltd MTH:ASX AGM Presentation David Hutton | Managing Director 14 November

FISH50 Hydrology Forecasts for the Southeastern US Sa8sh

Corporate Presentation October 2015 Qualified Person Statement and Disclaimer Qualified Person

Advancing High Potential Gold &amp; Silver Projects JANUARY 2018 Au Au Ag Ag Au Au BBB :

Unforeseen Challenges Bernardo Antonio Gonzlez Director, Fries Center for Global Studies &

Advancing High Potential Gold & Silver Projects JANUARY 2018 Au Au Ag Ag Au Au BBB :