Statistical Genetics Matthew Stephens Statistics Retreat, October - PowerPoint PPT Presentation

Statistical Genetics Matthew Stephens Statistics Retreat, October 26th 2012 Matthew Stephens Retreat Talk 2012

Two stories ◮ The two most influential statistical ideas in analysis of genetic association studies. 1 ◮ Sequence, sequence, everywhere. 1 With apologies to Steve Stigler Matthew Stephens Retreat Talk 2012

Story I: Genetic Association Studies Genetic association studies aim to identify genetic variants that modify risk of common diseases or affect other phenotypes (e.g. Type I Diabetes, height, LDL cholestrol). The idea is absurdly simple: measure genetic variants (usually SNPs), and phenotypes in randomly-sampled individuals, and see which SNPs are correlated with phenotypes. Matthew Stephens Retreat Talk 2012

Story I: Genetic Association Studies ◮ Typical recent genome-wide studies have typed 500K-1M SNPs in thousands of (unrelated) phenotyped individuals. ◮ Basic Analysis: test each SNP, one-by-one, for statistical association with each phenotype. Matthew Stephens Retreat Talk 2012

Progress identifying variants underlying common disease Published Genome ‐ Wide Associations through 09/2011 1,617 published GWA at p ≤ 5X10 ‐ 8 for 249 traits NHGRI GWA Catalog www.genome.gov/GWAStudies Matthew Stephens Credit: Retreat Talk 2012

The two most influential statistical ideas in GWAS ◮ Correction for unmeasured confounding (population structure). ◮ Imputation to combine studies. Matthew Stephens Retreat Talk 2012

Population Structure and Unmeasured Confounding The Problem in a nutshell: What would happen if you conducted a Genetic Association study for “Chopstick Use” in San Francisco? Matthew Stephens Retreat Talk 2012

Population Structure and Unmeasured Confounding If you know the “genetic background” of the individuals in your study (e.g. which continent they inherited their genes from), then you can correct for it. What if you don’t know it? Matthew Stephens Retreat Talk 2012

Principal Components Analysis to the rescue! Novembre et al, Nature, 2008 Matthew Stephens Retreat Talk 2012

Principal Components Analysis to the rescue! Test for significance of genetic effect β , controlling for effects of genetic background ( α ): y = v α + x β + ǫ Price et al, Nature Genetics, 2006 Matthew Stephens Retreat Talk 2012

The two most influential statistical ideas in GWAS ◮ Correction for unmeasured confounding (population structure). ◮ Imputation to combine studies. Credit: Bryan Howie Matthew Stephens Retreat Talk 2012

Genotype(imputa-on(background( 0% 0% 1% 1% 1% 0% 0% 1% 1% 0% 0% 0% 1% 1% 1% 0% 0% 0% 0% 0% 1% 1% 1% 0% 1% 1% 1% 0% 0% 1% Reference( 1% 1% 1% 1% 1% 0% 0% 0% 1% 0% 0% 0% 0% 0% 0% haplotypes( 1% 0% 1% 1% 0% 0% 0% 1% 1% 1% 1% 1% 0% 0% 1% 1% 2% 0% 0% 1% 1% 1% 1% 0% ?% 0% 0% 0% 1% 1% 1% 0% 1% Phenotyped ( 1% 2% 0% 0% 1% 1% GWAS ( ?% 2% 0% 0% 0% 0% samples ( 1% 1% 1% 1% 0% ?% 0% 2% 0% 0% 1% 1% 1% 1% 1% 1% 1% 2% SNPs%genotyped%on%an%array% Matthew Stephens Retreat Talk 2012

Genotype(imputa-on(background( 0% 0% 1% 1% 1% 0% 0% 1% 1% 0% 0% 0% 1% 1% 1% 0% 0% 0% 0% 0% 1% 1% 1% 0% 1% 1% 1% 0% 0% 1% Reference( 1% 1% 1% 1% 1% 0% 0% 0% 1% 0% 0% 0% 0% 0% 0% haplotypes( 1% 0% 1% 1% 0% 0% 0% 1% 1% 1% 1% 1% 0% 0% 1% 1% ?% ?% ?% 2% ?% 0% ?% ?% ?% ?% 0% 1% ?% 1% 1% ?% ?% ?% 1% ?% 0% ?% ?% ?% ?% ?% 0% ?% 0% 0% ?% ?% ?% 1% ?% 1% ?% ?% ?% ?% 1% 0% ?% 1% Phenotyped ( 1% ?% ?% ?% 2% ?% 0% ?% ?% ?% ?% 0% 1% ?% 1% GWAS ( ?% ?% ?% ?% 2% ?% 0% ?% ?% ?% ?% 0% 0% ?% 0% samples ( 1% ?% ?% ?% 1% ?% 1% ?% ?% ?% ?% 1% 0% ?% ?% 0% ?% ?% ?% 2% ?% 0% ?% ?% ?% ?% 0% 1% ?% 1% 1% ?% ?% ?% 1% ?% 1% ?% ?% ?% ?% 1% 1% ?% 2% Untyped%SNPs% Matthew Stephens Retreat Talk 2012

Genotype(imputa-on(background( 0% 0% 1% 1% 1% 0% 0% 1% 1% 0% 0% 0% 1% 1% 1% 0% 0% 0% 0% 0% 1% 1% 1% 0% 1% 1% 1% 0% 0% 1% Reference( 1% 1% 1% 1% 1% 0% 0% 0% 1% 0% 0% 0% 0% 0% 0% haplotypes( 1% 0% 1% 1% 0% 0% 0% 1% 1% 1% 1% 1% 0% 0% 1% 1% 1% 2% 2% 2% 0% 0% 1% 2% 0% 0% 0% 1% 1% 1% 1% 1% 1% 1% 1% 0% 0% 1% 2% 1% 0% 0% 0% 0% 0% 0% 0% 1% 1% 1% 1% 1% 2% 1% 0% 1% 1% 0% 0% 1% Phenotyped ( 1% 2% 2% 2% 2% 0% 0% 1% 2% 0% 0% 0% 1% 1% 1% GWAS ( 2% 1% 2% 2% 2% 0% 0% 0% 2% 2% 0% 0% 0% 0% 0% samples ( 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 0% 0% 1% 0% 0% 2% 2% 2% 0% 0% 2% 2% 2% 2% 0% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 2% Associa8on% signal% Matthew Stephens Retreat Talk 2012

Imputa-on(facilitates(meta>analysis( 0% 0% 1% 1% 1% 0% 0% 1% 1% 0% 0% 0% 1% 1% 1% 0% 0% 0% 0% 0% 1% 1% 1% 0% 1% 1% 1% 0% 0% 1% Reference( 1% 1% 1% 1% 1% 0% 0% 0% 1% 0% 0% 0% 0% 0% 0% haplotypes( 1% 0% 1% 1% 0% 0% 0% 1% 1% 1% 1% 1% 0% 0% 1% 1% 2% 0% 0% 1% 1% 1% 1% 0% 0% 0% 0% GWAS(1 ( 0% 1% 1% 1% 0% 1% 1% 2% 0% 0% 1% 1% 0% 1% 1% 0% 1% 1% 0% 0% 1% 0% 2% 0% GWAS(2 ( 2% 2% 0% 0% 1% 0% 1% 1% 0% 1% 1% 1% Matthew Stephens Retreat Talk 2012

Imputa-on(facilitates(meta>analysis( 0% 0% 1% 1% 1% 0% 0% 1% 1% 0% 0% 0% 1% 1% 1% 0% 0% 0% 0% 0% 1% 1% 1% 0% 1% 1% 1% 0% 0% 1% Reference( 1% 1% 1% 1% 1% 0% 0% 0% 1% 0% 0% 0% 0% 0% 0% haplotypes( 1% 0% 1% 1% 0% 0% 0% 1% 1% 1% 1% 1% 0% 0% 1% 1% 1% 2% 2% 2% 0% 0% 1% 1% 2% 0% 0% 1% 1% 1% 1% 1% 1% 1% 1% 0% 0% 1% 0% 1% 0% 0% 0% 0% 0% GWAS(1 ( 0% 0% 1% 1% 1% 1% 1% 2% 1% 1% 1% 1% 0% 0% 1% 1% 2% 2% 2% 2% 0% 0% 1% 0% 1% 0% 0% 1% 1% 1% 0% 0% 0% 1% 1% 1% 1% 2% 0% 1% 1% 1% 1% 1% 2% 0% 0% 0% 0% 0% 1% 1% 2% 0% 2% 1% 1% 0% 0% 1% GWAS(2 ( 1% 1% 2% 2% 1% 0% 0% 1% 0% 1% 1% 1% 0% 0% 1% 0% 0% 1% 1% 1% 0% 0% 2% 1% 1% 0% 0% 1% 1% 1% Associa8on% signal% Matthew Stephens Retreat Talk 2012

Imputa-on(facilitates(meta>analysis( 0% 0% 1% 1% 1% 0% 0% 1% 1% 0% 0% 0% 1% 1% 1% 0% 0% 0% 0% 0% 1% 1% 1% 0% 1% 1% 1% 0% 0% 1% Reference( 1% 1% 1% 1% 1% 0% 0% 0% 1% 0% 0% 0% 0% 0% 0% haplotypes( 1% 0% 1% 1% 0% 0% 0% 1% 1% 1% 1% 1% 0% 0% 1% 1% 1% 2% 2% 2% 0% 0% 1% 1% 2% 0% 0% 1% 1% 1% 1% 1% 1% 1% 1% 0% 0% 1% 0% 1% 0% 0% 0% 0% 0% GWAS(1 ( 0% 0% 1% 1% 1% 1% 1% 2% 1% 1% 1% 1% 0% 0% 1% 1% 2% 2% 2% 2% 0% 0% 1% 0% 1% 0% 0% 1% 1% 1% 0% 0% 0% 1% 1% 1% 1% 2% 0% 1% 1% 1% 1% 1% 2% 0% 0% 0% 0% 0% 1% 1% 2% 0% 2% 1% 1% 0% 0% 1% GWAS(2 ( 1% 1% 2% 2% 1% 0% 0% 1% 0% 1% 1% 1% 0% 0% 1% 0% 0% 1% 1% 1% 0% 0% 2% 1% 1% 0% 0% 1% 1% 1% Type%1%diabetes:%Cooper%et%al.,%Nov%2008%( Nature'Gene*cs )% Type%2%diabetes:%Zeggini%et%al.,%May%2008%( Nature'Gene*cs )% Crohn’s%disease:%BarreH%et%al.,%Aug%2008%( Nature'Gene*cs )% Matthew Stephens Retreat Talk 2012

Story II: Sequence, Sequence, Everywhere Matthew Stephens Retreat Talk 2012

Sequencing Assays, and Statistical Challenges Although DNA sequencing is best known for obtaining “genome sequences”, it is now routinely used for measuring cellular processes to try to understand how cells operate. For example: ◮ Gene expression (RNA-seq). ◮ Chromatin openness (DNase-seq). ◮ Transcription Factor Binding (ChIP-seq) ◮ Histone modifications (ChIP-seq) A key question is how/why cells differ from one another (they share the same DNA!). Matthew Stephens Retreat Talk 2012

Chromatin and DNA structure Figure from Felsenfeld and Groudine. Nature, 2003 Matthew Stephens Retreat Talk 2012

The Data The basic structure of these assays is the same: ◮ Do something clever to get bits of the DNA that you want (e.g. the bits that contact a modified histone, or the bits that are bound by a particular transcription factor). ◮ Sequence these bits (producing millions of little sequences). ◮ Work out where in the genome each sequence came from. ◮ The number of sequences coming from each location (usually 0 or 1) is a measure of the “intensity” of the process at that location. ◮ Basic model: an inhomogeneous Poisson process, x ib ∼ Poi ( λ ib ). Matthew Stephens Retreat Talk 2012

Example: Histone Modification H3K4me1 Can you spot the difference? Left Ventricle, H3K4me1 0.08 0.06 0.04 0.02 0.00 32230000 32250000 32270000 32290000 Right Ventricle, H3K4me1 0.08 0.06 0.04 0.02 0.00 32230000 32250000 32270000 32290000 Data from Scott Smemo, Nobrega lab Matthew Stephens Retreat Talk 2012

Advertisement: STAT 45800 We have preliminary ideas and methods for dealing with these data, based on wavelets for count data (work with H. Shim). In STAT 45800 we will try “crowd-sourcing” these ideas, to see how much further progress we can make. Aim: to combine expertises in Bioinformatics, Computing, Biology and Statistics, to make more progress together than any of us could do alone! Matthew Stephens Retreat Talk 2012

Acknowledgements ◮ Bryan Howie, Heejung Shim. ◮ Funding: NHGRI, NIH GTEX project, and NIH ENDGAME consortium. Matthew Stephens Retreat Talk 2012

Statistical Genetics Matthew Stephens Statistics Retreat, October - PowerPoint PPT Presentation

Statistical Genetics Matthew Stephens Statistics Retreat, October 26th 2012 Matthew Stephens Retreat Talk 2012 Two stories The two most influential statistical ideas in analysis of genetic association studies. 1 Sequence, sequence,

Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2016 Human Genetics

Carl Spickett Academic Laboratory of Medical Genetics Academic Laboratory of Medical Genetics

deCODE experience Unnur Styrkarsdottir, PhD deCODE Genetics/Amgen, Reykjavik, Iceland Rotterdam,

DUMMIES COURSE FOR PEOPLE who want to know more about genetics INTRODUCTION to BASIC GENETICS

EPIC 2015 EPIC 2015 20/20 Vision Or a vision of 2020 Genetics 2020 Genetics 2020

recap to this point foundations foundations foundations foundations genetics =

Genetics Policy: Genetics Policy: Progress or Paralysis Progress or Paralysis Kathy Hudson,

Genetics Virtual Science University 1 Genetics Texas TEK B.6 (D) The student will compare

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Non-equilibrium statistical physics, population genetics and evolution Marija Vucelja The

Multiple Quantitative Trait Analysis in Statistical Genetics with Bayesian Networks Marco Scutari

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

DNA Analysis in Glycogen storage disease Nick Beauchamp PhD Sheffield Diagnostic Genetics

Harvey J. Stern MD, Ph.D. Director, Reproductive Genetics Genetics & IVF Institute Fairfax,

Customer Conference Brandon Smith Myriad Genetics September 9-12, 2013 Sundance Resort,

Genetics Unit Lesson 2: How are Characteristics Inherited? VOCAB 1-4 1) Genetics - the study of

Targeted Proteomics Environment Status of the Skyline open-source software project four years

Over er-Archi ching T ng Topi opics: cs: V Var ariant ants Prioritizing

NHG NHGRI RI Genom enomic c Medi edicine e Activi vities National Human Genome Research

Assimilation of satellite altimetry data in hydrological models for improved inland surface water

Scalable Data Science with Hadoop, Spark and R Mario Inchiosa, PhD Principal Software Engineer

Computational Science and Engineering Malik Ghallab April 2013 Centuries of craftsmanship

9/18/2017 UW MEDICINE | UCSF ASIAN HEALTH SYMPOSIUM 2017 UW MEDICINE TITLE OR EVENT

Amendments to DISC2 and CLIN2 Vice President Portfolio Development and Review Concept Proposals

Sambuz

Useful Links

Newsletter

Mail Us

Statistical Genetics Matthew Stephens Statistics Retreat, October - PowerPoint PPT Presentation

Statistical Genetics Matthew Stephens Statistics Retreat, October 26th 2012 Matthew Stephens Retreat Talk 2012 Two stories The two most influential statistical ideas in analysis of genetic association studies. 1 Sequence, sequence,

Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2016 Human Genetics

Carl Spickett Academic Laboratory of Medical Genetics Academic Laboratory of Medical Genetics

deCODE experience Unnur Styrkarsdottir, PhD deCODE Genetics/Amgen, Reykjavik, Iceland Rotterdam,

DUMMIES COURSE FOR PEOPLE who want to know more about genetics INTRODUCTION to BASIC GENETICS

EPIC 2015 EPIC 2015 20/20 Vision Or a vision of 2020 Genetics 2020 Genetics 2020

recap to this point foundations foundations foundations foundations genetics =

Genetics Policy: Genetics Policy: Progress or Paralysis Progress or Paralysis Kathy Hudson,

Genetics Virtual Science University 1 Genetics Texas TEK B.6 (D) The student will compare

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Non-equilibrium statistical physics, population genetics and evolution Marija Vucelja The

Multiple Quantitative Trait Analysis in Statistical Genetics with Bayesian Networks Marco Scutari

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

DNA Analysis in Glycogen storage disease Nick Beauchamp PhD Sheffield Diagnostic Genetics

Harvey J. Stern MD, Ph.D. Director, Reproductive Genetics Genetics &amp; IVF Institute Fairfax,

Customer Conference Brandon Smith Myriad Genetics September 9-12, 2013 Sundance Resort,

Genetics Unit Lesson 2: How are Characteristics Inherited? VOCAB 1-4 1) Genetics - the study of

Targeted Proteomics Environment Status of the Skyline open-source software project four years

Over er-Archi ching T ng Topi opics: cs: V Var ariant ants Prioritizing

NHG NHGRI RI Genom enomic c Medi edicine e Activi vities National Human Genome Research

Assimilation of satellite altimetry data in hydrological models for improved inland surface water

Scalable Data Science with Hadoop, Spark and R Mario Inchiosa, PhD Principal Software Engineer

Computational Science and Engineering Malik Ghallab April 2013 Centuries of craftsmanship

9/18/2017 UW MEDICINE | UCSF ASIAN HEALTH SYMPOSIUM 2017 UW MEDICINE TITLE OR EVENT

Amendments to DISC2 and CLIN2 Vice President Portfolio Development and Review Concept Proposals

Sambuz

Useful Links

Newsletter

Mail Us

Harvey J. Stern MD, Ph.D. Director, Reproductive Genetics Genetics & IVF Institute Fairfax,