Lecture 3: Introduction to Association Analysis 02-715 Advanced - PowerPoint PPT Presentation

Lecture 3: Introduction to Association Analysis 02-‑715 ¡Advanced ¡Topics ¡in ¡Computa8onal ¡Genomics ¡

Genome Polymorphisms

Type of Polymorphisms • Each variant is called an “ allele ” � • Almost always bi-allelic � • Account for most of the genetic diversi ty among different (normal) individual, e.g. drug response, disease susceptib ility

A Human TCGAGGTATTAAC Genealogy ¡ The ¡ancestral ¡chromosome ¡

From SNPS … ¡ TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC * ** * *

… To Haplotypes ¡ A ¡disease ¡muta8on ¡

Population-Based Association Study • Case/control ¡data ¡are ¡collected ¡from ¡unrelated ¡individuals ¡ – All ¡individuals ¡are ¡related ¡if ¡we ¡go ¡back ¡far ¡enough ¡in ¡the ¡ancestry ¡ Balding, ¡Nature ¡Reviews ¡Gene8cs, ¡2006 ¡

Advantages of SNPs in Genetic Analysis of Complex Traits Abundance: ¡high ¡frequency ¡on ¡the ¡genome ¡ • Posi8on: ¡throughout ¡the ¡genome ¡ ¡ • – coding ¡region, ¡intron ¡region, ¡promoter ¡site ¡ Ease ¡of ¡genotyping ¡ • Less ¡mutable ¡than ¡other ¡forms ¡of ¡polymorphisms ¡ • SNPs ¡account ¡for ¡around ¡90% ¡of ¡human ¡genomic ¡varia8on ¡ • • About ¡10 ¡million ¡SNPs ¡exist ¡in ¡human ¡popula8ons ¡ Most ¡SNPs ¡are ¡outside ¡of ¡the ¡protein ¡coding ¡regions ¡ • 1 ¡SNP ¡every ¡600 ¡base ¡pairs ¡ • More ¡than ¡5 ¡million ¡common ¡SNPs ¡each ¡with ¡frequency ¡10-‑50% ¡account ¡ • for ¡the ¡bulk ¡of ¡human ¡DNA ¡sequence ¡difference ¡ It ¡is ¡es8mated ¡that ¡~60,000 ¡SNPs ¡occur ¡within ¡exons; ¡85% ¡of ¡exons ¡are ¡ • within ¡5 ¡kb ¡of ¡the ¡nearest ¡SNP ¡

Causal Mutations and Genetic Markers Causal ¡ SNP ¡Marker ¡ Muta8on ¡ X ¡ X ¡ X ¡ Linkage ¡ Disequilibrium ¡ • ¡Fine ¡mapping ¡required ¡

Linkage Analysis vs. Association Analysis Strachan ¡& ¡Read, ¡Human ¡Molecular ¡Gene8cs, ¡2001 ¡

Overview • Single ¡SNP ¡associa8on ¡test ¡ Discrete-‑valued ¡phenotype: ¡case/control ¡study ¡ • Con8nuous-‑valued ¡phenotype: ¡quan8ta8ve ¡traits ¡ • Correc8ng ¡for ¡mul8ple ¡tes8ng ¡ • • Leveraging ¡linkage ¡disequilibrium ¡ • Mul8marker ¡associa8on ¡test ¡ • Genotype ¡imputa8on ¡method ¡

Single SNP Association Analysis: Case/Control Study • For ¡each ¡marker ¡locus, ¡find ¡the ¡ 3x2 ¡con8ngency ¡table ¡containing ¡ the ¡counts ¡of ¡three ¡genotypes ¡ Genotype Case Control AA N case,AA N control,AA Aa N case,Aa N control,Aa aa N case,aa N control,aa Total N case N control 2 χ • ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡test ¡with ¡2 ¡df, ¡or ¡Fisher’s ¡ exact ¡test ¡under ¡the ¡null ¡ Genotype ¡score ¡= ¡the ¡number ¡of ¡minor ¡alleles ¡ ¡ hypothesis ¡of ¡no ¡associa8on ¡ ¡

Single SNP Association Analysis: Case/Control Study • Alterna8vely, ¡assume ¡an ¡addi8ve ¡model, ¡where ¡the ¡ heterozygote ¡risk ¡is ¡approximately ¡between ¡the ¡two ¡ homozygotes ¡ • Form ¡a ¡2x2 ¡con8ngency ¡table. ¡Each ¡individual ¡contributes ¡ twice ¡from ¡each ¡of ¡the ¡two ¡chromosomes. ¡ Genotype Case Control A G case,A G control,A a G case,a G control,a Total 2xN case 2xN control • ¡ ¡ ¡ ¡ ¡ ¡test ¡with ¡1df ¡ 2 χ

Single SNP Association Analysis: Continuous-valued Traits • Con8nuous-‑valued ¡traits ¡ – Also ¡called ¡quan8ta8ve ¡traits ¡ – Cholesterol ¡level, ¡blood ¡ pressure ¡etc. ¡ • For ¡each ¡locus, ¡fit ¡a ¡linear ¡ regression ¡using ¡the ¡number ¡ of ¡minor ¡alleles ¡at ¡the ¡given ¡ locus ¡of ¡the ¡individual ¡as ¡ covariate ¡

Genetic Model for Association • Addi8ve ¡effect ¡ – Major ¡allele ¡homozygote: ¡0 ¡ – Heterozygote: ¡ a ¡ + ¡ a ¡ x ¡ k ¡ – Minor ¡allele ¡homozygote: ¡2 a ¡ • k =1: ¡dominant ¡effect ¡of ¡the ¡minor ¡allele ¡ • k =0: ¡no ¡dominance ¡ • k =-‑1: ¡dominant ¡effect ¡of ¡the ¡minor ¡allele ¡

Penetrance • Propor8ons ¡of ¡individuals ¡carrying ¡a ¡par8cular ¡allele ¡that ¡ possess ¡an ¡associated ¡trait ¡ • Alleles ¡with ¡high ¡penetrance ¡are ¡easier ¡to ¡detect ¡in ¡ associa8on ¡analysis ¡

Correcting for Multiple Testing • What ¡happens ¡when ¡we ¡scan ¡the ¡genome ¡of ¡1 ¡million ¡markers ¡ for ¡associa8on ¡with ¡α ¡= ¡0.05? ¡ – 50,000 ¡(=1 ¡millionx0.05) ¡SNPs ¡are ¡expected ¡to ¡be ¡found ¡significant ¡just ¡ by ¡chance ¡ – We ¡need ¡to ¡be ¡more ¡conserva8ve ¡when ¡we ¡decide ¡a ¡given ¡marker ¡is ¡ significantly ¡associated ¡with ¡the ¡trait. ¡ • Correc8on ¡methods ¡ – Bonferroni ¡correc8on ¡ – Permuta8on ¡test ¡

Bonferroni Correction • If ¡N ¡markers ¡are ¡tested, ¡we ¡correct ¡the ¡significance ¡level ¡as ¡ α’= ¡α/N ¡ – Assumes ¡the ¡N ¡tests ¡are ¡independent, ¡although ¡this ¡is ¡not ¡true ¡ because ¡of ¡the ¡linkage ¡disequilibrium. ¡ ¡ – Overly ¡conserva8ve ¡for ¡8ghtly ¡linked ¡markers ¡

Permutation Procedure • Step ¡1: ¡Compute ¡the ¡test ¡sta8s8c ¡ T ¡using ¡the ¡original ¡dataset ¡ • Step ¡2: ¡Set ¡ N sig ¡ = ¡ 0 ¡ • Step ¡3: ¡Repeat ¡1: N perm ¡ ¡ – Step ¡3a: ¡Randomly ¡permute ¡the ¡individuals ¡in ¡the ¡phenotype ¡data ¡to ¡ generate ¡datasets ¡with ¡no ¡associa8on ¡(retain ¡the ¡original ¡genotype) ¡ – Step ¡3b: ¡Find ¡the ¡test ¡sta8s8cs ¡ T perm ¡of ¡SNPs ¡using ¡the ¡permuted ¡ dataset ¡ – Step ¡3c: ¡if ¡ T> ¡ T perm, ¡ N sig ¡ = ¡ N sig +1 ¡ ¡ • Step ¡4: ¡Compute ¡ p -‑value ¡as ¡(1-‑ N sig / N perm ) ¡ This ¡approach ¡is ¡computa8onally ¡demanding ¡because ¡ olen ¡a ¡large ¡ N perm ¡is ¡required. ¡

Multi-marker Association Test • Idea: ¡a ¡haplotype ¡of ¡mul8ple ¡SNPs ¡is ¡a ¡bemer ¡proxy ¡for ¡a ¡true ¡ causal ¡SNP ¡than ¡a ¡single ¡SNP ¡ – Exploit ¡the ¡linkage ¡disequilibrium ¡structure ¡in ¡genome ¡ • Form ¡a ¡new ¡allele ¡by ¡combining ¡mul8ple ¡SNPs ¡for ¡a ¡haplotype ¡ SNP ¡A ¡ ¡ ¡ ¡SNP ¡B ¡ Auxiliary ¡Markers ¡for ¡Haplotypes ¡ ¡ ¡ ¡ ¡0 ¡ ¡0 ¡ ¡1 ¡0 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡ ¡0 ¡ ¡1 ¡ ¡0 ¡1 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡ ¡1 ¡ ¡0 ¡ ¡0 ¡0 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡1 ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡ ¡1 ¡ ¡1 ¡ ¡0 ¡0 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡1 ¡ • Test ¡the ¡haplotype ¡allele ¡for ¡associa8on ¡

Multi-marker Association Test • Mul8-‑marker ¡approach ¡can ¡capture ¡dependencies ¡across ¡ mul8ple ¡markers ¡ – SNPs ¡in ¡LD ¡form ¡a ¡haplotype ¡that ¡can ¡be ¡tested ¡as ¡a ¡single ¡allele ¡ – Can ¡achieve ¡the ¡same ¡power ¡with ¡data ¡collected ¡for ¡fewer ¡samples ¡ • Challenge ¡as ¡the ¡size ¡of ¡haplotype ¡increases ¡ – Haplotype ¡of ¡ K ¡SNPs ¡results ¡in ¡2 K ¡different ¡haplotypes, ¡but ¡the ¡number ¡ of ¡samples ¡corresponding ¡to ¡each ¡haplotype ¡decreases ¡quickly ¡as ¡we ¡ increase ¡ K ¡ – Large ¡ K ¡requires ¡a ¡large ¡sample ¡size ¡

Imputation-Based Methods (Servin & Stephens, 2007) Tag ¡SNP ¡ Non-‑tag ¡SNP ¡

Yeast Genomic Datasets • Yeast ¡genomic ¡datasets ¡ -‑ Genotypes ¡from ¡112 ¡segregants ¡from ¡a ¡yeast ¡cross ¡ between ¡BY ¡and ¡RM ¡strains ¡ -‑ Microarray ¡gene-‑expression ¡data ¡ -‑ Transcrip8on ¡factor ¡binding ¡site ¡data ¡ -‑ Protein-‑protein ¡interac8on ¡data ¡

Lecture 3: Introduction to Association Analysis 02-715 Advanced - PowerPoint PPT Presentation

Lecture 3: Introduction to Association Analysis 02-715 Advanced Topics in Computa8onal Genomics Genome Polymorphisms Type of Polymorphisms Each variant is called an allele Almost always

PSS718 - Data Mining Lecture 7 - Association Analysis Asst.Prof.Dr. Burkay Gen Hacettepe

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

NATIONAL HOME WATCH ASSOCIATION NATIONAL HOME WATCH ASSOCIATION NATIONAL HOME WATCH ASSOCIATION

Association Rules from transactional databases ! Mining multilevel association rules from

Israel Water Association Israel Water Association Avraham Israeli Israel Water Association The

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules

SWOT Analysis W T S O SWOT Analysis Learning Objectives What is SWOT Analysis? What is SWOT

Analysis and Optimizations Analysis and Optimizations Program Analysis Program Analysis

Association canadienne des professeurs de langues secondes Canadian Association of Second Language

Technical Analysis Technical Analysis Technical Analysis Technical Analysis Introduction

Alias Analysis Last time Reuse optimization Today Alias analysis (pointer analysis)

EDA045F: Program Analysis LECTURE 8: DYNAMIC ANALYSIS 1 Christoph Reichenbach In the last

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

The Hong Kong Association for the Study of Liver D 11/25/18 The Hong Kong Association for the

Logistics Checkpoint 1 -- Framework Genotypes and Phenotypes Due Friday, Dec 22nd.

Orthology * and paralogy >pro >p rote tein in_s _seq eque uence_A MTQSSHAVAA FD

CARNAC-LR: clustering genes expressed variants from long read RNA sequencing Camille Marchet ,

Module 13: Molecular Phylogenetics Instructors : Joe Felsenstein (University of Washington) Mark

EVA: Exome Variation Analyzer, a convivial tool for filtering strategies S. Coutant 1,2 , A.

Model-Based Evolutionary Algorithms Part 2: Linkage Tree Genetic Algorithm Dirk Thierens

The impact of genetic drift on the runtime of simple estimation-of-distribution algorithms Dirk

Analysis of Ashley Sawle based on slides by Bernard Pereira The many faces of

Lecture 3: Introduction to Association Analysis 02-715 Advanced - PowerPoint PPT Presentation

Lecture 3: Introduction to Association Analysis 02-715 Advanced Topics in Computa8onal Genomics Genome Polymorphisms Type of Polymorphisms Each variant is called an allele Almost always

PSS718 - Data Mining Lecture 7 - Association Analysis Asst.Prof.Dr. Burkay Gen Hacettepe

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

NATIONAL HOME WATCH ASSOCIATION NATIONAL HOME WATCH ASSOCIATION NATIONAL HOME WATCH ASSOCIATION

Association Rules from transactional databases ! Mining multilevel association rules from

Israel Water Association Israel Water Association Avraham Israeli Israel Water Association The

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules

SWOT Analysis W T S O SWOT Analysis Learning Objectives What is SWOT Analysis? What is SWOT

Analysis and Optimizations Analysis and Optimizations Program Analysis Program Analysis

Association canadienne des professeurs de langues secondes Canadian Association of Second Language

Technical Analysis Technical Analysis Technical Analysis Technical Analysis Introduction

Alias Analysis Last time Reuse optimization Today Alias analysis (pointer analysis)

EDA045F: Program Analysis LECTURE 8: DYNAMIC ANALYSIS 1 Christoph Reichenbach In the last

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

The Hong Kong Association for the Study of Liver D 11/25/18 The Hong Kong Association for the

Logistics Checkpoint 1 -- Framework Genotypes and Phenotypes Due Friday, Dec 22nd.

Orthology * and paralogy &gt;pro &gt;p rote tein in_s _seq eque uence_A MTQSSHAVAA FD

CARNAC-LR: clustering genes expressed variants from long read RNA sequencing Camille Marchet ,

Module 13: Molecular Phylogenetics Instructors : Joe Felsenstein (University of Washington) Mark

EVA: Exome Variation Analyzer, a convivial tool for filtering strategies S. Coutant 1,2 , A.

Model-Based Evolutionary Algorithms Part 2: Linkage Tree Genetic Algorithm Dirk Thierens

The impact of genetic drift on the runtime of simple estimation-of-distribution algorithms Dirk

Analysis of Ashley Sawle based on slides by Bernard Pereira The many faces of

Orthology * and paralogy >pro >p rote tein in_s _seq eque uence_A MTQSSHAVAA FD