Structural Variations 02-715 Advanced Topics in Computa8onal - PowerPoint PPT Presentation

Structural Variations 02-‑715 ¡Advanced ¡Topics ¡in ¡Computa8onal ¡ Genomics ¡

Challenges • Insert ¡sizes ¡of ¡each ¡ mate-‑pair ¡are ¡ unknown ¡and ¡can ¡ vary ¡ • Only ¡the ¡approximate ¡ distribu8on ¡of ¡insert ¡ sizes ¡is ¡available ¡

Indels (Insertions/Deletions) • Small-‑size ¡indels ¡with ¡size<10bp: ¡readily ¡iden8fiable ¡with ¡ Smith-‑Waterman ¡algorithm ¡ • Large-‑size ¡indels ¡with ¡size>50bp: ¡rela8vely ¡easy ¡to ¡iden8fy ¡ • Medium-‑size ¡indels? ¡ – Difficult ¡to ¡dis8nguish ¡between ¡true ¡indels ¡and ¡ ¡insert-‑size ¡varia8ons ¡ – Needs ¡methods ¡for ¡beSer ¡resolu8on ¡

Paired-End Smith-Waterman Alignment Algorithm • Detects ¡short ¡indels ¡ – First, ¡align ¡the ¡reads ¡without ¡gaps ¡ – For ¡those ¡read ¡pairs, ¡where ¡only ¡one ¡read ¡is ¡aligned ¡and ¡the ¡other ¡is ¡ not, ¡apply ¡gapped ¡alignment ¡for ¡the ¡unaligned ¡read ¡

Indels Detected by Paired-End Smith- Waterman Alignment Algorithm

Other Factors • Accuracies ¡can ¡depend ¡on ¡ – Coverage ¡ – Insert ¡sizes ¡ – Read ¡length ¡ – Read ¡alignment ¡accuracy ¡

Depth of Coverage and Physical Coverage • Single-‑end ¡sequencing ¡ • Paired-‑end ¡sequencing ¡ • Paired-‑end ¡sequencing ¡

Statistical Methods for Detecting Structural Variants • MODIL: ¡small ¡indels ¡ • MOGUL: ¡small ¡indels, ¡low ¡coverage, ¡many ¡individuals ¡ • BreakDancer ¡ – BreakDancerMax: ¡detects ¡different ¡types ¡of ¡structural ¡varia8ons ¡ – BreakDancerMini: ¡small ¡indels ¡ • All ¡methods ¡are ¡based ¡on ¡mixture ¡modeling ¡

Mate-pair Clusters • Many ¡methods ¡are ¡applied ¡to ¡mate-‑pair ¡clusters ¡a\er ¡ mapping ¡the ¡mate-‑pairs ¡to ¡reference ¡genome ¡ • Clustering ¡of ¡mate ¡pairs ¡

MODIL • Mixture ¡of ¡distribu8ons ¡indel ¡locators ¡ • Model ¡insert-‑size ¡distribu8ons ¡at ¡each ¡locus ¡ i ¡ – Mixture ¡component ¡1 ¡ P ( Y ): ¡mapped ¡distances ¡for ¡no ¡indels ¡ – Mixture ¡component ¡2 ¡ P ( C i ): ¡mapped ¡distances ¡for ¡indels ¡

MODIL • ¡Blue: ¡insert-‑size ¡distribu8on ¡with ¡no ¡dele8ons ¡ ¡Red: ¡insert-‑size ¡distribu8on ¡with ¡dele8ons ¡ • Homozygous ¡dele8ons ¡ Heterozygous ¡dele8ons ¡

MODIL Algorithm • Map ¡the ¡mate-‑pairs ¡to ¡reference ¡genome ¡ • Es8mate ¡ P ( Y ) ¡from ¡mapped ¡distances ¡across ¡the ¡whole ¡genome ¡ • For ¡each ¡locus ¡ i , ¡es8mate ¡ P ( C i ) ¡from ¡the ¡mate-‑pairs ¡that ¡span ¡the ¡ locus ¡ – Loca8on-‑shi\ed ¡distribu8on ¡of ¡ P ( Y ) ¡ – Expected ¡size ¡of ¡indels ¡ D 1 ,D 2 ¡ for ¡each ¡of ¡two ¡haplotypes ¡ ¡ – EM ¡algorithm ¡ • Expected ¡indel ¡size: ¡

Applying MODIL to Illumina Dataset • 40-‑fold ¡read ¡coverage ¡ • Observed ¡insert ¡size: ¡mean ¡208, ¡standard ¡devia8on ¡13 ¡ • To ¡determine ¡whether ¡there ¡is ¡an ¡inser8on/dele8on ¡at ¡each ¡ locus, ¡find ¡a ¡cluster ¡of ¡mate ¡pairs ¡spanning ¡that ¡locus. ¡Each ¡ cluster ¡is ¡required ¡to ¡have ¡20 ¡mate ¡pairs ¡

MODIL: Performance Number ¡of ¡inser8ons/dele8ons ¡ discovered ¡by ¡MODIL ¡

Mixture of Genotypes Variant Locator (MOGUL): Motivation • Higher ¡coverage ¡leads ¡to ¡more ¡accurate ¡results ¡ – MODiL ¡works ¡for ¡more ¡than ¡20 ¡mate-‑pairs ¡covering ¡each ¡locus ¡ – 1000 ¡genome ¡project: ¡less ¡than ¡4 ¡mate-‑pairs ¡covering ¡each ¡locus ¡for ¡ each ¡individual ¡ ¡ • What ¡if ¡we ¡have ¡many ¡individuals ¡with ¡each ¡having ¡lower ¡ coverage ¡ – How ¡can ¡we ¡combine ¡the ¡sta8s8cal ¡strength ¡across ¡mul8ple ¡ individuals? ¡ – Can ¡we ¡es8mate ¡variant ¡loca8ons/sizes ¡and ¡allele ¡frequencies? ¡ – MOGUL ¡(Mixture ¡of ¡Genotypes ¡Variant ¡Locator) ¡(Lee ¡et ¡al., ¡2010) ¡ – Allele ¡frequency/coverage/number ¡of ¡individuals ¡can ¡influence ¡the ¡ performance ¡

1000 Genome Project (The 1000 Genome Project Consortium, Nature 2010) ¡ ¡ The ¡ goal ¡is ¡to ¡characterize ¡over ¡ 95% ¡of ¡variants ¡ that ¡are ¡in ¡genomic ¡regions ¡accessible ¡to ¡ current ¡high-‑throughput ¡sequencing ¡technologies ¡and ¡that ¡have ¡ allele ¡frequency ¡of ¡1% ¡ or ¡higher ¡ (the ¡classical ¡defini8on ¡of ¡polymorphism) ¡in ¡each ¡of ¡ five ¡major ¡popula;on ¡ groups ¡ (popula8ons ¡in ¡or ¡with ¡ancestry ¡from ¡Europe, ¡East ¡Asia, ¡South ¡Asia, ¡West ¡Africa ¡ and ¡the ¡Americas) ¡ ¡ Pilot ¡project: ¡ ¡ ¡ ¡ -‑ ¡ ¡179 ¡individuals ¡from ¡four ¡popula8ons ¡ ¡ ¡ ¡ ¡ (low ¡coverage: ¡2-‑6x) ¡ ¡ ¡ ¡ -‑ ¡ ¡6 ¡individuals ¡in ¡two ¡trios ¡ ¡ ¡ ¡ ¡ ¡ (deep ¡sequencing: ¡average ¡42x) ¡ ¡ ¡ ¡ -‑ ¡ ¡697 ¡individuals ¡from ¡seven ¡popula8ons ¡ ¡ ¡ ¡ ¡ (exon ¡sequencing ¡of ¡8,140 ¡exons: ¡average ¡50x) ¡ ¡ Main ¡project: ¡sequence ¡2500 ¡genomes ¡at ¡4x ¡coverage ¡ ¡

MoGUL • A ¡Bayesian ¡approach ¡for ¡discovering ¡indels ¡from ¡a ¡large ¡ number ¡of ¡individuals ¡sequenced ¡at ¡a ¡low ¡coverage ¡ – Explicitly ¡models ¡each ¡individual ¡as ¡homozygous ¡or ¡heterozygous ¡at ¡ each ¡locus ¡ – Computes ¡expected ¡minor ¡allele ¡frequency ¡(MAF) ¡at ¡each ¡locus ¡ – Allows ¡iden8fica8on ¡of ¡indels ¡> ¡30 ¡bases ¡for ¡MAF ¡> ¡0.04 ¡

Mate-pair Clusters • Clustering ¡of ¡mate ¡pairs ¡ • Blue/red ¡for ¡each ¡of ¡two ¡individuals ¡

Insert-size Distributions • The ¡insert ¡size ¡distribu8on ¡varies ¡across ¡libraries ¡and ¡ individuals ¡ • Insert ¡sizes ¡for ¡each ¡individual ¡need ¡to ¡be ¡modeled ¡as ¡ separate ¡random ¡variables ¡

MOGUL • For ¡a ¡given ¡locus ¡ – X lm : ¡insert ¡size ¡for ¡ individual ¡ l , ¡mate ¡pair ¡ m • D lm : mapped ¡distance ¡for ¡ l-‑th ¡individual, ¡m-‑th ¡mate ¡ pair • μ Yi : ¡ mean ¡of ¡the ¡insert ¡ size ¡in ¡the ¡case ¡of ¡no ¡ indels ¡ ¡

MOGUL • For ¡a ¡given ¡locus ¡ – L : ¡Number ¡of ¡individuals ¡ – M l : ¡Number ¡of ¡mate-‑pairs ¡ for ¡individual ¡ l ¡ – Z l : ¡0/1 ¡for ¡no ¡indels/indels ¡ – X lm : ¡insert ¡size ¡for ¡ individual ¡ l , ¡mate ¡pair ¡ m – Q lm : Two ¡copies ¡of ¡ chromosomes

MoGUL • Prior ¡distribu8ons ¡ • Find ¡a ¡MAP ¡(maximum ¡ a ¡posteriori) ¡es8mate ¡of ¡ the ¡unknown ¡ parameters ¡

MOGUL: Simulation Study • Heatmap ¡for ¡average ¡error ¡rates ¡of ¡20 ¡MOGUL ¡simula8ons ¡ ¡

BreakDancer • BreakDancerMax ¡ – Detects ¡dele8ons, ¡inser8ons, ¡inversions, ¡intrachromosomal ¡and ¡ interchromosomal ¡transloca8ons ¡ • BreakDancerMini ¡ – Focuses ¡on ¡small ¡indels ¡(10-‑100bp) ¡that ¡are ¡o\en ¡missed ¡by ¡ BreakDancerMax ¡

BreakDancer

BreakDancerMax • Detects ¡normal, ¡dele8on, ¡inser8on, ¡inversion, ¡ intrachromosomal ¡transloca8on ¡and ¡interchromosomal ¡ transloca8on ¡ • Focuses ¡on ¡rela8vely ¡large ¡inser8ons/dele8ons ¡ • Poisson ¡mixture ¡model ¡with ¡a ¡mixture ¡component ¡for ¡each ¡ type ¡of ¡structural ¡variant ¡

BreakDancerMax Algorithm • Align ¡mate-‑pairs ¡to ¡reference ¡genome ¡ • Assign ¡each ¡mate-‑pair ¡to ¡categories ¡of ¡normal/dele8on/ inser8on/inversion/transloca8on ¡ • Select ¡those ¡regions ¡spanned ¡by ¡two ¡or ¡more ¡anomalous ¡read ¡ pairs ¡as ¡candidate ¡structural ¡variants ¡ • Confidence ¡score ¡based ¡on ¡Poisson ¡mixture ¡model ¡is ¡assigned ¡ to ¡each ¡candidate ¡structural ¡variant ¡

Structural Variations 02-715 Advanced Topics in Computa8onal - PowerPoint PPT Presentation

Structural Variations 02-715 Advanced Topics in Computa8onal Genomics Challenges Insert sizes of each mate-pair are unknown and can vary Only the

Monthly & Quarterly Tariff Variations July 2016 to June 2019 Tariff Variations Tariff

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

Variations in the Quality of Variations in the Quality of TN-VPK Classrooms TN-VPK Classrooms

Repeat Repeat runs/variations on a theme runs/variations on a theme Model

Variations of Parotidectomy Variations of Parotidectomy Indications and Technique

Variations on a Theme by Friedman Ali Enayat, G oteborgs Universitet September 5, 2013

Brownian Motion Variations and Brownian Motion with drift Today: Various variations of

P P Partial Partial-Scan & Scan ti l ti l S S Scan & Scan & S & S

Variations and Brownian Motion with drift Bo Friis Nielsen 1 1 DTU Informatics 02407 Stochastic

Discovery of Genomic Structural Variations with Next-Generation Sequencing Data Advanced Topics

Structural isomer Background. Structural isomerism Structural isomerism at nanoscale.

THE 3 PRINCIPLES OF STRUCTURAL DESIGN FOR STRUCTURAL DETAILING DENIS H CAMILLERI BICC CPD 03/03

Structural Health Monitoring Structural Health Monitoring Using Using PZT Impedance

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

STRUCTURAL CE 382 ANALYSIS Struct Structural Analysis ural Analysis As a structural engineer,

Renewable Energy Projects Evaluating Tax Risks, Navigating Structural Variations, Leveraging

CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemels lectures

Potential PCA Interpretation Problems for Volatility Smile Dynamics Robert Tompkins, Dimitri

Applied Machine Learning Multivariate Gaussian Siamak Ravanbakhsh COMP 551 (Fall 2020) Admin

Week 1: Introduc/on Precision and covariance matrix 2 1.2C

Exploiting Latency Variation for Access Conflict Reduction of NAND Flash Memory Jinhua Cui,

Alex Suciu Northeastern University ETnA 2017: Encounter in Topology and Algebra Scuola

Variety of orthomodular posets Ivan Chajda, Miroslav Kola r k Palack y University

Bertini irreducibility theorems via statistics Bjorn Poonen (joint work with Kaloyan Slavov of

Sambuz

Useful Links

Newsletter

Mail Us

Structural Variations 02-715 Advanced Topics in Computa8onal - PowerPoint PPT Presentation

Structural Variations 02-715 Advanced Topics in Computa8onal Genomics Challenges Insert sizes of each mate-pair are unknown and can vary Only the

Monthly &amp; Quarterly Tariff Variations July 2016 to June 2019 Tariff Variations Tariff

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

Variations in the Quality of Variations in the Quality of TN-VPK Classrooms TN-VPK Classrooms

Repeat Repeat runs/variations on a theme runs/variations on a theme Model

Variations of Parotidectomy Variations of Parotidectomy Indications and Technique

Variations on a Theme by Friedman Ali Enayat, G oteborgs Universitet September 5, 2013

Brownian Motion Variations and Brownian Motion with drift Today: Various variations of

P P Partial Partial-Scan &amp; Scan ti l ti l S S Scan &amp; Scan &amp; S &amp; S

Variations and Brownian Motion with drift Bo Friis Nielsen 1 1 DTU Informatics 02407 Stochastic

Discovery of Genomic Structural Variations with Next-Generation Sequencing Data Advanced Topics

Structural isomer Background. Structural isomerism Structural isomerism at nanoscale.

THE 3 PRINCIPLES OF STRUCTURAL DESIGN FOR STRUCTURAL DETAILING DENIS H CAMILLERI BICC CPD 03/03

Structural Health Monitoring Structural Health Monitoring Using Using PZT Impedance

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

STRUCTURAL CE 382 ANALYSIS Struct Structural Analysis ural Analysis As a structural engineer,

Renewable Energy Projects Evaluating Tax Risks, Navigating Structural Variations, Leveraging

CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun &amp; Rich Zemels lectures

Potential PCA Interpretation Problems for Volatility Smile Dynamics Robert Tompkins, Dimitri

Applied Machine Learning Multivariate Gaussian Siamak Ravanbakhsh COMP 551 (Fall 2020) Admin

Week 1: Introduc/on Precision and covariance matrix 2 1.2C

Exploiting Latency Variation for Access Conflict Reduction of NAND Flash Memory Jinhua Cui,

Alex Suciu Northeastern University ETnA 2017: Encounter in Topology and Algebra Scuola

Variety of orthomodular posets Ivan Chajda, Miroslav Kola r k Palack y University

Bertini irreducibility theorems via statistics Bjorn Poonen (joint work with Kaloyan Slavov of

Sambuz

Useful Links

Newsletter

Mail Us

Monthly & Quarterly Tariff Variations July 2016 to June 2019 Tariff Variations Tariff

P P Partial Partial-Scan & Scan ti l ti l S S Scan & Scan & S & S

CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemels lectures