viaDBG : Inference of viral quasispecies with a paired de Bruijn - PowerPoint PPT Presentation

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG : Inference of viral quasispecies with a paired de Bruijn graph Borja Freire 1 , Susana Ladra 1 , Jose Paramá 1 , and Leena Salmela 2 1 Universidade da Coruña 2 University of Helsinki February 2020 Freire et al . viaDBG 1 / 25

Introduction/Motivation Methods viaDBG Results Conclusion Contents 1 Introduction/Motivation 2 Methods 3 viaDBG 4 Results 5 Conclusion Freire et al . viaDBG 2 / 25

Introduction/Motivation Methods viaDBG Results Conclusion Introduction Viral quasispecies problem motivation Viral quasispecies are population of closely related strains emerged from RNA viruses with high mutation rate. The higher mutation rate the larger number of closely related strains. Each mutation produces his own haplotypes. It is important to capture the whole set of strains because different strains might have different responses to the available drugs and treatments. Freire et al . viaDBG 3 / 25

Introduction/Motivation Methods viaDBG Results Conclusion Introduction Viral quasispecies problem The viral quasispecies assembly problem asks to characterize the quasispecies present in a sample from high-throughput sequencing data. There are two base hypotheses that relax the problem : All the genomes are totally covered in the sample. The coverage of the genomes is expected to be larger than in common assembly problems. There are two major challenges : The presence of similar haplotypes in the data makes it difficult to separate the reads to different haplotype sequences. Viral samples are typically sequenced to a much deeper coverage than e.g samples for genomic or metagenomic sequencing. Freire et al . viaDBG 4 / 25

Introduction/Motivation Methods viaDBG Results Conclusion Methods Reference based and de-novo methods Current methods available for assembling viral quasispecies are either reference-based or de novo . Reference-based methods : Reference-guided methods are based on using one or several strains to guide the assembly problem. Some examples : HaploClique, ViQuaS or PredictHaplo. The main problem of these methods is that the reference used might be obsolete due the high mutation ratio. de novo methods : They are reference free. Some examples : SAVAGE, PeHaplo or MLEHaplo. Freire et al . viaDBG 5 / 25

Introduction/Motivation Methods viaDBG Results Conclusion Methods Overlap and de Bruijn graphs De Bruijn graphs : Faster. Less accurate. SOAPdenovo2, SGA & metaSPAdes (for metagenomic but also useful on viral quasispecies). Overlap graphs : Slower. More accurate. SAVAGE, PeHaplo & HaploClique. Freire et al . viaDBG 6 / 25

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG - Overview Pipeline Error Correction Obtain solid k-mers Apply LorDEC Haplotype Inference Obtain unitigs and Add paired-end information representative k-mers to k- mers o Build DBG Obtain the haplotypes Polish paired-end o For each pair of adjacent nodes in DBG information • Build CPBG • Find Cliques • For each Clique create new nodes in the modified DBG’ Obtain unitigs in DBG’ o Freire et al . viaDBG 7 / 25

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG - Error Correction Obtain solid k -mers What is a solid k -mer ? Solid k -mers commonly refer to k -mers that are likely to be part of the real genomic information. There are several methods to obtain these k -mers such as : Parametrical statistical methods - based on the mix of different distribution like Gaussian or Poisson. Non-parametrical statistical methods - based on features provided by the sample like k -mer frequency, gradient information and so on. Freire et al . viaDBG 8 / 25

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG - Error Correction viaDBG solid k -mers viaDBG uses the histogram of k -mer in the sample (Non-parametrical statistical method). The idea behind the selection is to find a point t where frequencies reach a stable state. The stability is measured using a window, but surprisingly we obtained from several tests that the windows size does not have a high impact over the final result. Freire et al . viaDBG 9 / 25

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG - Error Correction Apply LoRDEC LoRDEC is a “well-known” hybrid reads corrector for third generation sequencing (TGS) reads. Steps (simplified version) : Classify k -mers from the TGS as solid or not solid based on the k -mer frequency. Building of a de Bruijn graph from short reads. Between solid k -mers with non-solid gap between them look for a path in the de Bruijn graph. Complete de reads by using this paths. Repeat iteratively by selecting a higher k -mer size for each iteration. Freire et al . viaDBG 10 / 25

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG - Haplotype inference Obtain representative k -mers What is a representative k -mer ? In our case, it is the k -mer in the middle of a unitig. The use of representative k -mers covers two main problems : Efficiency - by working only with representatives, we create a more succinct graph representation (this is exactly the same idea under the succinct de Bruijn graph) Effectiveness - by using representatives, we are reducing the impact of the ± ∆ (variability of the paired end distance). Freire et al . viaDBG 11 / 25

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG - Haplotype inference Obtain representative k -mers I First = G G H I J Last = J C A B C D First = A Last = D O P M N O First = M Last = P Freire et al . viaDBG 12 / 25

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG - Haplotype inference Add paired-end information to k -mers L(r x ) ………….. R(r x ) A M j . . . . . j+k j . . . . . j+k L(r y ) A ………….. L R(r y ) u. . . . . u+k u. . . . . u+k P( A )=( M, L ) Freire et al . viaDBG 13 / 25

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG - Haplotype inference Polish paired-end information The polishing method removes outliers with large variance in the insert size. Challenge - remove outliers without removing low abundance strains. The idea behind the polishing can be summarise as : � f( A, M ) + |{ S | f( A, S ) ≥ 1 and d ( M, S ) < max-path-len }| f’(A,M)= min max-threshold Where f(A,M) is the number of times A and M has been associated as left and right k -mers, and d(M,S) is the distance between M and S. Freire et al . viaDBG 14 / 25

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG - Obtain the haplotypes Cliques Paired de Bruijn Graph For each pair of adjacent nodes of the DBG, viaDBG builds one Cliques Paired de Bruijn Graph , henceforth CPBG. What is a CPBG ? The nodes of the CPBG are the paired k-mers of the two considered nodes and edges connect paired k-mers if they are connected in the DBG by a short path. Furthermore, nodes have labelled the number of times the k-mer has been associated with the left k-mer. Freire et al . viaDBG 15 / 25

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG - Obtain the haplotypes Cliques Paired de Bruijn Graph The next step is to find the maximal cliques in the CPBG. Conceptually, cliques on the graph are sets of k-mers that belong to the same haplotypic sequence. The obtained cliques must be polished because some of them come from erroneous k-mers, wrong relations (from shared regions between strains) and/or repetitive sections. Freire et al . viaDBG 16 / 25

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG - Obtain the haplotypes Cliques Paired de Bruijn Graph (easy example) c 0 c 0 D E F J M D E F c 1 c 1 G I N H G H I (a): CPBG(A,B) (b): CPBG(A,C) c 0 … D c 0 B M K E F J L D F A F L c 1 J L M … C H I N G G I N c 1 I N (c): CPBG(B,K) (d): CPBG(C,K) Freire et al . viaDBG 17 / 25

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG - Obtain the haplotypes Cliques Paired de Bruijn Graph (complete example) D E F J D E F J 8 18 10 9 8 45 46 47 46 10 G I G N B … D E J 8 1 1 19 21 10 F (a) (b) 9 A D E F J … D E F J C G H I 1 1 1 1 1 1 2 1 1 G G I I 1 19 21 2 1 (c) (d) Freire et al . viaDBG 18 / 25

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG - Obtain the haplotypes Building the new de Bruijn graph Given A and B, two nodes of the de Bruijn graph and C a set of maximal cliques from the CPBG of A and B. For each clique c x ∈ C : If c x has nodes of P ( A ) and P ( B ) , where P ( X ) is the paired-end information for node X then the nodes A P A ∩ c x and B P B ∩ c x are added to the new de Bruijn graph, henceforth DBG’. When we should not create new nodes ? If A P A ∩ c x or B P B ∩ c x already belongs to the DBG’. Finally, contigs are obtained as unitigs in this new graph. Freire et al . viaDBG 19 / 25

viaDBG : Inference of viral quasispecies with a paired de Bruijn - PowerPoint PPT Presentation

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG : Inference of viral quasispecies with a paired de Bruijn graph Borja Freire 1 , Susana Ladra 1 , Jose Param 1 , and Leena Salmela 2 1 Universidade da Corua 2 University of

STAT 113 Independent vs. Paired Samples Colin Reimer Dawson Oberlin College November 16, 2017

National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention: NCHHSTP Division of Viral

NGS in clinical Italian practice: impact of minor quasispecies on antiretroviral drug resistance

Paired Programming & Personality Traits Andrew J. Dick Red Hook Group

PAIRED READING What is it? Paired Reading is a way in which YOU can help your child to improve

Paired Reading 1 Why Paired Reading? Tried and tested evidence based Children feel

Paired t-test STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Disclosures I have nothing to disclose Paired Exchange: Compatible Pairs and Process Valerie

Strongly paired fermions Alexandros Gezerlis TALENT/INT Course on Nuclear forces and their

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with

Introductory Statistics Day 25 Paired Means Test 4.4 Paired Tests Find the data set

Viral Video Marketing Pecha Kucha Presentation Shane Hirschman, Creative Web, Winter 2008 Viral

Viral Shedding Viral Shedding <in hospital> <in hospital> into Patients' ??? ???

Viral Hepatitis Surveillance in Tennessee NASTAD Viral Hepatitis TA Meeting November 29, 2017

Quasispecies, virus evolution, and lethal mutagenesis on realistic fitness landscapes Peter

Populations in Reality Quasispecies truncated by integer particle numbers Peter Schuster

DeepAlgebra - an outline Przemyslaw Chojecki (Polish Academy of Sciences and

Promoting Economic Advancement Using Work Incentives to Build Financial Stability April 24,

PUC Gois Prof. Ricardo Resende Dias, MSc. 1 2 3 4 RAZES PARA ADOO DE PRTICAS

Reading group: Latent Optimized GANs (Game theory brings guns to GANs) Michal Sustr Dept. of

Proposed Tuition and Student Fees 2014 2015 Academic Year Board of Trustees December 5, 2013

On the Analysis of the Simple Genetic Algorithm Pietro Oliveto 1 Carsten Witt 2 1 School of

Q1 Fiscal 2019 Results May 7, 2019 Cautionary statements regarding forward-looking information

Rou ound nd 4 ( (FY FY 2018 & FY FY 2019) Noti tice In Invi viting ting Ap Applica

viaDBG : Inference of viral quasispecies with a paired de Bruijn - PowerPoint PPT Presentation

Introduction/Motivation Methods viaDBG Results Conclusion viaDBG : Inference of viral quasispecies with a paired de Bruijn graph Borja Freire 1 , Susana Ladra 1 , Jose Param 1 , and Leena Salmela 2 1 Universidade da Corua 2 University of

STAT 113 Independent vs. Paired Samples Colin Reimer Dawson Oberlin College November 16, 2017

National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention: NCHHSTP Division of Viral

NGS in clinical Italian practice: impact of minor quasispecies on antiretroviral drug resistance

Paired Programming &amp; Personality Traits Andrew J. Dick Red Hook Group

PAIRED READING What is it? Paired Reading is a way in which YOU can help your child to improve

Paired Reading 1 Why Paired Reading? Tried and tested evidence based Children feel

Paired t-test STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Disclosures I have nothing to disclose Paired Exchange: Compatible Pairs and Process Valerie

Strongly paired fermions Alexandros Gezerlis TALENT/INT Course on Nuclear forces and their

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with

Introductory Statistics Day 25 Paired Means Test 4.4 Paired Tests Find the data set

Viral Video Marketing Pecha Kucha Presentation Shane Hirschman, Creative Web, Winter 2008 Viral

Viral Shedding Viral Shedding &lt;in hospital&gt; &lt;in hospital&gt; into Patients' ??? ???

Viral Hepatitis Surveillance in Tennessee NASTAD Viral Hepatitis TA Meeting November 29, 2017

Quasispecies, virus evolution, and lethal mutagenesis on realistic fitness landscapes Peter

Populations in Reality Quasispecies truncated by integer particle numbers Peter Schuster

DeepAlgebra - an outline Przemyslaw Chojecki (Polish Academy of Sciences and

Promoting Economic Advancement Using Work Incentives to Build Financial Stability April 24,

PUC Gois Prof. Ricardo Resende Dias, MSc. 1 2 3 4 RAZES PARA ADOO DE PRTICAS

Reading group: Latent Optimized GANs (Game theory brings guns to GANs) Michal Sustr Dept. of

Proposed Tuition and Student Fees 2014 2015 Academic Year Board of Trustees December 5, 2013

On the Analysis of the Simple Genetic Algorithm Pietro Oliveto 1 Carsten Witt 2 1 School of

Q1 Fiscal 2019 Results May 7, 2019 Cautionary statements regarding forward-looking information

Rou ound nd 4 ( (FY FY 2018 &amp; FY FY 2019) Noti tice In Invi viting ting Ap Applica

Paired Programming & Personality Traits Andrew J. Dick Red Hook Group

Viral Shedding Viral Shedding <in hospital> <in hospital> into Patients' ??? ???

Rou ound nd 4 ( (FY FY 2018 & FY FY 2019) Noti tice In Invi viting ting Ap Applica