Enhanced de Bruijn Graphs Pierre MORISSE - - PowerPoint PPT Presentation

enhanced de bruijn graphs
SMART_READER_LITE
LIVE PREVIEW

Enhanced de Bruijn Graphs Pierre MORISSE - - PowerPoint PPT Presentation

Enhanced de Bruijn Graphs Pierre MORISSE pierre.morisse2@univ-rouen.fr Supervisors: Thierry LECROQ and Arnaud LEFEBVRE Laboratoire dInformatique, de Traitement de lInformation et des Syst` emes September 14, 2017 Introduction Classical


slide-1
SLIDE 1

Enhanced de Bruijn Graphs

Pierre MORISSE

pierre.morisse2@univ-rouen.fr

Supervisors: Thierry LECROQ and Arnaud LEFEBVRE

Laboratoire d’Informatique, de Traitement de l’Information et des Syst` emes September 14, 2017

slide-2
SLIDE 2

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Plan

1

Introduction

2

Classical graph structures

3

Enhanced de Bruijn graph

4

PgSA

5

HG-CoLoR

6

Conclusion

  • P. Morisse

Enhanced de Bruin Graphs 2/35

slide-3
SLIDE 3

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

1

Introduction

2

Classical graph structures

3

Enhanced de Bruijn graph

4

PgSA

5

HG-CoLoR

6

Conclusion

  • P. Morisse

Enhanced de Bruin Graphs 3/35

slide-4
SLIDE 4

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Next Generation Sequencing

NGS technologies allow to produce millions of short sequences (100-300 bases), called reads These reads contain sequencing errors (∼ 1%) Efficient algorithms and data structures are required to process these reads Main focus: error correction and assembly

  • P. Morisse

Enhanced de Bruin Graphs 4/35

slide-5
SLIDE 5

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Next Generation Sequencing

NGS technologies allow to produce millions of short sequences (100-300 bases), called reads These reads contain sequencing errors (∼ 1%) Efficient algorithms and data structures are required to process these reads Main focus: error correction and assembly

  • P. Morisse

Enhanced de Bruin Graphs 4/35

slide-6
SLIDE 6

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Next Generation Sequencing

NGS technologies allow to produce millions of short sequences (100-300 bases), called reads These reads contain sequencing errors (∼ 1%) Efficient algorithms and data structures are required to process these reads Main focus: error correction and assembly

  • P. Morisse

Enhanced de Bruin Graphs 4/35

slide-7
SLIDE 7

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Next Generation Sequencing

NGS technologies allow to produce millions of short sequences (100-300 bases), called reads These reads contain sequencing errors (∼ 1%) Efficient algorithms and data structures are required to process these reads Main focus: error correction and assembly

  • P. Morisse

Enhanced de Bruin Graphs 4/35

slide-8
SLIDE 8

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Next Generation Sequencing

Recently, Third Generation Sequencing technologies started to develop Two main technologies: Pacific Biosciences and Oxford Nanopore Allow the sequencing of longer reads (several thousand of bases) Very useful to resolve assembly problems for large and complex genomes Much higher error rate, around 15% for Pacific Biosciences and up to 30% for Oxford Nanopore

  • P. Morisse

Enhanced de Bruin Graphs 5/35

slide-9
SLIDE 9

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Next Generation Sequencing

Recently, Third Generation Sequencing technologies started to develop Two main technologies: Pacific Biosciences and Oxford Nanopore Allow the sequencing of longer reads (several thousand of bases) Very useful to resolve assembly problems for large and complex genomes Much higher error rate, around 15% for Pacific Biosciences and up to 30% for Oxford Nanopore

  • P. Morisse

Enhanced de Bruin Graphs 5/35

slide-10
SLIDE 10

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Next Generation Sequencing

Recently, Third Generation Sequencing technologies started to develop Two main technologies: Pacific Biosciences and Oxford Nanopore Allow the sequencing of longer reads (several thousand of bases) Very useful to resolve assembly problems for large and complex genomes Much higher error rate, around 15% for Pacific Biosciences and up to 30% for Oxford Nanopore

  • P. Morisse

Enhanced de Bruin Graphs 5/35

slide-11
SLIDE 11

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Next Generation Sequencing

Recently, Third Generation Sequencing technologies started to develop Two main technologies: Pacific Biosciences and Oxford Nanopore Allow the sequencing of longer reads (several thousand of bases) Very useful to resolve assembly problems for large and complex genomes Much higher error rate, around 15% for Pacific Biosciences and up to 30% for Oxford Nanopore

  • P. Morisse

Enhanced de Bruin Graphs 5/35

slide-12
SLIDE 12

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Next Generation Sequencing

Recently, Third Generation Sequencing technologies started to develop Two main technologies: Pacific Biosciences and Oxford Nanopore Allow the sequencing of longer reads (several thousand of bases) Very useful to resolve assembly problems for large and complex genomes Much higher error rate, around 15% for Pacific Biosciences and up to 30% for Oxford Nanopore

  • P. Morisse

Enhanced de Bruin Graphs 5/35

slide-13
SLIDE 13

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

1

Introduction

2

Classical graph structures

3

Enhanced de Bruijn graph

4

PgSA

5

HG-CoLoR

6

Conclusion

  • P. Morisse

Enhanced de Bruin Graphs 6/35

slide-14
SLIDE 14

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Overlap graph

An overlap graph is a graph structure that allows to compute overlaps

  • f variable length between the reads of a given set.

Formal definition For a set of reads R = {r1,r2,...,rn},OG(R) = (V,E) such as: V : {ri;i = 1,...,n} E : {(s,l,d);s,d ∈ V and suffl(s) = prefl(d)}

  • P. Morisse

Enhanced de Bruin Graphs 7/35

slide-15
SLIDE 15

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Overlap graph

An overlap graph is a graph structure that allows to compute overlaps

  • f variable length between the reads of a given set.

Formal definition For a set of reads R = {r1,r2,...,rn},OG(R) = (V,E) such as: V : {ri;i = 1,...,n} E : {(s,l,d);s,d ∈ V and suffl(s) = prefl(d)}

  • P. Morisse

Enhanced de Bruin Graphs 7/35

slide-16
SLIDE 16

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Overlap graph

An overlap graph is a graph structure that allows to compute overlaps

  • f variable length between the reads of a given set.

Formal definition For a set of reads R = {r1,r2,...,rn},OG(R) = (V,E) such as: V : {ri;i = 1,...,n} E : {(s,l,d);s,d ∈ V and suffl(s) = prefl(d)}

  • P. Morisse

Enhanced de Bruin Graphs 7/35

slide-17
SLIDE 17

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Overlap graph

An overlap graph is a graph structure that allows to compute overlaps

  • f variable length between the reads of a given set.

Formal definition For a set of reads R = {r1,r2,...,rn},OG(R) = (V,E) such as: V : {ri;i = 1,...,n} E : {(s,l,d);s,d ∈ V and suffl(s) = prefl(d)}

  • P. Morisse

Enhanced de Bruin Graphs 7/35

slide-18
SLIDE 18

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Overlap graph

Example With the set of reads S = {AGCTTACA, CTTACGTA, GTATACTG}, we

  • btain the following overlap graph:

AGCTTACA GTATACTG CTTACGTA 1 1 3 1

Drawback Faces difficulties with sequencing errors.

  • P. Morisse

Enhanced de Bruin Graphs 8/35

slide-19
SLIDE 19

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Overlap graph

Example With the set of reads S = {AGCTTACA, CTTACGTA, GTATACTG}, we

  • btain the following overlap graph:

AGCTTACA GTATACTG CTTACGTA 1 1 3 1

Drawback Faces difficulties with sequencing errors.

  • P. Morisse

Enhanced de Bruin Graphs 8/35

slide-20
SLIDE 20

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

de Bruijn graph

A de Bruijn graph of order k is a graph structure that allows to compute overlaps of constant length k − 1 between the k-mers of the reads of a given set. Formal definition For a set of reads R = {r1,r2,...,rn}, DBGk(R) = (V,E) such as: V : {w;|w| = k and ∃i;w ∈ Fact(ri)} E : {(s,d);s,d ∈ V and suffk−1(s) = prefk−1(d)}

  • P. Morisse

Enhanced de Bruin Graphs 9/35

slide-21
SLIDE 21

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

de Bruijn graph

A de Bruijn graph of order k is a graph structure that allows to compute overlaps of constant length k − 1 between the k-mers of the reads of a given set. Formal definition For a set of reads R = {r1,r2,...,rn}, DBGk(R) = (V,E) such as: V : {w;|w| = k and ∃i;w ∈ Fact(ri)} E : {(s,d);s,d ∈ V and suffk−1(s) = prefk−1(d)}

  • P. Morisse

Enhanced de Bruin Graphs 9/35

slide-22
SLIDE 22

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

de Bruijn graph

A de Bruijn graph of order k is a graph structure that allows to compute overlaps of constant length k − 1 between the k-mers of the reads of a given set. Formal definition For a set of reads R = {r1,r2,...,rn}, DBGk(R) = (V,E) such as: V : {w;|w| = k and ∃i;w ∈ Fact(ri)} E : {(s,d);s,d ∈ V and suffk−1(s) = prefk−1(d)}

  • P. Morisse

Enhanced de Bruin Graphs 9/35

slide-23
SLIDE 23

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

de Bruijn graph

A de Bruijn graph of order k is a graph structure that allows to compute overlaps of constant length k − 1 between the k-mers of the reads of a given set. Formal definition For a set of reads R = {r1,r2,...,rn}, DBGk(R) = (V,E) such as: V : {w;|w| = k and ∃i;w ∈ Fact(ri)} E : {(s,d);s,d ∈ V and suffk−1(s) = prefk−1(d)}

  • P. Morisse

Enhanced de Bruin Graphs 9/35

slide-24
SLIDE 24

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

de Bruijn graph

Example With the set of reads S = {AGCTTACA, CTTACGTA, GTATACTG}, we

  • btain the following de Bruijn graph of order 6:

GCTTAC AGCTTA CTTACA CTTACG TTACGT TACGTA GTATAC TATACT ATACTG

Drawback Faces difficulties with locally insufficient coverage.

  • P. Morisse

Enhanced de Bruin Graphs 10/35

slide-25
SLIDE 25

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

de Bruijn graph

Example With the set of reads S = {AGCTTACA, CTTACGTA, GTATACTG}, we

  • btain the following de Bruijn graph of order 6:

GCTTAC AGCTTA CTTACA CTTACG TTACGT TACGTA GTATAC TATACT ATACTG

Drawback Faces difficulties with locally insufficient coverage.

  • P. Morisse

Enhanced de Bruin Graphs 10/35

slide-26
SLIDE 26

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

1

Introduction

2

Classical graph structures

3

Enhanced de Bruijn graph

4

PgSA

5

HG-CoLoR

6

Conclusion

  • P. Morisse

Enhanced de Bruin Graphs 11/35

slide-27
SLIDE 27

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Multiple de Bruijn graphs

Usually, multiple de Bruijn graphs of different orders are built Requires a different graph for each order Consumes large amounts of memory

  • P. Morisse

Enhanced de Bruin Graphs 12/35

slide-28
SLIDE 28

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Multiple de Bruijn graphs

Usually, multiple de Bruijn graphs of different orders are built Requires a different graph for each order Consumes large amounts of memory

  • P. Morisse

Enhanced de Bruin Graphs 12/35

slide-29
SLIDE 29

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Multiple de Bruijn graphs

Usually, multiple de Bruijn graphs of different orders are built Requires a different graph for each order Consumes large amounts of memory

  • P. Morisse

Enhanced de Bruin Graphs 12/35

slide-30
SLIDE 30

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Enhanced de Bruijn graph

Idea Enhance the de Bruijn graph with the capability of computing overlaps

  • f variable lengths between the k-mers, in an overlap graph fashion, in
  • rder to avoid building multiple de Bruijn graphs of different orders.

Formal definition For a set of reads R = {r1,r2,...,rn},eDBGk,m(R) = (V,E) such as: V : {w;|w| = k and ∃i;w ∈ Fact(ri)} E : {(s,l,d);s,d ∈ V;m ≤ l ≤ k − 1 and suffl(s) = prefl(d)}

  • P. Morisse

Enhanced de Bruin Graphs 13/35

slide-31
SLIDE 31

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Enhanced de Bruijn graph

Idea Enhance the de Bruijn graph with the capability of computing overlaps

  • f variable lengths between the k-mers, in an overlap graph fashion, in
  • rder to avoid building multiple de Bruijn graphs of different orders.

Formal definition For a set of reads R = {r1,r2,...,rn},eDBGk,m(R) = (V,E) such as: V : {w;|w| = k and ∃i;w ∈ Fact(ri)} E : {(s,l,d);s,d ∈ V;m ≤ l ≤ k − 1 and suffl(s) = prefl(d)}

  • P. Morisse

Enhanced de Bruin Graphs 13/35

slide-32
SLIDE 32

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Enhanced de Bruijn graph

Idea Enhance the de Bruijn graph with the capability of computing overlaps

  • f variable lengths between the k-mers, in an overlap graph fashion, in
  • rder to avoid building multiple de Bruijn graphs of different orders.

Formal definition For a set of reads R = {r1,r2,...,rn},eDBGk,m(R) = (V,E) such as: V : {w;|w| = k and ∃i;w ∈ Fact(ri)} E : {(s,l,d);s,d ∈ V;m ≤ l ≤ k − 1 and suffl(s) = prefl(d)}

  • P. Morisse

Enhanced de Bruin Graphs 13/35

slide-33
SLIDE 33

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Enhanced de Bruijn graph

Idea Enhance the de Bruijn graph with the capability of computing overlaps

  • f variable lengths between the k-mers, in an overlap graph fashion, in
  • rder to avoid building multiple de Bruijn graphs of different orders.

Formal definition For a set of reads R = {r1,r2,...,rn},eDBGk,m(R) = (V,E) such as: V : {w;|w| = k and ∃i;w ∈ Fact(ri)} E : {(s,l,d);s,d ∈ V;m ≤ l ≤ k − 1 and suffl(s) = prefl(d)}

  • P. Morisse

Enhanced de Bruin Graphs 13/35

slide-34
SLIDE 34

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Enhanced de Bruijn graph

Example With the set of reads S = {AGCTTACA, CTTACGTA, GTATACTG}, we

  • btain the following enhanced de Bruijn graph of order 6,3:

GCTTAC AGCTTA CTTACA CTTACG TTACGT TACGTA GTATAC TATACT ATACTG 5 5 5 5 5 5 5 4 4 4 4 4 3 3 3 3

  • P. Morisse

Enhanced de Bruin Graphs 14/35

slide-35
SLIDE 35

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Enhanced de Bruijn graph

Example With the set of reads S = {AGCTTACA, CTTACGTA, GTATACTG}, we

  • btain the following enhanced de Bruijn graph of order 6,3:

GCTTAC AGCTTA CTTACA CTTACG TTACGT TACGTA GTATAC TATACT ATACTG 5 5 5 5 5 5 5 4 4 4 4 4 3 3 3 3

  • P. Morisse

Enhanced de Bruin Graphs 14/35

slide-36
SLIDE 36

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Enhanced de Bruijn graph

Example With the set of reads S = {AGCTTACA, CTTACGTA, GTATACTG}, we

  • btain the following enhanced de Bruijn graph of order 6,3:

GCTTAC AGCTTA CTTACA CTTACG TTACGT TACGTA GTATAC TATACT ATACTG 5 5 5 5 5 5 5 4 4 4 4 4 3 3 3 3

  • P. Morisse

Enhanced de Bruin Graphs 14/35

slide-37
SLIDE 37

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Construction

The enhanced de Bruijn graph does not need to be explicitly built It can be traversed with the help of an index structure:

All the k-mers from the reads are stored in the index The index is queried to retrieve the edges

Makes backwards traversal easy

  • P. Morisse

Enhanced de Bruin Graphs 15/35

slide-38
SLIDE 38

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Construction

The enhanced de Bruijn graph does not need to be explicitly built It can be traversed with the help of an index structure:

All the k-mers from the reads are stored in the index The index is queried to retrieve the edges

Makes backwards traversal easy

  • P. Morisse

Enhanced de Bruin Graphs 15/35

slide-39
SLIDE 39

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Construction

The enhanced de Bruijn graph does not need to be explicitly built It can be traversed with the help of an index structure:

All the k-mers from the reads are stored in the index The index is queried to retrieve the edges

Makes backwards traversal easy

  • P. Morisse

Enhanced de Bruin Graphs 15/35

slide-40
SLIDE 40

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Construction

The enhanced de Bruijn graph does not need to be explicitly built It can be traversed with the help of an index structure:

All the k-mers from the reads are stored in the index The index is queried to retrieve the edges

Makes backwards traversal easy

  • P. Morisse

Enhanced de Bruin Graphs 15/35

slide-41
SLIDE 41

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Construction

The enhanced de Bruijn graph does not need to be explicitly built It can be traversed with the help of an index structure:

All the k-mers from the reads are stored in the index The index is queried to retrieve the edges

Makes backwards traversal easy

  • P. Morisse

Enhanced de Bruin Graphs 15/35

slide-42
SLIDE 42

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

1

Introduction

2

Classical graph structures

3

Enhanced de Bruijn graph

4

PgSA

5

HG-CoLoR

6

Conclusion

  • P. Morisse

Enhanced de Bruin Graphs 16/35

slide-43
SLIDE 43

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Definition

PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f:

1

In which reads does f occur?

2

In how many reads does f occur?

3

What are the occurrences positions of f?

4

What is the number of occurrences of f?

5

In which reads does f occur only once?

6

In how many reads does f occur only once?

7

What are the occurrences positions of f in the reads where it

  • ccurs only once?
  • P. Morisse

Enhanced de Bruin Graphs 17/35

slide-44
SLIDE 44

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Definition

PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f:

1

In which reads does f occur?

2

In how many reads does f occur?

3

What are the occurrences positions of f?

4

What is the number of occurrences of f?

5

In which reads does f occur only once?

6

In how many reads does f occur only once?

7

What are the occurrences positions of f in the reads where it

  • ccurs only once?
  • P. Morisse

Enhanced de Bruin Graphs 17/35

slide-45
SLIDE 45

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Definition

PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f:

1

In which reads does f occur?

2

In how many reads does f occur?

3

What are the occurrences positions of f?

4

What is the number of occurrences of f?

5

In which reads does f occur only once?

6

In how many reads does f occur only once?

7

What are the occurrences positions of f in the reads where it

  • ccurs only once?
  • P. Morisse

Enhanced de Bruin Graphs 17/35

slide-46
SLIDE 46

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Definition

PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f:

1

In which reads does f occur?

2

In how many reads does f occur?

3

What are the occurrences positions of f?

4

What is the number of occurrences of f?

5

In which reads does f occur only once?

6

In how many reads does f occur only once?

7

What are the occurrences positions of f in the reads where it

  • ccurs only once?
  • P. Morisse

Enhanced de Bruin Graphs 17/35

slide-47
SLIDE 47

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Definition

PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f:

1

In which reads does f occur?

2

In how many reads does f occur?

3

What are the occurrences positions of f?

4

What is the number of occurrences of f?

5

In which reads does f occur only once?

6

In how many reads does f occur only once?

7

What are the occurrences positions of f in the reads where it

  • ccurs only once?
  • P. Morisse

Enhanced de Bruin Graphs 17/35

slide-48
SLIDE 48

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Definition

PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f:

1

In which reads does f occur?

2

In how many reads does f occur?

3

What are the occurrences positions of f?

4

What is the number of occurrences of f?

5

In which reads does f occur only once?

6

In how many reads does f occur only once?

7

What are the occurrences positions of f in the reads where it

  • ccurs only once?
  • P. Morisse

Enhanced de Bruin Graphs 17/35

slide-49
SLIDE 49

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Definition

PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f:

1

In which reads does f occur?

2

In how many reads does f occur?

3

What are the occurrences positions of f?

4

What is the number of occurrences of f?

5

In which reads does f occur only once?

6

In how many reads does f occur only once?

7

What are the occurrences positions of f in the reads where it

  • ccurs only once?
  • P. Morisse

Enhanced de Bruin Graphs 17/35

slide-50
SLIDE 50

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Definition

PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f:

1

In which reads does f occur?

2

In how many reads does f occur?

3

What are the occurrences positions of f?

4

What is the number of occurrences of f?

5

In which reads does f occur only once?

6

In how many reads does f occur only once?

7

What are the occurrences positions of f in the reads where it

  • ccurs only once?
  • P. Morisse

Enhanced de Bruin Graphs 17/35

slide-51
SLIDE 51

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Index construction

Concatenation of the reads, with respect to their overlaps

Ex: ACGT + GTGG ⇒ ACGTGG

Construction of the sparse suffix array of the obtained pseudogenome Construction of an auxiliary array Queries are handled by a binary search over the suffix array, and with the help of the auxiliary array

  • P. Morisse

Enhanced de Bruin Graphs 18/35

slide-52
SLIDE 52

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Index construction

Concatenation of the reads, with respect to their overlaps

Ex: ACGT + GTGG ⇒ ACGTGG

Construction of the sparse suffix array of the obtained pseudogenome Construction of an auxiliary array Queries are handled by a binary search over the suffix array, and with the help of the auxiliary array

  • P. Morisse

Enhanced de Bruin Graphs 18/35

slide-53
SLIDE 53

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Index construction

Concatenation of the reads, with respect to their overlaps

Ex: ACGT + GTGG ⇒ ACGTGG

Construction of the sparse suffix array of the obtained pseudogenome Construction of an auxiliary array Queries are handled by a binary search over the suffix array, and with the help of the auxiliary array

  • P. Morisse

Enhanced de Bruin Graphs 18/35

slide-54
SLIDE 54

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Index construction

Concatenation of the reads, with respect to their overlaps

Ex: ACGT + GTGG ⇒ ACGTGG

Construction of the sparse suffix array of the obtained pseudogenome Construction of an auxiliary array Queries are handled by a binary search over the suffix array, and with the help of the auxiliary array

  • P. Morisse

Enhanced de Bruin Graphs 18/35

slide-55
SLIDE 55

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Index construction

Concatenation of the reads, with respect to their overlaps

Ex: ACGT + GTGG ⇒ ACGTGG

Construction of the sparse suffix array of the obtained pseudogenome Construction of an auxiliary array Queries are handled by a binary search over the suffix array, and with the help of the auxiliary array

  • P. Morisse

Enhanced de Bruin Graphs 18/35

slide-56
SLIDE 56

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Index construction

Concatenation of the reads, with respect to their overlaps

Ex: ACGT + GTGG ⇒ ACGTGG

Construction of the sparse suffix array of the obtained pseudogenome Construction of an auxiliary array Queries are handled by a binary search over the suffix array, and with the help of the auxiliary array

  • P. Morisse

Enhanced de Bruin Graphs 18/35

slide-57
SLIDE 57

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Extract the k-mers of the reads Build the index of the k-mers Query the index, looping over the third query (what are the

  • ccurrences positions of f?), to retrieve the edges
  • P. Morisse

Enhanced de Bruin Graphs 19/35

slide-58
SLIDE 58

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Extract the k-mers of the reads Build the index of the k-mers Query the index, looping over the third query (what are the

  • ccurrences positions of f?), to retrieve the edges
  • P. Morisse

Enhanced de Bruin Graphs 19/35

slide-59
SLIDE 59

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Extract the k-mers of the reads Build the index of the k-mers Query the index, looping over the third query (what are the

  • ccurrences positions of f?), to retrieve the edges
  • P. Morisse

Enhanced de Bruin Graphs 19/35

slide-60
SLIDE 60

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example Traversing the previous enhanced de Bruijn graph:

GCTTAC AGCTTA CTTACA CTTACG TTACGT TACGTA GTATAC TATACT ATACTG 5 5 4 3 5 5 5 5 4 4 3 3 5 4 4 3

  • P. Morisse

Enhanced de Bruin Graphs 20/35

slide-61
SLIDE 61

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

AGCTTA GCTTAC CTTACA CTTACG TTACGT

5 4 4 3

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-62
SLIDE 62

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

AGCTTA GCTTAC CTTACA CTTACG TTACGT

5 4 4 3

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-63
SLIDE 63

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index Occurrences positions?

AGCTTA GCTTAC CTTACA CTTACG TTACGT

5 4 4 3

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-64
SLIDE 64

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index Occurrences positions?

{(1,1) (5,0)}

AGCTTA GCTTAC CTTACA CTTACG TTACGT

5 4 4 3

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-65
SLIDE 65

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,1) (5,0)}

AGCTTA GCTTAC CTTACA CTTACG TTACGT

5 4 4 3

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-66
SLIDE 66

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,1) (5,0)}

AGCTTA GCTTAC CTTACA CTTACG TTACGT

4 4 3 5

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-67
SLIDE 67

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index Occurrences positions?

AGCTTA GCTTAC CTTACA CTTACG TTACGT

4 4 3 5

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-68
SLIDE 68

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index Occurrences positions?

{(1,2) ; (3,0) ; (4,0) ; (5,1) }

AGCTTA GCTTAC CTTACA CTTACG TTACGT

4 4 3 5

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-69
SLIDE 69

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,2) ; (3,0) ; (4,0) ; (5,1) }

AGCTTA GCTTAC CTTACA CTTACG TTACGT

4 4 3 5

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-70
SLIDE 70

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,2) ; (3,0) ; (4,0) ; (5,1) }

AGCTTA GCTTAC CTTACA CTTACG TTACGT

4 3 5 4

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-71
SLIDE 71

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,2) ; (3,0) ; (4,0) ; (5,1) }

AGCTTA GCTTAC CTTACA CTTACG TTACGT

3 5 4 4

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-72
SLIDE 72

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,2) ; (3,0) ; (4,0) ; (5,1) }

AGCTTA GCTTAC CTTACA CTTACG TTACGT

3 5 4 4

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-73
SLIDE 73

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index Occurrences positions?

AGCTTA GCTTAC CTTACA CTTACG TTACGT

3 5 4 4

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-74
SLIDE 74

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index Occurrences positions?

{(1,3) ; (3,1) ; (4,1) ;

(5,2) ; (9,0)}

AGCTTA GCTTAC CTTACA CTTACG TTACGT

3 5 4 4

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-75
SLIDE 75

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,3) ; (3,1) ; (4,1) ;

(5,2) ; (9,0)}

AGCTTA GCTTAC CTTACA CTTACG TTACGT

3 5 4 4

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-76
SLIDE 76

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,3) ; (3,1) ; (4,1) ;

(5,2) ; (9,0)}

AGCTTA GCTTAC CTTACA CTTACG TTACGT

3 5 4 4

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-77
SLIDE 77

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,3) ; (3,1) ; (4,1) ;

(5,2) ; (9,0)}

AGCTTA GCTTAC CTTACA CTTACG TTACGT

3 5 4 4

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-78
SLIDE 78

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,3) ; (3,1) ; (4,1) ;

(5,2) ; (9,0)}

AGCTTA GCTTAC CTTACA CTTACG TTACGT

3 5 4 4

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-79
SLIDE 79

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Traversal of the enhanced de Bruijn graph

Example k-mers set 1: AGCTTA 2: ATACTG 3: CTTACA 4: CTTACG 5: GCTTAC 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,3) ; (3,1) ; (4,1) ;

(5,2) ; (9,0)}

AGCTTA GCTTAC CTTACA CTTACG TTACGT

5 4 4 3

  • P. Morisse

Enhanced de Bruin Graphs 21/35

slide-80
SLIDE 80

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

1

Introduction

2

Classical graph structures

3

Enhanced de Bruijn graph

4

PgSA

5

HG-CoLoR

6

Conclusion

  • P. Morisse

Enhanced de Bruin Graphs 22/35

slide-81
SLIDE 81

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Context

Due to their high error rate, error correction of long reads is mandatory Various methods already exist for the correction of short reads, but are not applicable to long reads Forces the development of new error correction methods Two main categories: self-correction and hybrid correction

  • P. Morisse

Enhanced de Bruin Graphs 23/35

slide-82
SLIDE 82

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Context

Due to their high error rate, error correction of long reads is mandatory Various methods already exist for the correction of short reads, but are not applicable to long reads Forces the development of new error correction methods Two main categories: self-correction and hybrid correction

  • P. Morisse

Enhanced de Bruin Graphs 23/35

slide-83
SLIDE 83

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Context

Due to their high error rate, error correction of long reads is mandatory Various methods already exist for the correction of short reads, but are not applicable to long reads Forces the development of new error correction methods Two main categories: self-correction and hybrid correction

  • P. Morisse

Enhanced de Bruin Graphs 23/35

slide-84
SLIDE 84

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Context

Due to their high error rate, error correction of long reads is mandatory Various methods already exist for the correction of short reads, but are not applicable to long reads Forces the development of new error correction methods Two main categories: self-correction and hybrid correction

  • P. Morisse

Enhanced de Bruin Graphs 23/35

slide-85
SLIDE 85

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Workflow

5 steps:

1

Correct the short reads

2

Align the short reads on the long reads, to find seeds

3

Merge the overlapping seeds

4

Link the seeds, by traversing the enhanced de Bruijn graph

5

Extend the obtained corrected long read, on the left (resp. right)

  • f the leftmost (resp. rightmost) seed
  • P. Morisse

Enhanced de Bruin Graphs 24/35

slide-86
SLIDE 86

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Workflow

5 steps:

1

Correct the short reads

2

Align the short reads on the long reads, to find seeds

3

Merge the overlapping seeds

4

Link the seeds, by traversing the enhanced de Bruijn graph

5

Extend the obtained corrected long read, on the left (resp. right)

  • f the leftmost (resp. rightmost) seed
  • P. Morisse

Enhanced de Bruin Graphs 24/35

slide-87
SLIDE 87

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Workflow

5 steps:

1

Correct the short reads

2

Align the short reads on the long reads, to find seeds

3

Merge the overlapping seeds

4

Link the seeds, by traversing the enhanced de Bruijn graph

5

Extend the obtained corrected long read, on the left (resp. right)

  • f the leftmost (resp. rightmost) seed
  • P. Morisse

Enhanced de Bruin Graphs 24/35

slide-88
SLIDE 88

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Workflow

5 steps:

1

Correct the short reads

2

Align the short reads on the long reads, to find seeds

3

Merge the overlapping seeds

4

Link the seeds, by traversing the enhanced de Bruijn graph

5

Extend the obtained corrected long read, on the left (resp. right)

  • f the leftmost (resp. rightmost) seed
  • P. Morisse

Enhanced de Bruin Graphs 24/35

slide-89
SLIDE 89

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Workflow

5 steps:

1

Correct the short reads

2

Align the short reads on the long reads, to find seeds

3

Merge the overlapping seeds

4

Link the seeds, by traversing the enhanced de Bruijn graph

5

Extend the obtained corrected long read, on the left (resp. right)

  • f the leftmost (resp. rightmost) seed
  • P. Morisse

Enhanced de Bruin Graphs 24/35

slide-90
SLIDE 90

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

Seeds are used as anchor points on the enhanced de Bruijn graph The graph is traversed to link together the seeds and assemble the k-mers

  • P. Morisse

Enhanced de Bruin Graphs 25/35

slide-91
SLIDE 91

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

Seeds are used as anchor points on the enhanced de Bruijn graph The graph is traversed to link together the seeds and assemble the k-mers

  • P. Morisse

Enhanced de Bruin Graphs 25/35

slide-92
SLIDE 92

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

seed1 seed2 seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-93
SLIDE 93

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-94
SLIDE 94

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-95
SLIDE 95

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-96
SLIDE 96

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-97
SLIDE 97

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-98
SLIDE 98

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-99
SLIDE 99

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-100
SLIDE 100

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-101
SLIDE 101

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-102
SLIDE 102

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-103
SLIDE 103

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-104
SLIDE 104

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-105
SLIDE 105

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-106
SLIDE 106

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-107
SLIDE 107

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . . . . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-108
SLIDE 108

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . . . . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-109
SLIDE 109

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-110
SLIDE 110

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src src

. . . . . .

dst dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-111
SLIDE 111

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

linked seeds seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-112
SLIDE 112

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

src dst

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-113
SLIDE 113

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 4: Seeds linking

long read

corrected long read

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2

  • P. Morisse

Enhanced de Bruin Graphs 26/35

slide-114
SLIDE 114

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 5: Tips extension

Seeds don’t always map right at the beginning or until the end of the long read Once all the seeds have been linked, HG-CoLoR keeps on traversing the graph The traversal stops when the borders of the long read or a branching path are reached

  • P. Morisse

Enhanced de Bruin Graphs 27/35

slide-115
SLIDE 115

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 5: Tips extension

Seeds don’t always map right at the beginning or until the end of the long read Once all the seeds have been linked, HG-CoLoR keeps on traversing the graph The traversal stops when the borders of the long read or a branching path are reached

  • P. Morisse

Enhanced de Bruin Graphs 27/35

slide-116
SLIDE 116

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Step 5: Tips extension

Seeds don’t always map right at the beginning or until the end of the long read Once all the seeds have been linked, HG-CoLoR keeps on traversing the graph The traversal stops when the borders of the long read or a branching path are reached

  • P. Morisse

Enhanced de Bruin Graphs 27/35

slide-117
SLIDE 117

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Remark

Some seeds might be impossible to link together

⇒ Production of a corrected long read fragmented in multiple

parts

  • P. Morisse

Enhanced de Bruin Graphs 28/35

slide-118
SLIDE 118

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Remark

Some seeds might be impossible to link together

⇒ Production of a corrected long read fragmented in multiple

parts

  • P. Morisse

Enhanced de Bruin Graphs 28/35

slide-119
SLIDE 119

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Datasets

We replaced the enhanced de Bruijn graph in the HG-CoLoR implementation with an overlap graph and with a classical de Bruijn graph, in order to compare the obtained results. Experiments were run on the following datasets

Dataset Reference genome Oxford Nanopore data Illumina data Name Genome size # Reads Average length Coverage # Reads Read length Coverage

  • E. coli
  • E. coli

4.6 Mbp 22,270 5,999 28x 465,000 300 30x Yeast

  • S. cerevisae

12.4 Mbp 118,763 5,512 34x 2,500,000 250 50x

  • P. Morisse

Enhanced de Bruin Graphs 29/35

slide-120
SLIDE 120

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Alignment-based comparison

Dataset Graph # Reads # Fragmented reads Average length Average identity Runtime

  • E. coli

Raw reads 22,270 N/A 5,999 79.46% N/A Overlap graph 19,592 1,319 5,979 99.91% 40min de Bruijn graph (k = 100) 21,782 132 6,144 99.75% 1h53 Enhanced de Bruijn graph (k = 100,m = 50) 21,786 40 6,174 99.72% 1h46 Yeast Raw reads 118,763 N/A 5,512 68.63% N/A Overlap graph 60,649 14,095 4,694 99.42% 6h10 de Bruijn graph (k = 100) 69,610 11,763 6,060 98.61% 18h20 Enhanced de Bruijn graph (k = 100,m = 50) 69,784 11,567 6,078 99.03% 17h58

  • P. Morisse

Enhanced de Bruin Graphs 30/35

slide-121
SLIDE 121

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Assembly-based comparison

Dataset Graph # Expected contigs # Obtained contigs

  • E. coli

Overlap graph 1 20 de Bruijn graph (k = 100) 1 4 Enhanced de Bruijn graph (k = 100,m = 50) 1 1 Yeast Overlap graph 16 197 de Bruijn graph (k = 100) 16 124 Enhanced de Bruijn graph (k = 100,m = 50) 16 103

  • P. Morisse

Enhanced de Bruin Graphs 31/35

slide-122
SLIDE 122

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

1

Introduction

2

Classical graph structures

3

Enhanced de Bruijn graph

4

PgSA

5

HG-CoLoR

6

Conclusion

  • P. Morisse

Enhanced de Bruin Graphs 32/35

slide-123
SLIDE 123

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Conclusion

We showed that multiple de Bruijn graphs of different orders can be combined into a single enhanced de Bruijn graph We showed how to traverse an enhanced de Bruijn graph without explicitly building it We introduced a new long read hybrid error correction method relying on an enhanced de Bruijn graph We proved the usefulness of enhanced de Bruijn graphs by comparing them with overlap graphs and classical de Bruijn graphs on the HG-CoLoR implementation HG-CoLoR is available from:

https://github.com/pierre-morisse/HG-CoLoR

  • P. Morisse

Enhanced de Bruin Graphs 33/35

slide-124
SLIDE 124

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Conclusion

We showed that multiple de Bruijn graphs of different orders can be combined into a single enhanced de Bruijn graph We showed how to traverse an enhanced de Bruijn graph without explicitly building it We introduced a new long read hybrid error correction method relying on an enhanced de Bruijn graph We proved the usefulness of enhanced de Bruijn graphs by comparing them with overlap graphs and classical de Bruijn graphs on the HG-CoLoR implementation HG-CoLoR is available from:

https://github.com/pierre-morisse/HG-CoLoR

  • P. Morisse

Enhanced de Bruin Graphs 33/35

slide-125
SLIDE 125

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Conclusion

We showed that multiple de Bruijn graphs of different orders can be combined into a single enhanced de Bruijn graph We showed how to traverse an enhanced de Bruijn graph without explicitly building it We introduced a new long read hybrid error correction method relying on an enhanced de Bruijn graph We proved the usefulness of enhanced de Bruijn graphs by comparing them with overlap graphs and classical de Bruijn graphs on the HG-CoLoR implementation HG-CoLoR is available from:

https://github.com/pierre-morisse/HG-CoLoR

  • P. Morisse

Enhanced de Bruin Graphs 33/35

slide-126
SLIDE 126

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Conclusion

We showed that multiple de Bruijn graphs of different orders can be combined into a single enhanced de Bruijn graph We showed how to traverse an enhanced de Bruijn graph without explicitly building it We introduced a new long read hybrid error correction method relying on an enhanced de Bruijn graph We proved the usefulness of enhanced de Bruijn graphs by comparing them with overlap graphs and classical de Bruijn graphs on the HG-CoLoR implementation HG-CoLoR is available from:

https://github.com/pierre-morisse/HG-CoLoR

  • P. Morisse

Enhanced de Bruin Graphs 33/35

slide-127
SLIDE 127

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Conclusion

We showed that multiple de Bruijn graphs of different orders can be combined into a single enhanced de Bruijn graph We showed how to traverse an enhanced de Bruijn graph without explicitly building it We introduced a new long read hybrid error correction method relying on an enhanced de Bruijn graph We proved the usefulness of enhanced de Bruijn graphs by comparing them with overlap graphs and classical de Bruijn graphs on the HG-CoLoR implementation HG-CoLoR is available from:

https://github.com/pierre-morisse/HG-CoLoR

  • P. Morisse

Enhanced de Bruin Graphs 33/35

slide-128
SLIDE 128

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Future work

Use a greedy selection at branching paths Run HG-CoLoR on larger genomes Build a proper assembly tool relying on enhanced de Bruijn graphs Compare it with already existing assemblers using multiple de Bruijn graphs of different orders

  • P. Morisse

Enhanced de Bruin Graphs 34/35

slide-129
SLIDE 129

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Future work

Use a greedy selection at branching paths Run HG-CoLoR on larger genomes Build a proper assembly tool relying on enhanced de Bruijn graphs Compare it with already existing assemblers using multiple de Bruijn graphs of different orders

  • P. Morisse

Enhanced de Bruin Graphs 34/35

slide-130
SLIDE 130

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Future work

Use a greedy selection at branching paths Run HG-CoLoR on larger genomes Build a proper assembly tool relying on enhanced de Bruijn graphs Compare it with already existing assemblers using multiple de Bruijn graphs of different orders

  • P. Morisse

Enhanced de Bruin Graphs 34/35

slide-131
SLIDE 131

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

Future work

Use a greedy selection at branching paths Run HG-CoLoR on larger genomes Build a proper assembly tool relying on enhanced de Bruijn graphs Compare it with already existing assemblers using multiple de Bruijn graphs of different orders

  • P. Morisse

Enhanced de Bruin Graphs 34/35

slide-132
SLIDE 132

Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion

References I

Kowalski, T., Grabowski, S., and Deorowicz, S. (2015). Indexing arbitrary-length k-mers in sequencing reads. PLoS ONE, 10(7):1–14. Morisse, P ., Lecroq, T., and Lefebvre, A. (2017). HG-CoLoR: Hybrid Graph for the error Correction of Long Reads. In Proceedings of the Journ´ ees Ouvertes en Biologie, Informatique et Math´ ematiques.

  • P. Morisse

Enhanced de Bruin Graphs 35/35