Relaxations of the Seriation Problem and Applications to de novo Genome Assembly
Soutenance de th` ese
Antoine Recanati
sous la direction d’Alexandre d’Aspremont 29 Novembre 2018
Relaxations of the Seriation Problem and Applications to de novo - - PowerPoint PPT Presentation
Relaxations of the Seriation Problem and Applications to de novo Genome Assembly Soutenance de th` ese Antoine Recanati sous la direction dAlexandre dAspremont 29 Novembre 2018 Introduction Genome sequencing ...ATGGCGTGCAATG...
Soutenance de th` ese
Antoine Recanati
sous la direction d’Alexandre d’Aspremont 29 Novembre 2018
Genome sequencing
...TACCGCACGTTAC...
1
DNA sequencing
Image: Nik Spencer/Nature
Genome is cut into
(reads). Ex: ATGGCGTGCAATG
2
DNA sequencing
Image: Nik Spencer/Nature
Genome is cut into
(reads). Ex: ATGGCGTGCAATG CGTGCAA
2
DNA sequencing
Image: Nik Spencer/Nature
Genome is cut into
(reads). Ex: ATGGCGTGCAATG CGTGCAA ATGGCGT
2
DNA sequencing
Image: Nik Spencer/Nature
Genome is cut into
(reads). Ex: ATGGCGTGCAATG CGTGCAA ATGGCGT TGCAATG
2
DNA sequencing
Image: Nik Spencer/Nature
Genome is cut into
(reads). Ex: ATGGCGTGCAATG CGTGCAA ATGGCGT TGCAATG GGCGTGC
2
Assembly
Goal: assemble reads together to reconstruct the full sequence. The position and ordering of the reads are unknown. CGTGCAA ATGGCGT TGCAATG GGCGTGC
ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG
3
Genome assembly: mapping
If reference genome available: map the fragments to it, then derive consensus sequence CGTGCAA ATGGCGT TGCAATG GGCGTGC
4
Genome assembly: mapping
If reference genome available: map the fragments to it, then derive consensus sequence CGTGCAA ATGGCGT TGCAATG GGCGTGC AAGGCGTGCATTG (ref. (proxy))
4
Genome assembly: mapping
If reference genome available: map the fragments to it, then derive consensus sequence CGTGCAA ATGGCGT TGCAATG GGCGTGC AAGGCGTGCATTG (ref. (proxy)) ATGGCGTGCAATG
4
Genome assembly: mapping
If reference genome available: map the fragments to it, then derive consensus sequence CGTGCAA ATGGCGT TGCAATG GGCGTGC AAGGCGTGCATTG (ref. (proxy)) ATGGCGTGCAATG ATGGCGTGCAATG
4
Genome assembly: mapping
If reference genome available: map the fragments to it, then derive consensus sequence CGTGCAA ATGGCGT TGCAATG GGCGTGC AAGGCGTGCATTG (ref. (proxy)) ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG
4
Genome assembly: mapping
If reference genome available: map the fragments to it, then derive consensus sequence CGTGCAA ATGGCGT TGCAATG GGCGTGC AAGGCGTGCATTG (ref. (proxy)) ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG
4
Genome assembly: mapping
If reference genome available: map the fragments to it, then derive consensus sequence CGTGCAA ATGGCGT TGCAATG GGCGTGC AAGGCGTGCATTG (ref. (proxy)) ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG
4
Genome assembly: mapping
If reference genome available: map the fragments to it, then derive consensus sequence CGTGCAA ATGGCGT TGCAATG GGCGTGC AAGGCGTGCATTG (ref. (proxy)) ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG (assembly)
4
Genome assembly: mapping
If reference genome available: map the fragments to it, then derive consensus sequence CGTGCAA ATGGCGT TGCAATG GGCGTGC AAGGCGTGCATTG (ref. (proxy)) ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG (assembly)
4
Genome assembly: de novo
No reference available. Greedy assembly: take one read, “add” the
CGTGCAA ATGGCGT TGCAATG GGCGTGC
5
Genome assembly: de novo
No reference available. Greedy assembly: take one read, “add” the
CGTGCAA ATGGCGT TGCAATG GGCGTGC ATGGCGTGCAATG ATGGCGTGCAATG
5
Genome assembly: de novo
No reference available. Greedy assembly: take one read, “add” the
CGTGCAA ATGGCGT TGCAATG GGCGTGC ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG
5
Genome assembly: de novo
No reference available. Greedy assembly: take one read, “add” the
CGTGCAA ATGGCGT TGCAATG GGCGTGC ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG
5
Genome assembly: de novo
No reference available. Greedy assembly: take one read, “add” the
CGTGCAA ATGGCGT TGCAATG GGCGTGC ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG
5
De novo assembly paradigms
6
Overlap-Layout-Consensus
ATGGCGT CGTGCAA TGCAATG GGCGTGC
GGCGT CGTGC TGCAA
CGT TGC
ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG ATGGCGTGCAATG
7
Modern sequencing technologies
(Illumina/Solexa), with pairing information. De Bruijn graphs methods (on k-mers based graph) preferred.
Biosciences [PacBio], Oxford Nanopore Technology [ONT]). Come-back of OLC methods.
methods)
8
De novo assembly methods with ONT reads
State of the art: Canu (ex. Celera Assembler). Heavy pre-processing, many heuristics
low-coverage/high-error regions
priori model of errors)
incremental scaffolding
9
De novo assembly methods with ONT reads
2015-now
10
Introduction De novo Genome Assembly Seriation Application of the Spectral Method to Genome Assembly Robust Seriation Multi-dimensional spectral ordering Conclusion
11