Safe and complete genome assembly via omnitigs Alexandru Tomescu - - PowerPoint PPT Presentation

safe and complete genome assembly via omnitigs
SMART_READER_LITE
LIVE PREVIEW

Safe and complete genome assembly via omnitigs Alexandru Tomescu - - PowerPoint PPT Presentation

This project has received funding from the European Unions Horizon 2020 research and innovation programme under the Marie This lecture was part of the 1st Summer School on Bioinformatics Data Structures, funded by BIRDS project


slide-1
SLIDE 1

Safe and complete genome assembly via omnitigs

Alexandru Tomescu

Department of Computer Science University of Helsinki, Finland

1st Summer School on Bioinformatics Data Structures August 9, 2016

1 / 6 This lecture was part of the 1st Summer School on Bioinformatics Data Structures, funded by BIRDS project (www.birdsproject.eu) This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 690941

slide-2
SLIDE 2

CENTRAL DOGMA OF BIOLOGY

DNA transcription binding promoter enhancer silencer protein-protein interaction gene 1 pre-mRNA mature mRNA transcripts intron exon alternative splicing translation proteins TFBS gene 2

...

Image taken from Genome-Scale Algorithm Design, Cambridge University Press, 2015

2 / 6

slide-3
SLIDE 3

SEQUENCING ATLAS

DNA gene mature mRNA transcripts proteins RNA sequencing ChIP sequencing methylation Bisulfite sequencing binding primer primer Targeted resequencing De novo sequencing / Whole genome resequencing TFBS Image taken from Genome-Scale Algorithm Design, Cambridge University Press, 2015

3 / 6

slide-4
SLIDE 4

BLACKBOARD

The “safe and complete” framework is described in:

◮ A. I. Tomescu, P. Medvedev, Safe and complete contig assembly via omnitigs

RECOMB 2016 - 20th Annual International Conference on Research in Computational Molecular Biology, LNCS 9649, 152-163, 2016. Extended version available at http://arxiv.org/abs/1601.02932

4 / 6

slide-5
SLIDE 5

EXPERIMENTAL RESULTS

Table: Results for DBk

ec(R), where R is the set of all (k + 1)-mers of the genome.

E.coli (k = 31) chr10 (k = 55) # strings avg len E-size time (s) # strings avg len E-size time (s) unitigs 1,743 2,654 33,309 < 1 259,845 546 8,344 1 Y-to-V 1,004 4,682 33,632 < 1 159,101 878 8,376 2

  • mnitigs

983 4,832 34,557 < 1 158,236 887 8,401 1, 046

5 / 6

slide-6
SLIDE 6

EXPERIMENTAL RESULTS

Compared to unitigs, #SNPs whose block size

◮ increased: ∼ 1.7 million (out of ∼ 5.9 million) ◮ increased by more than 10: ∼ 137, 000

Compared to Y-to-V contigs, #SNPs whose block size

◮ increased: ∼ 266, 000 (out of ∼ 5.9 million)

6 / 6