short read genome assembly
Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014
1
short read genome assembly Sorin Istrail CSCI1820 Short-read genome - - PowerPoint PPT Presentation
short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014 1 Genomathica Assembler Mathematica notebook for genome assembly simulation Assembler can be found at:
1
– Change the input genome to your FASTA file’s location – Evaluate each cell initially, then you only need to evaluate the last two cells to re-run the assembly, and display the results respectively – Mathematica can be downloaded here: http://www.brown.edu/information-technology/software/
2
Raw Sequence Reads Sample prep
Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP) Available at: www.genome.gov/sequencingcosts. Accessed April 2013.
http://www.ncbi.nlm.nih.gov/Traces/sra/
Compeau et al. (2011) How to apply de Bruijn graphs to genome assembly
17
GACG ACGT
GAC ACG CGT CGTA GTA
GACG ACGT
GAC ACG CGT CGTA GTA GTAC TACG TAC
GACG ACGT
GAC ACG CGT CGTA GTA GTAC TACG TAC CGTT GTT
21
GA AC CG GT TA GAC ACG CGT GTA
GA AC CG GT TA GAC ACG CGT GTA TAC
GA AC CG GT TA GT GAC ACG CGT GTA TAC GTT TT
25
Error occurs in the middle of a read and is propagated to many k-mers.
Error creates an erroneous ending k-mer
Compeau et al. (2011) How to apply de Bruijn graphs to genome assembly
Errors connect two nodes in the graph which do not correspond to a valid extension in the genome sequence
29
30 Compeau et al. (2011) How to apply de Bruijn graphs to genome assembly
Compeau et al. (2011) How to apply de Bruijn graphs to genome assembly
32
33
34