Repetitive DNA and next-generation sequencing: computational challenges and solutions
Todd J. Treangen, Steven L. Salzberg Nature Reviews Genetics 13, 36-46 (January 2012) doi:10.1038/nrg3117 Speaker: 黃建龍, 黃元鴻 Date: 2012.06.04
Repetitive DNA and next-generation sequencing: computational - - PowerPoint PPT Presentation
Repetitive DNA and next-generation sequencing: computational challenges and solutions Todd J. Treangen, Steven L. Salzberg Nature Reviews Genetics 13, 36-46 (January 2012) doi:10.1038/nrg3117 Speaker: , Date: 2012.06.04
Todd J. Treangen, Steven L. Salzberg Nature Reviews Genetics 13, 36-46 (January 2012) doi:10.1038/nrg3117 Speaker: 黃建龍, 黃元鴻 Date: 2012.06.04
2
3
4
Box 1 | Repetitive DNA in the human genome
5
6
Figure 1 | Ambiguities in read mapping.
7
1.
Ignore them
2.
The best match approach (If equally good, then choose one at random or report all of them)
3.
Report all alignments up to a maximum number, d (multi-reads that align to > d locations will be discarded)
8
Figure 2 | Three strategies for mapping multi-reads.
NGS Sanger Length 50~150 bp 800~900 bp Depth
High Lower Hard!
http://www.data2bio.com/images/assembly_bg.png
9
Repeats Reads ? N ? ? ?
Hunan: 250~500bp NGS: 50~150bp
10
1.
False Joins
2.
Accurate but fragmented assembly. (Short contigs)
11
Figure 3 | Assembly errors caused by repeats (B, C)
12
13
14
15
separate locations on the genome.
read so that only 5 bp of that read span the splice site, then there may be many equally good locations to align the short 5 bp fragment.
16
http://en.wikipedia.org/wiki/File:RNA-Seq-alignment.png
17
Gene A Gene B Paralogue A/B biased downwards biased upwards
18
19