Discovery of Genomic Structural Variations with Next-Generation Sequencing Data
Marcel H. Schulz Advanced Topics in Computational Genomics Oct 2011
with slides from Tobias Rausch (EMBL) and Kai Ye (Leiden University)
Discovery of Genomic Structural Variations with Next-Generation - - PowerPoint PPT Presentation
Discovery of Genomic Structural Variations with Next-Generation Sequencing Data Marcel H. Schulz Advanced Topics in Computational Genomics Oct 2011 with slides from Tobias Rausch (EMBL) and Kai Ye (Leiden University) Genomic Rearrangements/
with slides from Tobias Rausch (EMBL) and Kai Ye (Leiden University)
courtesy of Tobias Rausch (EMBL)
– Deletion – Duplication
courtesy of Tobias Rausch (EMBL)
– Deletion – Duplication
courtesy of Tobias Rausch (EMBL)
– Deletion – Duplication
courtesy of Tobias Rausch (EMBL)
– Deletion – Duplication
courtesy of Tobias Rausch (EMBL)
– Deletion – Duplication
courtesy of Tobias Rausch (EMBL)
– Deletion – Duplication
– Disrupting genes – Creating fusion genes – Copy number changes of dosage-sensitive genes
courtesy of Tobias Rausch (EMBL)
Perry et al. (2007)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
Reference Split-Read alignments Read depth signals Mate-pair or paired-end mapping abnormalities
courtesy of Tobias Rausch (EMBL)
Unmapped or single-anchored reads Reference Split-Read alignments Read depth signals Mate-pair or paired-end mapping abnormalities Local assembly courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
Insertions Deletions
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
Korbel et al. (2007) Lee et al. (2009)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
Chiang et al. (2009)
courtesy of Tobias Rausch (EMBL)
Xie et al. (2009)
courtesy of Tobias Rausch (EMBL)
Chiang et al. (2009) Human cancer cell lines compared to normal cell lines (SeqSeq algorithm, no fixed window size, multiple change points method )
Donor Reference
Donor Reference
Donor Reference Medvedev et al. (2009)
Ye et al. (2009)
Ye et al. (2009) How to do that?
Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size)
!"!#$%&$'($)!*!++ +,
#&)-./!'0&12-./!(3!%0&&$).!/)45&2
courtesy of Kai Ye (Leiden U.)
!"!#$%&$'($)!*!++ +,
#&)-./!'0&12-./!(3!%0&&$).!/)45&2
courtesy of Kai Ye (Leiden U.)
!"!#$%&$'($)!*!++ +,
#&)-./!'0&12-./!(3!%0&&$).!/)45&2
courtesy of Kai Ye (Leiden U.)
!"!#$%&$'($)!*!++ +,
#&)-./!'0&12-./!(3!%0&&$).!/)45&2
courtesy of Kai Ye (Leiden U.)
!"!#$%&$'($)!*!++ *!
#&),-.!'/&01,-.!(2!%/&&$)-!.)34&1
5,-,'6'!6-,76$!86(8&),-.9!:;< 5/=,'6'!6-,76$!86(8&),-.9!:;<> courtesy of Kai Ye (Leiden U.)
Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length + Max_D) starting from mapped end in step 2
Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length + Max_D) starting from mapped end in step 2 ④ check if complete unmapped read can be combined from 3’ and 5’ end substrings matches
Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length -1) starting from mapped end in step 2 ④ check if complete unmapped read can be combined from 3’ and 5’ end substrings matches
Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length -1) starting from mapped end in step 2 ④ check if complete unmapped read can be combined from 3’ and 5’ end substrings matches
Ye et al. (2009)
Ye et al. (2009)
Ye et al. (2009)
Ye et al. Pindel manual a) large deletion b) tandem duplication c) inversion d-f) same as a-c with non-template sequence (yellow part)
Emde et al. submitted ① SplazerS detects any possible prefix-suffix decomposition of the unmapped read in search region ② allows arbitrary number of mismatches and even small indels in the unmapped read ③ delay decision to indel calling step
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
courtesy of Tobias Rausch (EMBL)
Kai Ye, Marcel H. Schulz, Quan Long, Rolf Apweiler, and Zemin Ning Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics (2009) 25(21): 2865-2871 Pindel homepage: https://trac.nbic.nl/pindel/ SplazerS homepage: http://www.seqan.de/projects/splazers.html