Discovery of Genomic Structural Variations with Next-Generation - - PowerPoint PPT Presentation

discovery of genomic structural variations with next
SMART_READER_LITE
LIVE PREVIEW

Discovery of Genomic Structural Variations with Next-Generation - - PowerPoint PPT Presentation

Discovery of Genomic Structural Variations with Next-Generation Sequencing Data Advanced Topics in Computational Genomics Slides from Marcel H. Schulz, Tobias Rausch (EMBL), and Kai Ye (Leiden University) Computational Methods Detecting


slide-1
SLIDE 1

Discovery of Genomic Structural Variations with Next-Generation Sequencing Data

Advanced Topics in Computational Genomics

Slides from Marcel H. Schulz, Tobias Rausch (EMBL), and Kai Ye (Leiden University)

slide-2
SLIDE 2

Computational Methods

slide-3
SLIDE 3

Reference Split-Read alignments Read depth signals Mate-pair or paired-end mapping abnormalities

Detecting Genomic Rearrangements

courtesy of Tobias Rausch (EMBL)

slide-4
SLIDE 4

Detecting Genomic Rearrangements

Unmapped or single-anchored reads Reference Split-Read alignments Read depth signals Mate-pair or paired-end mapping abnormalities Local assembly courtesy of Tobias Rausch (EMBL)

slide-5
SLIDE 5

courtesy of Tobias Rausch (EMBL)

slide-6
SLIDE 6

courtesy of Tobias Rausch (EMBL)

slide-7
SLIDE 7

Insertions   Deletions

courtesy of Tobias Rausch (EMBL)

slide-8
SLIDE 8

 Korbel et al. (2007)  Lee et al. (2009)

courtesy of Tobias Rausch (EMBL)

slide-9
SLIDE 9

courtesy of Tobias Rausch (EMBL)

slide-10
SLIDE 10

courtesy of Tobias Rausch (EMBL)

slide-11
SLIDE 11

courtesy of Tobias Rausch (EMBL)

slide-12
SLIDE 12

courtesy of Tobias Rausch (EMBL)

slide-13
SLIDE 13

1 Copy 1 Copy 0 Copy 2 Copy 2 Copy

 Chiang et al. (2009)

courtesy of Tobias Rausch (EMBL)

slide-14
SLIDE 14
  • Down-Syndrom

– Partial Trisomie 21

 Xie et al. (2009)

courtesy of Tobias Rausch (EMBL)

slide-15
SLIDE 15

 Chiang et al. (2009) Human cancer cell lines compared to normal cell lines (SeqSeq algorithm, no fixed window size, multiple change points method )

slide-16
SLIDE 16

With reads of length 40-100 bps are we able to find the exact breakpoint of a structural variation?

slide-17
SLIDE 17

With reads of length 40-100 bps are we able to find the exact breakpoint of a structural variation? Yes – using split-read mapping Example for read of length 40: Expected random matches for a 12bp read-prefix in the human genome?

Donor Reference

slide-18
SLIDE 18

With reads of length 40-100 bps are we able to find the exact breakpoint of a structural variation? Yes – using split-read mapping Example for read of length 40: Expected random matches for a 12bp read-prefix in the human genome?

Donor Reference

1⋅109 412 ≈179

slide-19
SLIDE 19

With reads of length 40-100 bps are we able to find the exact breakpoint of a structural variation? Yes – using anchored split-read mapping mappable read mate provides anchor to narrow down search space

Donor Reference  Medvedev et al. (2009)

slide-20
SLIDE 20

The Pindel algorithm (Deletions)

 Ye et al. (2009) How to do that?

slide-21
SLIDE 21

The Pindel algorithm (Deletions)

 Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size)

slide-22
SLIDE 22

!"!#$%&$'($)!*!++ +,

#&)-./!'0&12-./!(3!%0&&$).!/)45&2

ATGCA ATCAAGTATGCTTAGC

courtesy of Kai Ye (Leiden U.)

slide-23
SLIDE 23

!"!#$%&$'($)!*!++ +,

#&)-./!'0&12-./!(3!%0&&$).!/)45&2

ATGCA ATCAAGTATGCTTAGC

courtesy of Kai Ye (Leiden U.)

slide-24
SLIDE 24

!"!#$%&$'($)!*!++ +,

#&)-./!'0&12-./!(3!%0&&$).!/)45&2

ATGCA ATCAAGTATGCTTAGC

courtesy of Kai Ye (Leiden U.)

slide-25
SLIDE 25

!"!#$%&$'($)!*!++ +,

#&)-./!'0&12-./!(3!%0&&$).!/)45&2

ATGCA ATCAAGTATGCTTAGC

courtesy of Kai Ye (Leiden U.)

slide-26
SLIDE 26

!"!#$%&$'($)!*!++ *!

#&),-.!'/&01,-.!(2!%/&&$)-!.)34&1

ATGCA ATCAAGTATGCTTAGC

5,-,'6'!6-,76$!86(8&),-.9!:;< 5/=,'6'!6-,76$!86(8&),-.9!:;<> courtesy of Kai Ye (Leiden U.)

slide-27
SLIDE 27

The Pindel algorithm (Deletions)

 Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length + Max_D) starting from mapped end in step 2

slide-28
SLIDE 28

The Pindel algorithm (Deletions)

 Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length + Max_D) starting from mapped end in step 2 ④ check if complete unmapped read can be combined from 3’ and 5’ end substrings matches

slide-29
SLIDE 29

The Pindel algorithm (Insertions)

 Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length -1) starting from mapped end in step 2 ④ check if complete unmapped read can be combined from 3’ and 5’ end substrings matches

slide-30
SLIDE 30

The Pindel algorithm (Insertions)

 Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length -1) starting from mapped end in step 2 ④ check if complete unmapped read can be combined from 3’ and 5’ end substrings matches

  • In initial Pindel version exact matches to reference where required
slide-31
SLIDE 31

The Pindel algorithm (Real Data)

 Ye et al. (2009)

slide-32
SLIDE 32

The Pindel algorithm (Real Data)

 Ye et al. (2009)

slide-33
SLIDE 33

The Pindel algorithm for complex variants

 Ye et al. Pindel manual a) large deletion b) tandem duplication c) inversion d-f) same as a-c with non-template sequence (yellow part)

slide-34
SLIDE 34

Acknowledgements

  • Tobias Rausch (EMBL)
  • Kai Ye (Leiden University Medical Center)
  • Anne-Katrin Emde (Freie Universität Berlin)

References

Kai Ye, Marcel H. Schulz, Quan Long, Rolf Apweiler, and Zemin Ning Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics (2009) 25(21): 2865-2871 Pindel homepage: https://trac.nbic.nl/pindel/ SplazerS homepage: http://www.seqan.de/projects/splazers.html