[PPT] - Discovery of Genomic Structural Variations with Next-Generation PowerPoint Presentation

SLIDE 1

Discovery of Genomic Structural Variations with Next-Generation Sequencing Data

Marcel H. Schulz Advanced Topics in Computational Genomics Oct 2011

with slides from Tobias Rausch (EMBL) and Kai Ye (Leiden University)

SLIDE 2

Genomic Rearrangements/ Structural Variations (SVs)

1 Kb to several Mb in size

courtesy of Tobias Rausch (EMBL)

SLIDE 3

1 Kb to several Mb in size
Copy number variants

(CNVs)

– Deletion – Duplication

Genomic Rearrangements/ Structural Variations (SVs)

courtesy of Tobias Rausch (EMBL)

SLIDE 4

Genomic Rearrangements/ Structural Variations (SVs)

1 Kb to several Mb in size
Copy number variants

(CNVs)

– Deletion – Duplication

Insertion

courtesy of Tobias Rausch (EMBL)

SLIDE 5

Genomic Rearrangements/ Structural Variations (SVs)

1 Kb to several Mb in size
Copy number variants

(CNVs)

– Deletion – Duplication

Insertion, Inversion

courtesy of Tobias Rausch (EMBL)

SLIDE 6

1 Kb to several Mb in size
Copy number variants

(CNVs)

– Deletion – Duplication

Insertion, Inversion, Translocation

Genomic Rearrangements/ Structural Variations

courtesy of Tobias Rausch (EMBL)

SLIDE 7

1 Kb to several Mb in size
Copy number variants

– Deletion – Duplication

Insertion, Inversion, Translocation
More abundant than SNPs

Genomic Rearrangements/ Structural Variations

…ACGATACG… …ACGAGACG…

courtesy of Tobias Rausch (EMBL)

SLIDE 8

1 Kb to several Mb in size
Copy number variants

– Deletion – Duplication

Insertion, Inversion, Translocation
More abundant than SNPs
Either neutral or non-neutral in function
Non-neutral mechanisms

– Disrupting genes – Creating fusion genes – Copy number changes of dosage-sensitive genes

Genomic Rearrangements/ Structural Variations

courtesy of Tobias Rausch (EMBL)

SLIDE 9

Why Structural Variation Discovery

Finding disease causal genes
Trace evolutionary genome history
Analyze the mechanisms of SVs occurrence
Understand Repetitive Element spreading

(LINEs, ALUs, etc.)

SLIDE 10

Technologies to Discover Structural Variations

SLIDE 11

Technologies

Fluorescent in situ hybridization (FISH)

– Fluorescent probes (≈100kb) detect and localize the presence or absence of specific DNA sequence

 Perry et al. (2007)

courtesy of Tobias Rausch (EMBL)

SLIDE 12

Technologies

Fluorescent in situ hybridization (FISH)
Comparative Genomic Hybridization (CGH)

– Test vs. reference sample – 2.1 million probes – Different types

Whole-Genome Tiling Arrays
Whole-Genome Exon-Focused Arrays
CNV Arrays

courtesy of Tobias Rausch (EMBL)

SLIDE 13

Technologies

Fluorescent in situ hybridization (FISH)
Comparative Genomic Hybridization (CGH)
Genome-Wide Human SNP Array 6.0

– 1.8 million genetic markers

906,600 SNPs
946,000 probes for CNVs

courtesy of Tobias Rausch (EMBL)

SLIDE 14

Technologies

Fluorescent in situ hybridization (FISH)
Comparative Genomic Hybridization (CGH)
Genome-Wide Human SNP Array 6.0
Human 1M-Duo DNA Analysis BeadChip

– 1.2 million genetic markers

Markers for SNPs and CNV regions

– Targeted studies

60,800 additional custom SNPs
60,000 custom CNV-targets

courtesy of Tobias Rausch (EMBL)

SLIDE 15

Technologies

Fluorescent in situ hybridization (FISH)
Comparative Genomic Hybridization (CGH)
Genome-Wide Human SNP Array 6.0
Human 1M-Duo DNA Analysis BeadChip
Next-Generation Sequencing (NGS)

– Whole-genome sequencing – Targeted, e.g. RNA-Seq

courtesy of Tobias Rausch (EMBL)

SLIDE 16

Focus on NGS

Limitations of Arrays

– Lower resolution for genomic rearrangements – Balanced events (e.g., inversions) cannot be detected using signal intensity differences – No breakpoint information

courtesy of Tobias Rausch (EMBL)

SLIDE 17

Paired-end data

Two protocols for paired-end data

– mate-pair sequencing by circularization (traditional Sanger sequencing) – paired-end NGS

verview protocol

SLIDE 18

Paired-end data

– paired-end NGS (insert distribution known due to fragment size selection)

SLIDE 19

Computational Methods

SLIDE 20

Experiment

SLIDE 21

Reference Split-Read alignments Read depth signals Mate-pair or paired-end mapping abnormalities

Detecting Genomic Rearrangements

courtesy of Tobias Rausch (EMBL)

SLIDE 22

Detecting Genomic Rearrangements

Unmapped or single-anchored reads Reference Split-Read alignments Read depth signals Mate-pair or paired-end mapping abnormalities Local assembly courtesy of Tobias Rausch (EMBL)

SLIDE 23

courtesy of Tobias Rausch (EMBL)

SLIDE 24

courtesy of Tobias Rausch (EMBL)

SLIDE 25

Insertions   Deletions

courtesy of Tobias Rausch (EMBL)

SLIDE 26

courtesy of Tobias Rausch (EMBL)

SLIDE 27

 Korbel et al. (2007)  Lee et al. (2009)

courtesy of Tobias Rausch (EMBL)

SLIDE 28

courtesy of Tobias Rausch (EMBL)

SLIDE 29

courtesy of Tobias Rausch (EMBL)

SLIDE 30

courtesy of Tobias Rausch (EMBL)

SLIDE 31

courtesy of Tobias Rausch (EMBL)

SLIDE 32

courtesy of Tobias Rausch (EMBL)

SLIDE 33

1 Copy 1 Copy 0 Copy 2 Copy 2 Copy

 Chiang et al. (2009)

courtesy of Tobias Rausch (EMBL)

SLIDE 34

Down-Syndrom

– Partial Trisomie 21

 Xie et al. (2009)

courtesy of Tobias Rausch (EMBL)

SLIDE 35

 Chiang et al. (2009) Human cancer cell lines compared to normal cell lines (SeqSeq algorithm, no fixed window size, multiple change points method )

SLIDE 36

With reads of length 40-100 bps are we able to find the exact breakpoint of a structural variation?

SLIDE 37

With reads of length 40-100 bps are we able to find the exact breakpoint of a structural variation? Yes – using split-read mapping Example for read of length 40: Expected random matches for a 12bp read-prefix in the human genome?

Donor Reference

SLIDE 38

With reads of length 40-100 bps are we able to find the exact breakpoint of a structural variation? Yes – using split-read mapping Example for read of length 40: Expected random matches for a 12bp read-prefix in the human genome?

Donor Reference

1⋅109 412 ≈179

SLIDE 39

With reads of length 40-100 bps are we able to find the exact breakpoint of a structural variation? Yes – using anchored split-read mapping mappable read mate provides anchor to narrow down search space

Donor Reference  Medvedev et al. (2009)

SLIDE 40

The Pindel algorithm (Deletions)

 Ye et al. (2009)

SLIDE 41

The Pindel algorithm (Deletions)

 Ye et al. (2009) How to do that?

SLIDE 42

The Pindel algorithm (Deletions)

 Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size)

SLIDE 43

!"!#$%&$'($)!*!++ +,

#&)-./!'0&12-./!(3!%0&&$).!/)45&2

ATGCA ATCAAGTATGCTTAGC

courtesy of Kai Ye (Leiden U.)

SLIDE 44

!"!#$%&$'($)!*!++ +,

#&)-./!'0&12-./!(3!%0&&$).!/)45&2

ATGCA ATCAAGTATGCTTAGC

courtesy of Kai Ye (Leiden U.)

SLIDE 45

!"!#$%&$'($)!*!++ +,

#&)-./!'0&12-./!(3!%0&&$).!/)45&2

ATGCA ATCAAGTATGCTTAGC

courtesy of Kai Ye (Leiden U.)

SLIDE 46

!"!#$%&$'($)!*!++ +,

#&)-./!'0&12-./!(3!%0&&$).!/)45&2

ATGCA ATCAAGTATGCTTAGC

courtesy of Kai Ye (Leiden U.)

SLIDE 47

!"!#$%&$'($)!*!++ *!

#&),-.!'/&01,-.!(2!%/&&$)-!.)34&1

ATGCA ATCAAGTATGCTTAGC

5,-,'6'!6-,76$!86(8&),-.9!:;< 5/=,'6'!6-,76$!86(8&),-.9!:;<> courtesy of Kai Ye (Leiden U.)

SLIDE 48

The Pindel algorithm (Deletions)

 Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length + Max_D) starting from mapped end in step 2

SLIDE 49

The Pindel algorithm (Deletions)

 Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length + Max_D) starting from mapped end in step 2 ④ check if complete unmapped read can be combined from 3’ and 5’ end substrings matches

SLIDE 50

The Pindel algorithm (Insertions)

 Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length -1) starting from mapped end in step 2 ④ check if complete unmapped read can be combined from 3’ and 5’ end substrings matches

SLIDE 51

The Pindel algorithm (Insertions)

 Ye et al. (2009) ① Use 3’ end of left read as anchor point ② Use pattern growth to search for minimum and maximum unique substrings from the 3′ end of the unmapped read (<=2x insert size) ③ Use pattern growth to search for minimum and maximum unique substrings from the 5’ end of the unmapped read (read length -1) starting from mapped end in step 2 ④ check if complete unmapped read can be combined from 3’ and 5’ end substrings matches

In initial Pindel version exact matches to reference where required

SLIDE 52

The Pindel algorithm (Simulations)

 Ye et al. (2009)

SLIDE 53

The Pindel algorithm (Real Data)

 Ye et al. (2009)

SLIDE 54

The Pindel algorithm (Real Data)

 Ye et al. (2009)

SLIDE 55

The Pindel algorithm for complex variants

 Ye et al. Pindel manual a) large deletion b) tandem duplication c) inversion d-f) same as a-c with non-template sequence (yellow part)

SLIDE 56

Comparison to SplazerS

 Emde et al. submitted ① SplazerS detects any possible prefix-suffix decomposition of the unmapped read in search region ② allows arbitrary number of mismatches and even small indels in the unmapped read ③ delay decision to indel calling step

SLIDE 57

Computational Methods for De Novo Genomic Rearrangement Detection

courtesy of Tobias Rausch (EMBL)

SLIDE 58

Computational Methods for De Novo Genomic Rearrangement Detection

courtesy of Tobias Rausch (EMBL)

SLIDE 59

Computational Methods for De Novo Genomic Rearrangement Detection

courtesy of Tobias Rausch (EMBL)

SLIDE 60

Acknowledgements

Tobias Rausch (EMBL)
Kai Ye (Leiden University Medical Center)
Anne-Katrin Emde (Freie Universität Berlin)

References

Kai Ye, Marcel H. Schulz, Quan Long, Rolf Apweiler, and Zemin Ning Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics (2009) 25(21): 2865-2871 Pindel homepage: https://trac.nbic.nl/pindel/ SplazerS homepage: http://www.seqan.de/projects/splazers.html