Population-based detection of Structural Variants in normal and - - PowerPoint PPT Presentation

population based detection of structural variants in
SMART_READER_LITE
LIVE PREVIEW

Population-based detection of Structural Variants in normal and - - PowerPoint PPT Presentation

Population-based detection of Structural Variants in normal and aberrant genomes. Jean Monlong, PhD2 Guillaume Bourques group Research Day - June 5, 2014 Human Genetics Dept. 1 / 13 What is structural variation ? Genetic variation


slide-1
SLIDE 1

Population-based detection of Structural Variants in normal and aberrant genomes.

Jean Monlong, PhD2

Guillaume Bourque’s group

Research Day - June 5, 2014 Human Genetics Dept.

1 / 13

slide-2
SLIDE 2

What is structural variation ?

Genetic variation involving more than 500bp.

Baker 2012, Nature Methods. Raphael Lab, Brown University.

Structural Variant: SV; Copy Number Variation: CNV.

2 / 13

slide-3
SLIDE 3

Why is it important ?

◮ Major role in evolution. ◮ Population Genetics: widespread variation across humans. ◮ Association with diseases and cancer.

SV detection using High-Throughput Sequencing

◮ Sample is sequenced. ◮ Reads are mapped to the reference genome. ◮ Unexpected patterns could be explain by presence of SVs.

3 / 13

slide-4
SLIDE 4

SV detection using High-Throughput Sequencing

Baker 2012, Nature Methods. 4 / 13

slide-5
SLIDE 5

Limitation

Low mappability

◮ Noisy or reduced signal in repeat-rich regions, centromeres, telomeres. ◮ Unpredictable segmentation → reduced sensitivity/specificity. ◮ Filtering problematic regions reduces the genome range tested. genomic window number of reads mapped genomic window number of reads mapped

5 / 13

slide-6
SLIDE 6

Objective

Test the entire genome, including low-mappability regions, and detect subtle abnormal coverage.

PopSV : Population-based approach

Use a set of reference experiments to detect abnormal patterns.

genomic window number of reads mapped

sample reference tested 6 / 13

slide-7
SLIDE 7

PopSV : Population-based approach

genomic window number of reads mapped

sample reference tested

Workflow

  • 1. Genome is fragmented in bins.
  • 2. Reads in each bin are counted, for each sample.
  • 3. Normalization of the bin counts.
  • 4. Each sample and each bin is tested for divergence from

reference samples (Z-score).

  • 5. P-value estimation and multiple test correction.

7 / 13

slide-8
SLIDE 8

CageKid : Renal Cell Cancer

Whole-Genome Sequencing of 100 individuals, ∼ 40X coverage, Illumina paired-end 100bp, normal and tumor paired samples.

◮ Normal samples → reference samples. ◮ 10kb bins. ◮ Only properly paired and mapped read pairs.

Validation and benchmark

◮ Germline events detected in tumor samples ? ◮ Concordant with SNP-array calls ? ◮ Twin dataset: concordant with the pedigree ? ◮ Concordant when using different bin sizes ?

PopSV detected more concordant calls than other methods.

8 / 13

slide-9
SLIDE 9

Example: Partial tumoral event

  • 2000

4000 100.75 100.80 100.85 100.90 100.95 101.00

position (Mb) read coverage

tumor sample: D000GMU

  • abnormal

normal normal samples

Chr.1, overlapping CDC14A gene (cell division cycle), not detected by other approaches. 9 / 13

slide-10
SLIDE 10

Example: Telomeric region

  • 2000

4000 6000 135.11 135.13 135.15

position (Mb) read coverage

normal sample: D000GQ9

  • abnormal

normal normal samples

Chr.10, overlapping genes (PRAP1, CALY), not detected by other approaches. 10 / 13

slide-11
SLIDE 11

PopSV flexibility

Custom binning: repeat annotation

◮ Increased resolution in regions of interest. ◮ Promising results: enrichment in centromere/telomere.

Counting discordant reads

◮ Detect excess of discordant reads. ◮ Promising results, including on repeats.

11 / 13

slide-12
SLIDE 12

Conclusion

Robust and sensitive approach

◮ Detection in low mappability regions and partial tumoral signal. ◮ Superior to other Read-Depth methods. ◮ Wider range of the genome tested.

Work in progress

◮ Explore results and application to other projects (e.g. Pan-Cancer Analysis of

Whole Genome).

◮ Custom binning: repeat annotation, Whole-Exome Sequencing. ◮ More than an CNV caller.

◮ Excess of discordant read pairs. ◮ Combination with orthogonal approaches (PEM, Assembly). 12 / 13

slide-13
SLIDE 13

Acknowledgment

◮ Guillaume Bourque ◮ Mathieu Bourgey ◮ Louis Letourneau ◮ Francois Lefebvre ◮ Eric Audemard ◮ Toby Hocking ◮ Simon Gravel ◮ Mathieu Blanchette ◮ Mehran Karimzadeh Reghbati

13 / 13