Global patterns of copy number variation in humans from a - - PowerPoint PPT Presentation

global patterns of copy number variation in humans from a
SMART_READER_LITE
LIVE PREVIEW

Global patterns of copy number variation in humans from a - - PowerPoint PPT Presentation

Global patterns of copy number variation in humans from a population-based analysis. ICHG Kyoto Jean Monlong April 5, 2016 B OURQUE L AB M C G ILL U NIVERSITY H UMAN G ENETICS D EPT . Disclosure Information I have no financial relationships


slide-1
SLIDE 1

Global patterns of copy number variation in humans from a population-based analysis.

ICHG Kyoto

Jean Monlong April 5, 2016

BOURQUE LAB MCGILL UNIVERSITY HUMAN GENETICS DEPT.

slide-2
SLIDE 2

2

Disclosure Information

I have no financial relationships to disclose

slide-3
SLIDE 3

Copy-Number Variation 3

Copy-Number Variation

slide-4
SLIDE 4

Copy-Number Variation 4

Copy Number Variation (CNV)

Imbalanced genetic variation involving more than 500bp.

slide-5
SLIDE 5

Copy-Number Variation 5

CNV detection from High-Throughput Sequencing

Baker 2012, Nature Methods.

slide-6
SLIDE 6

Copy-Number Variation 6

Low-mappability regions

Repeat-rich regions, centromeres, telomeres. ∼13% of the human genome.

slide-7
SLIDE 7

Copy-Number Variation 6

Low-mappability regions

Repeat-rich regions, centromeres, telomeres. ∼13% of the human genome. More prone to CNV. Enriched in Segmental Duplications (Sharp Annual Review

2006).

Short Tandem Repeats highly polymorphic (Warbuton BMC

Genomics 2008).

Transposons involved in CNV formation (Sen AJHG 2006).

slide-8
SLIDE 8

Copy-Number Variation 6

Low-mappability regions

Repeat-rich regions, centromeres, telomeres. ∼13% of the human genome. More prone to CNV. Enriched in Segmental Duplications (Sharp Annual Review

2006).

Short Tandem Repeats highly polymorphic (Warbuton BMC

Genomics 2008).

Transposons involved in CNV formation (Sen AJHG 2006). Involved in phenotype and disease. Short Tandem Repeats and gene expression (Gymrek Nat.

Genetics 2016).

Repeats CNV involved in ∼30 genetic disorders (Mirkin

Nature 2007).

Retrotransposition in cancer (Lee Science 2012).

slide-9
SLIDE 9

PopSV approach 7

PopSV approach

slide-10
SLIDE 10

PopSV approach 8

PopSV approach

Objective

Test the entire genome, including low-mappability regions, and detect subtle abnormal coverage.

PopSV: Population-based approach

Use a set of reference experiments to detect abnormal patterns.

genomic window number of reads mapped

sample reference tested

slide-11
SLIDE 11

PopSV approach 9

Benchmark and validation

Existing methods

FREEC LASSO-based segmentation; GC and mappability correction. cn.MOPS Multi-sample Bayesian-based segmentation.

Whole-Genome Sequencing data

45 samples, including 10 twin families (i.e 2 twins + 2 parents). 95 pairs of normal/tumor samples from Renal Cell Carcinoma (CageKid).

slide-12
SLIDE 12

PopSV approach 10

Benchmark and validation

Replication in the twins. Concordance with pedigree. Replication in the paired tumor. Concordance of different bin sizes PCR validation. Overall performance and in different repeat context.

slide-13
SLIDE 13

PopSV approach 11

Validation conclusions

PopSV detects 3-5x more variants. Wider genomic range. Robust across challenging regions: Low-coverage. Segmental duplications. DNA satellites. Short tandem repeats GC-rich/poor. Resolution down to half the bin size.

slide-14
SLIDE 14

CNV patterns in normal genomes 12

CNV patterns in normal genomes

slide-15
SLIDE 15

CNV patterns in normal genomes 13

CNV in normal genomes

640 normal genomes

45 samples from the Twin study (∼40X) 95 normal samples from Renal Cell Carcinoma (∼54X). 500 unrelated samples from GoNL (∼14X).

slide-16
SLIDE 16

CNV patterns in normal genomes 13

CNV in normal genomes

640 normal genomes

45 samples from the Twin study (∼40X) 95 normal samples from Renal Cell Carcinoma (∼54X). 500 unrelated samples from GoNL (∼14X).

Where are CNVs located ?

In Centromere ? Telomere ? Segmental duplication ? DNA satellites ? Short tandem repeats ? Transposable Elements ? Exons ? Promoters ?

slide-17
SLIDE 17

CNV patterns in normal genomes 13

CNV in normal genomes

640 normal genomes

45 samples from the Twin study (∼40X) 95 normal samples from Renal Cell Carcinoma (∼54X). 500 unrelated samples from GoNL (∼14X).

Where are CNVs located ?

In Centromere ? Telomere ? Segmental duplication ? DNA satellites ? Short tandem repeats ? Transposable Elements ? Exons ? Promoters ?

Control regions

Same size distribution. Randomly distributed.

slide-18
SLIDE 18

CNV patterns in normal genomes 14

Enriched close to Centromere/Telomere/Gap (CTG)

0.00 0.25 0.50 0.75 1.00 0e+00 2e+07 4e+07 6e+07

distance to centromere/telomere/gap (bp) cumulative proportion region

CNV control

slide-19
SLIDE 19

CNV patterns in normal genomes 15

Enriched in SD and low-coverage regions

slide-20
SLIDE 20

CNV patterns in normal genomes 16

Going further

  • 1. Control for the SD and CTG patterns.
  • 2. Look at other repeat classes.

Control regions

Randomly distributed. Same size distribution.

slide-21
SLIDE 21

CNV patterns in normal genomes 16

Going further

  • 1. Control for the SD and CTG patterns.
  • 2. Look at other repeat classes.

Control regions

Randomly distributed. Same size distribution. Same proportion overlapping a segmental duplication. Similar distance to CTG.

slide-22
SLIDE 22

CNV patterns in normal genomes 17

Controlling for SD and distance to CTG

slide-23
SLIDE 23

CNV patterns in normal genomes 17

Controlling for SD and distance to CTG

slide-24
SLIDE 24

CNV patterns in normal genomes 18

Controlling for SD and distance to CTG

Satellites enrichment driven by ALR/Alpha, (GAATG)n/(CATTC)n families. Short Tandem Repeats Enrichment distributed across families... ... but stronger for larger STR. Transposable elements (TE): SVA class enriched. Expected: L1HS, L1PA2 to L1PA5. Surprises: HERVH, LTR38, LTR4.

slide-25
SLIDE 25

CNV patterns in normal genomes 19

Repeat CNVs and protein-coding genes

Set CNVs Genes with CNVs Exon + Promoter + Intron All CNVs 91733 7206 11341 13259 Low coverage 26888 682 1151 1977 Extremely low coverage 10010 347 465 521 STR 4286 45 286 748 Satellite 1822 2 21 33 TE 20491 164 1747 3998 STR/Satellite/TE 22313 166 1760 4014

Repeat CNV: more than 90% of the CNV is annotated as repeat.

slide-26
SLIDE 26

Conclusion 20

Conclusion

slide-27
SLIDE 27

Conclusion 21

Summary

PopSV uses reference samples. detects more CNVs. is robust across the entire genome.

slide-28
SLIDE 28

Conclusion 21

Summary

PopSV uses reference samples. detects more CNVs. is robust across the entire genome. In normal genomes: CNVs enriched in low coverage regions. Specific enrichment in satellites, simple repeats, TEs. Not due to segmental duplication enrichment. Replicated across datasets but different from somatic patterns. Some CNVs in low coverage regions or repeats hit exonic sequence.

slide-29
SLIDE 29

Guillaume Bourque Mathieu Bourgey Louis Letourneau Francois Lefebvre Eric Audemard Toby Hocking Simon Girard Patrick Cossette Guy Rouleau Caroline Meloche Simon Gravel Mathieu Blanchette

slide-30
SLIDE 30

23

slide-31
SLIDE 31

24

Workflow

slide-32
SLIDE 32

25

Replication in twins

slide-33
SLIDE 33

26

Robust across challenging regions

0.00 0.25 0.50 0.75 1.00 PopSV low expected high

coverage class proportion of regions with concordant samples

set call null 0.00 0.25 0.50 0.75 1.00 PopSV [0,0.2] (0.2,0.4] (0.4,0.6] (0.6,0.8] (0.8,1]

GC content proportion of regions with concordant samples

set call null 0.00 0.25 0.50 0.75 1.00 PopSV [0,0.2] (0.2,0.4] (0.4,0.6] (0.6,0.8] (0.8,1]

segmental duplication proportion proportion of regions with concordant samples

set call null 0.00 0.25 0.50 0.75 1.00 PopSV [0,0.2] (0.2,0.4] (0.4,0.6] (0.6,0.8] (0.8,1]

simple repeat proportion proportion of regions with concordant samples

set call null

slide-34
SLIDE 34

27

Robust across challenging regions

  • 0.0

0.2 0.4 0.6 0.8 1652−Mother 1652−Father 1652−Twin1 1652−Twin2 1480−Mother 1480−Twin2 1480−Twin1 1389−Mother 1389−Twin1 1389−Twin2 1207−Mother 1207−Father 1207−Twin1 1207−Twin2 1286−Father 1286−Twin1 1286−Twin2 1286−Mother 1389−Father

  • ther5

1301−Father 1480−Father 1323−Father 1301−Mother 1301−Twin1 1301−Twin2 1323−Mother 1323−Twin1 1323−Twin2

  • ther1

1443−Mother 1443−Father 1443−Twin2 1443−Twin1 1121−Mother 1121−Father 1121−Twin1 1121−Twin2

  • ther3
  • ther2
  • ther4

1490−Father 1490−Mother 1490−Twin1 1490−Twin2

sample

  • Father

Mother Twin family ●

  • 1121

1207 1286 1301 1323 1389 1443 1480 1490 1652

PopSV

Using only CNVs in extremely low coverage regions !

slide-35
SLIDE 35

28

Resolution - 500 bp bins Vs 5 Kbp bins

0.00 0.25 0.50 0.75 1.00 2500 5000 7500 10000 12500 15000 17500 20000

size of the 500bp−bin call proportion overlapping 5kbp−bin calls

slide-36
SLIDE 36

29

Control regions

0.00 0.25 0.50 0.75 1.00 l

  • w

m a p s e g d u p

feature proportion overlapping the feature set

CNV control

QC − SD, low−coverage and CTG distance control

slide-37
SLIDE 37

30

Control regions

0.00 0.25 0.50 0.75 1.00 0e+00 2e+07 4e+07 6e+07

distance to centromere/telomere/gap (bp) cumulative proportion region

CNV control

QC − SD, low−coverage and CTG distance control

slide-38
SLIDE 38

31

Control regions

S/2 S/2 S/2 S/2 S/2 S/2

= Random base in green S Random region of size S

slide-39
SLIDE 39

32

Controlling for SD and distance to CTG

DNA LINE LTR SVA SINE AluY HERVH−int L1HS L1PA2 L1PA3 L1PA4 L1PA5 LTR38−int LTR4 MER65A SVA_D SVA_E SVA_F TE TE top families Twins CK Normal GoNL CK Somatic

cohort Significance (−log10 Pvalue)

4 8 12 Depleted Enriched

slide-40
SLIDE 40

33

Controlling for SD and distance to CTG

(CATTC)n (GAATG)n ACRO1 ALR/Alpha BSR/Beta CER D20S16 GSAT GSATII GSATX HSAT4 HSAT5 HSAT6 HSATI HSATII LSAU MSR1 REP522 SAR SATR1 SATR2 SST1 SUBTEL_sa TAR1 AAAG AAGA AAGG AGAA AGAT AGGA AT ATAG ATCT CAG CCTT CGC CTG CTTC CTTT GAAA GAAG GATA GCG GGAA TA TAGA TATC TCCT TCTA TCTT TTCC TTCT TTTC

  • ther

Satellite STR Twins CK Normal GoNL CK Somatic

cohort

Depleted Enriched

Significance (−log10 Pvalue)

4 8 12

slide-41
SLIDE 41

34

Distance to CTG for somatic CNVs

0.00 0.25 0.50 0.75 1.00 0e+00 2e+07 4e+07 6e+07

distance to centromere/telomere/gap (bp) cumulative proportion region

CNV control