Analysis of structural genome varia3on in whole genome and exome - - PowerPoint PPT Presentation

analysis of structural genome varia3on in whole genome
SMART_READER_LITE
LIVE PREVIEW

Analysis of structural genome varia3on in whole genome and exome - - PowerPoint PPT Presentation

Analysis of structural genome varia3on in whole genome and exome sequencing data Victor Guryev November 15, 2017 14th SNPs and human diseases course Erasmus MC, Rotterdam Our genomes: base and structural variants /a /g NGS: how do we get


slide-1
SLIDE 1

Analysis of structural genome varia3on in whole genome and exome sequencing data

Victor Guryev

November 15, 2017 14th SNP’s and human diseases course Erasmus MC, Rotterdam

slide-2
SLIDE 2

Our genomes: base and structural variants

/a /g

slide-3
SLIDE 3

NGS: how do we get our genomes?

slide-4
SLIDE 4

1000 genomes project (1kG)

Low coverage whole genome and deep exome sequencing of 2,500 individuals to discover 95% of variants at 1% frequency

Small variants: The 1000 Genomes Project Consortium, 2015. Nature 526:68-74 Structural variants: Sudmant et al, 2015. Nature 526:75-81

slide-5
SLIDE 5

Genome of the Netherlands (GoNL)

500 bp 90 bp Median base coverage: 12x

Position paper: Boomsma et al, 2013 Small variants: Francioli et al, 2014 Structural variants: Hehir-Kwa et al, 2016 1000 G GoNL DNA source Cell lines Blood Coverage 3-4x >12x Data genera3on

  • Mult. plaRorms

BGI/Illumina Popula3on Mul3ple, unrelated Dutch only, trios, twins Phenotype info None Mul3ple

slide-6
SLIDE 6

SV classes and detec3on methods

Copy-number variants Structural Genome Varia;ons (SVs) ABCD Duplica;on ABCCCD Dele;on ABD Copy-balanced variants Inversion ADCB Transloca;on AB CD aCGH Di-tag fosmid and NGS sequencing Fibre-FISH

slide-7
SLIDE 7

Single end vs paired-end sequencing

chromosome chromosome

slide-8
SLIDE 8

Advantages of paired-end sequencing

1) Twice as many bases per slide ! 2) Structural information !!! G A A T CONTIG 1 CONTIG 2 Molecular haplotyping (phasing) Structural variants Genome assembly SINE SINE Better repeat coverage Profiling of transcript isoforms

slide-9
SLIDE 9

Paired-end vs mate-pair sequencing

Fragmenta3on, size selec3on Paired-end library Insert size 200-400 bp Adaptors liga3on Adaptors liga3on Internal adaptor liga3on, circulariza3on Mate-pair library Insert size 0.6-25 kb

slide-10
SLIDE 10

Method 1: Read depth analysis (RD)

R W

Average coverage: 5 WGS /site

R W

5 WGS/site 5 WGS/site 10 WGS/site

Expected distribu;on of tags Distribu;on over duplicated site Scope: Copy-number changes Tool examples: CNV-Seq (Xie &Tammi 2009) CNVnator (Abyzov et al, 2011) SegSeq (Chiang et al, 2009) DWAC-Seq (our tool)

slide-11
SLIDE 11

Method 2: Discordant pairs (DP)

sequenced

Normal Inversion Tandem duplication Insertion Deletion Translocation

reference /mapped Chr 7 Chr 5

Scope: copy-number and copy-neutral SV at resolu3on close to base-pair Tool exampless: Breakdancer (Chen et al, 2009); 123SV (our tool)

slide-12
SLIDE 12

Method 3: Split-read mapping (SR)

Scope: predic3on of copy-number and copy- neutral SV at nucleo3de resolu3on Tool examples: Pindel (Ye et al, 2009) SRiC (Zhang et al, 2011) Evidence from mul3ple reads Advantage of paired reads Anchor Split read Unmapped reads are good candidates for split-mapping

slide-13
SLIDE 13

Method 4: Genome assembly (AS)

Scope: various types of SVs including large inserts Tool examples: de novo assemblers SOAPdenovo, ABYSS, Allpaths-LG BLAST/BLAT/BWASW search for comparison of con3gs and genome reference

Imperfect alignment

Ref Con3g

slide-14
SLIDE 14

Method applicability: base and physical coverage

Base coverage: ~ 1x; Physical coverage ~ 4x chromosome

Approach Base coverage Physical coverage Depth of coverage

!

Discordant pairs

!

Split-mapping

!

De novo assembly

! !

slide-15
SLIDE 15

GoNL pipeline for SV discovery

slide-16
SLIDE 16

GoNL SV detec3on

[Hehir-Kwa et al., 2016]

slide-17
SLIDE 17

Popular SV mining tools

PINDEL (http://gmt.genome.wustl.edu/packages/pindel/) Split-read mapping (very specific for short and mid-size variants) DELLY (https://github.com/dellytools/delly) Discordant read and split-read methods LUMPY-SV (https://github.com/arq5x/lumpy-sv) Multi-method tool

slide-18
SLIDE 18

Genome sequencing: what do we get?

Per individual genome (compared to reference genome) 3.7M SNPs 360k short indels (1-20bp) 5.2k medium deletions ( 20 – 100 bp) 3.3k large deletions ( 100+ bp) GoNL variant list SNPs 20.4 M Short indels 1-20 bp 1.7M Dele3ons 20-99 bp 31.5k Dele3ons 100+ bp 20k Mobile Element Inser3ons 13k Inser3ons 2,2k Duplica3ons 1,8k Inversions 90 Interchromosomal events 60

slide-19
SLIDE 19

Impact of Structural Variants

SNVs

Indels

Structural variants

GoNL: Bases affected Variant type Megabases SNVs 20.4 Indels 4.3 SVs 75.3

slide-20
SLIDE 20

AluYa4 inser3on in PRAMEF4 gene

PRAME Family member 4 In constitutive exon Observed in 21 samples Mutations in gene are associated with melanoma

AluYa4 AluYa4

[Hehir-Kwa et al., 2016]

slide-21
SLIDE 21

Complex variants: gene retrotransposi3on inser3on polymorphism (GRIP)

210 Chr15: 40.85Mb Chr7:26.24Mb 1 534 Chr15: 40.85Mb Chr7: 26.24 Mb to chr7 to chr15

  • -----------------------------deletion------------------------

Mechanism: (retro)transposition Prevalence: GoNL about 40 cases Tools: Discordant pairs (1-2-3-SV)

[Hehir-Kwa et al., 2016]

slide-22
SLIDE 22

Genomes full of ‘knock-outs'?

SKA3 cDNA

slide-23
SLIDE 23

Complex variants: MNPs, complex indels

Mechanism: polymerase errors Prevalence: ~3% of all indels are non-simple Tool example: GATK Haplotype Caller

[Hehir-Kwa et al., 2016]

slide-24
SLIDE 24

Complex variants: Non-allelic conversion

Father Mother Child Mechanism: gene conversion Prevalence: currently only several cases Tool example: assembly, discordant pairs

slide-25
SLIDE 25

Complex variants

KRAB box domain containing 4, aka ZNF673, transcription regulator

[Hehir-Kwa et al., 2016]

slide-26
SLIDE 26

New genomic segments

slide-27
SLIDE 27

New segments

slide-28
SLIDE 28

New segments: example

Allele frequency in GoNL: 28%. 50% of Dutch population have it as

slide-29
SLIDE 29

Change in expression level

slide-30
SLIDE 30

Change in transcript structure

slide-31
SLIDE 31

Complex variants: Chromothripsis

chr10

102287386

chr4

105745953

57523805

55792170

105025700 57519913 105035150 55793182 105036708

57524597

105745783 102287791 57521088 105028395 105029770 105745828 55793180 104738996 57523787

chr1

50761470 50761463 102287798 105036735 105028400 57521100 57519917 104738136

=DNA double strand breaks

slide-32
SLIDE 32

How dynamic our genomes are?

x 250

1,169 de novo candidate indels Sized 1-20 bp; 99 children 601 de novo candidate SVs Sized 20+ bp; 250 families (258 non-identical children) 291 de novo indels

  • 203 small deletions
  • 74 insertions
  • 14 complex indels

41 de novo SVs

  • 27 deletions
  • 8 duplications
  • 5 Alu insertions
  • 1 complex event

Validation by PCR, sequencing

Genome Res (2015) 25:792–801

slide-33
SLIDE 33

De novo SVs: size distribu3on

slide-34
SLIDE 34

De novo muta3ons : parental and familial bias

Indels SVs

Non-uniform distribution

  • f SVs, p = 0.0074
slide-35
SLIDE 35

What about targeted re-sequencing?

WGS Father WGS Child WGS Mother WES Father WES Mother WES Child

  • Same methodologies are applicable for WES
  • RD analysis: need additional correction to account for variation in enrichment
  • Very limited sensitivity if SV breakpoint is outside of enriched area

Tool examplea: GATK HaplotypeCaller, CONIFER, ExomeCNV

slide-36
SLIDE 36

Catching SVs from targeted sequencing

Father, WGS Mother, WGS Child, WGS Father, WES Mother, WES Child, WES Gene annotation

Del

slide-37
SLIDE 37

Not-catching SVs with targeted sequencing

Heterozygous dele3on in Father inherited to Child

Father, WGS Mother, WGS Child, WGS Father, WES Mother, WES Child, WES Gene annotation

slide-38
SLIDE 38

SV imputa3on

slide-39
SLIDE 39

SV imputa3on (2)

slide-40
SLIDE 40

SV imputa3on (3)

slide-41
SLIDE 41

PacBio and OxNano: true long reads

slide-42
SLIDE 42

Moleculo, 10xGenomics: synthe3c long reads

slide-43
SLIDE 43

Take home message: importance of SVs

Variant type Human Vs Chimp Common Variants AF > 5% Rare variants Individual/ family- specific De novo Variants

(avg per kid)

Soma;c, ageing- related Single Base Changes 1.23% of genome 5.948 Mb 6.625 Mb 6,989 Mb 45 bp ? Structural 3% of genome 10.916 Mb 28.507 Mb 43,317 Mb 4,084 bp ?

SNV:CNV ra3o

1 : 2 1 : 2 1 : 4 1 : 6 1 : 91 1 : ?

[Chimp genome consortium, 2005] [ Hehir-Kwa, Guryev, 2016 ] [Kloosterman, Guryev, 2015]

slide-44
SLIDE 44

Acknowledgements

GoNL SV Team Wigard Kloosterman UMCU Laurent C. Francioli UMCU Jayne Y. Hehir-Kwa UMCN Djie Tjwan Thung UMCN Tobias Marschall CWI/MPI Alexander Schoenhuth CWI Matthijs Moed LUMC Eric-Wubbo Lameijer LUMC Abdel Abdellaoui VU Slavik Koval EMC/LUMC Joep de Ligt UMCN Najaf Amin EMC Freerk van Dijk UMCG Lennart Karssen EM/Polyomica Leon Mei LUMC Kai Ye LUMC/WASHU University of Washington Fereydoun Hormozdiari Evan E. Eichler GoNL steering committee Paul de Bakker UMCU Dorret Boomsma VU Cornelia van Duin EMC Gert-Jan van Ommen LUMC Eline Slagboom LUMC Morris Swertz UMCG Cisca Wimenga UMCG BGI Shenzen Jun Wang ERIBA, RuG, UMC Groningen Diana Spierings Marianna Bevova Rene Wardenaar Tristan de Jong Peter Lansdorp