Analysis of structural genome variation in whole genome and exome - - PowerPoint PPT Presentation

analysis of structural genome variation in whole genome
SMART_READER_LITE
LIVE PREVIEW

Analysis of structural genome variation in whole genome and exome - - PowerPoint PPT Presentation

Analysis of structural genome variation in whole genome and exome sequencing data Victor Guryev November 13, 2018 15th SNPs and human diseases course Erasmus MC, Rotterdam Our genomes: base and structural variants /a /g NGS: how do we


slide-1
SLIDE 1

Analysis of structural genome variation in whole genome and exome sequencing data

Victor Guryev

November 13, 2018 15th SNP’s and human diseases course Erasmus MC, Rotterdam

slide-2
SLIDE 2

Our genomes: base and structural variants

/a /g

slide-3
SLIDE 3

NGS: how do we get our genomes?

slide-4
SLIDE 4

1000 genomes project (1kG)

Low coverage whole genome and deep exome sequencing of 2,500 individuals to discover 95% of variants at 1% frequency

Small variants: The 1000 Genomes Project Consortium, 2015. Nature 526:68-74 Structural variants: Sudmant et al, 2015. Nature 526:75-81

slide-5
SLIDE 5

Genome of the Netherlands (GoNL)

500 bp 90 bp Median base coverage: 12x

Position paper: Boomsma et al, 2013 Small variants: Francioli et al, 2014 Structural variants: Hehir-Kwa et al, 2016 1000 G GoNL DNA source Cell lines Blood Coverage 3-4x >12x Data generation

  • Mult. platforms

BGI/Illumina Population Multiple, unrelated Dutch only, trios, twins Phenotype info None Multiple

slide-6
SLIDE 6

SV classes and detection methods

Copy-number variants Structural Genome Variations (SVs) ABCD Duplication ABCCCD Deletion ABD Copy-balanced variants Inversion ADCB Translocation AB CD aCGH Di-tag fosmid and NGS sequencing Fibre-FISH

slide-7
SLIDE 7

Method 1: Read depth analysis (RD)

R W

Average coverage: 5 WGS /site

R W

5 WGS/site 5 WGS/site 10 WGS/site

Expected distribution of tags Distribution over duplicated site Scope: Copy-number changes Tool examples: CNV-Seq (Xie &Tammi 2009) CNVnator (Abyzov et al, 2011) SegSeq (Chiang et al, 2009) DWAC-Seq (our tool)

slide-8
SLIDE 8

Method 2: Discordant pairs (DP)

sequenced

Normal Inversion Tandem duplication Insertion Deletion Translocation

reference /mapped Chr 7 Chr 5

Scope: copy-number and copy-neutral SV at resolution close to base-pair Tool exampless: Breakdancer (Chen et al, 2009); 123SV (our tool)

slide-9
SLIDE 9

Method 3: Split-read mapping (SR)

Scope: prediction of copy-number and copy- neutral SV at nucleotide resolution Tool examples: Pindel (Ye et al, 2009) SRiC (Zhang et al, 2011) Evidence from multiple reads Advantage of paired reads Anchor Split read Unmapped reads are good candidates for split-mapping

slide-10
SLIDE 10

Method 4: Genome assembly (AS)

Scope: various types of SVs including large inserts Tool examples: de novo assemblers SOAPdenovo, ABYSS, Allpaths-LG BLAST/BLAT/BWASW search for comparison of contigs and genome reference

Imperfect alignment

Ref Contig

slide-11
SLIDE 11

Method applicability: base and physical coverage

Base coverage: ~ 1x; Physical coverage ~ 4x chromosome

Approach Base coverage Physical coverage Depth of coverage

!

Discordant pairs

!

Split-mapping

!

De novo assembly

! !

slide-12
SLIDE 12

Multi-method approaches to SV discovery

PINDEL (http://gmt.genome.wustl.edu/packages/pindel/) Split-read mapping (very specific for short and mid-size variants) DELLY (https://github.com/dellytools/delly) Discordant read and split-read methods LUMPY-SV (https://github.com/arq5x/lumpy-sv) Multi-method tool SURVIVOR, MetaSV, Parliament creating consensus or multi-sample callset Parliament2 run multiple tools (Breakdancer, BreakSeq, CNVnator, Delly, Lumpy, Manta) and create consensus callset Also available as docker container

slide-13
SLIDE 13

GoNL pipeline for SV discovery

slide-14
SLIDE 14

GoNL SV detection

[Hehir-Kwa et al., 2016]

slide-15
SLIDE 15

Genome sequencing: what do we get?

Per individual genome (compared to reference genome) 3.7M SNPs 360k short indels (1-20bp) 5.2k medium deletions ( 20 – 100 bp) 3.3k large deletions ( 100+ bp) GoNL variant list SNPs 20.4 M Short indels 1-20 bp 1.7M Deletions 20-99 bp 31.5k Deletions 100+ bp 20k Mobile Element Insertions 13k Insertions 2,2k Duplications 1,8k Inversions 90 Interchromosomal events 60

slide-16
SLIDE 16

Impact of Structural Variants

SNVs

Indels

Structural variants

GoNL: Bases affected Variant type Megabases SNVs 20.4 Indels 4.3 SVs 75.3

slide-17
SLIDE 17

AluYa4 insertion in PRAMEF4 gene

PRAME Family member 4 In constitutive exon Observed in 21 samples Mutations in gene are associated with melanoma

AluYa4 AluYa4

[Hehir-Kwa et al., 2016]

slide-18
SLIDE 18

Complex variants: gene retrotransposition insertion polymorphism (GRIP)

210 Chr15: 40.85Mb Chr7:26.24Mb 1 534 Chr15: 40.85Mb Chr7: 26.24 Mb to chr7 to chr15

  • -----------------------------deletion------------------------

Mechanism: (retro)transposition Prevalence: GoNL about 40 cases Tools: Discordant pairs (1-2-3-SV)

[Hehir-Kwa et al., 2016]

slide-19
SLIDE 19

“Knock-outs” in our genome

Chr13 transcription splicing reverse transcription integration Chr11

slide-20
SLIDE 20

Complex variants: MNPs, complex indels

Mechanism: polymerase errors Prevalence: ~3% of all indels are non-simple Tool example: GATK Haplotype Caller

[Hehir-Kwa et al., 2016]

slide-21
SLIDE 21

Complex variants: Non-allelic conversion

Father Mother Child Mechanism: gene conversion Prevalence: currently only several cases Tool example: assembly, discordant pairs

slide-22
SLIDE 22

Complex variants

KRAB box domain containing 4, aka ZNF673, transcription regulator

[Hehir-Kwa et al., 2016]

slide-23
SLIDE 23

New genomic segments

slide-24
SLIDE 24

New segments

slide-25
SLIDE 25

New segments: example

Allele frequency in GoNL: 28%. 50% of Dutch population have it as

slide-26
SLIDE 26

Change in expression level

slide-27
SLIDE 27

Change in transcript structure

slide-28
SLIDE 28

Complex variants: Chromothripsis

chr10

102287386

chr4

105745953

57523805

55792170

105025700 57519913 105035150 55793182 105036708

57524597

105745783 102287791 57521088 105028395 105029770 105745828 55793180 104738996 57523787

chr1

50761470 50761463 102287798 105036735 105028400 57521100 57519917 104738136

=DNA double strand breaks

slide-29
SLIDE 29

How dynamic our genomes are?

x 250

1,169 de novo candidate indels Sized 1-20 bp; 99 children 601 de novo candidate SVs Sized 20+ bp; 250 families (258 non-identical children) 291 de novo indels

  • 203 small deletions
  • 74 insertions
  • 14 complex indels

41 de novo SVs

  • 27 deletions
  • 8 duplications
  • 5 Alu insertions
  • 1 complex event

Validation by PCR, sequencing

Genome Res (2015) 25:792–801

slide-30
SLIDE 30

De novo SVs: size distribution

slide-31
SLIDE 31

De novo mutations : parental and familial bias

Indels SVs

Non-uniform distribution

  • f SVs, p = 0.0074
slide-32
SLIDE 32

What about targeted re-sequencing?

WGS Father WGS Child WGS Mother WES Father WES Mother WES Child

  • Same methodologies are applicable for WES
  • RD analysis: need additional correction to account for variation in enrichment
  • Very limited sensitivity if SV breakpoint is outside of enriched area

Tool examplea: GATK HaplotypeCaller, CONIFER, ExomeCNV

slide-33
SLIDE 33

Catching SVs from targeted sequencing

Father, WGS Mother, WGS Child, WGS Father, WES Mother, WES Child, WES Gene annotation

Del

slide-34
SLIDE 34

Not-catching SVs with targeted sequencing

Heterozygous deletion in Father inherited to Child

Father, WGS Mother, WGS Child, WGS Father, WES Mother, WES Child, WES Gene annotation

slide-35
SLIDE 35

SV imputation

slide-36
SLIDE 36

SV imputation (2)

slide-37
SLIDE 37

SV imputation (3)

slide-38
SLIDE 38

PacBio and OxNano: true long reads

slide-39
SLIDE 39

“Synthetic” long reads: 10x Chromium linked-reads

slide-40
SLIDE 40

Take home message: importance of SVs

Variant type Human Vs Chimp Common Variants AF > 5% Rare variants Individual/ family- specific De novo Variants

(avg per kid)

Somatic, ageing- related Single Base Changes 1.23% of genome 5.948 Mb 6.625 Mb 6,989 Mb 45 bp ? Structural 3% of genome 10.916 Mb 28.507 Mb 43,317 Mb 4,084 bp ?

SNV:CNV ratio

1 : 2 1 : 2 1 : 4 1 : 6 1 : 91 1 : ?

[Chimp genome consortium, 2005] [ Hehir-Kwa, Guryev, 2016 ] [Kloosterman, Guryev, 2015]

slide-41
SLIDE 41

Acknowledgements

GoNL SV Team Wigard Kloosterman UMCU Laurent C. Francioli UMCU Jayne Y. Hehir-Kwa UMCN Djie Tjwan Thung UMCN Tobias Marschall CWI/MPI Alexander Schoenhuth CWI Matthijs Moed LUMC Eric-Wubbo Lameijer LUMC Abdel Abdellaoui VU Slavik Koval EMC/LUMC Joep de Ligt UMCN Najaf Amin EMC Freerk van Dijk UMCG Lennart Karssen EM/Polyomica Leon Mei LUMC Kai Ye LUMC/WASHU University of Washington Fereydoun Hormozdiari Evan E. Eichler GoNL steering committee Paul de Bakker UMCU Dorret Boomsma VU Cornelia van Duin EMC Gert-Jan van Ommen LUMC Eline Slagboom LUMC Morris Swertz UMCG Cisca Wimenga UMCG BGI Shenzen Jun Wang ERIBA, RuG, UMC Groningen Diana Spierings Marianna Bevova Rene Wardenaar Tristan de Jong Peter Lansdorp

Positions open:

  • PhD student,
  • Scientific programmer