Analysis of structural genome variation in whole genome and exome sequencing data
Victor Guryev
November 13, 2018 15th SNP’s and human diseases course Erasmus MC, Rotterdam
Analysis of structural genome variation in whole genome and exome - - PowerPoint PPT Presentation
Analysis of structural genome variation in whole genome and exome sequencing data Victor Guryev November 13, 2018 15th SNPs and human diseases course Erasmus MC, Rotterdam Our genomes: base and structural variants /a /g NGS: how do we
Victor Guryev
November 13, 2018 15th SNP’s and human diseases course Erasmus MC, Rotterdam
/a /g
Small variants: The 1000 Genomes Project Consortium, 2015. Nature 526:68-74 Structural variants: Sudmant et al, 2015. Nature 526:75-81
500 bp 90 bp Median base coverage: 12x
Position paper: Boomsma et al, 2013 Small variants: Francioli et al, 2014 Structural variants: Hehir-Kwa et al, 2016 1000 G GoNL DNA source Cell lines Blood Coverage 3-4x >12x Data generation
BGI/Illumina Population Multiple, unrelated Dutch only, trios, twins Phenotype info None Multiple
Copy-number variants Structural Genome Variations (SVs) ABCD Duplication ABCCCD Deletion ABD Copy-balanced variants Inversion ADCB Translocation AB CD aCGH Di-tag fosmid and NGS sequencing Fibre-FISH
R W
Average coverage: 5 WGS /site
R W
5 WGS/site 5 WGS/site 10 WGS/site
Expected distribution of tags Distribution over duplicated site Scope: Copy-number changes Tool examples: CNV-Seq (Xie &Tammi 2009) CNVnator (Abyzov et al, 2011) SegSeq (Chiang et al, 2009) DWAC-Seq (our tool)
sequenced
Normal Inversion Tandem duplication Insertion Deletion Translocation
reference /mapped Chr 7 Chr 5
Scope: copy-number and copy-neutral SV at resolution close to base-pair Tool exampless: Breakdancer (Chen et al, 2009); 123SV (our tool)
Scope: prediction of copy-number and copy- neutral SV at nucleotide resolution Tool examples: Pindel (Ye et al, 2009) SRiC (Zhang et al, 2011) Evidence from multiple reads Advantage of paired reads Anchor Split read Unmapped reads are good candidates for split-mapping
Scope: various types of SVs including large inserts Tool examples: de novo assemblers SOAPdenovo, ABYSS, Allpaths-LG BLAST/BLAT/BWASW search for comparison of contigs and genome reference
Imperfect alignment
Ref Contig
Base coverage: ~ 1x; Physical coverage ~ 4x chromosome
Approach Base coverage Physical coverage Depth of coverage
Discordant pairs
Split-mapping
De novo assembly
PINDEL (http://gmt.genome.wustl.edu/packages/pindel/) Split-read mapping (very specific for short and mid-size variants) DELLY (https://github.com/dellytools/delly) Discordant read and split-read methods LUMPY-SV (https://github.com/arq5x/lumpy-sv) Multi-method tool SURVIVOR, MetaSV, Parliament creating consensus or multi-sample callset Parliament2 run multiple tools (Breakdancer, BreakSeq, CNVnator, Delly, Lumpy, Manta) and create consensus callset Also available as docker container
[Hehir-Kwa et al., 2016]
Per individual genome (compared to reference genome) 3.7M SNPs 360k short indels (1-20bp) 5.2k medium deletions ( 20 – 100 bp) 3.3k large deletions ( 100+ bp) GoNL variant list SNPs 20.4 M Short indels 1-20 bp 1.7M Deletions 20-99 bp 31.5k Deletions 100+ bp 20k Mobile Element Insertions 13k Insertions 2,2k Duplications 1,8k Inversions 90 Interchromosomal events 60
Indels
GoNL: Bases affected Variant type Megabases SNVs 20.4 Indels 4.3 SVs 75.3
AluYa4 AluYa4
[Hehir-Kwa et al., 2016]
210 Chr15: 40.85Mb Chr7:26.24Mb 1 534 Chr15: 40.85Mb Chr7: 26.24 Mb to chr7 to chr15
Mechanism: (retro)transposition Prevalence: GoNL about 40 cases Tools: Discordant pairs (1-2-3-SV)
[Hehir-Kwa et al., 2016]
Chr13 transcription splicing reverse transcription integration Chr11
Mechanism: polymerase errors Prevalence: ~3% of all indels are non-simple Tool example: GATK Haplotype Caller
[Hehir-Kwa et al., 2016]
Father Mother Child Mechanism: gene conversion Prevalence: currently only several cases Tool example: assembly, discordant pairs
KRAB box domain containing 4, aka ZNF673, transcription regulator
[Hehir-Kwa et al., 2016]
Allele frequency in GoNL: 28%. 50% of Dutch population have it as
chr10
102287386
chr4
105745953
57523805
55792170
105025700 57519913 105035150 55793182 105036708
57524597
105745783 102287791 57521088 105028395 105029770 105745828 55793180 104738996 57523787
chr1
50761470 50761463 102287798 105036735 105028400 57521100 57519917 104738136
=DNA double strand breaks
1,169 de novo candidate indels Sized 1-20 bp; 99 children 601 de novo candidate SVs Sized 20+ bp; 250 families (258 non-identical children) 291 de novo indels
41 de novo SVs
Validation by PCR, sequencing
Genome Res (2015) 25:792–801
Indels SVs
Non-uniform distribution
WGS Father WGS Child WGS Mother WES Father WES Mother WES Child
Tool examplea: GATK HaplotypeCaller, CONIFER, ExomeCNV
Father, WGS Mother, WGS Child, WGS Father, WES Mother, WES Child, WES Gene annotation
Del
Heterozygous deletion in Father inherited to Child
Father, WGS Mother, WGS Child, WGS Father, WES Mother, WES Child, WES Gene annotation
Variant type Human Vs Chimp Common Variants AF > 5% Rare variants Individual/ family- specific De novo Variants
(avg per kid)
Somatic, ageing- related Single Base Changes 1.23% of genome 5.948 Mb 6.625 Mb 6,989 Mb 45 bp ? Structural 3% of genome 10.916 Mb 28.507 Mb 43,317 Mb 4,084 bp ?
SNV:CNV ratio
[Chimp genome consortium, 2005] [ Hehir-Kwa, Guryev, 2016 ] [Kloosterman, Guryev, 2015]
GoNL SV Team Wigard Kloosterman UMCU Laurent C. Francioli UMCU Jayne Y. Hehir-Kwa UMCN Djie Tjwan Thung UMCN Tobias Marschall CWI/MPI Alexander Schoenhuth CWI Matthijs Moed LUMC Eric-Wubbo Lameijer LUMC Abdel Abdellaoui VU Slavik Koval EMC/LUMC Joep de Ligt UMCN Najaf Amin EMC Freerk van Dijk UMCG Lennart Karssen EM/Polyomica Leon Mei LUMC Kai Ye LUMC/WASHU University of Washington Fereydoun Hormozdiari Evan E. Eichler GoNL steering committee Paul de Bakker UMCU Dorret Boomsma VU Cornelia van Duin EMC Gert-Jan van Ommen LUMC Eline Slagboom LUMC Morris Swertz UMCG Cisca Wimenga UMCG BGI Shenzen Jun Wang ERIBA, RuG, UMC Groningen Diana Spierings Marianna Bevova Rene Wardenaar Tristan de Jong Peter Lansdorp
Positions open: