Introduction to Variant Detection
Johnson et al., Blood, 2013, 122(19)
Introduction to Variant Detection Johnson et al., Blood, 2013, - - PowerPoint PPT Presentation
Introduction to Variant Detection Johnson et al., Blood, 2013, 122(19) Evolutionary analysis http://insects.eugenes.org/DroSpeGe/ https://bacpathgenomics.wordpress.com/tag/snp/
Johnson et al., Blood, 2013, 122(19)
http://insects.eugenes.org/DroSpeGe/
http://www.nature.com/ncomms/2015/151019/ncomms9609/fig_tab/ncomms9609_F6.html
https://bacpathgenomics.wordpress.com/tag/snp/
5
Nature 491, 56–65 (01 November 2012)
analyses, and contained in databases like dbSNP , HapMap
Nature 491, 56–65 (01 November 2012)
analyses, and contained in databases like dbSNP , HapMap
confer a reproductive advantage
typically affecting traits like height, facial features, hair or skin color
confer a reproductive advantage
typically affecting traits like height, facial features, hair or skin color
alleles are affected
disappear, or have effects that minimally impact reproductive fitness
8
For SNPs, many different methods have been used:
For SNPs, many different methods have been used:
For SNPs, many different methods have been used:
For CNVs, the main methods are hybridization based
For SNPs, many different methods have been used:
For CNVs, the main methods are hybridization based For SVs the most reliable ones used partial sequencing of large clones (e.g. fosmids)
For SNPs, many different methods have been used:
For CNVs, the main methods are hybridization based For SVs the most reliable ones used partial sequencing of large clones (e.g. fosmids) NGS can detect all types of variants (Paired-end data preferred!)
10
(for SNPs/Indels, CNVs and SVs)
A visualization of an analysis using a panel of known cancer genes
For WGS
For WGS
For WGS
For Exome Sequencing
For WGS
For Exome Sequencing
For WGS
For Exome Sequencing
coverage
For WGS
For Exome Sequencing
coverage
For Gene Panels
variants
Adapted from GATK best practices guidelines (2012)
18
Biological samples/Library preparation Sequence reads Quality control Alignment to Genome Experimental design
FASTQ FASTQ SAM/BAM
Alignment to Genome Alignment Cleanup BAM ready for variant calling
SAM/BAM
Sort alignment file Deduplicate Add read groups and merge alignment files (optional)
Indel realignment (optional) Base Recalibration (optional) Reduce Reads (optional)
Alignment to Genome Alignment Cleanup BAM ready for variant calling
SAM/BAM SAM/BAM
Sort alignment file Deduplicate Add read groups and merge alignment files (optional)
Indel realignment (optional) Base Recalibration (optional) Reduce Reads (optional)
Alignment to Genome Variant Calling Alignment Cleanup BAM ready for variant calling
SAM/BAM SAM/BAM VCF
Sort alignment file Deduplicate Add read groups and merge alignment files (optional)
Indel realignment (optional) Base Recalibration (optional) Reduce Reads (optional)
Alignment to Genome Variant Calling Alignment Cleanup BAM ready for variant calling Variant call filtering VCF ready for functional analysis
VCF VCF SAM/BAM SAM/BAM
Sort alignment file Deduplicate Add read groups and merge alignment files (optional)
Indel realignment (optional) Base Recalibration (optional) Reduce Reads (optional)
Alignment to Genome Variant Calling Annotating variant calls Functional Analysis Alignment Cleanup BAM ready for variant calling Variant call filtering VCF ready for functional analysis
VCF VCF SAM/BAM SAM/BAM
Sort alignment file Deduplicate Add read groups and merge alignment files (optional)
Indel realignment (optional) Base Recalibration (optional) Reduce Reads (optional)
Alignment Cleanup BAM ready for variant calling
SAM/BAM SAM/BAM
Sort alignment file
https://www.broadinstitute.org/gatk/guide/presentations?id=3391
https://www.broadinstitute.org/gatk/guide/presentations?id=3391
Deduplicate
Add read groups and merge alignment files (optional)
Sort De-duplicate Add read group information
https://www.broadinstitute.org/gatk/guide/presentations?id=3391
Add read groups and merge alignment files (optional)
Sort De-duplicate Add read group information
https://www.broadinstitute.org/gatk/guide/presentations?id=3391
BAM with multiple samples used for joint variant calling
https://www.broadinstitute.org/gatk/guide/presentations?id=3391
Indel realignment (optional)
https://www.broadinstitute.org/gatk/guide/presentations?id=3391
Indel realignment (optional)
Base Recalibration (optional)
This step removes any systematic biases the creep in during sequencing
https://www.broadinstitute.org/gatk/guide/presentations?id=3391
Reduce Reads (optional)
https://www.broadinstitute.org/gatk/guide/presentations?id=3391
Sort alignment file Deduplicate Add read groups and merge alignment files (optional)
Indel realignment (optional) Base Recalibration (optional) Reduce Reads (optional)
Alignment to Genome Variant Calling Annotating variant calls Functional Analysis Alignment Cleanup BAM ready for variant calling Variant call filtering VCF ready for functional analysis
VCF VCF SAM/BAM SAM/BAM
Probability of real variant “A” given observed variants “B” in alignments
From Erik Garrison’s CSHL Advanced Sequencing Tutorial, http://clavius.bc.edu/~erik/CSHL-advanced-sequencing/freebayes-tutorial.html
In theory
From Erik Garrison’s CSHL Advanced Sequencing Tutorial, http://clavius.bc.edu/~erik/CSHL-advanced-sequencing/freebayes-tutorial.html
A
In practice Probability of real variant “A” given observed variants “B” in alignments
Reference Reads Variant observations
From Erik Garrison’s CSHL Advanced Sequencing Tutorial, http://clavius.bc.edu/~erik/CSHL-advanced-sequencing/freebayes-tutorial.html
Reference Haplotype information is lost.
From Erik Garrison’s CSHL Advanced Sequencing Tutorial, http://clavius.bc.edu/~erik/CSHL-advanced-sequencing/freebayes-tutorial.html
Direct detection of haplotypes (FreeBayes)
Detection window Reference Reads
Direct detection of haplotypes from reads resolves differentially-represented alleles (as the sequence is compared, not the alignment). Allele detection is still alignment-based.
Detection window
From Erik Garrison’s CSHL Advanced Sequencing Tutorial, http://clavius.bc.edu/~erik/CSHL-advanced-sequencing/freebayes-tutorial.html
Cases Controls
Genome
X X ~ X ~ ~ ~ ~ ~ X X X X X X
Genome
Monya Baker, Nature Methods 9, 133–137 (2012)
These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.