Introduction to Variant Detection Johnson et al., Blood, 2013, - - PowerPoint PPT Presentation

introduction to variant detection
SMART_READER_LITE
LIVE PREVIEW

Introduction to Variant Detection Johnson et al., Blood, 2013, - - PowerPoint PPT Presentation

Introduction to Variant Detection Johnson et al., Blood, 2013, 122(19) Evolutionary analysis http://insects.eugenes.org/DroSpeGe/ https://bacpathgenomics.wordpress.com/tag/snp/


slide-1
SLIDE 1

Introduction to Variant Detection

Johnson et al., Blood, 2013, 122(19)

slide-2
SLIDE 2

http://insects.eugenes.org/DroSpeGe/

Evolutionary analysis

slide-3
SLIDE 3

Medicine and Agriculture

http://www.nature.com/ncomms/2015/151019/ncomms9609/fig_tab/ncomms9609_F6.html

https://bacpathgenomics.wordpress.com/tag/snp/

slide-4
SLIDE 4

Genomic medicine

slide-5
SLIDE 5

Overview

5

» Human variations

  • Germline
  • Somatic

» Types of Variations » Sequencing strategies to identify variants » Generalized analysis workflow (GATK best practice guidelines)

slide-6
SLIDE 6

Germline vs Somatic mutations

Nature 491, 56–65 (01 November 2012)

Any heritable “mutation” is considered a germline variant.

  • found in populations, discovered by large-scale population

analyses, and contained in databases like dbSNP , HapMap

  • most are not deleterious
slide-7
SLIDE 7

Germline vs Somatic mutations

Nature 491, 56–65 (01 November 2012)

Any heritable “mutation” is considered a germline variant.

  • found in populations, discovered by large-scale population

analyses, and contained in databases like dbSNP , HapMap

  • most are not deleterious

A somatic variant is any mutation that arises in a single cell of an individual and is only present in the descendants of that cell, not all the cells of that individual.

  • found in rapidly growing cancer cells
  • can be silent or pathogenic
slide-8
SLIDE 8

Phenotypic impacts

Most human genomic variants have no phenotypic impacts

  • Ones that have an impact are either positively selected, i.e. they

confer a reproductive advantage

  • Or they are neutral. These are often associated with ethnic origin,

typically affecting traits like height, facial features, hair or skin color

slide-9
SLIDE 9

Phenotypic impacts

Most human genomic variants have no phenotypic impacts

  • Ones that have an impact are either positively selected, i.e. they

confer a reproductive advantage

  • Or they are neutral. These are often associated with ethnic origin,

typically affecting traits like height, facial features, hair or skin color

Some genomic variants have deleterious effects

  • Most of these are recessive: their effect is observed only if both

alleles are affected

  • Those that are dominant will either be selected against and

disappear, or have effects that minimally impact reproductive fitness

slide-10
SLIDE 10

Types of variations

8

Single Nucleotide Polymorphisms (SNPs) Small Insertions/Deletions (Indels) Copy Number Variations (CNVs) Structural Variations (SVs)

slide-11
SLIDE 11

How to assess genomic diversity?

For SNPs, many different methods have been used:

  • Hybridization based, primarily SNP arrays
  • Enzyme-based methods, primarily oligonucleotide ligation and RFLP
  • Methods measuring physical properties of DNA
slide-12
SLIDE 12

How to assess genomic diversity?

For SNPs, many different methods have been used:

  • Hybridization based, primarily SNP arrays
  • Enzyme-based methods, primarily oligonucleotide ligation and RFLP
  • Methods measuring physical properties of DNA
slide-13
SLIDE 13

How to assess genomic diversity?

For SNPs, many different methods have been used:

  • Hybridization based, primarily SNP arrays
  • Enzyme-based methods, primarily oligonucleotide ligation and RFLP
  • Methods measuring physical properties of DNA

For CNVs, the main methods are hybridization based

slide-14
SLIDE 14

How to assess genomic diversity?

For SNPs, many different methods have been used:

  • Hybridization based, primarily SNP arrays
  • Enzyme-based methods, primarily oligonucleotide ligation and RFLP
  • Methods measuring physical properties of DNA

For CNVs, the main methods are hybridization based For SVs the most reliable ones used partial sequencing of large clones (e.g. fosmids)

slide-15
SLIDE 15

How to assess genomic diversity?

For SNPs, many different methods have been used:

  • Hybridization based, primarily SNP arrays
  • Enzyme-based methods, primarily oligonucleotide ligation and RFLP
  • Methods measuring physical properties of DNA

For CNVs, the main methods are hybridization based For SVs the most reliable ones used partial sequencing of large clones (e.g. fosmids) NGS can detect all types of variants (Paired-end data preferred!)

slide-16
SLIDE 16

Sequencing strategies

10

Whole Genome Sequencing (WGS)

(for SNPs/Indels, CNVs and SVs)

Exome Sequencing (for SNPs/Indels) Gene Panels (for SNPs/Indels)

slide-17
SLIDE 17

Whole genome sequencing

slide-18
SLIDE 18

Exome sequencing

slide-19
SLIDE 19

Exome sequencing

slide-20
SLIDE 20

Patients Cancer Genes

A visualization of an analysis using a panel of known cancer genes

Gene panel sequencing for diagnostics

slide-21
SLIDE 21

Gene panels or ES or WGS: Which one is “better”?

  • Targeted gene panels are most commonly used for

diagnostics/clinical work

  • Coverage: cost considerations for various methods, based
  • n number of samples
  • Variants in un-targeted or non-exonic regions will be missed
slide-22
SLIDE 22

Sequencing depth and cost

slide-23
SLIDE 23

For WGS

  • Haploid genome size => 3.2 Giga base pairs (3.2 billion)

Sequencing depth?

slide-24
SLIDE 24

For WGS

  • Haploid genome size => 3.2 Giga base pairs (3.2 billion)
  • Minimum 30x for WGS

Sequencing depth?

slide-25
SLIDE 25

For WGS

  • Haploid genome size => 3.2 Giga base pairs (3.2 billion)
  • Minimum 30x for WGS

For Exome Sequencing

  • Exome size => 33 Mega base pairs (33 million bases)

Sequencing depth?

slide-26
SLIDE 26

For WGS

  • Haploid genome size => 3.2 Giga base pairs (3.2 billion)
  • Minimum 30x for WGS

For Exome Sequencing

  • Exome size => 33 Mega base pairs (33 million bases)
  • About 100 times smaller than WGS

Sequencing depth?

slide-27
SLIDE 27

For WGS

  • Haploid genome size => 3.2 Giga base pairs (3.2 billion)
  • Minimum 30x for WGS

For Exome Sequencing

  • Exome size => 33 Mega base pairs (33 million bases)
  • About 100 times smaller than WGS
  • 70x-100x for ES, with additional considerations for unevenness of

coverage

Sequencing depth?

slide-28
SLIDE 28

For WGS

  • Haploid genome size => 3.2 Giga base pairs (3.2 billion)
  • Minimum 30x for WGS

For Exome Sequencing

  • Exome size => 33 Mega base pairs (33 million bases)
  • About 100 times smaller than WGS
  • 70x-100x for ES, with additional considerations for unevenness of

coverage

For Gene Panels

  • 10x-20x coverage for gene panels for heterozygous germline

variants

Sequencing depth?

slide-29
SLIDE 29

Sequencing depth and cost

Adapted from GATK best practices guidelines (2012)

slide-30
SLIDE 30

Generalized Variant Calling Workflow

18

Biological samples/Library preparation Sequence reads Quality control Alignment to Genome Experimental design

FASTQ FASTQ SAM/BAM

slide-31
SLIDE 31

Alignment to Genome Alignment Cleanup BAM ready for variant calling

Generalized Variant Calling Workflow

SAM/BAM

slide-32
SLIDE 32

Sort alignment file Deduplicate Add read groups and merge alignment files (optional)

+ +

Indel realignment (optional) Base Recalibration (optional) Reduce Reads (optional)

+ + +

Alignment to Genome Alignment Cleanup BAM ready for variant calling

Generalized Variant Calling Workflow

SAM/BAM SAM/BAM

slide-33
SLIDE 33

Sort alignment file Deduplicate Add read groups and merge alignment files (optional)

+ +

Indel realignment (optional) Base Recalibration (optional) Reduce Reads (optional)

+ + +

Alignment to Genome Variant Calling Alignment Cleanup BAM ready for variant calling

Generalized Variant Calling Workflow

SAM/BAM SAM/BAM VCF

slide-34
SLIDE 34

Sort alignment file Deduplicate Add read groups and merge alignment files (optional)

+ +

Indel realignment (optional) Base Recalibration (optional) Reduce Reads (optional)

+ + +

Alignment to Genome Variant Calling Alignment Cleanup BAM ready for variant calling Variant call filtering VCF ready for functional analysis

Generalized Variant Calling Workflow

VCF VCF SAM/BAM SAM/BAM

slide-35
SLIDE 35

Sort alignment file Deduplicate Add read groups and merge alignment files (optional)

+ +

Indel realignment (optional) Base Recalibration (optional) Reduce Reads (optional)

+ + +

Generalized Variant Calling Workflow

Alignment to Genome Variant Calling Annotating variant calls Functional Analysis Alignment Cleanup BAM ready for variant calling Variant call filtering VCF ready for functional analysis

VCF VCF SAM/BAM SAM/BAM

slide-36
SLIDE 36

Sort alignment file Deduplicate Add read groups and merge alignment files (optional)

+ +

Indel realignment (optional) Base Recalibration (optional) Reduce Reads (optional)

+ + +

Alignment Cleanup BAM ready for variant calling

Generalized Variant Calling Workflow

SAM/BAM SAM/BAM

slide-37
SLIDE 37

Generalized Variant Calling Workflow

Sort alignment file

https://www.broadinstitute.org/gatk/guide/presentations?id=3391

slide-38
SLIDE 38

Generalized Variant Calling Workflow

https://www.broadinstitute.org/gatk/guide/presentations?id=3391

Deduplicate

slide-39
SLIDE 39

Add read groups and merge alignment files (optional)

Generalized Variant Calling Workflow

Sort De-duplicate Add read group information

https://www.broadinstitute.org/gatk/guide/presentations?id=3391

slide-40
SLIDE 40

Add read groups and merge alignment files (optional)

Generalized Variant Calling Workflow

Sort De-duplicate Add read group information

https://www.broadinstitute.org/gatk/guide/presentations?id=3391

BAM with multiple samples used for joint variant calling

slide-41
SLIDE 41

Generalized Variant Calling Workflow

https://www.broadinstitute.org/gatk/guide/presentations?id=3391

Indel realignment (optional)

slide-42
SLIDE 42

Generalized Variant Calling Workflow

https://www.broadinstitute.org/gatk/guide/presentations?id=3391

Indel realignment (optional)

slide-43
SLIDE 43

Generalized Variant Calling Workflow

Base Recalibration (optional)

This step removes any systematic biases the creep in during sequencing

https://www.broadinstitute.org/gatk/guide/presentations?id=3391

slide-44
SLIDE 44

Generalized Variant Calling Workflow

Reduce Reads (optional)

https://www.broadinstitute.org/gatk/guide/presentations?id=3391

slide-45
SLIDE 45

Sort alignment file Deduplicate Add read groups and merge alignment files (optional)

+ +

Indel realignment (optional) Base Recalibration (optional) Reduce Reads (optional)

+ + +

Alignment to Genome Variant Calling Annotating variant calls Functional Analysis Alignment Cleanup BAM ready for variant calling Variant call filtering VCF ready for functional analysis

Generalized Variant Calling Workflow

VCF VCF SAM/BAM SAM/BAM

slide-46
SLIDE 46

Variant detection

Probability of real variant “A” given observed variants “B” in alignments

Universe

A B

From Erik Garrison’s CSHL Advanced Sequencing Tutorial, http://clavius.bc.edu/~erik/CSHL-advanced-sequencing/freebayes-tutorial.html

In theory

slide-47
SLIDE 47

From Erik Garrison’s CSHL Advanced Sequencing Tutorial, http://clavius.bc.edu/~erik/CSHL-advanced-sequencing/freebayes-tutorial.html

Universe

A

B

Variant detection

In practice Probability of real variant “A” given observed variants “B” in alignments

slide-48
SLIDE 48

Finding variants one at a time

One position at a time

Reference Reads Variant observations

Reads Observed variant

From Erik Garrison’s CSHL Advanced Sequencing Tutorial, http://clavius.bc.edu/~erik/CSHL-advanced-sequencing/freebayes-tutorial.html

slide-49
SLIDE 49

One position at a time

Reference Haplotype information is lost.

Reads Haplotype information is lost

From Erik Garrison’s CSHL Advanced Sequencing Tutorial, http://clavius.bc.edu/~erik/CSHL-advanced-sequencing/freebayes-tutorial.html

Finding variants one at a time

slide-50
SLIDE 50

Direct detection of haplotypes (FreeBayes)

Detection window Reference Reads

Direct detection of haplotypes from reads resolves differentially-represented alleles (as the sequence is compared, not the alignment). Allele detection is still alignment-based.

Detection window

Finding variants with FreeBayes (or GATK)

From Erik Garrison’s CSHL Advanced Sequencing Tutorial, http://clavius.bc.edu/~erik/CSHL-advanced-sequencing/freebayes-tutorial.html

slide-51
SLIDE 51

How many samples does one need for a given population study?

Cases Controls

Genome

X X ~ X ~ ~ ~ ~ ~ X X X X X X

Scenario 1 Scenario 2

Genome

slide-52
SLIDE 52

CNV and SV detection is more complex

CNVs and SVs

Monya Baker, Nature Methods 9, 133–137 (2012)

slide-53
SLIDE 53

These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.