NGS II Illumina Sequencing Robert Kraaij Department of Internal - - PowerPoint PPT Presentation

ngs ii illumina sequencing
SMART_READER_LITE
LIVE PREVIEW

NGS II Illumina Sequencing Robert Kraaij Department of Internal - - PowerPoint PPT Presentation

DepthOfCoverage Genetics for Dummies 2017 NGS II Illumina Sequencing Robert Kraaij Department of Internal Medicine r.kraaij@erasmusmc.nl Overview Data Analysis Applications Example: Exome Sequencing Things to be addressed


slide-1
SLIDE 1

DepthOfCoverage

Genetics for Dummies 2017 NGS II – Illumina Sequencing

Robert Kraaij Department of Internal Medicine r.kraaij@erasmusmc.nl

slide-2
SLIDE 2
  • Data Analysis
  • Applications
  • Example: Exome Sequencing

Overview

slide-3
SLIDE 3

Things to be addressed

NGS: many short reads that might contain errors data analysis will handle these reads and errors

slide-4
SLIDE 4
  • Data Analysis
  • Applications
  • Example: Exome Sequencing

Overview

slide-5
SLIDE 5

cBot flowcell bridgePCR HiSeq2000

Illumina Sequencing

slide-6
SLIDE 6

Per Cycle Imaging

slide-7
SLIDE 7

G A T C

Per Cycle Imaging

slide-8
SLIDE 8

G good quality G poor quality

Per Cycle Base Calling

slide-9
SLIDE 9

Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000 99.999 % 0 to 93  ASCII 33 to 126 = single character

Quality Scoring

slide-10
SLIDE 10

@SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTC +SEQ_ID !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>

FASTQ File

slide-11
SLIDE 11

T A C G G T A C T T G C A T A G A T T A C G G T A C T T G C A T A G C T Alignment or Mapping of Reads R E F E R E N C E G E N O M E (HG19)

chromosome + position + strand

sample.bam

slide-12
SLIDE 12

Run QC and filtering

sample.bam

slide-13
SLIDE 13

sample.bam

  • both reads
  • quality scores
  • chromosome
  • position
  • quality flag
  • duplicate flag
  • off target flag

sortedBAM file

slide-14
SLIDE 14

Coverage T A C G G T A C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T A C T T G C A T A G G A T T A C G G T A C T T G C G G T A C T T G C A T A G C T T T A C G G T A C T T G C A T

5x coverage

slide-15
SLIDE 15

Mean Coverage

bases on target size of target

slide-16
SLIDE 16

% of Bases Above a Certain Threshold T A C G G T A C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T A C T T G C A T A G G A T T A C G G T A C T T G C G G T A C T T G C A T A G C T T T A C G G T A C T T G C A T

5x 5x 4x 1x

slide-17
SLIDE 17

Variant Calling T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T G C T T G C A T A G G A T T A C G G T G C T T G C G G T G C T T G C A T A G C T T T A C G G T G C T T G C A T

G = homozygous alternative

G A T T A C G G T G C C G G T G C T T G C A T A G C T G C A T A G C T - A T T A C G G T G C T T G C A

slide-18
SLIDE 18

Variant Calling T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T G C T T G C A T A G G A T T A C G G T A C T T G C G G T G C T T G C A T A G C T T T A C G G T A C T T G C A T

A/G = heterozygous

G A T T A C G G T A C C G G T G C T T G C A T A G C T G C A T A G C T - A T T A C G G T G C T T G C A

slide-19
SLIDE 19

Variant Calling T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T G C T T G C A T A G G A T T A C G G T A C T T G C

A/G = heterozygous?

slide-20
SLIDE 20

Variant Calling T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T G C T T G C A T A G G A T T A C G G T A C T T G C

G

sequencing quality good poor

slide-21
SLIDE 21

sample.vcf

  • chromosome
  • position
  • quality
  • annotations

VCF File

slide-22
SLIDE 22

Variant Calling T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T G C T T G C A T A G T A G G A T T A C G G T A C T T G C G G T G C T T G C A T A G C T G A T T A C G G T A C T T G C A T

deletion = heterozygous

G A T T A C G G T A C C G G T G C T T G C A T A G C T G C A T A G C T - G A T T A C G G T G C T T G C A

slide-23
SLIDE 23

Paired-End Sequencing

2 x 100 bp

slide-24
SLIDE 24

Variant Calling: Mate Pairs

normal

400 bp

deletion

800 bp

insertion

200 bp

slide-25
SLIDE 25

Variant Calling: Mate Pairs

normal

400 bp

translocation

slide-26
SLIDE 26

Variant Calling: Split Reads

genome

800 bp

mRNA (cDNA)

slide-27
SLIDE 27
  • Data Analysis
  • Applications
  • Example: Exome Sequencing

Overview

slide-28
SLIDE 28

Applications

  • Re-sequencing  full genome  SNPs and indels
  • Re-sequencing  mate pairs  structural variations
  • Re-sequencing  regional  SNPs and indels
  • Sequencing  de novo assembly
  • RNAseq
  • ChIPseq
  • …seq
slide-29
SLIDE 29

www.illumina.com

slide-30
SLIDE 30

Example: Exome Sequencing

slide-31
SLIDE 31

funding by NGI-NCHA, NWO, BBMRI n > 3,000 samples of random set from RS-I start May 2011; Nimblegen part of “CHARGE-S” effort: >5,000 exomes across 4 cohorts Framingham, CHS, ARIC, Rotterdam Study Expand with exome variants array?

CHARGE

Exome Sequencing

slide-32
SLIDE 32

Exome vs Full Genome

exon exon exon genome  3 Gb exome  ~30 Mb

slide-33
SLIDE 33

Exome Sequencing Workflow

DNA isolation Library preparation Exome capture Sequencing Data analysis

slide-34
SLIDE 34

+ +

Exome capture

slide-35
SLIDE 35

Nimblegen SeqCap EZ v2 Capture

  • CCDS (Sept 2009)
  • miRBase (v14, Sept 2009)
  • RefSeq (Jan 2010)
  • 2,100,000 probes
  • 30,246 coding genes
  • 329,028 exons
  • 710 miRNAs
  • 36.5 Mb primary target
  • 44.1 Mb capture target
slide-36
SLIDE 36

Illumina TruSeq V3 2x100 PE Sequencing

slide-37
SLIDE 37

Data analysis: BWA-GATK pipeline

  • BclToFastQ

(CASAVA)

  • Chastity Filter

Demultiplexing

  • BWA (paired)
  • SortSam,

MarkDuplicates (picard)

Alignment

  • BaseQualityScore

Recalibration, IndelRealignment (GATK)

Processing

  • HaplotypeCaller
  • VQSR
  • VarEval

Variant-Calling

  • ANNOVAR,

VCFtools

  • PlinkSeq, SKAT,

R

  • Spotfire

Analysis

slide-38
SLIDE 38

Sample QC and Variant QC

slide-39
SLIDE 39

RSX-2 Samples were sequenced to ~54x Mean Coverage

Average Mean Depth of Coverage across the 44Mb SeqCap Exome Percentage of 44Mb covered 10x or better

slide-40
SLIDE 40

Mean Depth of Coverage by Flowcell

Mean Depth of Coverage Flowcell Number (Roughly Chronological Order)

slide-41
SLIDE 41

Freemix Values by Flowcell

Estimated Freemix Values Flowcell Number (Roughly Chronological Order)

slide-42
SLIDE 42

Determing Heterozygous Concordance versus 550k genotyping arrays

Heterozygous Concordance Flowcell Number (Roughly Chronological Order)

slide-43
SLIDE 43

Comparing Concordance versus Freemix reveals cutoff around 13% correction

Heterozygous Concordance Estimated Freemix Values

slide-44
SLIDE 44

Sample QC and Variant QC

slide-45
SLIDE 45

Number of Detected SNPs per Samples by Flowcell

Flowcell Number (Roughly Chronological Order)

slide-46
SLIDE 46

Heterozygous to Homozygous ratio per Sample by Flowcell

Flowcell Number (Roughly Chronological Order)

slide-47
SLIDE 47

purines

Transition to Transversion Ratio

pyrimidines

transversion transition

slide-48
SLIDE 48

Transition to Transversion Ratio per Sample by Flowcell

Flowcell Number (Roughly Chronological Order)

slide-49
SLIDE 49

QC and filtering results

slide-50
SLIDE 50
slide-51
SLIDE 51

Things to Remember

NGS: many short reads that might contain errors coverage indicates the number of independent reads that cover a base  needed to analyse a genome FASTQ file  sequence + quality scores BAM file  aligned reads VCF file  called variants + annotation