Genotypin ing in in Thousands by sequencing (GT GT-seq): A low - - PowerPoint PPT Presentation

genotypin ing in in thousands by sequencing
SMART_READER_LITE
LIVE PREVIEW

Genotypin ing in in Thousands by sequencing (GT GT-seq): A low - - PowerPoint PPT Presentation

Introduction Pilot Study Implementation Markers New Tools Summary Genotypin ing in in Thousands by sequencing (GT GT-seq): A low cost, high-throughput, targeted SNP genotyping method Nathan Campbell, Stephanie Harmon, Shawn Narum


slide-1
SLIDE 1

Genotypin ing in in Thousands by sequencing (GT GT-seq):

A low cost, high-throughput, targeted SNP genotyping method

Nathan Campbell, Stephanie Harmon, Shawn Narum Columbia River Inter-Tribal Fish Commission

Introduction Pilot Study Implementation Markers New Tools Summary

slide-2
SLIDE 2

What is GT-seq?

  • Next Gen Sequencing of multiplex PCR amplicons containing SNPs
  • Genotyping in Thousands by sequencing (GT-seq)
  • A method of Genotyping by Sequencing for thousands of individuals
  • Hundreds of loci (specific panels of target loci)
  • Currently set up for Illumina sequencing
  • SNP loci are genotyped using the ratio of allele 1 to allele 2 read counts at

each locus (similar to RAD genotyping)

  • Alternative to TaqMan assays
  • Our lab formerly ran panels of 96 – 192 TaqMan assays for genotyping various

fish species

  • GT-seq can produce the same genotypes generated using TaqMan assays
  • GT-seq greatly reduces the cost of lab reagents/supplies for genotyping

Introduction Pilot Study Implementation Markers New Tools Summary

slide-3
SLIDE 3

Introduction Pilot Study Implementation Markers New Tools Summary

slide-4
SLIDE 4

Introduction Pilot Study Implementation Markers New Tools Summary

slide-5
SLIDE 5

2,068 samples in one lane

22 plates of 94 steelhead samples 157M Raw reads 5.4 – 8.6M reads per plate 96.1% of the samples (1,987) genotyped at ≥ 90% of target loci

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0 25000 50000 75000 100000 125000 150000

Percentage of Genotypes Collected Individual Raw Reads

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0 20000 40000 60000 80000 100000

Percentage of Genotypes Collected Individual On-Target Reads

10 20 30 40 50 60 70 80 90 100 0.00 0.20 0.40 0.60 0.80 1.00

Percentage of Genotypes Collected On-Target Fraction per 96-well plate

Introduction Pilot Study Implementation Markers New Tools Summary

slide-6
SLIDE 6

Read distribution among loci

10 20 30 40 50 60 70 80 90 100 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

% Genotypes Collected Percentage of On-Target reads 192 Target loci

Introduction Pilot Study Implementation Markers New Tools Summary

slide-7
SLIDE 7

GT-seq has very low background signal and excellent heterozygote ratios across loci

The graphs below plot allele 1 vs. allele 2 for all GTseq loci. The orange line shows the 1:1 ratio of allele 1 to allele 2 which should be true for all heterozygotes (y=x). The other lines show the cutoff values used in the genotyping

  • script. [Below Red = A1 homozygote; Left of Blue = A2 homozygote; Between Blue or Red & Black = NA; Between

Black = Heterozygote; Any Data point below read depth of 10 = NA]; Genotypes are color coded, yellow triangles are “No Calls”. *Both graphs are the same plot zoomed to different scales.

GT-seq Plots

100 200 300 400 500 600 700 800 100 200 300 400 500 600 700 800

Allele 2 Counts Allele 1 Counts

GT-seq Genotyping

5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50

Allele 2 Counts Allele 1 Counts

GT-seq Genotyping (zoom)

Introduction Pilot Study Implementation Markers New Tools Summary

slide-8
SLIDE 8

Comparison to the same Taqman assays… GT-seq Plots

100 200 300 400 500 600 700 800 100 200 300 400 500 600 700 800

Allele 2 Counts Allele 1 Counts

GT-seq Genotyping

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Allele 2 (6FAM) fluorescence Allele 1 (VIC) fluorescence

TaqMan Genotyping

Introduction Pilot Study Implementation Markers New Tools Summary

slide-9
SLIDE 9

Genotype Accuracy Compared to Taqman

%Concordant: 99.3% 99.9% #Disconcordant: 133 10 #Concordant: 18474 18813 #GTseq Genotypes: 18768 18988 #TaqMan Genotypes: 19039 19039 #Genotyped by both methods 18607 18824

A. B.

  • A. Unmodified TaqMan Probe sequences as search

strings

  • B. 15 search string modifications based on
  • bserved variations in sequence data

Introduction Pilot Study Implementation Markers New Tools Summary

slide-10
SLIDE 10

Genotyping costs for GT-seq

$0 $20,000 $40,000 $60,000 $80,000 $100,000 $120,000 $140,000 $160,000 $0.00 $1.00 $2.00 $3.00 $4.00 $5.00 $6.00 $7.00 $8.00 $9.00 $10.00 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Total supplies cost of genotyping Sequencing Cost per sample (one HiSeq SR100 lane Individual Samples Sequencing Cost per sample: GT-seq Total cost: GT-seq genotyping Total cost: 5' exonuclease

GT-seq – $3.98/sample TaqMan – $16.50/sample Introduction Pilot Study Implementation Markers New Tools Summary

slide-11
SLIDE 11

Benefits of GT-seq

  • Open source method
  • Not a kit; Not proprietary
  • Fast/simple library preparation
  • Requires only a 96-well thermal cycler
  • Simple genotyping pipeline
  • New faster scripts (raw data to genotypes in less than 1 hour)
  • Clean data
  • Low background, high accuracy, high throughput
  • Generates the same genotypes as TaqMan (99.9% concordant)
  • Less than 1/4th the cost ($3.98 / sample)

Introduction Pilot Study Implementation Markers New Tools Summary

slide-12
SLIDE 12

Applications for GT-seq

  • Stock Improvement (marker assisted selection)
  • Create panel of trait related SNPs
  • Quickly generate GEBVs for large numbers of potential broodstock
  • Genetic Monitoring
  • Use large numbers of neutral SNP loci to monitor abundance and dispersal of

various stocks within a target species

  • Large-Scale Parentage
  • Assign juvenile samples to potential parents in genotype database

Introduction Pilot Study Implementation Markers New Tools Summary

slide-13
SLIDE 13

GT-seq Panels

  • O. mykiss (Steelhead and Rainbow trout): 192 loci
  • Recently expanded to 287 loci
  • Campbell et al. 2015
  • O. tshawytscha (Chinook Salmon): 299 loci
  • Includes SNPs from TaqMan assays and RAD

markers

  • O. kisutch (Coho Salmon): 258 loci
  • Includes SNPs from TaqMan assays and RAD

markers

  • O. nerka (Sockeye Salmon): 93 loci
  • All SNPs were converted from previously

developed TaqMan assays

  • E. tridentatus (Pacific Lamprey): 316 loci
  • All SNP targets are from RAD markers
  • O. tshawytscha, 73725
  • O. mykiss, 15969
  • O. kisutch, 2839
  • O. nerka, 6996
  • E. tridentatus, 6037

SAMPLES GENOTYPED BY GT-SEQ

105,566 Total samples genotyped at end of 2015 Introduction Pilot Study Implementation Markers New Tools Summary

slide-14
SLIDE 14

Lessons Learned

  • GT-seq is somewhat sensitive to DNA concentration and quality
  • Low concentration DNA samples (<5 ng/uL) genotype poorly
  • Dirty DNA extracts okay
  • Higher concentration is more important than purity

Introduction Pilot Study Implementation Markers New Tools Summary

10 20 30 40 50 60 70 80 90 100 0.00 0.01 0.10 1.00 10.00 100.00 1000.00

Genotyping Percentage DNA concentration (ng/uL)

Qiagen Chelex

slide-15
SLIDE 15

GT-seq target loci

  • Few limitations for target SNPs
  • Avoid repetitive sequence (messy amplification)
  • Avoid duplicated loci
  • Enough flanking sequence for primer design
  • Diploid Organisms
  • Tetraploid genotyping is possible but hasn’t been explored
  • Most SNPs are viable targets

Introduction Pilot Study Implementation Markers New Tools Summary

slide-16
SLIDE 16

GT-seq targets from RAD loci

  • Advantages
  • Thousands of SNPs to choose from
  • Summary statistics available (Fst, MAF, etc…)
  • Samples with known genotypes
  • Caveats
  • SNP site must be at position 25 or beyond in RAD sequence
  • Must have enough flanking sequence to design primers surrounding SNP
  • Strategy
  • Gather R1 and R2 sequences from specified RAD loci and create scaffolds
  • Mask any base ambiguities and design primers flanking target SNP

Introduction Pilot Study Implementation Markers New Tools Summary

slide-17
SLIDE 17

Designing GT-seq primers for RAD loci

  • Sample RAD specific sequences

from raw fastq data (100)

  • https://github.com/GTseq
  • Collect coordinates from R1

sequences and gather corresponding R2 sequences for scaffolding

  • Export masked consensus

sequences for primer design (Primer3)

Introduction Pilot Study Implementation Markers New Tools Summary

slide-18
SLIDE 18

New GT-seq tools

  • Faster barcode splitting script
  • Python script using multiple processors
  • Fewer compute resources for faster barcode splitting (20 min)
  • Faster/expanded genotyping script
  • Perl script (100x faster than original; 6 min)
  • Generates summary statistics for each individual sample
  • Allows for allele corrections
  • GTseq_SummaryFigures
  • Python script using the MatPlotLib module to generate summary figures for

any GT-seq library

  • Outputs scatter plots for each SNP locus

Introduction Pilot Study Implementation Markers New Tools Summary

slide-19
SLIDE 19

Expanded genotyping output

i035_11_P3252_OtsUIBonn13-1276.fastq,Raw-Reads:428003,On-Target reads:267825,%On-Target:62.6,IFI_score:0.42 Ots_100884-287,T=1192,C=1,1192.000,TT,A1HOM,0,0,1193,95.7,0.445 Ots_101119-381,T=0,C=318,0.000,CC,A2HOM,0,0,318,83.2,0.119 Ots_101554-407,C=562,G=590,0.953,CG,HET,0,0,1152,68.9,0.430 Ots_101704-143,T=3,G=921,0.003,GG,A2HOM,0,0,924,93.5,0.345 Ots_101770-82,G=988,T=1,988.000,GG,A1HOM,0,0,989,97.2,0.369 Ots_102213-210,A=0,G=213,0.000,GG,A2HOM,0,0,213,96.4,0.080 Ots_102414-395,A=108,G=95,1.137,AG,HET,0,0,203,95.8,0.076 Ots_102457-132,A=365,G=359,1.017,AG,HET,0,0,724,80.7,0.270 Ots_102801-308,C=267,A=281,0.950,CA,HET,0,0,548,90.3,0.205 Ots_102867-609,A=1329,G=6,221.500,AA,A1HOM,0,0,1335,95.8,0.498 Ots_103041-52,A=1582,G=4,395.500,AA,A1HOM,0,0,1586,94.1,0.592 Ots_103122-180,T=3,C=1253,0.002,CC,A2HOM,0,0,1256,6.4,0.469 Ots_104048-194,C=356,T=348,1.023,CT,HET,0,0,704,43.6,0.263 Ots_104063-132,C=531,T=2,265.500,CC,A1HOM,0,0,533,96.2,0.199 Ots_104415-88,C=406,T=2,203.000,CC,A1HOM,0,0,408,96.7,0.152 Ots_105105-613,C=1,G=369,0.003,GG,A2HOM,0,0,370,94.6,0.138 Ots_105132-200,G=1,T=1228,0.001,TT,A2HOM,0,0,1229,80.3,0.459 … Introduction Pilot Study Implementation Markers New Tools Summary

slide-20
SLIDE 20

GT-seq Summary Figures

Introduction Pilot Study Implementation Markers New Tools Summary

slide-21
SLIDE 21

GT-seq Summary Figures

Introduction Pilot Study Implementation Markers New Tools Summary

slide-22
SLIDE 22

GT-seq Summary Figures

Introduction Pilot Study Implementation Markers New Tools Summary

slide-23
SLIDE 23

GT-seq Summary Figures

Introduction Pilot Study Implementation Markers New Tools Summary

slide-24
SLIDE 24

GT-seq Summary Figures

Introduction Pilot Study Implementation Markers New Tools Summary

slide-25
SLIDE 25

GT-seq Summary Figures

Introduction Pilot Study Implementation Markers New Tools Summary

slide-26
SLIDE 26

GT-seq Summary Figures

Introduction Pilot Study Implementation Markers New Tools Summary

slide-27
SLIDE 27

GT-seq Summary Figures

Introduction Pilot Study Implementation Markers New Tools Summary

slide-28
SLIDE 28

Allele corrections

Introduction Pilot Study Implementation Markers New Tools Summary

slide-29
SLIDE 29

Allele corrections

Introduction Pilot Study Implementation Markers New Tools Summary

slide-30
SLIDE 30

Individual Samples

Introduction Pilot Study Implementation Markers New Tools Summary

slide-31
SLIDE 31

Individual Samples

Introduction Pilot Study Implementation Markers New Tools Summary

slide-32
SLIDE 32

Summary ry

  • Open Source
  • Same instrument for SNP discovery and HT genotyping
  • Better/Cleaner data
  • Fast library prep
  • Fast genotyping
  • Summary figures
  • Cheaper genotyping costs

Introduction Pilot Study Implementation Markers New Tools Summary