Library Building & Bioinformatic Pipeline DAY 1 Building a - - PDF document

library building bioinformatic pipeline day 1
SMART_READER_LITE
LIVE PREVIEW

Library Building & Bioinformatic Pipeline DAY 1 Building a - - PDF document

3/29/2017 Library Building & Bioinformatic Pipeline DAY 1 Building a Library Step one, build a plate with your samples to the proper concentrations Excel Spreadsheet logs the library FL_148_NC15, total DNA 200ng, in 26 uL 1 2


slide-1
SLIDE 1

3/29/2017 1

Library Building & Bioinformatic Pipeline DAY 1

Building a Library

  • Step one, build a plate

with your samples to the proper concentrations

  • Excel Spreadsheet logs

the library

A B C D E F G H

1 2 3 4 5 6 7 8 9 10 11 12 FL_148_NC15, total DNA 200ng, in 26 uL

slide-2
SLIDE 2

3/29/2017 2

Extract high quality genomic DNA from individuals (Qiagen Kits), spec DNA concentrations Digest equal amounts

  • f DNA from each

individual (ideally >200ng) with SphI/MluCI on 96-well plate Ligate unique barcode and general Y-adapter to each sample. Pool all individual samples into one tube, size-select on Pippin Prep for 300- 500bp Then PCR 8x (PCR indices in at this step)

Purify Purify BioAnalyze

Burford Reiskind et al. 2016 Mol. Ecol. Resources

Extract high quality genomic DNA from individuals (Qiagen Kits), spec DNA concentrations Digest equal amounts

  • f DNA from each

individual (ideally >200ng) with SphI/MluCI on 96-well plate Ligate unique barcode and general Y-adapter to each sample. Pool all individual samples into one tube, size-select on Pippin Prep for 300- 500bp Then PCR 8x (PCR indices in at this step)

Purify Purify BioAnalyze

Burford Reiskind et al. 2016 Mol. Ecol. Resources

slide-3
SLIDE 3

3/29/2017 3

  • 96 well plate, each well holds 200ng of 1 individual’s DNA
  • First digest for 3 hours – SphI & MlucI, cutter buffer, template
  • Ligation attach P1 and P2 adapters to the sticky ends, P1 has the barcode (48 in

total), and illumina platform binding sequemce

well A1: FL_148_NC15; barcode ATGGT

A B C D E F G H

1 2 3 4 5 6 7 8 9 10 11 12 DIGESTION LIGATION

slide-4
SLIDE 4

3/29/2017 4

Extract high quality genomic DNA from individuals (Qiagen Kits), spec DNA concentrations Digest equal amounts

  • f DNA from each

individual (ideally >200ng) with SphI/MluCI on 96-well plate Ligate unique barcode and general Y-adapter to each sample. Pool all individual samples into one tube, size-select on Pippin Prep for 300- 500bp Then PCR 8x (PCR indices in at this step)

Purify Purify BioAnalyze

  • Pool 48 individuals into one tube
  • Size select for the size range of the platform, hiseq target template of 200bps,

miseq target template of 400 bps. A B C D E F G H

1 2 3 4 5 6 7 8 9 10 11 12

slide-5
SLIDE 5

3/29/2017 5

Extract high quality genomic DNA from individuals (Qiagen Kits), spec DNA concentrations Digest equal amounts

  • f DNA from each

individual (ideally >200ng) with SphI/MluCI on 96-well plate Ligate unique barcode and general Y-adapter to each sample. Pool all individual samples into one tube, size-select on Pippin Prep for 300- 500bp Then PCR 8x (PCR indices in at this step)

Purify Purify BioAnalyze

Illumina Hi-seq

  • https://www.youtube.com/watch?v=9YxExTS

wgPM&t=30s

slide-6
SLIDE 6

3/29/2017 6

QC, demultiplex from Illumina & filter in STACKS (process_radtags) Align to reference genome with Bowtie2 or DeNovo if there is no genome Run denovo or ref pipeline Bining reads and SNP detection with STACKS Population Measures (GenePop, Structure, GeneLand) Detect outlier loci (Bayescan, Lositan, DAPC)

Bioinformatic Pipeline

What happens next?!?

First quality check (QC) and processing sequences before STACKS run

  • If more than 1 library is pooled?

– Illumina platform will de-multiplex the libraries based on the index that is in the reverse PCR primer

  • Run a fastqc analysis to check phred scores on sequences in general

(helps determine how the library ran, and whether you need to trim sequences – phred scores = 30 and above)

  • Need fastq , fasta, or BAM files to run through STACKS
  • Use Unix terminal commands to interface with STACKS and mysql to

create STAOKS output databases

  • Step 1: process radtags
  • Step 2: rename folders barcode to individual’s name
  • Step 3: STACKS pipeline (either denovo or with a reference genome)
  • Step 4: Population pipeline
slide-7
SLIDE 7

3/29/2017 7

Quality Control

  • FastqX Toolkit -

http://hannonlab.cshl.edu /fastx_toolkit/

  • Can determine if there are

problem areas within the sequence

  • Toolkit gives you different

scripts to create

– The fastqc file below, and to trim your sequences, etc. etc.

process_radtag

  • All the sequences for all 48

individuals of Library 4!

  • Need to de-multiplex them

into individual folders based

  • n their individual barcode in

you P1 adapter

  • STACKS website:-f, -o, -q, -r, -b,
  • -inline_null, --renz_1, -i, -c, -t
  • Result in output folders like

this :

FL_LIB4_S1.fastq.qz sample_AGTCT.fq.qz sample_AGTGT.fq.qz …to the last barcode used

slide-8
SLIDE 8

3/29/2017 8

Renaming your files

  • From barcode to the

associated individual name

  • At this step you can combine

several libraries

sample_AGTCT.fq.qz sample_AGTGT.fq.qz …to the last barcode used F148_NC_15.fq.gz F34_Tx_15.fq.gz …to the last individual

STACKS pipeline

  • IF you have a reference genome, align the reads post

process_radtag to the reference genome using Bowtie2

  • r another program
  • This generates BAM files that are used in the ref_map.pl
  • Step 1: Create a mysql data base:
  • Step 2: Create an output folder for STACKS
  • Step 3: Run the denovo_map.pl
slide-9
SLIDE 9

3/29/2017 9

STACKS output

  • Step 1: open the stacks database to confirm that you

created loci and have SNP data.

  • Step 2: setup the population pipeline

F148_NC_15.alleles.tsv F148_NC_15.tags.tsv F148_NC_15.matches.tsv F148_NC_15.snps.tsv

STACKS

Population pipeline

  • For our purposes, this is another filtering step,

but it can also be used to gain population summaries

  • Create a population map text file
  • Run the population pipeline using unix

command line:

  • Parameters & output files
slide-10
SLIDE 10

3/29/2017 10

Library Specs

Recruitment season of 2014

  • Number of populations = (5 NC 2 TX)
  • Number of loci = ?
  • Number of loci for the whitelist =?
  • Number post population pipeline = ?

Recruitment season of 2015

  • Number of populations (3 NC 1 TX) with N = 136 individuals
  • Number of loci with 1 to 2 SNPs = 104,252
  • Number of loci in the whitelist (20 to 140 individuals stack

depth of 10) = 51,090

  • Number post-population pipeline = 12,294 loci (85% of

individuals within a population share the loci) – PLINK, genepop and structure files created