UHT Sequencing Course Large-scale genotyping Christian Iseli - - PowerPoint PPT Presentation
UHT Sequencing Course Large-scale genotyping Christian Iseli - - PowerPoint PPT Presentation
UHT Sequencing Course Large-scale genotyping Christian Iseli January 2009 Overview Introduction Examples Base calling method and parameters Reads filtering Reads classification Detailed alignment Alignments analysis Output generation
Jun 04 Jun 04
Overview
Introduction Examples Base calling method and parameters Reads filtering Reads classification Detailed alignment Alignments analysis Output generation
Jun 04 Jun 04
Introduction
Basic problem: distinguish polymorphism from sequencing error Use quality “measures” Use redundancy Use knowledge about data source
Jun 04 Jun 04
Examples
Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping
Jun 04 Jun 04
Retinitis pigmentosa
Inherited eye disease Linkage analysis PRPF31 mutation Incomplete penetrance Attempt sequencing
Jun 04 Jun 04
PRPF31 example
13 14
c.1374+654C>G
Jun 04 Jun 04
PRPF31 example
Jun 04 Jun 04
PRPF31 example, zoom
Jun 04 Jun 04
PRPF31 example, MFA
Jun 04 Jun 04
Examples
Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping
Jun 04 Jun 04
Hypertrophic cardiomiopathy
Small collection of known genes PCR amplify gene pieces Sequence
Jun 04 Jun 04
Small deletion
Jun 04 Jun 04
Examples
Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping
Jun 04 Jun 04
“Exome” sequencing
Extract selected genomic parts Sequence collected pieces
Jun 04 Jun 04
Coverage on HsA 21q
Jun 04 Jun 04
Coverage detail HsA 21q
Jun 04 Jun 04
HsA 21q HAPMAP NA12782
Jun 04 Jun 04
Base calling
Rolexa FastQ ...
Jun 04 Jun 04
Reads filtering
Entropy Quality values (Position)
Jun 04 Jun 04
Filtering example
Rolexa base calling Filter reads for length and ambiguity
- ACGTU -> 1
- KMRSWY -> 2
- BDHV -> 3
- N -> 4
– Minimum length 20 – Maximum ambiguity 81
Jun 04 Jun 04
Read classification
Use fetchGWI against whole genome
– Single exact matches -> U (unique) – Multiple exact matches -> R (repeat) – No exact match -> M (missed)
Jun 04 Jun 04
Detailed alignment
Use M reads Split region of interest in chunks (eg 300 bp + 40 bp overlap) Find reads with identical 12-mer Global alignment of reads vs chunks Filter alignments, retain “good” set
Eg: maximum 3 mismatches
Jun 04 Jun 04
Alignment analysis
Map retained reads to full genome Remove set with better maps outside region of interest
Jun 04 Jun 04
Practical alignment analysis 1
U R M 12-mers
Jun 04 Jun 04
Practical alignment analysis 2
U R M 12-mers
Jun 04 Jun 04
Output generation
Create multiple sequence alignment Prepare text output in column format Call SNPs (alleles, coverage, etc.)
Jun 04 Jun 04
Results in CSV files
Jun 04 Jun 04
Detailed view in UCSC
Jun 04 Jun 04
Results in MFA
Jun 04 Jun 04
Script srMap
Needs fetch.conf, input chunk and genomic coordinates Produces MFA and CSV output
Jun 04 Jun 04
Script prepareJobs
Needs genomic coordinates Prepares scripts to process each chunk using srMap
Jun 04 Jun 04
Script local2genomic
Needs CSV file produced by srMap Adds genomic coordinates
Jun 04 Jun 04
Script collateCsv
Needs CSV file produced by local2genomic Merges chunks back together
Jun 04 Jun 04
Script matchGenotype
Needs CSV file produced by srMap, local2genomic, or collateCsv Needs genotype file, eg
genotypes_chrMT_YRI_r24_nr.b36_fwd.txt.gz
Compares detected SNPs with reference and produces CSV output
Jun 04 Jun 04
Exercise data source
ftp://ftp.ncbi.nih.gov:21/pub/TraceDB/ShortRead/SRA000271/fastq http://www.illumina.com/HumanGenome/ http://ftp.hapmap.org/genotypes/latest/fwd_strand/non-redundant/ Locally in UHTS_SNP subdirectory of student accounts
Jun 04 Jun 04
Exercise 1
Analyze Illumina reads from NA18507 Confirm HapMap genotype for the mitochondrial genome Choose subsets of the reads and see how coverage and SNPs are affected (confirm other genomic regions of interest)
Jun 04 Jun 04
Exercise 2
Analyze paired Illumina reads from NA18507 Look at the mitochondrial DNA and explain the apparent gap near coordinates 1-120
Jun 04 Jun 04
Exercise 3
Analyze paired Illumina reads from NA18507 Can you confirm homozygous 1Kb deletion on chromosome 20 at 61 Mb?
Jun 04 Jun 04
Exercise 4
Analyze paired Illumina reads from NA18507 Can you confirm a complex re-arrangement
- n chromosome 5