UHT Sequencing Course Large-scale genotyping Christian Iseli - - PowerPoint PPT Presentation

uht sequencing course large scale genotyping
SMART_READER_LITE
LIVE PREVIEW

UHT Sequencing Course Large-scale genotyping Christian Iseli - - PowerPoint PPT Presentation

UHT Sequencing Course Large-scale genotyping Christian Iseli January 2009 Overview Introduction Examples Base calling method and parameters Reads filtering Reads classification Detailed alignment Alignments analysis Output generation


slide-1
SLIDE 1

UHT Sequencing Course Large-scale genotyping

Christian Iseli January 2009

slide-2
SLIDE 2

Jun 04 Jun 04

Overview

Introduction Examples Base calling method and parameters Reads filtering Reads classification Detailed alignment Alignments analysis Output generation

slide-3
SLIDE 3

Jun 04 Jun 04

Introduction

Basic problem: distinguish polymorphism from sequencing error Use quality “measures” Use redundancy Use knowledge about data source

slide-4
SLIDE 4

Jun 04 Jun 04

Examples

Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping

slide-5
SLIDE 5

Jun 04 Jun 04

Retinitis pigmentosa

Inherited eye disease Linkage analysis PRPF31 mutation Incomplete penetrance Attempt sequencing

slide-6
SLIDE 6

Jun 04 Jun 04

PRPF31 example

13 14

c.1374+654C>G

slide-7
SLIDE 7

Jun 04 Jun 04

PRPF31 example

slide-8
SLIDE 8

Jun 04 Jun 04

PRPF31 example, zoom

slide-9
SLIDE 9

Jun 04 Jun 04

PRPF31 example, MFA

slide-10
SLIDE 10

Jun 04 Jun 04

Examples

Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping

slide-11
SLIDE 11

Jun 04 Jun 04

Hypertrophic cardiomiopathy

Small collection of known genes PCR amplify gene pieces Sequence

slide-12
SLIDE 12

Jun 04 Jun 04

Small deletion

slide-13
SLIDE 13

Jun 04 Jun 04

Examples

Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping

slide-14
SLIDE 14

Jun 04 Jun 04

“Exome” sequencing

Extract selected genomic parts Sequence collected pieces

slide-15
SLIDE 15

Jun 04 Jun 04

Coverage on HsA 21q

slide-16
SLIDE 16

Jun 04 Jun 04

Coverage detail HsA 21q

slide-17
SLIDE 17

Jun 04 Jun 04

HsA 21q HAPMAP NA12782

slide-18
SLIDE 18

Jun 04 Jun 04

Base calling

Rolexa FastQ ...

slide-19
SLIDE 19

Jun 04 Jun 04

Reads filtering

Entropy Quality values (Position)

slide-20
SLIDE 20

Jun 04 Jun 04

Filtering example

Rolexa base calling Filter reads for length and ambiguity

  • ACGTU -> 1
  • KMRSWY -> 2
  • BDHV -> 3
  • N -> 4

– Minimum length 20 – Maximum ambiguity 81

slide-21
SLIDE 21

Jun 04 Jun 04

Read classification

Use fetchGWI against whole genome

– Single exact matches -> U (unique) – Multiple exact matches -> R (repeat) – No exact match -> M (missed)

slide-22
SLIDE 22

Jun 04 Jun 04

Detailed alignment

Use M reads Split region of interest in chunks (eg 300 bp + 40 bp overlap) Find reads with identical 12-mer Global alignment of reads vs chunks Filter alignments, retain “good” set

Eg: maximum 3 mismatches

slide-23
SLIDE 23

Jun 04 Jun 04

Alignment analysis

Map retained reads to full genome Remove set with better maps outside region of interest

slide-24
SLIDE 24

Jun 04 Jun 04

Practical alignment analysis 1

U R M 12-mers

slide-25
SLIDE 25

Jun 04 Jun 04

Practical alignment analysis 2

U R M 12-mers

slide-26
SLIDE 26

Jun 04 Jun 04

Output generation

Create multiple sequence alignment Prepare text output in column format Call SNPs (alleles, coverage, etc.)

slide-27
SLIDE 27

Jun 04 Jun 04

Results in CSV files

slide-28
SLIDE 28

Jun 04 Jun 04

Detailed view in UCSC

slide-29
SLIDE 29

Jun 04 Jun 04

Results in MFA

slide-30
SLIDE 30

Jun 04 Jun 04

Script srMap

Needs fetch.conf, input chunk and genomic coordinates Produces MFA and CSV output

slide-31
SLIDE 31

Jun 04 Jun 04

Script prepareJobs

Needs genomic coordinates Prepares scripts to process each chunk using srMap

slide-32
SLIDE 32

Jun 04 Jun 04

Script local2genomic

Needs CSV file produced by srMap Adds genomic coordinates

slide-33
SLIDE 33

Jun 04 Jun 04

Script collateCsv

Needs CSV file produced by local2genomic Merges chunks back together

slide-34
SLIDE 34

Jun 04 Jun 04

Script matchGenotype

Needs CSV file produced by srMap, local2genomic, or collateCsv Needs genotype file, eg

genotypes_chrMT_YRI_r24_nr.b36_fwd.txt.gz

Compares detected SNPs with reference and produces CSV output

slide-35
SLIDE 35

Jun 04 Jun 04

Exercise data source

ftp://ftp.ncbi.nih.gov:21/pub/TraceDB/ShortRead/SRA000271/fastq http://www.illumina.com/HumanGenome/ http://ftp.hapmap.org/genotypes/latest/fwd_strand/non-redundant/ Locally in UHTS_SNP subdirectory of student accounts

slide-36
SLIDE 36

Jun 04 Jun 04

Exercise 1

Analyze Illumina reads from NA18507 Confirm HapMap genotype for the mitochondrial genome Choose subsets of the reads and see how coverage and SNPs are affected (confirm other genomic regions of interest)

slide-37
SLIDE 37

Jun 04 Jun 04

Exercise 2

Analyze paired Illumina reads from NA18507 Look at the mitochondrial DNA and explain the apparent gap near coordinates 1-120

slide-38
SLIDE 38

Jun 04 Jun 04

Exercise 3

Analyze paired Illumina reads from NA18507 Can you confirm homozygous 1Kb deletion on chromosome 20 at 61 Mb?

slide-39
SLIDE 39

Jun 04 Jun 04

Exercise 4

Analyze paired Illumina reads from NA18507 Can you confirm a complex re-arrangement

  • n chromosome 5

What do you expect to see in the pairs?