Next Generation Sequencing:
Applications
Anna De Grassi
- - European Institute of Oncology - Milan --
- F. Ciccarelli group -
BITS - March 20, 2009 - Genoa
Applications Anna De Grassi - - European Institute of Oncology - - - PowerPoint PPT Presentation
Next Generation Sequencing: Applications Anna De Grassi - - European Institute of Oncology - Milan -- - F. Ciccarelli group - BITS - March 20, 2009 - Genoa Several Flavours of Throughput Genome sequencing Genome sequencing
BITS - March 20, 2009 - Genoa
Kevin Chen and Lior Pachter (University of California, Berkeley)
Turnbaugh, PJ Nature - 444, 1027 - 1031 2006
Shotgun sequencing:
454 sequencing:
beads and emulsion PCR
+/+
Obese: Lean:
lean1 lean2 lean3
3runs 2runs
Draft genome of the most common bacterium (E. rectale):
Turnbaugh, PJ Nature - 444, 1027 - 1031 2006
Metagenomics Analyses:
EGS = enviromental gene tags
Turnbaugh, PJ Nature - 444, 1027 - 1031 2006
454 Pros:
Capillary Pros:
ATCGT ATCGT ATCGT ATCGT ATCGT ATCGT AT ATA AGT GT ATCGT ATCGT ATCGT ATCGT
Only consensus Only consensus sequence: ATCGT sequence: ATCGT
ATCGT ATCGT ATCGT ATCGT ATCGT ATCGT AT ATA AGT GT ATCGT ATCGT ATCGT ATCGT
ATCGT ATCGT ATCGT ATCGT ATCGT ATCGT ATCGT ATCGT ATCGT ATCGT AT ATA AGT GT
Campbell, PJ PNAS - 105, 13081 - 13086 2008
Samples:
(chronic lymphocytic leukemia)
385,000 reads, ~250bp per read (>95% aligned to the reference)
~300bp F R e.g. ACT
Campbell, PJ PNAS - 105, 13081 - 13086 2008
ERROR PROCESSING : Analysis of the control locus all the variations from the reference sequence are artifacts Sequencing errors:
DNA polymerase errors:
e.g (G:C->A:T) most common
Campbell, PJ PNAS - 105, 13081 - 13086 2008
Filter to detect “real” rare variants in 24 samples by excluding:
Sub-clonal mutations can be detected down to a frequency of 1/5000 reads Phyolgenetic analysis:
Fields S Science (2007) 316. pp. 1441 - 1442
1946 peaks
Johnson, DS Science - 316, 1497 - 1502 2007
Protein: NRSF (neuron-restrictive silencer factor)
DNA samples:
Sequencing and Mapping:
Detection of binding sites:
Johnson, DS Science - 316, 1497 - 1502 2007
Benchmark:
Variation of DNA motifs at the binding site:
Canonical Non canonical
Morin, D Genome Research - 18, 610 - 621 2008
Samples:
RNA preparation and sequencing:
Filter and Mapping on the genome:
~4M (70%) reads and ~0.75M unique sequences
Overlap with DBs of known sequences: 5% sequences
Morin, D Genome Research - 18, 610 - 621 2008
Qualitative analysis (known microRNAs):
cleavage positions and post-translational modifications
Morin, D Genome Research - 18, 610 - 621 2008
Quantitative analysis:
100 microRNAs
Cloonan, N Nature Methods - 5(7), 613 - 619 2008
Samples :
Cloonan, N Nature Methods - 5(7), 613 - 619 2008
Reads mapping on the genome:
Multiple mapping is accepted (if less than 100 positions)
Filter and Mapping strategy : 7 steps!!
unique tags
the genome (<=2 mismatches) Good quality reads: ~155M reads per sample
Gene expression (tag count):
Cloonan, N Nature Methods - 5(7), 613 - 619 2008
Custom track on UCSC:
Differential expression between samples:
(35/50 ES markers were confirmed): 70% sensitivity
Cloonan, N Nature Methods - 5(7), 613 - 619 2008
Transcriptome discovery :
Alternative splicing isoforms:
longer consensus (>50nt)
Discovery of expressed SNPs: Extensive filtering!
Cloonan, N Nature Methods - 5(7), 613 - 619 2008
Filter by proportion: 75% of tag are mutated: (heterozigous mutations are sistematically discarded)
PCR: specificity = 80% Only full length tags (35nt) and high quality Mapping to the genome: (multi-mapping are excluded) Filter by colour-space errors Filter by error profile of tags: first 6nt, last 5nt and 26
Yes Yes Yes Transcriptome Analysis No No Yes Amplicon sequencing Tested only for 100s reads Tested only for 100s reads Yes Ultra-deep sequencing No Only virus Yes Metagenomics Yes Yes Yes Chip-Seq Yes Yes Yes SNPs and Point Mutations Yes Yes Yes Nucleosome positioning Yes Yes Yes Structural variations Small genome Yes Yes Genome re-sequencing No Small Genomes Small Genomes Genome sequencing SOLiD Illumina 454 Application Read length Number of reads