The 1000 genomes project The 1000 genomes project Genetic variation - - PowerPoint PPT Presentation
The 1000 genomes project The 1000 genomes project Genetic variation - - PowerPoint PPT Presentation
The 1000 genomes project The 1000 genomes project Genetic variation > 1% 1000 2500 individuals China, Germany, the UK, the USA 28 populations from Europe, East Asia, West Africa, America, South Asia The 1000 genomes project
The 1000 genomes project
- Genetic variation > 1%
- 1000 → 2500 individuals
- China, Germany, the UK, the USA
- 28 populations from Europe, East Asia, West
Africa, America, South Asia
The 1000 genomes project
The 1000 genomes project
Pilot Purpose Coverage Strategy Status 1 - low coverage 2-4X 2 - trios 20-60X 3 - gene regions 50X Assess strategy of sharing data across samples Whole-genome sequencing of 180 samples Sequencing completed October 2008 Assess coverage and platforms and centers Whole-genome sequencing of 2 mother- father-adult child trios Sequencing completed October 2008 Assess methods for gene-region- capture 1000 gene regions in 900 samples Sequencing completed June 2009
The 1001 Genomes Project
Arabidopsis thaliana
The 1001 Genomes Project
- First plant with a known genome sequence
- 125 – 150 Mb, 5 chromosomes, 30000 genes
- Self-fertilizing
- Big genetic and phenotypic diversity
- Few known alleles responsible for phenotypic
variations
The 1001 Genomes Project
- 10x10x10+1 samples
- The seeds are
available in Arabidospis stock centers
- Includes
morphological analysis
SHORE
- Mapping and analysis pipeline
- Short DNA sequences
- Mapping to a reference sequence
- Weighted and gapped alignments
- SHOREmap
Sequencing Arabidopsis thaliana
- Two naturally inbred accessions (Bur-0, Tsu-1)
- Reference genome sequence (Col-0)
- 120 – 173 million SBS reads
- Aligned to Col-0 (4 MM, 3 bp indels)
- Minimum read support for base calls
Identifying polymorphic regions
- 4.3 Mb non-repetitive or moderately repetitive
regions not covered
- GC poor regions
- 8 non.rep. or mod.rep. positions
- Col-0: 28kb
- Bur-0: 3.25 Mb, Tsu-1: 3.13 Mb
De novo assembly of dissimilar sequences
- Unmapped reads of high quality
- Retain high-confidence reads
- Alignment to the homologous target in the
reference genome
- Bur-0: 7396 contigs
- Tsu-1: 3525 contigs
- Col-0: 20 contigs
Detection of duplications
- Higher than expected coverage
- Several reads support more than one base
- Segmentation into regions of 250bp
- Search for “heterozygous” positions
- Bur-0: 332 kb
- Tsu-1: 364 kb
- Col-0: 11 kb