The 1000 genomes project The 1000 genomes project Genetic variation - - PowerPoint PPT Presentation

▶

Sep 15, 2023 138 likes •272 views

The 1000 genomes project The 1000 genomes project Genetic variation > 1% 1000 2500 individuals China, Germany, the UK, the USA 28 populations from Europe, East Asia, West Africa, America, South Asia The 1000 genomes project

SLIDE 1

The 1000 genomes project

SLIDE 2

The 1000 genomes project

Genetic variation > 1%
1000 → 2500 individuals
China, Germany, the UK, the USA
28 populations from Europe, East Asia, West

Africa, America, South Asia

SLIDE 3

The 1000 genomes project

SLIDE 4

The 1000 genomes project

Pilot Purpose Coverage Strategy Status 1 - low coverage 2-4X 2 - trios 20-60X 3 - gene regions 50X Assess strategy of sharing data across samples Whole-genome sequencing of 180 samples Sequencing completed October 2008 Assess coverage and platforms and centers Whole-genome sequencing of 2 mother- father-adult child trios Sequencing completed October 2008 Assess methods for gene-region- capture 1000 gene regions in 900 samples Sequencing completed June 2009

SLIDE 5

The 1001 Genomes Project

Arabidopsis thaliana

SLIDE 6

The 1001 Genomes Project

First plant with a known genome sequence
125 – 150 Mb, 5 chromosomes, 30000 genes
Self-fertilizing
Big genetic and phenotypic diversity
Few known alleles responsible for phenotypic

variations

SLIDE 7

The 1001 Genomes Project

10x10x10+1 samples
The seeds are

available in Arabidospis stock centers

Includes

morphological analysis

SLIDE 8

SHORE

Mapping and analysis pipeline
Short DNA sequences
Mapping to a reference sequence
Weighted and gapped alignments
SHOREmap

SLIDE 9

Sequencing Arabidopsis thaliana

Two naturally inbred accessions (Bur-0, Tsu-1)
Reference genome sequence (Col-0)
120 – 173 million SBS reads
Aligned to Col-0 (4 MM, 3 bp indels)
Minimum read support for base calls

SLIDE 10

Identifying polymorphic regions

4.3 Mb non-repetitive or moderately repetitive

regions not covered

GC poor regions
8 non.rep. or mod.rep. positions
Col-0: 28kb
Bur-0: 3.25 Mb, Tsu-1: 3.13 Mb

SLIDE 11

De novo assembly of dissimilar sequences

Unmapped reads of high quality
Retain high-confidence reads
Alignment to the homologous target in the

reference genome

Bur-0: 7396 contigs
Tsu-1: 3525 contigs
Col-0: 20 contigs

SLIDE 12

Detection of duplications

Higher than expected coverage
Several reads support more than one base
Segmentation into regions of 250bp
Search for “heterozygous” positions
Bur-0: 332 kb
Tsu-1: 364 kb
Col-0: 11 kb