Uncovering the wealth of grapevine genetic diversity through whole - - PowerPoint PPT Presentation
Uncovering the wealth of grapevine genetic diversity through whole - - PowerPoint PPT Presentation
Uncovering the wealth of grapevine genetic diversity through whole genome sequencing and assembly Dario Cant Associate Professor and Louis P. Martini Endowed Chair Grape Genomics Outline 1. Why do we need more genome references? 2. What do
Grape Genomics
2
Outline
- 1. Why do we need more genome references?
- 2. What do we need to generate and study more
(high quality) grape genomes?
- 3. Where are we now and where are we going?
- 4. What do we still need?
Grape Genomics
3
The grape genome
PN40024 12X V2 Canaguier et al., 2017
700 600 500 400 300
500 ± 62.9 Mbp
Genome size Assembly size (Mbp)
50 ± 2.4%
Repeats
30 40 50 60
Content (%)
Pierozzi and Moura, 2016
Genes
20k 30k 40k 50k 70
- N. genes
36,283 ± 3,029 Mbp Chromosome
Grape Genomics
4
Vitis vinifera var. PN40024 (487 Mb; 29,791 protein coding genes)
The first grapevine genomes (2007)
Vitis vinifera cv. Pinot noir ENTAV 115 (504.6 Mb; 29,585 protein coding genes)
Grape Genomics
5
- 1. Why do we need more grape
genome references?
Grape Genomics
Tannat Thompson Seedless Corvina
Grape Genomics
7
Da Silva C et al. Plant Cell 2013;25:4777-4788
Categorization of the 1,873 genes not shared with PN40024. Number of genes found in common among Tannat, Corvina, and Pinot Noir (ENTAV 115)
- V. vinifera
- cv. Tannat
Flavonoid biosynthesis
Grape Genomics
8
Muscat Cabernet Sauvignon Sauvignon blanc Cabernet Franc Methoxypirazines Terpenes TDN Riesling Rotundone Syrah
Wolkovich et al., 2017 Nature Climate Change
Maturity WUE
Grape Genomics
9
- 2. What do we need to generate and study
more (high quality) grape genomes?
Grape Genomics
10
Minio et al., 2017
The challenge of heterozygosity
Grape Genomics
11
Wild grapes are dioecious
- Obligate out-crossers, so highly heterozygous
- High recessive load and strong inbreeding depression
- Hermaphrodites are rare in the wild since selfing expresses
deleterious recessive alleles
MA Walker MA Walker
Female Male
Carmona et al., 2008
The challenge of heterozygosity
MA Walker MA Walker
Male
Grape Genomics
12
Cabernet Franc Merlot Cabernet Sauvignon Sauvignon Blanc Carménère
Bowers and Meredith, 1997 Nat Genetics
Examples of “famous crosses”
Gouias blanc Pinot Chardonnay Riesling Sémillon Aligoté
Bowers et al., 1999 Science
Cultivated varieties are hermaphroditic, but suffer of inbreeding depression
Grape Genomics
13
Cultivated varieties are hermaphroditic, but suffer of inbreeding depression
Zhou et al., 2017 PNAS
Number of HET deleterious mutations/accession
Number of HET deleterious mutations/accession
Forward simulations under a model of recessive selection for three demographic scenarios and two mating systems.
Zhou et al., 2017 PNAS
Clonal propagation
30 kya (demographic shift) 8 kya (onset of clonal propagation)
Grape Genomics
14
Seq technology Assembler Assembly size
- N. contigs
NG50 LG50 Illumina PE SOAPdenovo2 631 Mb 994,414 2,325 49,584 Illumina PE SPAdes 482 Mb 245,348 7,719 15,644 Illumina PE + PacBio (5x) Celera 574 Mb 154,787 24,598 5,591
First attempts to assemble Cabernet Sauvignon (~2012)
The challenge of heterozygosity
Grape Genomics
15
- 2. What do we need to generate and study
more (high quality) grape genomes?
a) Better sequencing technologies: longer reads (single molecule real time, nanopore), optical maps, and long-range scaffolding/phasing b) Methods to extract pure and high-molecular weight DNA from grape tissue c) Assembly algorithms that enables the assembly of highly contiguous diploid genomes
Grape Genomics
16
Cabernet Sauvignon FPS Clone 08 20 kb & 30 kb libraries
SMRT sequencing
P6-C4 chemistry PacBio RSII
FALCON-unzip Quiver 74 cells Mean: 10.7 kbp ~140X
Diploid contigs (primary + haplotigs)
Chin et al., 2016 Nature Methods
Grape Genomics
17
Common assemblers
Haploid consensus
SNPs
Chin et al., 2016 Nature Methods
Contigs break at loci of heterozygous structural variation
Falcon Falcon-unzip
Grape Genomics
18
Cabernet Sauvignon Sauvignon Blanc
X
Cabernet Franc
Bowers and Meredith, Nat Genetics 1997
Grape Genomics
19
Cumulative contig size relative to genome size (480 Mb)
Primary contigs Haplotigs
Primary contigs # of Contigs 718 Assembly Size (Mb) 591 N50 size (Mb) 2.17
Contig size (Mbp) NG values (on 480 Mb genome size)
Seq technology Assembler Assembly size
- N. contigs
NG50 LG50 Illumina PE SOAPdenovo2 631,320,289 994,414 2,325 49,584 Illumina PE SPAdes 481,817,163 245,348 7,719 15,644 Illumina PE + PacBio (5x) Celera 573,589,710 154,787 24,598 5,591 PacBio (20 + 30 kb lib) FALCON-unzip 590,964,935 718 2,767,687 53
Haplotigs 2,609 372 0.768
Grape Genomics
20
Assembly Size (Mb)
- N. seqs
N50 (L50) N90 (L90) Contigs 591 718 2.17 Mb (72) 0.42 Mb (300) Scaffolds (i) 592 330 9.4 Mb (21) 0.96 Mb (87) Scaffolds (ii) 592 246 11.3 Mb (19) 1.7 Mb (66) Scaffolds (iv) 559 182 11.8 Mb (18) 2.8 Mb (52)
Alignment to PN40024 and pseudomolecules
Assembly Size (Mb)
- N. seqs
N50 (L50) N90 (L90) Contigs 591 718 2.17 Mb (72) 0.42 Mb (300) Scaffolds (i) 592 330 9.4 Mb (21) 0.96 Mb (87) Scaffolds (ii) 592 246 11.3 Mb (19) 1.7 Mb (66) Scaffolds (iv) 559 182 11.8 Mb (18) 2.8 Mb (52) Hyperscaffolds ALT1 443 56 16.5 Mb (11) 6.9 Mb (26) Hyperscaffolds ALT2 330 33 14 Mb (10) 6.1 Mb (23)
Optical maps (phasing and scaffolding)
Grape Genomics
Chr_01 Chr_01 ALT1 Chr_02 Chr_03 Chr_04 Chr_05 Chr_02 ALT1 Chr_03 ALT1 Chr_04 ALT1 Chr_05 ALT1 Chr_06 ALT1
Cabernet Sauvignon
Chr_06 Chr_07 Chr_08 Chr_09 Chr_10 Chr_07 ALT1 Chr_08 ALT1 Chr_09 ALT1 Chr_10 ALT1 Chr_11 Chr_12 Chr_13 Chr_14 Chr_15 Chr_11 ALT1 Chr_12 ALT1 Chr_13 ALT1 Chr_14 ALT1 Chr_15 ALT1 Chr_16 ALT1 Chr_17 ALT1 Chr_18 ALT1 Chr_19 ALT1 Chr_16 Chr_17 Chr_18 Chr_19
PN40024 V2
Grape Genomics
22
68% of the whole genome is phased in two haplotypes
Assembly Size (Mb)
- N. chr
- N. contigs
Gaps (Mb) ALT_1 455.7 19 525 14.2 (3.1%) ALT_2 310.0 19 422 13.5 (4.3%)
Size (Mb)
ALT1 ALT2
Number of CDS
29,294 16,806
Mean CDS length (kb)
1.2 1.2
Mean number of exons / CDS
5 5
Protein coding sequences
Grape Genomics
23
Chromosome 09
CS Chr_09 ALT1 CS Chr_09 ALT2
Chromosome 08
CS Chr_08 ALT2 CS Chr_08 ALT1
Structural comparison between homologous chromosomes
Grape Genomics
24
- 3. Where are we and where are we going?
Grape Genomics
25 Myles et al., 2011 PNAS
Cabernet Sauvignon
Grape Genomics Carmenere
Different coverage (5x - 115x) and assembly parameters
Contig length (Mbp) NG values (relative to 480Mb genome size)
616 assemblies
Optimization of SMRT sequencing and FALCON assembly
Grape Genomics
Expanding the gene space
1 genotype PN40024.gV2.aV3 2 genotypes + Corvina 3 genotypes + Tannat 4 genotypes + Cab. Sauv 5 genotypes + Chardonnay 6 genotypes + Zinfandel
Number of alleles
480,000 240,000 120,000 60,000 30,000 90 100 92 94 96 98
Composition (%)
Core (shared by all) Variable (shared by at least 2 genotypes but not all) Unique (only in one genotype) 90
Grape Genomics
SNPs + indels* SVs*
- N. variants in V. vinifera spp. sylvestris
5.36 M 0.21 M
- N. variants in V. vinifera spp. vinifera
4.91 M 0.19 M Total size 7.44 Mbp 14.28 Mbp
Genomic structural variation
* relative to Chardonnay
Grape Genomics
29
What’s Next? North American Vitis
- V. rotundifolia
- V. popenoei
- V. nesbittiania
- V. mustangensis
- V. shuttleworthii
- V. palmata
- V. biformis
- V. cinerea
- V. aestivalis
- V. vulpina
- V. labrusca
- V. monticola
- V. california
- V. blancoii
- V. bloodworthiana
- V. arizonica
- V. acerifolia
- V. riparia
- V. rupestris
- V. girdiana
- V. flexuosa
- V. treleasei
18 my Wan et al. BMC Evol. Biol., 2013
Southwest Vitis
RESEARCH-PGR #1741627
Grape Genomics
Ren1
- V. sylvestris/vinifera
Ren2
- V. cinerea x
- V. rupestris
Ren3
american sp.
(`Regent`)
Ren5 Run1 Run2
- M. rotundundifoila
Ren6 Ren7
- V. piazeskii
Ren4
- V. romeneti
Grape Genomics
Grape Genomics
32
Species Disease Isolates sequenced Assembly size (Mb) Number
- f contigs N50 (Kb) Number of
genes Citation Eutypa lata
Eutypa dieback 11 54.5 10 6,542.31 15,313 Blanco-Ulate et al., 2013a Morales-Cruz et al., 2015
Neofusicoccum parvum
Botryosphaeria dieback 16 43.7 27 2,555.42 13,124 Blanco-Ulate et al., 2013b Morales-Cruz et al., 2015 Massonnet et al., 2016
Phaeoacremonium minimum
Esca complex 5 47.3 24 5,520.70 14,790 Blanco-Ulate et al., 2013c Morales-Cruz et al., 2015 Massonnet et al., 2018
Phaeomoniella chlamydospora
Esca complex 2 27.5 702 178.60 6,986 Morales-Cruz et al., 2015
Diplodia seriata
Botryosphaeria dieback 1 37.1 695 304.20 9,398 Morales-Cruz et al., 2015
Diaporthe ampelina
Phomopsis dieback 1 47.4 2,392 132.30 10,801 Morales-Cruz et al., 2015
Erysiphe necator
Powdery mildew 5 52.5 5,936 21.4
6,533
Jones et al., 2014
Grape Genomics
33
Multi-species reference for RNA-seq read mapping
meta-RNA-seq meta-DNA-seq
Morales-Cruz et al., 2017 Mol Plant Pathology
The vineyard metagenome
- Dr. Abraham
Morales-Cruz
Grape Genomics
34 Different virulence strengths Multiple hosts
Sampling genetic diversity
Multiple geographical locations
Gene cluster conservation across isolates
Number of isolates Shared orthologous genes
Core genome (65.4%) Variable genome (24.0%) Isolate specific genes (2.48%)
Morales-Cruz et al., in preparation
Grape Genomics
35
The take home message We are going toward one (diploid) genome reference per grape species/cultivar/accession.
Grape Genomics
36
What do we still need?
- 1. Data sharing, organization, and retrieval
- 2. How do we ensure that results are comparable
across experiments?
- 3. Bioinformatic tools with diploid references
- 4. Data visualization: from “genome-browsers” to
“multiple haplotype-browsers”
Grape Genomics
- Dr. Abraham
Morales-Cruz
- Dr. Mélanie
Massonnet
Our team
- Dr. Amanda Vondras
Yang He Jadran Garcia Daniela Quiroz Lucero Espinoza Jerry Lin Shahin S. Ali Dingren Yang
- Dr. Rosa
Figueroa
- Dr. Andrea Minio
@cantulab http://http://cantulab.github.io/
Grape Genomics
38