Uncovering the wealth of grapevine genetic diversity through whole - - PowerPoint PPT Presentation

uncovering the wealth of grapevine genetic diversity
SMART_READER_LITE
LIVE PREVIEW

Uncovering the wealth of grapevine genetic diversity through whole - - PowerPoint PPT Presentation

Uncovering the wealth of grapevine genetic diversity through whole genome sequencing and assembly Dario Cant Associate Professor and Louis P. Martini Endowed Chair Grape Genomics Outline 1. Why do we need more genome references? 2. What do


slide-1
SLIDE 1

Dario Cantù Associate Professor and Louis P. Martini Endowed Chair

Uncovering the wealth of grapevine genetic diversity through whole genome sequencing and assembly

slide-2
SLIDE 2

Grape Genomics

2

Outline

  • 1. Why do we need more genome references?
  • 2. What do we need to generate and study more

(high quality) grape genomes?

  • 3. Where are we now and where are we going?
  • 4. What do we still need?
slide-3
SLIDE 3

Grape Genomics

3

The grape genome

PN40024 12X V2 Canaguier et al., 2017

700 600 500 400 300

500 ± 62.9 Mbp

Genome size Assembly size (Mbp)

50 ± 2.4%

Repeats

30 40 50 60

Content (%)

Pierozzi and Moura, 2016

Genes

20k 30k 40k 50k 70

  • N. genes

36,283 ± 3,029 Mbp Chromosome

slide-4
SLIDE 4

Grape Genomics

4

Vitis vinifera var. PN40024 (487 Mb; 29,791 protein coding genes)

The first grapevine genomes (2007)

Vitis vinifera cv. Pinot noir ENTAV 115 (504.6 Mb; 29,585 protein coding genes)

slide-5
SLIDE 5

Grape Genomics

5

  • 1. Why do we need more grape

genome references?

slide-6
SLIDE 6

Grape Genomics

Tannat Thompson Seedless Corvina

slide-7
SLIDE 7

Grape Genomics

7

Da Silva C et al. Plant Cell 2013;25:4777-4788

Categorization of the 1,873 genes not shared with PN40024. Number of genes found in common among Tannat, Corvina, and Pinot Noir (ENTAV 115)

  • V. vinifera
  • cv. Tannat

Flavonoid biosynthesis

slide-8
SLIDE 8

Grape Genomics

8

Muscat Cabernet Sauvignon Sauvignon blanc Cabernet Franc Methoxypirazines Terpenes TDN Riesling Rotundone Syrah

Wolkovich et al., 2017 Nature Climate Change

Maturity WUE

slide-9
SLIDE 9

Grape Genomics

9

  • 2. What do we need to generate and study

more (high quality) grape genomes?

slide-10
SLIDE 10

Grape Genomics

10

Minio et al., 2017

The challenge of heterozygosity

slide-11
SLIDE 11

Grape Genomics

11

Wild grapes are dioecious

  • Obligate out-crossers, so highly heterozygous
  • High recessive load and strong inbreeding depression
  • Hermaphrodites are rare in the wild since selfing expresses

deleterious recessive alleles

MA Walker MA Walker

Female Male

Carmona et al., 2008

The challenge of heterozygosity

MA Walker MA Walker

Male

slide-12
SLIDE 12

Grape Genomics

12

Cabernet Franc Merlot Cabernet Sauvignon Sauvignon Blanc Carménère

Bowers and Meredith, 1997 Nat Genetics

Examples of “famous crosses”

Gouias blanc Pinot Chardonnay Riesling Sémillon Aligoté

Bowers et al., 1999 Science

Cultivated varieties are hermaphroditic, but suffer of inbreeding depression

slide-13
SLIDE 13

Grape Genomics

13

Cultivated varieties are hermaphroditic, but suffer of inbreeding depression

Zhou et al., 2017 PNAS

Number of HET deleterious mutations/accession

Number of HET deleterious mutations/accession

Forward simulations under a model of recessive selection for three demographic scenarios and two mating systems.

Zhou et al., 2017 PNAS

Clonal propagation

30 kya (demographic shift) 8 kya (onset of clonal propagation)

slide-14
SLIDE 14

Grape Genomics

14

Seq technology Assembler Assembly size

  • N. contigs

NG50 LG50 Illumina PE SOAPdenovo2 631 Mb 994,414 2,325 49,584 Illumina PE SPAdes 482 Mb 245,348 7,719 15,644 Illumina PE + PacBio (5x) Celera 574 Mb 154,787 24,598 5,591

First attempts to assemble Cabernet Sauvignon (~2012)

The challenge of heterozygosity

slide-15
SLIDE 15

Grape Genomics

15

  • 2. What do we need to generate and study

more (high quality) grape genomes?

a) Better sequencing technologies: longer reads (single molecule real time, nanopore), optical maps, and long-range scaffolding/phasing b) Methods to extract pure and high-molecular weight DNA from grape tissue c) Assembly algorithms that enables the assembly of highly contiguous diploid genomes

slide-16
SLIDE 16

Grape Genomics

16

Cabernet Sauvignon FPS Clone 08 20 kb & 30 kb libraries

SMRT sequencing

P6-C4 chemistry PacBio RSII

FALCON-unzip Quiver 74 cells Mean: 10.7 kbp ~140X

Diploid contigs (primary + haplotigs)

Chin et al., 2016 Nature Methods

slide-17
SLIDE 17

Grape Genomics

17

Common assemblers

Haploid consensus

SNPs

Chin et al., 2016 Nature Methods

Contigs break at loci of heterozygous structural variation

Falcon Falcon-unzip

slide-18
SLIDE 18

Grape Genomics

18

Cabernet Sauvignon Sauvignon Blanc

X

Cabernet Franc

Bowers and Meredith, Nat Genetics 1997

slide-19
SLIDE 19

Grape Genomics

19

Cumulative contig size relative to genome size (480 Mb)

Primary contigs Haplotigs

Primary contigs # of Contigs 718 Assembly Size (Mb) 591 N50 size (Mb) 2.17

Contig size (Mbp) NG values (on 480 Mb genome size)

Seq technology Assembler Assembly size

  • N. contigs

NG50 LG50 Illumina PE SOAPdenovo2 631,320,289 994,414 2,325 49,584 Illumina PE SPAdes 481,817,163 245,348 7,719 15,644 Illumina PE + PacBio (5x) Celera 573,589,710 154,787 24,598 5,591 PacBio (20 + 30 kb lib) FALCON-unzip 590,964,935 718 2,767,687 53

Haplotigs 2,609 372 0.768

slide-20
SLIDE 20

Grape Genomics

20

Assembly Size (Mb)

  • N. seqs

N50 (L50) N90 (L90) Contigs 591 718 2.17 Mb (72) 0.42 Mb (300) Scaffolds (i) 592 330 9.4 Mb (21) 0.96 Mb (87) Scaffolds (ii) 592 246 11.3 Mb (19) 1.7 Mb (66) Scaffolds (iv) 559 182 11.8 Mb (18) 2.8 Mb (52)

Alignment to PN40024 and pseudomolecules

Assembly Size (Mb)

  • N. seqs

N50 (L50) N90 (L90) Contigs 591 718 2.17 Mb (72) 0.42 Mb (300) Scaffolds (i) 592 330 9.4 Mb (21) 0.96 Mb (87) Scaffolds (ii) 592 246 11.3 Mb (19) 1.7 Mb (66) Scaffolds (iv) 559 182 11.8 Mb (18) 2.8 Mb (52) Hyperscaffolds ALT1 443 56 16.5 Mb (11) 6.9 Mb (26) Hyperscaffolds ALT2 330 33 14 Mb (10) 6.1 Mb (23)

Optical maps (phasing and scaffolding)

slide-21
SLIDE 21

Grape Genomics

Chr_01 Chr_01 ALT1 Chr_02 Chr_03 Chr_04 Chr_05 Chr_02 ALT1 Chr_03 ALT1 Chr_04 ALT1 Chr_05 ALT1 Chr_06 ALT1

Cabernet Sauvignon

Chr_06 Chr_07 Chr_08 Chr_09 Chr_10 Chr_07 ALT1 Chr_08 ALT1 Chr_09 ALT1 Chr_10 ALT1 Chr_11 Chr_12 Chr_13 Chr_14 Chr_15 Chr_11 ALT1 Chr_12 ALT1 Chr_13 ALT1 Chr_14 ALT1 Chr_15 ALT1 Chr_16 ALT1 Chr_17 ALT1 Chr_18 ALT1 Chr_19 ALT1 Chr_16 Chr_17 Chr_18 Chr_19

PN40024 V2

slide-22
SLIDE 22

Grape Genomics

22

68% of the whole genome is phased in two haplotypes

Assembly Size (Mb)

  • N. chr
  • N. contigs

Gaps (Mb) ALT_1 455.7 19 525 14.2 (3.1%) ALT_2 310.0 19 422 13.5 (4.3%)

Size (Mb)

ALT1 ALT2

Number of CDS

29,294 16,806

Mean CDS length (kb)

1.2 1.2

Mean number of exons / CDS

5 5

Protein coding sequences

slide-23
SLIDE 23

Grape Genomics

23

Chromosome 09

CS Chr_09 ALT1 CS Chr_09 ALT2

Chromosome 08

CS Chr_08 ALT2 CS Chr_08 ALT1

Structural comparison between homologous chromosomes

slide-24
SLIDE 24

Grape Genomics

24

  • 3. Where are we and where are we going?
slide-25
SLIDE 25

Grape Genomics

25 Myles et al., 2011 PNAS

Cabernet Sauvignon

slide-26
SLIDE 26

Grape Genomics Carmenere

Different coverage (5x - 115x) and assembly parameters

Contig length (Mbp) NG values (relative to 480Mb genome size)

616 assemblies

Optimization of SMRT sequencing and FALCON assembly

slide-27
SLIDE 27

Grape Genomics

Expanding the gene space

1 genotype PN40024.gV2.aV3 2 genotypes + Corvina 3 genotypes + Tannat 4 genotypes + Cab. Sauv 5 genotypes + Chardonnay 6 genotypes + Zinfandel

Number of alleles

480,000 240,000 120,000 60,000 30,000 90 100 92 94 96 98

Composition (%)

Core (shared by all) Variable (shared by at least 2 genotypes but not all) Unique (only in one genotype) 90

slide-28
SLIDE 28

Grape Genomics

SNPs + indels* SVs*

  • N. variants in V. vinifera spp. sylvestris

5.36 M 0.21 M

  • N. variants in V. vinifera spp. vinifera

4.91 M 0.19 M Total size 7.44 Mbp 14.28 Mbp

Genomic structural variation

* relative to Chardonnay

slide-29
SLIDE 29

Grape Genomics

29

What’s Next? North American Vitis

  • V. rotundifolia
  • V. popenoei
  • V. nesbittiania
  • V. mustangensis
  • V. shuttleworthii
  • V. palmata
  • V. biformis
  • V. cinerea
  • V. aestivalis
  • V. vulpina
  • V. labrusca
  • V. monticola
  • V. california
  • V. blancoii
  • V. bloodworthiana
  • V. arizonica
  • V. acerifolia
  • V. riparia
  • V. rupestris
  • V. girdiana
  • V. flexuosa
  • V. treleasei

18 my Wan et al. BMC Evol. Biol., 2013

Southwest Vitis

RESEARCH-PGR #1741627

slide-30
SLIDE 30

Grape Genomics

Ren1

  • V. sylvestris/vinifera

Ren2

  • V. cinerea x
  • V. rupestris

Ren3

american sp.

(`Regent`)

Ren5 Run1 Run2

  • M. rotundundifoila

Ren6 Ren7

  • V. piazeskii

Ren4

  • V. romeneti
slide-31
SLIDE 31

Grape Genomics

slide-32
SLIDE 32

Grape Genomics

32

Species Disease Isolates sequenced Assembly size (Mb) Number

  • f contigs N50 (Kb) Number of

genes Citation Eutypa lata

Eutypa dieback 11 54.5 10 6,542.31 15,313 Blanco-Ulate et al., 2013a Morales-Cruz et al., 2015

Neofusicoccum parvum

Botryosphaeria dieback 16 43.7 27 2,555.42 13,124 Blanco-Ulate et al., 2013b Morales-Cruz et al., 2015 Massonnet et al., 2016

Phaeoacremonium minimum

Esca complex 5 47.3 24 5,520.70 14,790 Blanco-Ulate et al., 2013c Morales-Cruz et al., 2015 Massonnet et al., 2018

Phaeomoniella chlamydospora

Esca complex 2 27.5 702 178.60 6,986 Morales-Cruz et al., 2015

Diplodia seriata

Botryosphaeria dieback 1 37.1 695 304.20 9,398 Morales-Cruz et al., 2015

Diaporthe ampelina

Phomopsis dieback 1 47.4 2,392 132.30 10,801 Morales-Cruz et al., 2015

Erysiphe necator

Powdery mildew 5 52.5 5,936 21.4

6,533

Jones et al., 2014

slide-33
SLIDE 33

Grape Genomics

33

Multi-species reference for RNA-seq read mapping

meta-RNA-seq meta-DNA-seq

Morales-Cruz et al., 2017 Mol Plant Pathology

The vineyard metagenome

  • Dr. Abraham

Morales-Cruz

slide-34
SLIDE 34

Grape Genomics

34 Different virulence strengths Multiple hosts

Sampling genetic diversity

Multiple geographical locations

Gene cluster conservation across isolates

Number of isolates Shared orthologous genes

Core genome (65.4%) Variable genome (24.0%) Isolate specific genes (2.48%)

Morales-Cruz et al., in preparation

slide-35
SLIDE 35

Grape Genomics

35

The take home message We are going toward one (diploid) genome reference per grape species/cultivar/accession.

slide-36
SLIDE 36

Grape Genomics

36

What do we still need?

  • 1. Data sharing, organization, and retrieval
  • 2. How do we ensure that results are comparable

across experiments?

  • 3. Bioinformatic tools with diploid references
  • 4. Data visualization: from “genome-browsers” to

“multiple haplotype-browsers”

slide-37
SLIDE 37

Grape Genomics

  • Dr. Abraham

Morales-Cruz

  • Dr. Mélanie

Massonnet

Our team

  • Dr. Amanda Vondras

Yang He Jadran Garcia Daniela Quiroz Lucero Espinoza Jerry Lin Shahin S. Ali Dingren Yang

  • Dr. Rosa

Figueroa

  • Dr. Andrea Minio

@cantulab http://http://cantulab.github.io/

slide-38
SLIDE 38

Grape Genomics

38

Funding Collaborators

Andy Walker (UCD) Brandon Gaut (UCI) Grant Cramer (UNR) Kendra Baumgartner (USDA-ANR) Philippe Rolshausen (USDA-ANR) Massimo Delledonne (Univ. Verona) Lance Cadle Davidson (USDA-ANR) Jason Londo (USDA-ANR)

Acknowledgments