Improving genome assemblies, assessing structural variation and - - PowerPoint PPT Presentation

improving genome assemblies assessing structural
SMART_READER_LITE
LIVE PREVIEW

Improving genome assemblies, assessing structural variation and - - PowerPoint PPT Presentation

Improving genome assemblies, assessing structural variation and trait association using chromosome genomics and Illumina skim genotyping by sequencing David Edwards University of Queensland, Australia Dave.Edwards@uq.edu.au 1 Outline


slide-1
SLIDE 1

Improving genome assemblies, assessing structural variation and trait association using chromosome genomics and Illumina skim genotyping by sequencing

1

David Edwards University of Queensland, Australia Dave.Edwards@uq.edu.au

slide-2
SLIDE 2

Outline

  • Chromosome sequencing
  • SNP discovery
  • Genotyping by sequencing (skim method)
  • Validating genome structure
slide-3
SLIDE 3

Technology - Next Generation sequence

The challenge of genome sequencing

slide-4
SLIDE 4

Technology - Next Generation sequence

The challenge of genome sequencing

Thanks to Roger Hellens, Plant and Food New Zealand

slide-5
SLIDE 5

Hexaploid wheat genome

5 http://www.jic.ac.uk/staff/graham-moore/wheat_meiosis.htm

17 billion bases

slide-6
SLIDE 6

Chromosome sequencing

  • Isolate individual or groups of chromosomes

using flow cytometry

  • Generate NGS libraries and PE Illumina data
  • Assemble or map reads to reference genome
slide-7
SLIDE 7

Mapping reads to reference genomes

7

1 2 3 4 5 6 11 10 9 8 7 12

slide-8
SLIDE 8

Sequencing wheat chromosome arms

8

Ta 7DS Bd 1 Bd 3

www.wheatgenome.info

Berkman, et al., Plant Biotechnology Journal (2011)

slide-9
SLIDE 9

7BS/4AL translocation

9

7DS and 7BL sequence similarity with Brachypodium

slide-10
SLIDE 10

7BS/4AL translocation

10

  • Translocation between Bradi1g49500 and

Bradi1g49550

  • Intervening 4 genes missing from

all assemblies

  • ~13% genes moved from 7BS to 4AL
  • 13 genes moved from 4AL to 7BS

Berkman et al. (2012) Theoretical and Applied Genetics 3, 423-432

slide-11
SLIDE 11

Wheat genome evolution

11

AA BB AW AABB 50,000 years ago DD AABBDD 10,000 years ago AABB DD

7A 7B 7D

slide-12
SLIDE 12

GBrowse http://wheatgenome.info/

Lai et al.(2012) Plant and Cell Physiology 53, 1-7

slide-13
SLIDE 13

Genome sequencing in chickpea

13

Two draft genomes published in 2013

slide-14
SLIDE 14

Chickpea reference (Kabuli)

slide-15
SLIDE 15

Chickpea reference (Kabuli)

slide-16
SLIDE 16

Chickpea reference (Kabuli)

K8 D8

K3 D3

K5 D5

K = Kabuli D = Desi

slide-17
SLIDE 17

Chickpea reference (Kabuli)

K8 D8 K3 D3 K5 D5

slide-18
SLIDE 18

Chickpea reference (Desi)

A 5 3 8

slide-19
SLIDE 19

Chromosome sequencing

  • Sequencing isolated chromosomes identifies

misassembles and rearrangements at base pair resolution

slide-20
SLIDE 20

20

SGSautoSNP

  • Generate a reference
  • Map variety specific reads to the reference
  • Call differences between the varieties
  • At least two reads defining the difference
  • No conflict within a variety (homozygous genomes)

>95% accuracy for canola >93% accuracy for wheat

slide-21
SLIDE 21

Brassica SNP matrix

21

A Bn 55,716 E 57,492 67,676 I 27,487 33,874 26,406 J 100,933 108,457 86,807 52,377 M1 52,541 61,657 43,746 20,655 93,148 M51 53,627 69,495 54,071 30,968 93,966 56,190 M52 64,088 68,533 63,092 34,656 51,013 63,219 60,793 M91 70,214 80,230 57,023 38,612 89,294 67,496 60,932 58,091 M2 34,535 38,248 27,954 18,731 41,866 34,073 29,306 27,318 11,944 Mu 106,182 121,584 87,536 46,824 192,343 72,205 114,260 130,317 131,155 66,838 N 159,608 208,373 146,700 73,345 270,623 139,082 178,653 205,985 215,689 113,928 258,980 No 81,073 97,160 86,610 39,263 164,813 81,265 93,250 98,393 97,109 46,546 174,630 252,923 S 40,857 42,661 53,786 28,431 92,840 51,584 55,260 60,118 64,493 31,424 101,900 160,234 81,474 Sr 65,657 85,317 63,305 38,484 113,199 68,078 3,798 73,578 73,825 35,584 137,597 215,422 115,212 68,231 T 124,971 149,974 100,000 51,304 212,272 61,611 132,415 153,887 153,504 82,307 175,304 296,891 213,237 119,697 157,308 Tf 57,190 76,556 78,239 39,240 140,978 68,383 59,394 78,257 90,655 41,702 157,441 262,784 125,298 65,430 74,385 194,683 Tr 11,193 14,028 12,553 6,760 21,972 12,045 6,624 13,849 16,149 7,794 25,791 39,920 20,127 12,249 8,314 30,468 12,331 A Bn E I J M1 M51 M52 M91 M2 Mu N No S Sr T Tf Tr

slide-22
SLIDE 22

Skim GBS

22

  • Determine SNPs by sequencing parents and running

SGSautoSNP

  • Low coverage skim sequence segregating population
  • Map reads to the reference genome
  • Call genotype where reads cover previously defined SNP
  • Impute and clean to define haplotype blocks
slide-23
SLIDE 23

Genotype calling

23

Call genotype of previously predicted SNPs A C/A T/C A

slide-24
SLIDE 24

Pre-imputation

slide-25
SLIDE 25

After imputation and cleaning

slide-26
SLIDE 26

Misplaced contigs in assembly?

slide-27
SLIDE 27

Misplaced contig or rearrangement

M20809 M20810 M20811 M20812 M20813 M20814 M20815 M20816 M20817 M20818 M20819 M20820 M20821 M20822 M20823 M20824 M20825 M20826 M20827 M20828 M20829 M20831 M20832 M20833 M20834 M20835 M20836 M20837 M20838 M20839 M20840 M20841 M20842 M20843 M20844 M20845 M20846 M20847 M20848 M20849 M20850 M20851 M20852 M20853 M20854 M20855 M20856 M20857 M20858 M20859 M20860 M20861 M20862 M20863 M20864 M20865 M20866 M20867 M20868 M20869 M20870 M20871 M20872 M20873 T T T T T E E T T E E N N E E T T E E E E E T E E T N N n n N E T E N T T T T T E E T T E e e E E e E E N E T E E E E T T E T T T T T T E T T T T T T E T T T T N N E E E T t t t T T T T T N N n n n n N N T T T E E T T T T E E T T T E N E E E N N N N N N N N N N N N N N N N N N N N N N N N N N N N T T T T T T T T T N N N N N N N N n N N N n N N N N n n N N N N N N N N N N N N N N N n N N N N N N N N N N N N N N N N N E N N T T T T T T T T T N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N n N N N n N N n N N n N n N N T T T T T T T E N N N n n N N N N N N N N N N N n N N N N N N N N N N N N N N N N n N n N N N N N N n N n N N N N N N N N N N T T T T T T T T T N N N N N N N N N N N N n N N n N N N N N N N N N N N N N n n n n N N n n n n n N N n n n n N n n n n n n N T T T T T T E T E N N N N N N n n N n n n N N n n N N N n N N n n n N N N N N N N N N N N N N N N N N N N N N N N N N N N N E N n n N N n N n n n n n n N N N N N N N N N N N N N N N N N N N N N N N N n N N N n N N n N N N N n n N n N N N N N N N T t T T T T T T T N N N N N N N n N N N n N n N N N N N N n n N N N n n N N N N n n N N n N N N N N N N N n n N N N N N N N N n n n n n n n N n n n n N N n n n n N N N N N N n N N N N N N N N N n N N N E N N N N N N n N N N N N N N N N N N n n N N N n N T T T T t t T E T N N N N N N N n N N N N N N N N N N N N N N N N N N n n N N N N N N N N N N N N N N N N N N N N N N N N N N N T T T T T T T T T N N N N N N N N n n N N n N N N N N N N N N N N N n N N N N N n N n n n n N n N N N N N n N N N N N N n N N T t T T T T T T N N N n n N N n n N N N n n N N N N N N N N N N N n N n N N N N N N N n N n N n n n n n N N n n N N N N N N E T T T T T T N T N N n n n N N N n n N N n N N N N N N N N N N n n N N T N n n N N n n n n n n n n n n n n n n n n n N N T t t t t t T N E T t t t T N E T N n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n N n n n n n N N n n n n n n n n n n n n n n n N N n n n n n N n n N n n n N n N T t t t t t t T N N N N n n n n n N N n N N n n n N N N N n n N T N N N N n n N N n N N n N N N N n n n N T E T t T T T t T T N n n n N n n n n n n n N n n n n N N n N N N N N N T N N N T N N N n n n N T N N T E N N E E N E N N n n n n n n n n n n N n n n n N N N n n n N N n n n n N N T t t t t T T T T N n n n n n n n N n n N N n n N N N N n n n n N n n n N N N N N N n N N N n n n N N N N N N n n N n N N N T T T T T T T T T N N N N N n n N N n N N N N N N N N N N N N N N N N n n N N N N N n N N n n n n N N N N n n n n n n N n N N n n n n N N N n n N N N N n n n n N N n n N n N n N N N N N n n n n N n N n N N N n n N n N N N N n N N n n n n n N N n n N T t T T N T N N N N N n n n n n N N n n N N N N N N N n N N n n N n n N N N N N n N n n n n N n N N N n n n N n N N N T T T T t T T t T N n n n n N N n n n N N N n n n n n N N N N n N N N N N n N N N N N n N N N N N N N N n N n n n n n n n N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N n n N n n N N N N N N N N N N N N N N N N N N n n n n n n n N N N N N n n n n N N N N N n N n N N n n N N N N N N n N N N N N N N N N N N N N N N N N N N N N N N N n N N N N n N n N N n n N N n N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N n n n n N n N N N N N N N n N N N n n n N T T T T t T T T T N N N N N n n n n N N N N N N N N N N N N n N N n n n n n n N n n N n n n n n n n N N N T N N n n n n n n n n n n N n n N N T T E n n n n n n n n N n n n n n n N n n N n n n n N N N N N n n n n n n n N N N n n N N n n n n n n n n n n n n n n n n n n n N N N N N E E N N N N N N N N N N N n N N N N N N N T T T T T T T T T N N N N N N N N N N n n N E N N N N N N N N N N N N N T N T N T N N N n N E N n n n n n n n n n n n n N n n n n n n n n n n n n n n N N N n n N T T N N N N N n N N n N N N N N N N N N N N N N N N N N T T T T T T E E N N N N N N n n n N n n N N N N N N N N N N N N N N N T T T T E E T T T T T T T T T T T T T t t T T T T T N N T t t t t t T T T T T t T t T t t T T t t T T T t T T T T T T T T T E T T T T N N E T E E T t t T T T T T N N n n n n N N T T T E E T T T T E E T T T E N E E E N N T T T T T T T T T T E E e e e E T E N N N T N T T t t T E E N n n n n n n n n N T T t t t t t T T N T E N T T E t t T T T t T T t T T t t t t t t t t T t t t T T T N n N N n n N T t t t T t T t t t t t t t t t T T T t t t t T T T T E T T T T T T t T T T T t T t T T T T T T N N n N n N N N T T T T T T T T T T T t t t T T t T t T N T T T T T T T T T T T T T t t T T T T t T T T T T T T T N N n n n n N N T T t t t t T T T t t T T t t t T T t t t T T T T T T T T T T T T T T T T T T t t t T T T T T T T T T t T T N T T T T T t T T T T t T T t T t t T T t t t T t t t t t T T T T T T T T T T T T t t T T t T T T t t T T t T T N N n N N n n N T t t t t t t T t T T T t t T t t t t t t t t T T t T T T T T T T t T t t T t t t t T T T t t T t t T t t t t t t t t t t t t t T T t t t t t T T T t T T t t t t T T t t t T t T t t t T t t T T T t t t T T T t t T T T T T t t t t T T t T N N N N N n n N T T t t t t t t t t t t T t t t t t T T T T T T t T T t T T t t T T t T t t T T t T t t T T E T E E T T E T T t t t t t t t t t T t T T t t t t t T T t T t t t t t T T T t T T E T T T T t t T T T T T T T T T T t t T N N T t t T t t t t t t t T T t t t T T T t T T T T T E T E T N T T T N T N T N N T t T T T t t t T t t t t t T E T t t t t t t t t t t T N N E E E T N E T t T t t t T E E T T T T T T t t t t T t t t t T N T T n n n N N T t t T T T t t T T t t t T N T t T N N n N N n n N N T T T T t T T T t T t t t T E T T N N T T T T t t t T T T t t t t t t t t t t t t t t t t t T N T t t t t t t t t T t t t T T t t T T T T T t t T T T t t T T T T t t T T T T T T T t T T T T T t T T t t t t t T T T T T T t T t t T t t T T T t T T T T t t T T T T T t T T t T T t t t T T T T T T t t T T t t T t t t t t t t t t t t t t t t t T T T T t t t t t t T T T t t T t t t T T t T t t t t T t t t t t t t t t t t t t t t t t t T t t t t t t t t t t t t t t t t t t t t t t t t t t t t t T T T T t t T T T T T t t t T t T T T t t t T t T T N T T T T t T t t T t t t t T t T t t t T T T T T N T N T t t t t t T t t t t T T E T t t T T T t t t T t t t t T T T T t T t t T t T T T T T T T T T t T T t t t t t t t t t t T t t T t t t T T T N T t t t T T T t t T T T T T t T T t t t t t t t t t T t T T T T T N N N N N n n N T t t t t t t t T T N E T t t T T T t t T T T T t t t t t t t t t t t T N T t t t t T t T t t t T t t t t T T T T t T T T T t t t T T T N T t T t T T T t t t t t t t t t t t T t t t t t t t t t t t t T T T t t t t T t t t T T T E E e e e e E N N N T T T E N E E T E T T T T t T T E T E T N n N E E T N N N n n n n N E E E T T T t t t t t t t t t t t T E N T t t t t t t T T N N T
slide-28
SLIDE 28

Blackleg

  • Blackleg-infection (caused by

Leptosphaeria maculans)

Source: http://www.agf.gov.bc.ca/cropprot/blackleg.htm

Tollenaere et al. (2012) Identification and characterisation of candidate Rlm4 blackleg resistance genes in Brassica napus using next generation sequencing. Plant Biotechnology Journal. 10 (6): 709-715

slide-29
SLIDE 29

Conclusions

  • Sequencing isolated chromosomes identifies

genes, misassembles and rearrangements

  • High resolution GBS
  • highlights differences in genome structure
  • identifies gene conversions
  • can be used to validate and fix assemblies
  • can identify candidate genes for traits
slide-30
SLIDE 30

Acknowledgements

30

Pradeep Ruperao Philipp Bayer Kenneth Chan Michal Lorenc Kaitao Lai Agnieszka Golic Paul Visendi Paula Martinez Jacqueline Batley Alice Hayward Emma Campbell Jessica Dalton-Morgan Satomi Hayashi Hana Šimková Marie Kubaláková Jaroslav Doležel Harsh Raman Yan Long Jinling Meng Isobel Parkin Rachit K Saxena Deepa Jaganathan Pooran M Gaur Rajeev Varshney Bart Lambert Benjamin Laga

Contact: Dave.Edwards@uq.edu.au