PLANT DNA BARCODING: HISTORY CASE STUDIES FUTURE PERSPECTIVES
MARIA KUZMINA UNIVERSITY OF GUELPH, CANADA
HISTORY CASE STUDIES FUTURE PERSPECTIVES MARIA KUZMINA UNIVERSITY - - PowerPoint PPT Presentation
PLANT DNA BARCODING: HISTORY CASE STUDIES FUTURE PERSPECTIVES MARIA KUZMINA UNIVERSITY OF GUELPH, CANADA Building the DNA barcode library for the flora of Canada using herbarium specimens Encouraging start ... COI is a successful
MARIA KUZMINA UNIVERSITY OF GUELPH, CANADA
fails in plants for several reasons:
plants
COI
Encouraging start ...
Ideal barcode should be:
species Resources:
Decision:
psbA-trnH matK rbcL
2009 Chloroplast DNA markers
Top
ical corresp espon
denc nce e of
the e DN DNA A ba barco code de ph phyloge ylogeny ny an and d th the Ang e Angiospe iosperm P m Phylog hylogen eny y Gr Grou
p (AP (APG) G)
The central role
in our overall understanding
(Soltis et al., 2005) (Kress et al., 2009) (Stevens, 2001 onward) Angiosperm phylogeny website
Nuclear Ribosomal DNA
(Chen et al, 2010)
18S 5.8S 26S
ITS1 ITS2
rejected
diverse sample of plants
species
Adding a nuclear marker ...
Publicati
Geographi c area No.
species
Reported species resolution (%)
rbcL matK ITS2 rbcL+ma tK rbcL+ITS 2 All 3
A.Fazekas et al, 2008 North America
92
48 56
K.Burgess et al, 2011 Koffler Scientific Reserve (KSR), Ontario
436
80 89 93
M.Kuzmin a et al, 2012 Churchill, Manitoba
312
54 63 69
D.Percy et al, 2014 North America
71
Incomplete lineage sorting OR plastid capture with selective sweep
M.Zarrei et al, 2015 North America
83
Polyploidy and hybridization
T.Elliott et al, 2015 Mont St.Hilaire, Quebec
582
Focusing on quality control of collected material and data
Gene Pros Cons rbcL (550 bp) Easily amplified Good length for NGS Poor taxonomic resolution matK (800 bp) Good taxonomic resolution Often difficult to amplify Too long for most NGS platforms ITS2 (350 bp) Good taxonomic resolution Good length for NGS Paralogous copies Not easy to align across a diverse set of taxa
(CAN)
Resource s
1 2 3 4 5 6 2008 2009 2010 2011 2012 2013 2014 2015 2016
5K
~18,000 specimens
rbcL matK ITS2 10 20 30 40 50 60 70 80 90 100 Gene species resolution (%) rbcL matK ITS2 10 20 30 40 50 60 70 80 90 100 Gene species resolution (%)
species resolution (%) rbcL rbcL matK matK ITS2 ITS2 BLAST mothu r
Plant checklists from 28 national parks and reserves
Arctic Boreal Pacific Prairie Woodland Atlantic 20 30 40 50 60 70 80 90 Region species resolution (%) Arctic Boreal Pacific Prairie Woodland Atlantic 20 30 40 50 60 70 80 90 Region species resolution (%) Arctic Boreal Pacific Prairie Woodland Atlantic 20 30 40 50 60 70 80 90 Region species resolution (%)
species resolution (%) matK rbcL ITS2 Arctic Boreal Pacific Prairie Woodland Atlantic Arctic Boreal Pacific Prairie Woodland Atlantic Arctic Boreal Pacific Prairie Woodland Atlantic
Plant checklists from 28 national parks and reserves combined in 6 biogeographic regions
The he DN DNA A ba barco code de ref efer eren ence ce li libr brar ary y for
mosses: s: rb rbcL and and tr trnL-F f F for
775 5 spe species of cies of Can Canad adian ian Br Bryop
hyta
Canadian Museum of Nature Center for Biodiversity Genomics
Maria Kuzmina, Jennifer Doubt, Catherine La Farge, Juan Carlos Villarreal & Paul Hebert
Step one: sampling, imaging, databasing
Source location of the specimens inc include luded in d in th the DN e DNA A ba barco code de ref efer eren ence ce li libr brar ary y for
Canad adian mosse ian mosses
~ 2000 specimens 775 species ~ 3 records per species
Number of moss specimens analyzed by province
The he most ofte most often use n used d ph phyloge ylogene netic tic mar marker ers f s for
mosses
(Stech & Quandt, 2010) rbcL trnL-F ITS
Time
Relationship between specimen age and sequence recovery
Overall sequencing success:
Specimens Species rbcL 84% 94% trnL-F 85% 98%
The Maximum Lik he Maximum Likelihood best elihood best rbc rbcL tr tree ee Monophyletic Polyphyletic 1665 specimens UNUSUAL: Many orders are polyphyletic!
Boot Bootst strap p consensus consensus rbc rbcL tr tree ee Bootstrap >80% 1665 specimens SURPRISINGLY: rbcL poorly supports beta taxonomy but good at resolving genera!
Species resolution with rbcL and trnL-F for species-rich orders of mosses
Ree eexam xamina ination tion of
taxono
my (red b ed bar ars) p ) provok
ed by by rbc rbcL results esults
Acknowledgement s
Reference Library for Targeted SNP-based Identification of Cibotium barometz Using NGS
Natalia Ivanova Maria Kuzmina Evgeny Zakharov
Plant growing in the Botanischer Garten München- Nymphenburg, Munich, Germany Photograph by: Daderot, Public domain
http://tropical.theferns.info/image.php?id=Cibotium+barometz
The golden brown hairs at the base of the frond Photograph by: Mokkie Creative Commons Attribution-Share Alike 4.0
Anti- inflammatory Anti- rheumatic Anti-
Tonic Styptic Antibacterial Antioxidant
Geiger JMO, Korall P, Ranker TA, Kleist AC, Nelson CL (2013) Molecular Phylogenetic Relationships of Cibotium and Origin of the Hawaiian Endemics. Am Fern J, 103: 141–152, doi:10.1640/0002-8444-103.3.141
rps4 (ribosomal protein S4) – 94 bp atpA (ATP synthase alpha chain) – 86 bp trnG-trnR intergenic spacer – 95 and 102 bp rps4-trnS intergenic spacer – 84 bp and 87 bp atpB-rbcL intergenic spacer – 79 and 110 bp
Average age of UBC Cibotium herbarium material – 58 years
rps4
atpA
rps4- trnS 87-200 rps4- trnS 163-300 trnG- rtnR 565-705 trnG- rtnR 681-830
atpB- rbcL
Confirmed C. barometz SNPs Shared C. barometz/cumingii SNPs Signature SNPs
Assembled reference library with voucher specimens Confirmed available GenBank data for C.barometz and C.cumingii Increased coverage for the regions of interest Resulting reference library can be used for regulatory purposes
David L. Erickson
DNA4 Technologies LLC
Example: plant chloroplast genomes as our reference ~ 150,000 bases in size A A T C G A T C G G A T C T A G A T C T C G A T A T A DECOMPOSE EACH SEQUECE INTO LIST OF OVERLAPPING “WORDS” A A T C G A A T C G A T T C G A T C C G A T C G G A T C G G A T C G G A T C G G A T . . G A T A T A
. .
DNA4 Technologies LLC
[A A T C G A [A T C G A T [T C G A T C [C G A T C G [G A T C G G [A T C G G A [T C G G A T . . [G A T A T A ;E. purpurea_NC1234 ;E. purpurea_NC1234 ;E. purpurea_NC1234 ;E. purpurea_NC1234 ;E. purpurea_NC1234 ;E. purpurea_NC1234 ;E. purpurea_NC1234 . . ;E. purpurea_NC1234 ;E. angustifolia _NC1235 ;E. angustifolia _NC1235 ;E. angustifolia _NC1235 ;E. angustifolia _NC1235 ;E. angustifolia _NC1235 ;E. angustifolia _NC1235 . . ;E. angustifolia _NC1235 [G A T A T T ;E. angustifolia _NC1235 ;Hypericum perforatum_NC22871 . . Hypericum perforatum_NC22871 . Hypericum perforatum_NC22871 Hypericum perforatum_NC22871 [G A T A T C Echinacea purpurea Echinacea angustifolia Hypericum perforatum
DNA4 Technologies LLC
DNA from Sample
G G A T A C T A G C T C G C C T A C T T C A T A G C C T T A G T G T T T A C A T A C A T A C G C T T A
Sequence from Sample (WGS)
G G A T A C G A T A C T A T A C T A T A C T A G A C T A G C T A G C T C A G C T C G C C T A C T . . . . . C G C T T A
Input Data
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Words
ATCATCATA C ATCATCATA G ATCATCTTA C ATCATTTTA C ATCCCTTTA C ATCCCATTA C . . 1 2 3 4 5 6 7 8 9 10 11 N