Supertree Analysis of the Plant Family Fabaceae
Tiffany Morris Advisor: Martin Wojciechowski June 2004-December 2004
Supertree Analysis of the Plant Family Fabaceae Tiffany Morris - - PowerPoint PPT Presentation
Supertree Analysis of the Plant Family Fabaceae Tiffany Morris Advisor: Martin Wojciechowski June 2004-December 2004 Project Goal To obtain a Supertree for the plant family Fabaceae utilizing phylogenetic trees found in previously
Tiffany Morris Advisor: Martin Wojciechowski June 2004-December 2004
– 750 genera – 18,000 species – 3rd largest family, cosmopolitan in distribution – Many of these species are agriculturally and economically important
Given the basic difficulties with inferring trees of a relative few taxa, how do we infer BIG phylogenies, with hundreds or thousands of taxa. . .?
Two basic philosophical approaches:
“total evidence” approach requires combined data to be compatible
“taxonomic congruence” requires that studies possess same set of taxa
Some existing options
advantage: information retained in individual characters is useful disadvantages: gathering data to fill in gaps between taxa requires significant expense some kinds of data cannot be included
sequence databases
estimates (source trees) sharing some taxa but not necessarily all by combining trees rather than the data (Bininda-Emonds, 2004)
Clusters (“genes” or other homologs) A B . . . n
species 1 species 2 species 3 . . . . . . . . . . . . . . species m
Supertree construction Sequence concatenation The sparse matrix of sequence and phylogenetic databases (i.e., what we have NOW in databases) Genbank release 127.0 (June 2003) 108,813 proteins from 11,5587 taxa (plants) # taxa x sequence clusters: 62 genes by 6 species
3 genes by 65 species
Data from Sanderson et al. (2003)
Supertree terminology
A D E F
source tree 1
D B C E
source tree 2
B A C D E F A B C D E F
strict supertree 1 strict supertree 2
*From Sanderson et al. (1998)
Two compatible source trees, together with two strict supertrees that are consistent with them despite disagreeing with each other. Taxa found on only one source tree are unique; taxa found on two or more are
found among the source trees is a supertree.
+
conflict among source trees (esp. w/ large numbers)
data matrix representation, analysis using parsimony
than MRP
1 2 3 4 A 0 0 0 0 B 1 0 0 1 C 1 1 1 1 D 1 1 1 0
Characters Taxa A C B D A B C D A D C B
1 4 2 3 3 3 3 3 4 4 4 4 1 1 2 2 2 5 steps 7 steps 6 steps
This data matrix contains character conflict. For example, character 4 suggests {B,C} is a monophyletic group, but characters 2 and 3 suggest {C,D} is monophyletic. They cannot both be true. How do we reconstruct phylogeny when the characters do not all agree? Phylogenetic analysis using parsimony is a procedure by which individual hypotheses of synapomorphy (shared, derived characters) are “tested” against one another for their
state changes (sum of # of changes or length=5) is considered the most parsimonious of the three possible solutions.
Parsimony
A A B B C D E F G A 1 1 1 . . . . . . . . . . . . . . 0 0 B 1 1 1 . . . . . . . . . . . . . . 1 0 C 0 1 1 . . . . . . . . . . . . . . ? ? D 0 0 1 . . . . . . . . . . . . . . ? ? E 0 0 0 . . . . . . . . . . . . . . ? ? F ? ? ? . . . . . . . . . . . . . . 1 1 G ? ? ? . . . . . . . . . . . . . . 1 1
In MRP a new matrix is constructed whose characters refer to the topologies of the source
have been proposed for determining which taxa are scored as ‘0’, ‘1’, or ‘?’. Baum and Ragan scheme shown below: Score ‘1’ for each taxon in clade, a ‘0’ for each taxon not in a clade, and a ‘?’ for taxa not present in that source tree. The characters from all source trees are then combined into one matrix and analyzed with
rooted with hypothetical ancestor having states with all ‘0’s.
– Keywords legumes, Fabaceae, systematics – Also searched for authors that have published in this field before
– Gene sequences, non-coding DNA sequences, Morphology, binary characters (loss of chloroplast IR)
(from Sanderson 2002)
Example of a ‘tree-graph’ of phylogenies, showing taxonomic overlap among source trees.
– Citation – Main Taxon – Number of Taxa – Outgroup – Character (sequence, morphological) – Phylogenetic Method (parsimony) – Support Value – Genbank/Treebase – Trees Presented – Independence – PDF file of paper
– Misspellings, accession numbers
– Multiple accessions for the same species
– Multiple names for the same organism – Have not dealt with this issue yet
http://darwin.zoology.gla.ac.uk/cgi-bin/supertree.pl)
http://genome.cs.iastate.edu/supertree/userdata_analysis/userdata_analysis.html)
0.005 changes
Mimosoids Papilionoids Leguminosae
Ceratonia Cercis Dinizia
Caesalpinioids
Amherstia Pentaclethra Calliandra Albizia Swartzia Myrospermum Diplotropis Calia
Genistoids s.l.
Thermopsis Lupinus Amorpha Arachis Pterocarpus Poecilanthe Prosopis Dalbergia Diphysa Xeroderris
Millettioids IRLC
Tephrosia Glycine Phaseolus
Robinioids
Sesbania Lotus japonicus Robinia Glycyrrhiza Astragalus Pisum Vicia faba
Dalbergioids s.l.
Medicago Andira Baphia
Canavanine
Hologalegina
Trifolium Vigna Acacia Wisteria
Plastid matK gene phylogeny Bayesian analysis 330 taxa
Albizia julibrissin Durazz.
Wojciechowski M.F. 34/330 taxa
Hughes C.E 72 taxa
Miller J.T 60 taxa
Clarke H.D 26 taxa
Cercidium floridum Torr.
Wojciechowski M.F. 33/330 taxa
Haston E.M. 28 taxa
Herendeen P.S. 220 taxa
Schnabel A. 13 taxa
Simpson B.B 81 taxa
Davis C.C 7 taxa
Brouat C. 13 taxa
Schnabel A. 13 taxa
Erythrina L.
Wojciechowski M.F. 262/330 taxa
Allan G.J 52 taxa
McMahon M. 240 taxa
Pardo C. 78 taxa
Ree R. 15 taxa
Ainoche A. 34 taxa
Crisp M.D.66 taxa
Dong T.X.X 10 taxa
Kang Y. 56 taxa
Lavin M. 12 taxa
109 taxa
84 taxa
Badr A. 37 taxa
57 taxa
23 taxa
42 taxa
12 taxa
50 taxa
77 taxa
57 taxa
61 taxa
95 taxa
Pennington R.T. 122 taxa
Allan G.J. 42 taxa
Crisp M.D.99 taxa
Murphy D.J. 19 taxa
Ainoche A-K 49 taxa
Delgado-Salinas A. 132 taxa
Wagstaff S.J. 39 taxa
Wojciechowski M.F. 115 taxa
Asmussen C.B. 42 taxa
Bena G. 13 taxa
Downie S.R. 62 taxa
Fennel S.R. 10 taxa
Lavin M. 34 taxa
van Oss H. 8 taxa
Sanderson M.J. 41 taxa
Pennington R.T 27 taxa
Liston A. 51 taxa
Bruneau A. 66 taxa
Doyle J.J. 53 taxa
Sanderson M.J. 33 taxa
Liston A. 64 taxa
Papilionoid Supermatrix 1502 taxa, 1683 characters
Vauquelinia_californica 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Polygala_californica 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Suriana_maritima 1100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Quillaja_saponaria 1100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Bauhinia_tomentosa 1011100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Cercis_gigantea 1011110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Cercis_occidentalis 1011110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Cercis_canadensis 1011110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Ceratonia_siliqua 1010000000000011100000000000000000000000000000000000000000000000000000000000000000000000000000000000 Gymnocladus_chinensis 1010000000000011100000000000000000000000000000000010000000000000000000000000000000000000000000000000 Gleditsia_sinensis 1010000000000011100000000000000000000000000000000011000000000000000000000000000000000000000000000000 Gleditsia_triacanthos 1010000000000011100000000000000000000000000000000011000000000000000000000000000000000000000000000000 Arcoa_gonavensis 1010000000000011100000000000000000000000000000000000000000000000000000000000000000000000000000000000 Colophospermum_mopane 1011001100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Prioria_copaifera 1011001100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Hymenaea_courbaril 1011001011000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Tessmannia_lescrauwaetii 1011001011000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Brownea_sp 1011001010110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Oddoniodendron_micranthum 1011001010111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Berlinia_congolensis 1011001010111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Brachystegia_spiciformis 1011001010111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Cynometra_mannii 1011001010110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Amherstia_nobilis 1011001010100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Petalostylis_labicheoides 1010000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Dialium_guianensis 1010000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Erythrostemon_gilliesii 1010000000000011111000000000000000000000000000000000000000000000000000000000000000000000000000000000 Caesalpinia_andamanica 1010000000000011111100000000000000000000000000000000000000000000000000000000000000000000000000000000 Caesalpinia_pulcherrima ???????????????????????????????????????????????????????????????????????????????????????????????????? Haematoxylum_brasiletto 1010000000000011111100000000000000000000000000000000000000000000000000000000000000000000000000000000 Chamaecrista_fasciculata 1010000000000011110010000000000000000000000000000000000000000000000000000000000000000000000000000000 Senna_candolleana 1010000000000011110011000000000000000000000000000000000000000000000000000000000000000000000000000000 Senna_covesii 1010000000000011110011000000000000000000000000000000000000000000000000000000000000000000000000000000 Peltophorum_dubium 1010000000000011110000111000000000000000000000000000000000000000000000000000000000000000000000000000 Cercidium_floridum 1010000000000011110000111110000000000000000000000000000000000000000000000000000000000000000000000000 Parkinsonia_aculeata 1010000000000011110000111110000000000000000000000000000000000000000000000000000000000000000000000000 Conzattia_multiflora 1010000000000011110000111100000000000000000000000000000000000000000000000000000000000000000000000000 Poeppigia_procera 1010000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Dinizia_excelsa 1010000000000011110000110000000000000000000000000000000000000000000000000000000000000000000000000000 Inga_punctata 1010000000000011110000100001111111111000000000000000000000000000000000000000000000000000000000000000 Samanea_saman 1010000000000011110000100001111111110000000000000000000000000000000000000000000000000000000000000000 Enterolobium_cyclocarpum 1010000000000011110000100001111111110000000000000000000000000000000000000000000000000000000000000000 Enterolobium_contortisiliquum ???????????????????????????????????????????????????????????????????????????????????????????????????? Lysiloma_watsonii 1010000000000011110000100001111111100100000000000000000000000000000000000000000000000000000000000000 Lysiloma_acapulcensis ???????????????????????????????????????????????????????????????????????????????????????????????????? Lysiloma_tergemina ???????????????????????????????????????????????????????????????????????????????????????????????????? Havardia_pallens ???????????????????????????????????????????????????????????????????????????????????????????????????? Havardia_albicans ????????????????????????????????????????????????????????????????????????????????????????????????????
– Phylogenetic Analysis Using Parsimony
– storing 5000 trees maximum – holding five trees at each step – using the TBR (tree bisection-reconnection) branch- swapping algorithm
A B C A B C A B C D E D E D A B C E D E Step 1 Step 2 Step 3
Heuristic methods: step 1, making initial tree, taxon addition sequence
Taxa are always added sequentially to make a tree in this phase. The simplest order of addition is known as “ASIS” addition; here taxa are added in the order they appear in the matrix. The first three taxa are joined into an unrooted three-taxon tree, then the fourth taxon in the matrix is added. It can be added in one of three places, so the length of the tree is determined for each possibility and the placement that is optimal at that point in time is selected. Next, the fifth taxon is added, and so on, until a complete tree is built. Other addition sequence implemented in software such as PAUP* include RANDOM (random order addition) and CLOSEST (which chooses next taxon to be added by finding the one that would add the fewest number of steps to the new tree).
Heuristic methods: step 2, branch swapping
E F A C G D B E F G D C A B F D G E A
C
B Branch swapping by tree bisection and reconnection (TBR). The tree is initially bisected along a branch, yielding two disjoint subtrees. The subtrees are then reconnected by joining a pair of branches, one from each subtree, with all possible bisections and reconnections evaluated. The shortest is saved and branch swapping proceeds again until a shorter tree is found.
(after Swofford et al. 1996)
Optimization methods
On a landscape of trees, random addition sequences (tree- building) are used to find multiple optima, or ‘tree islands’. Branch swapping moves search nearer to top of local optima. New random addition sequences may find additional local
Shortest trees Trees (solutions)
end of one random addition sequence end of one random addition sequence branch-swapping end of one random addition sequence
Vauquelinia calif ornica Polygala californica Suriana marit ima Quillaja saponaria Bauhinia t omentosa Cercis gigantea Cercis canadensis Cercis occident alis Colophospermum mopane Prioria copaifera Hymenaea courbaril Tessmannia lescrauwaetii Amherstia nobilis Brownea sp Cynometra mannii Brachystegia spiciformis Oddoniodendron micranthum Berlinia congolensis Poeppigia procera Petalostylis labicheoides Dialium guianensis Ceratonia siliqua Arcoa gonavensis Gymnocladus chinensis Gleditsia sinensis Gleditsia triacanthos Erythrost emon gilliesii Caesalpinia andamanica Haematoxylum brasiletto Chamaecrista f asciculata Senna candolleana Senna covesii Dinizia excelsa Peltophorum dubium Conzattia multiflora Cercidium floridum Parkinsonia aculeata Pentaclethra macroloba Pentaclethra macrophylla Piptadeniastrum africanum Entada abyssinica Pseudoprosopis gilletii Xylia africana Calpocalyx heitzii Adenanthera pavonina Amblygonocarpus andongens Tetrapleura tetraptera Leucaena cuspidata Leucaena pulverulenta Leucaena greggii Albizia kalkora Albizia sinaloensis Leucaena ret usa Neptunia monosperma Leucaena esculenta Leucaena matudae Leucaena pueblana Leucaena lempirana Leucaena trichodes Leucaena lanceolata sou Leucaena lanceolata1 Leucaena collinsii zaca2 Leucaena trichandra Leucaena collinsii Leucaena multicapitula Laucaena salvadorensis Leucaena shannonii Leucaena lancelolata2 Leucaena magnifica Leucaena collinsii zaca1 Leucaena macrophylla istm Leucaena macrophylla Schleinitzia novoguineensis Schleinitzia insularum Kanaloa kahoolawensis Desmanthus balsensis Desmanthus glandulosus Desmanthus virgatus Desmanthus velutinus Desmanthus pernambucanus Desmanthus obtusus Desmanthus acuminatus2 Desmanthus paspalaceus Desmanthus leptophyllus Desmanthus pubescens2 Desmanthus illinoiensis Desmanthus leptolobus Desmanthus pringlei Desmanthus tatahuyensis Desmanthus acuminatus1 Desmanthus reticulatus Desmanthus bicornutus3 Desmanthus covillei Desmanthus bicornutus2 Desmanthus pubescens1 Desmanthus fruticosus Desmanthus bicornutus1 Desmanthus interior Desmanthus oligospermus Desmanthus pumilus Prosopis glandulosa Prosopis pallida Acacia parviflora Acacia berlandieri Calliandropsis nervosus Gagnebina pterocarpa Dichrostachys paucifoliolata Dichrostachys tenuifolia Dichrostachys akataensis Dichrostachys scott iana Dichrostachys unijuga Dichrostachys arborescens Gagnebina calcicola Gagnebina commersoniana Gagnebina myriophylla Gagnebina pervilleana Gagnebina bernieriana Gagnebina bakoliae Alantsilodendron villosum Alantsilodendron ramosum Alantsilodendron pilosum Alantsilodendron brevipes Alantsilodendron alluaudianum Alantsilodendron mahafalense Dichrostachys venosa Desmanthus cooleyi Prosopidastrum mexicana Dichrostachys richardiana Parkia timoriana Microlobium foetidus Acacia hindsii Acacia greggii Acacia karroo Acacia nilotica Acacia roemeriana Acacia senegal Acacia t ortilis Acacia modesta Acacia acapulcensis Acacia willardiana Acacia galpinii Acacia angust issima Ebenopsis ebano Inga punctat a Albizia julibrissin Enterolobium cyclocarpum Enterolobium contortisiliquum Lysiloma watsonii Havardia mexicana Calliandra californica Calliandra surinamensis Mimosa tenuiflora Calliandra physocalyx Cathormion umbellatum Chloroleucon mangense Calliandra longepedicellata Leucaena leucocephala Pararchidendron pruinosum Faidherbia albida Paraserianthes lophantha Havardia pallens Havardia albicans Lysiloma acapulcensis Lysiloma tergemina Acacia alata Acacia mearnsii Acacia pulchella Acacia visco Acacia elata Acacia lycopodifolia Acacia glomerosa Acacia boliviana Acacia melanoxylon Acacia ampliceps Acacia t ransluscens Acacia adoxa Acacia platycarpa Caesalpinia pulcherrima Acacia schweinfurthii Inga edulis Desmanthus bicornutus Prosopis articulata Prosopis palmeri Mimosa guat emalensis Microlobius foetidus Neptunia plena Neptunia oleracea Neptunia pubescens Neptunia lutea Neptunia dimorphantha Neptunia monosperma2 Neptunia monosperma1 Neptunia gracilis Dichrostachys spicata Aphanocalyx cynometroides Dichrostachys cinerea Acacia acuifera Acacia schot tii Acacia neovernicosa Acacia sieberiana Acacia horrida Acacia rigidula Acacia const ricta Zapoteca tetragona Samanea saman Pseudosamanea guachapele Acacia macracantha Acacia f arnesiana Acacia t ortuosa Acacia f arnesiana guana Acacia caven Acacia cornigera Acacia choriophylla Acacia cochliacantha Acacia pennatula Acacia aroma
90% Majority Rule TBR/5
Ecology and Evolution 19:315-322.
supertree construction. Chapter 12 in Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Computational Biology 3:267-280.
sequence databases. Molecular Biology and Evolution 20: 1036-1042.
Phylogenetic Inference. In Molecular systematics, 2nd edition, chap. 5, pp. 407-514. Sinauer and Associates, Sunderland, Massachusetts