Methodological Challenges in the Pursuit
- f the Tree of Life
Christophe Dessimoz
!
Reviews in Computational Biology
February 13th, 2013
Methodological Challenges in the Pursuit of the Tree of Life ! - - PowerPoint PPT Presentation
Reviews in Computational Biology Methodological Challenges in the Pursuit of the Tree of Life ! Christophe Dessimoz February 13th, 2013 Outline Introduction Mature methods: supermatrix, supertree Emerging methods: species-tree
Christophe Dessimoz
February 13th, 2013
Wikipedia
16S rRNA was used by Woese (1987) to group early life forms into three kingdoms
Snel et al. Genome trees and the nature of genome evolution. Annu Rev Microbiol (2005) vol. 59 pp. 191-209
Duplication Gene loss Speciation
paralogs( , )
Altenhoff and Dessimoz, Methods in Molecular Biology 2012
100 30 genes 1000
1,000 100 13 genes
Full genome
1000 578 31 genes
Full genome
1000 2684 8 genes
Full genome
i.e. 50% bootstrap support! i.e. 95% bootstrap support!
# of species
1000
# of marker genes
Goloboff et al. 2009
73,060 191 578 31
Edwards et al. 2010
2684 150 8 77
Ciccarelli 2006 Wu & Eisen 2008 Dunn et al. 2008 Pisani 2007
>1000
Hejnol et al. 2009 Smith et al. 2011
Jeffroy et al. 2006 McInerney et al. 2008 Edwards 2009 Philippe et al. 2011
All photos from Wikipedia
Dunn et al. Nature 2008
Sponges (Porifera) Bilateria Cnidaria (Corals, jellyfish) Comb Jellies (Ctenophora)
80-90% 0-70%
Schierwater et al. PLoS Biol 2009
Cnidaria (Corals, jellyfish) Comb Jellies (Ctenophora) Sponges (Porifera) Bilateria
53% 27%
Philippe et al. Current Biol 2009
Sponges (Porifera) Bilateria Cnidaria (Corals, jellyfish) Comb Jellies (Ctenophora)
62-96%
78-99%
Same argument in Philippe et al. 2011
Most relevant review: Anderson et al. Methods in Molecular Biology 2012
Yang 1993, Yang 1994
Lartillot & Philippe 2004
Galtier 2001, Penny 2001
2008
Dufayard et al., Bioinformatics, 2005
G1 Homo sapiens G4 Pan troglodytes G3 Rattus norvegicus G2 Mus musculus
G
Homo sapiens Pan troglodytes Rattus norvegicus Mus musculus
S
Loss Homo sapiens G4 Pan troglodytes G3 Rattus norvegicus Loss Mus musculus
R
G1 Homo sapiens Loss Pan troglodytes G2 Mus musculus Loss Rattus norvegicus
Duplication node
Reviewed in Altenhoff & Dessimoz, Methods in Molecular Biology 2012
Loss Homo sapiens G4 Pan troglodytes G3 Rattus norvegicus Loss Mus musculus
R
G1 Homo sapiens Loss Pan troglodytes G2 Mus musculus Loss Rattus norvegicus
Duplication node
Reviewed in Altenhoff & Dessimoz, Methods in Molecular Biology 2012
Likelihood: Pick the reconciliation(s) that maximise the probability of
(i.e. gene/species trees) under a particular model
Rannala & Yang, Annu Rev Genomics Hum Genet 2008
Sequence alignments Model Parameters Gene Trees
time of speciation
time to most recent common ancestor
IDEA: instead of fixing species tree, treat as parameter!
locus
also see review of Liu et al 2009
(summary statistics) (parsimony) (parsimony)
inference for each gene (relatively efficient!)
trees modeled as Dirichlet process
Tree of gene i All Sequence alignments Gene-to-tree map assumption
among genes
http://www.cs.princeton.edu/courses/archive/fall07/cos597C/scribe/20070921.pdf
Leaché & Rannala, Syst Biol 2010
population size * mutation rate tree length
Difference between gene and species tree (baseline)
Better
Incomplete Lineage Sorting only Horizontal Gene Transfers+ILS mechanistic (ILS) empirical
Better
much more conservative than the posterior probabilities in the topology estimated from the concatenated alignment”
does not drastically change our overall view of rice phylogeny, but it does give a more varied picture of the support across the tree.”
incongruence (the α parameter)”
reached stationarity after 1.6 billion iterations.” (2 months on 96 CPU cores)
histories
results solely from first principle
(“The largest data set yet tested with these species tree methods is yeast, with 106 loci in 8 species” Cranston 2009)