 
              Overlapping Sets of Taxa Combination of Weighted Trees Containing Identical or The Average Consensus Procedure: François-Joseph Lapointe , Université de Montréal , Montreal, Canada Guy Cucumel , Université du Québec à Montréal , Montreal, Canada
Suppose we have multiple phylogenies on overlapping leaf sets, how can we combine them effectively? *Modified to show the overlapping leaf sets Consensus Trees The Problem
To understand the problem, it is important to define which criteria are important. • Tree Topology Criteria Intuitively, we want the best possible tree , i . e . the one which is closest to every input tree, in topology and branch lengths. • Branch Lengths
How “close” are they to one another? Define It is important to normalize the branch lengths for this to make n (1) n Let’s suppose two trees are defined on the same leaf set. Notion of Tree Distance: Identical Leaf Set Let S = { 1 , 2 , . . . , n } be the taxa, and T 1 and T 2 be two trees on S . ( ) 2 ∑ ∑ ∆( T 1 , T 2 ) := d 1 ( i , j ) − d 2 ( i , j ) i = 1 j = 1 Where d ℓ ( i , j ) is the distance from i to j in T ℓ . Note. sense. Assume all branch lengths in [ 0 , 1 ] .
n n (2) k n n But how do we compute this? k Optimal Consensus Tree: Identical Leaf Set What is the best tree? Given k trees, T 1 , T 2 , . . . T k on S , we want T c that minimizes ( ) 2 ∑ ∑ ∑ ∑ ∆( T c , T ℓ ) = d c ( i , j ) − d ℓ ( i , j ) ℓ = 1 ℓ = 1 i = 1 j = 1 Claim Let d ( i , j ) be the average over all trees T ℓ of d ℓ ( i , j ) , then to optimize ∑ k ℓ = 1 ∆( T c , T ℓ ) it suffices to optimize ( ) 2 ∑ ∑ d c ( i , j ) − d ( i , j ) i = 1 j = 1
k n k n n n n n n n k n n n n k ( ) 2 ∑ ∑ ∑ ∑ ∆( T c , T ℓ ) = d c ( i , j ) − d ℓ ( i , j ) ℓ = 1 ℓ = 1 i = 1 j = 1 [ d c ( i , j ) 2 − 2 d c ( i , j ) d ℓ ( i , j ) + d ℓ ( i , j ) 2 ] ∑ ∑ ∑ = ℓ = 1 i = 1 j = 1 [ d ℓ ( i , j ) 2 )] ∑ ∑ k · d c ( i , j ) 2 − 2 d c ( i , j ) ∑ = ℓ d ℓ ( i , j ) + ∑ ( ℓ i = 1 j = 1 2 + 1 [ 2 − d ( i , j ) d ℓ ( i , j ) 2 )] ∑ ∑ d c ( i , j ) 2 − 2 d c ( i , j ) d ( i , j ) + d ( i , j ) = k · ∑ ( ℓ i = 1 j = 1 [( ] ) 2 − g ( T 1 , . . . , T k ) ∑ ∑ = k · d c ( i , j ) − d ( i , j ) i = 1 j = 1 [( ) 2 ] ∑ ∑ = f ( k ) · d c ( i , j ) − d ( i , j ) − f ( T 1 , . . . , T k ) i = 1 j = 1 ( ) 2 . So optimizing ∑ k ℓ = 1 ∆( T c , T ℓ ) is the same as optimizing ∑ n ∑ n d c ( i , j ) − d ( i , j ) i = 1 j = 1
four-point condition — [Landry, Lapointe, Kirsch, ‘96] We take the weighted average the least-squares metric. k 1 Nonidentical Leaf Sets So we just want the tree whose leaf distances are closest to d ( i , j ) , in This can be done via a heuristic search using e . g . PHYLIP What if there is missing data for d ( i , j ) ? ∗ � ∑ d ( i , j ) = d ℓ ( i , j ) N ( i , j ) ℓ = 1 Where N ( i , j ) is the number of times i and j appear in the same tree. What if N ( i , j ) = 0 ? We can fill in the entries assuming additive using
this average consensus supertree The authors ran their method on the two trees from earlier to get The resulting tree is not too different from the input trees, and correctly predicts a clade that was conjectured by the authors who produced the original trees. Analysis
Questions?
Recommend
More recommend