balance indices for phylogenetic trees under well known
play

Balance indices for phylogenetic trees under well-known probability - PowerPoint PPT Presentation

Balance indices for phylogenetic trees under well-known probability models Universitat de les Illes Balears Toms M. Coronado 1 What is a phylogenetic tree? Balance 2 Probabilistic models for phylogenetic trees The Yule model The Uniform


  1. Balance indices and probability models Tomás M. Coronado November 10, 2020 16 / 59 The Yule model The Yule model explicitly assumes that, at each speciation event, all the current species are equally likely to speciate.

  2. Balance indices and probability models Tomás M. Coronado November 10, 2020 16 / 59 The Yule model The Yule model explicitly assumes that, at each speciation event, all the current species are equally likely to speciate. The Yule model is • Markovian with q ( k , n − k ) = 1 n − 1 [Semple and Steel 2003]. • Shape invariant by construction. • Sampling consistent [Ford 2005].

  3. Balance indices and probability models Tomás M. Coronado November 10, 2020 17 / 59 The Uniform model Recursive model of tree growth for bifurcating trees: 1. Start with a single node

  4. Balance indices and probability models Tomás M. Coronado November 10, 2020 17 / 59 The Uniform model Recursive model of tree growth for bifurcating trees: 1. Start with a single node 2. For every step m , add a new leaf by choosing uniformly between any arc

  5. Balance indices and probability models Tomás M. Coronado November 10, 2020 17 / 59 The Uniform model Recursive model of tree growth for bifurcating trees: 1. Start with a single node 2. For every step m , add a new leaf by choosing uniformly between any arc 1 2 ( m − 1 )

  6. Balance indices and probability models Tomás M. Coronado November 10, 2020 17 / 59 The Uniform model Recursive model of tree growth for bifurcating trees: 1. Start with a single node 2. For every step m , add a new leaf by choosing uniformly between any arc 3. Until the number of leaves n is reached

  7. Balance indices and probability models Tomás M. Coronado November 10, 2020 17 / 59 The Uniform model Recursive model of tree growth for bifurcating trees: 1. Start with a single node 2. For every step m , add a new leaf by choosing uniformly between any arc 3. Until the number of leaves n is reached 4. Label the tree uniformly 1 2 3 6 5

  8. Balance indices and probability models Tomás M. Coronado November 10, 2020 18 / 59 The Uniform model Equivalently: Uniformly choose a tree with n leaves from the set of all phylogenetic trees with n leaves.

  9. Balance indices and probability models Tomás M. Coronado November 10, 2020 18 / 59 The Uniform model Equivalently: Uniformly choose a tree with n leaves from the set of all phylogenetic trees with n leaves. Therefore, it assumes that all the joint evolutive histories are equally likely.

  10. Balance indices and probability models Tomás M. Coronado November 10, 2020 18 / 59 The Uniform model Equivalently: Uniformly choose a tree with n leaves from the set of all phylogenetic trees with n leaves. Therefore, it assumes that all the joint evolutive histories are equally likely. • There are ( 2 n − 3 ) !! trees with n leaves [Schröder 1870]. 1 • Therefore, each tree has probability ( 2 n − 3 ) !! .

  11. Balance indices and probability models Tomás M. Coronado November 10, 2020 18 / 59 The Uniform model Equivalently: Uniformly choose a tree with n leaves from the set of all phylogenetic trees with n leaves. Therefore, it assumes that all the joint evolutive histories are equally likely. • There are ( 2 n − 3 ) !! trees with n leaves [Schröder 1870]. 1 • Therefore, each tree has probability ( 2 n − 3 ) !! . As a result, the Uniform model is k ) − 1 ( 2 k − 3 ) !! ( 2 ( n − k ) − 3 ) !! 2 ( n • Markovian with q ( k , n − k ) = C k , n − k = 1 ( 2 n − 3 ) !! [Semple and Steel 2003], where n !! = n ( n − 2 )( n − 4 ) · · · 1 if n is odd and n !! = n ( n − 2 )( n − 4 ) · · · 2 if it is even. • Shape invariant by construction. • Sampling consistent [Ford 2005].

  12. Balance indices and probability models Tomás M. Coronado November 10, 2020 19 / 59 The α -model Recursive and parametric model of tree growth for bifurcating trees with 0 ≤ α ≤ 1: 1. Start with a single node labelled

  13. Balance indices and probability models Tomás M. Coronado November 10, 2020 19 / 59 The α -model Recursive and parametric model of tree growth for bifurcating trees with 0 ≤ α ≤ 1: 1. Start with a single node labelled 2. For every step m , add a new leaf by choosing randomly between:

  14. Balance indices and probability models Tomás M. Coronado November 10, 2020 19 / 59 The α -model Recursive and parametric model of tree growth for bifurcating trees with 0 ≤ α ≤ 1: 1. Start with a single node labelled 2. For every step m , add a new leaf by choosing randomly between: – pending arc 1 − α n − α

  15. Balance indices and probability models Tomás M. Coronado November 10, 2020 19 / 59 The α -model Recursive and parametric model of tree growth for bifurcating trees with 0 ≤ α ≤ 1: 1. Start with a single node labelled 2. For every step m , add a new leaf by choosing randomly between: – pending arc – internal arc α n − α

  16. Balance indices and probability models Tomás M. Coronado November 10, 2020 19 / 59 The α -model Recursive and parametric model of tree growth for bifurcating trees with 0 ≤ α ≤ 1: 1. Start with a single node labelled 2. For every step m , add a new leaf by choosing randomly between: – pending arc – internal arc (including a new root) α n − α

  17. Balance indices and probability models Tomás M. Coronado November 10, 2020 19 / 59 The α -model Recursive and parametric model of tree growth for bifurcating trees with 0 ≤ α ≤ 1: 1. Start with a single node labelled 2. For every step m , add a new leaf by choosing randomly between: – pending arc – internal arc (including a new root) 3. Until number of leaves n is reached

  18. Balance indices and probability models Tomás M. Coronado November 10, 2020 19 / 59 The α -model Recursive and parametric model of tree growth for bifurcating trees with 0 ≤ α ≤ 1: 1. Start with a single node labelled 2. For every step m , add a new leaf by choosing randomly between: – pending arc – internal arc (including a new root) 3. Until number of leaves n is reached 1 2 3 5 6 4. Label the tree uniformly

  19. Balance indices and probability models Tomás M. Coronado November 10, 2020 20 / 59 The α -model • Markovian [Ford 2005]. • Shape invariant by construction. • Sampling consistent [Ford 2005].

  20. Balance indices and probability models Tomás M. Coronado November 10, 2020 21 / 59 The α -model • Equal to the Yule model if α = 0 [Ford 2005]. • Equal to the Uniform model if α = 1/2 [Ford 2005].

  21. Balance indices and probability models Tomás M. Coronado November 10, 2020 22 / 59 The α - γ -model Recursive and parametric model of tree growth for multifurcating trees with 0 ≤ γ ≤ α ≤ 1: 1. Start with a single node 1 labelled 1

  22. Balance indices and probability models Tomás M. Coronado November 10, 2020 22 / 59 The α - γ -model Recursive and parametric model of tree growth for multifurcating trees with 0 ≤ γ ≤ α ≤ 1: 1. Start with a single node labelled 1 2. For every step m , add a new leaf by choosing randomly between: 1 2 3 4 5

  23. Balance indices and probability models Tomás M. Coronado November 10, 2020 22 / 59 The α - γ -model Recursive and parametric model of tree growth for multifurcating trees with 0 ≤ γ ≤ α ≤ 1: 1. Start with a single node labelled 1 2. For every step m , add a new leaf by choosing randomly between: – pending arc 1 2 3 4 5 1 − α n − α

  24. Balance indices and probability models Tomás M. Coronado November 10, 2020 22 / 59 The α - γ -model Recursive and parametric model of tree growth for multifurcating trees with 0 ≤ γ ≤ α ≤ 1: 1. Start with a single node labelled 1 2. For every step m , add a new leaf by choosing randomly between: – pending arc – internal node 1 2 3 4 5 ( deg ( v ) − 1 ) α − γ n − α

  25. Balance indices and probability models Tomás M. Coronado November 10, 2020 22 / 59 The α - γ -model Recursive and parametric model of tree growth for multifurcating trees with 0 ≤ γ ≤ α ≤ 1: 1. Start with a single node labelled 1 2. For every step m , add a new leaf by choosing randomly between: – pending arc – internal node – internal arc 1 2 3 4 5 γ n − α

  26. Balance indices and probability models Tomás M. Coronado November 10, 2020 22 / 59 The α - γ -model Recursive and parametric model of tree growth for multifurcating trees with 0 ≤ γ ≤ α ≤ 1: 1. Start with a single node labelled 1 2. For every step m , add a new leaf by choosing randomly between: – pending arc – internal node – internal arc (including a new root) 1 2 3 4 5 γ n − α

  27. Balance indices and probability models Tomás M. Coronado November 10, 2020 22 / 59 The α - γ -model Recursive and parametric model of tree growth for multifurcating trees with 0 ≤ γ ≤ α ≤ 1: 1. Start with a single node labelled 1 2. For every step m , add a new leaf by choosing randomly between: – pending arc – internal node – internal arc (including a new root) and label it m 1 1 2 2 3 3 4 4 5 5 6

  28. Balance indices and probability models Tomás M. Coronado November 10, 2020 22 / 59 The α - γ -model Recursive and parametric model of tree growth for multifurcating trees with 0 ≤ γ ≤ α ≤ 1: 1. Start with a single node labelled 1 2. For every step m , add a new leaf by choosing randomly between: – pending arc – internal node – internal arc (including a new root) and label it m 1 1 2 2 3 3 4 4 5 5 6 3. Until number of leaves n is reached

  29. Balance indices and probability models Tomás M. Coronado November 10, 2020 23 / 59 The α - γ -model The only probabilistic model presented here of multifurcating trees. • Markovian [Chen, Ford, and Winkel 2009]. • Not shape invariant in general. • Sampling consistent [Chen, Ford, and Winkel 2009].

  30. Balance indices and probability models Tomás M. Coronado November 10, 2020 24 / 59 The α - γ -model • Equal to the α -model when α = γ if we relabel each leaf uniformly [Chen, Ford, and Winkel 2009].

  31. Balance indices and probability models Tomás M. Coronado November 10, 2020 25 / 59 The β -model 1. Start with n dots uniformly distributed over the interval [ 0, 1 ] 0 1

  32. Balance indices and probability models Tomás M. Coronado November 10, 2020 25 / 59 The β -model 1. Start with n dots uniformly distributed over the interval [ 0, 1 ] 2. Choose a point in [ 0, 1 ] with beta density f ( x ) = Γ ( 2 β + 2 ) Γ 2 ( β + 1 ) x β ( 1 − x ) β , 0 < x < 1. 0 1 i

  33. Balance indices and probability models Tomás M. Coronado November 10, 2020 25 / 59 The β -model 1. Start with n dots uniformly distributed over the interval [ 0, 1 ] 2. Choose a point in [ 0, 1 ] with beta density f ( x ) = Γ ( 2 β + 2 ) Γ 2 ( β + 1 ) x β ( 1 − x ) β , 0 < x < 1. 0 1 ii i

  34. Balance indices and probability models Tomás M. Coronado November 10, 2020 25 / 59 The β -model 1. Start with n dots uniformly distributed over the interval [ 0, 1 ] 2. Choose a point in [ 0, 1 ] with beta density f ( x ) = Γ ( 2 β + 2 ) Γ 2 ( β + 1 ) x β ( 1 − x ) β , 0 < x < 1. 0 1 ii iii i

  35. Balance indices and probability models Tomás M. Coronado November 10, 2020 25 / 59 The β -model 1. Start with n dots uniformly distributed over the interval [ 0, 1 ] 2. Choose a point in [ 0, 1 ] with beta density f ( x ) = Γ ( 2 β + 2 ) Γ 2 ( β + 1 ) x β ( 1 − x ) β , 0 < x < 1. 3. Until each pair of leaves is separated by at least one point 0 1 iv ii iii i

  36. Balance indices and probability models Tomás M. Coronado November 10, 2020 25 / 59 The β -model 1. Start with n dots uniformly distributed over the interval [ 0, 1 ] 2. Choose a point in [ 0, 1 ] with beta density f ( x ) = Γ ( 2 β + 2 ) Γ 2 ( β + 1 ) x β ( 1 − x ) β , 0 < x < 1. 3. Until each pair of leaves is separated by at least one point 4. Construct the tree accordingly i ii iv 0 1 iv ii iii i

  37. Balance indices and probability models Tomás M. Coronado November 10, 2020 25 / 59 The β -model 1. Start with n dots uniformly distributed over the interval [ 0, 1 ] 2. Choose a point in [ 0, 1 ] with beta density f ( x ) = Γ ( 2 β + 2 ) Γ 2 ( β + 1 ) x β ( 1 − x ) β , 0 < x < 1. 3. Until each pair of leaves is separated by at least one point 4. Construct the tree accordingly 5. Label the tree uniformly 1 2 3 4

  38. Balance indices and probability models Tomás M. Coronado November 10, 2020 26 / 59 The β -model • It is Markovian [Aldous 1996]. • Shape invariant by construction. • Sampling consistent [Aldous 1996].

  39. Balance indices and probability models Tomás M. Coronado November 10, 2020 27 / 59 The β -model • Equal to the Yule model if β = 0 [Aldous 1996]. • Equal to the Uniform model if β = − 3/2 [Aldous 1996].

  40. Balance indices and probability models Tomás M. Coronado November 10, 2020 27 / 59 The β -model • Equal to the Yule model if β = 0 [Aldous 1996]. • Equal to the Uniform model if β = − 3/2 [Aldous 1996]. • Therefore, the α and β models intersect at these points...

  41. Balance indices and probability models Tomás M. Coronado November 10, 2020 27 / 59 The β -model • Equal to the Yule model if β = 0 [Aldous 1996]. • Equal to the Uniform model if β = − 3/2 [Aldous 1996]. • Therefore, the α and β models intersect at these points... • ... and these are the only points at which them intersect ( Theorem 43 at [Ford 2005]).

  42. 1 What is a phylogenetic tree? Balance 2 Probabilistic models for phylogenetic trees The Yule model The Uniform model The α and α - γ models The β -model 3 Balance indices The Colless index The Sackin index The Cophenetic index The Quadratic Colless index The rooted Quartet index 4 Conclusions 5 References

  43. Balance indices and probability models Tomás M. Coronado November 10, 2020 29 / 59 Balance indices: What do we know? • Most balance indices have only been studied under the models of Yule and Uniform.

  44. Balance indices and probability models Tomás M. Coronado November 10, 2020 29 / 59 Balance indices: What do we know? • Most balance indices have only been studied under the models of Yule and Uniform. • The only index presented here of which we know both the first and second moments under every probabilistic model presented is the rooted Quartet index.

  45. Balance indices and probability models Tomás M. Coronado November 10, 2020 30 / 59 The Colless index • Introduced in [Colless 1982]. • Only sound for bifurcating trees. • Let u ∈ ˚ V ( T ) , and call u 1 , u 2 its two children. Let κ ( u i ) be the number of leaves of T under u i . • Then, ∑ C ( T ) = | κ ( u 1 ) − κ ( u 2 ) | . u ∈ ˚ V ( T )

  46. Balance indices and probability models Tomás M. Coronado November 10, 2020 30 / 59 The Colless index • Introduced in [Colless 1982]. • Only sound for bifurcating trees. • Let u ∈ ˚ V ( T ) , and call u 1 , u 2 its two children. Let κ ( u i ) be the number of leaves of T under u i . • Then, ∑ C ( T ) = | κ ( u 1 ) − κ ( u 2 ) | . u ∈ ˚ V ( T ) In other words, the sum over all internal nodes of the absolute difference of numbers of leaves of each pair of subtrees rooted at the same internal node.

  47. Balance indices and probability models Tomás M. Coronado November 10, 2020 31 / 59 The Colless index The Colless index has the undeniable quality of being intuitive, as it sums up all the “local imbalances” of a tree. • Its maximum value for a tree with n leaves is ( n − 1 2 ) and it is attained exactly by the caterpillars [Mir, Rotger, and Rosselló 2013]. • Its minimum value is ∑ ℓ − 1 i = 0 2 m i ( m ℓ − m i − 2 ( ℓ − i − 1 )) , where ∑ ℓ i = 0 2 m i , with m i < m i + 1 , is the binary decomposition of n . It is attained by the maximally balanced trees, among other trees [Coronado, Fischer, et al. 2020].

  48. Balance indices and probability models Tomás M. Coronado November 10, 2020 31 / 59 The Colless index The Colless index has the undeniable quality of being intuitive, as it sums up all the “local imbalances” of a tree. • Its maximum value for a tree with n leaves is ( n − 1 2 ) and it is attained exactly by the caterpillars [Mir, Rotger, and Rosselló 2013]. • Its minimum value is ∑ ℓ − 1 i = 0 2 m i ( m ℓ − m i − 2 ( ℓ − i − 1 )) , where ∑ ℓ i = 0 2 m i , with m i < m i + 1 , is the binary decomposition of n . It is attained by the maximally balanced trees, among other trees [Coronado, Fischer, et al. 2020]. • By far, the most popular balance index in the literature.

  49. Balance indices and probability models Tomás M. Coronado November 10, 2020 32 / 59 The Colless index: what do we know? σ 2 σ 2 σ 2 σ 2 index E Yule E unif E α E β α Yule unif β Colless � [1] � [2] O [3] × × × × × [1] Heard 1992 [2] Cardona, Mir, and Rosselló 2013 [3] Blum, François, and Janson 1996

  50. Balance indices and probability models Tomás M. Coronado November 10, 2020 32 / 59 The Colless index: what do we know? σ 2 σ 2 σ 2 σ 2 index E Yule E unif E α E β α Yule unif β Colless � [1] � [2] O [3] × × × × × [1] Heard 1992 [2] Cardona, Mir, and Rosselló 2013 [3] Blum, François, and Janson 1996 • If we knew the expected value or the variance under the β or α model, we would know it under the Uniform model.

  51. Balance indices and probability models Tomás M. Coronado November 10, 2020 33 / 59 The Sackin index • Introduced in [Sokal 1983]. • Can be defined for all trees, but we usually study it only for bifurcating trees. • Defined as S ( T ) = ∑ δ ( x ) , x ∈ L ( T ) where δ ( x ) is the depth of x ; i.e., the length of the shortest path from the root to x .

  52. Balance indices and probability models Tomás M. Coronado November 10, 2020 33 / 59 The Sackin index • Introduced in [Sokal 1983]. • Can be defined for all trees, but we usually study it only for bifurcating trees. • Defined as S ( T ) = ∑ δ ( x ) , x ∈ L ( T ) where δ ( x ) is the depth of x ; i.e., the length of the shortest path from the root to x . In other words, the sum of the depths of all the leaves of T .

  53. Balance indices and probability models Tomás M. Coronado November 10, 2020 34 / 59 The Sackin index Also intuitive: the caterpillar has more different depths than the maximally balanced tree does. • Its maximum value for a tree with n leaves is ( n − 1 )( n + 2 ) and it is 2 attained exactly by the caterpillars [Fischer 2018]. • Its minimum value is 2 m m + 2 s ( m + 1 ) , where n = 2 m + s , with s < 2 m . It is attained exactly by the maximally balanced trees and the trees depth-equivalent to them [Fischer 2018].

  54. Balance indices and probability models Tomás M. Coronado November 10, 2020 34 / 59 The Sackin index Also intuitive: the caterpillar has more different depths than the maximally balanced tree does. • Its maximum value for a tree with n leaves is ( n − 1 )( n + 2 ) and it is 2 attained exactly by the caterpillars [Fischer 2018]. • Its minimum value is 2 m m + 2 s ( m + 1 ) , where n = 2 m + s , with s < 2 m . It is attained exactly by the maximally balanced trees and the trees depth-equivalent to them [Fischer 2018]. • The second most popular balance index in the literature.

  55. Balance indices and probability models Tomás M. Coronado November 10, 2020 35 / 59 The Sackin index: what do we know? σ 2 σ 2 σ 2 σ 2 index E Yule E unif E α E β α Yule unif β Colless � � O × × × × × Sackin � [1] � [2] � [3] � [4] × × × × [1] Kirkpatrick and Slatkin 1993 [2] Cardona, Mir, and Rosselló 2013 [3] Mir, Rotger, and Rosselló 2013 [4] Coronado, Mir, Rosselló, and Rotger 2020

  56. Balance indices and probability models Tomás M. Coronado November 10, 2020 36 / 59 The Sackin index This last result is known thanks to the proof in the Supplementary Material of [Coronado, Mir, Rosselló, and Rotger 2020] of Proposition 6 thereof: the solution of the family of recurrences n − 1 r s � n � + ( 2 n − 2 ) !! � n � ∑ ∑ ∑ X n = 2 C k X k + a l b l , ( 2 n − 3 ) !! l l i = 1 k = 1 l = 1 with initial condition X 1 and a l , b l real numbers.

  57. Balance indices and probability models Tomás M. Coronado November 10, 2020 36 / 59 The Sackin index This last result is known thanks to the proof in the Supplementary Material of [Coronado, Mir, Rosselló, and Rotger 2020] of Proposition 6 thereof: the solution of the family of recurrences n − 1 r s � n � + ( 2 n − 2 ) !! � n � ∑ ∑ ∑ X n = 2 C k X k + a l b l , ( 2 n − 3 ) !! l l i = 1 k = 1 l = 1 with initial condition X 1 and a l , b l real numbers. As a further note, the term ( 2 n − 2 ) !! ( 2 n − 3 ) !! appears when dealing with the expected value or the variance of recursive shape indices under the Uniform model.

  58. Balance indices and probability models Tomás M. Coronado November 10, 2020 37 / 59 The Colless and Sackin indices In [Blum, François, and Janson 1996], we find the following results • The Pearson correlation under the Yule model of the Sackin and Colless indices tends to 27 − 2 π 2 − 6 log 2 cor Yule ( C n , S n ) ∼ ∼ 0.98, 2 ( 18 − π 2 − 6 log 2 )( 21 − 2 π 2 ) � as n goes to ∞ . • Under the Uniform model, S n − C n → 0 n 3/2 in probability as n tends to ∞ . • Let A be the Airy distribution [Flajolet and Louchard 2001]. Under the Uniform model, S n n 3/2 → A in distribution as n tends to ∞ .

  59. Balance indices and probability models Tomás M. Coronado November 10, 2020 38 / 59 The Cophenetic index • Introduced in [Mir, Rotger, and Rosselló 2013]. • Can be defined for all trees, but we usually study it only for bifurcating trees. • Defined as ∑ Φ ( T ) = φ ( x , y ) , x , y ∈ L ( T ) where φ ( x , y ) is the cophenetic value of x and y ; i.e., depth of the lowest common ancestor of both x and y .

  60. Balance indices and probability models Tomás M. Coronado November 10, 2020 38 / 59 The Cophenetic index • Introduced in [Mir, Rotger, and Rosselló 2013]. • Can be defined for all trees, but we usually study it only for bifurcating trees. • Defined as ∑ Φ ( T ) = φ ( x , y ) , x , y ∈ L ( T ) where φ ( x , y ) is the cophenetic value of x and y ; i.e., depth of the lowest common ancestor of both x and y . In other words, the sum over all pairs of leaves of the length of their shared evolutive history.

  61. Balance indices and probability models Tomás M. Coronado November 10, 2020 39 / 59 The Cophenetic index • Its maximum value for a tree with n leaves is ( n 3 ) and it is attained exactly by the caterpillars [Mir, Rotger, and Rosselló 2013]. • Its minimum value for a multifurcating tree with n leaves is ( n 2 ) and is attained exactly at the stars. • Its minimum value for a bifurcating tree with n leaves is s n � n � 2 m j ( n ) − 1 ( m j ( n ) + 2 ( s n − j )) ∑ − 2 j = 1 , where ∑ ℓ j = 0 is the binary decomposition of n , m i < m i + 1 [to be submitted]. It is attained exactly by the maximally balanced trees [Mir, Rotger, and Rosselló 2013].

  62. Balance indices and probability models Tomás M. Coronado November 10, 2020 40 / 59 The Cophenetic index: what do we know? σ 2 σ 2 σ 2 σ 2 index E Yule E unif E α E β Yule unif α β × × × × × Colless � � O Sackin � � � � × × × × Cophenetic � [1] � [2] � [1] � [3] × × × × [1] Mir, Rotger, and Rosselló 2013 [2] Cardona, Mir, and Rosselló 2013 [3] Coronado, Mir, Rosselló, and Rotger 2020

  63. Balance indices and probability models Tomás M. Coronado November 10, 2020 41 / 59 The Cophenetic index: limit behaviour under the Yule model We can extend the definition of the Cophenetic index continuously taking into account edge lengths [Bartoszek 2018a], call it ˆ Φ . 2 ) − 1 ˆ • For the continuos Cophenetic index, ( n Φ n is a positive submartingale that converges almost surely and in L 2 to a finite first and second moment random variable [Bartoszek 2018a] under the Yule model. 2 ) − 1 Φ n • For the (discrete) Cophenetic index, it can be shown that ( n is an almost surely and L 2 convergent submartingale [Bartoszek 2018a] under the Yule model.

  64. Balance indices and probability models Tomás M. Coronado November 10, 2020 42 / 59 The Sackin and Cophenetic indices • The covariance of the Sackin and Cophenetic indices under the Uniform model is known [Coronado, Mir, Rosselló, and Rotger 2020]: � ( 2 n − 2 ) !! � 26 n 2 − 5 n − 4 � n − 3 n + 2 � n cov unif ( S n , Φ n ) = 2 15 8 2 ( 2 n − 3 ) !! � � ( 2 n − 2 ) !! � 2 � n − n . 2 2 ( 2 n − 3 ) !!

  65. Balance indices and probability models Tomás M. Coronado November 10, 2020 42 / 59 The Sackin and Cophenetic indices • The covariance of the Sackin and Cophenetic indices under the Uniform model is known [Coronado, Mir, Rosselló, and Rotger 2020]: � ( 2 n − 2 ) !! � 26 n 2 − 5 n − 4 � n − 3 n + 2 � n cov unif ( S n , Φ n ) = 2 15 8 2 ( 2 n − 3 ) !! � � ( 2 n − 2 ) !! � 2 � n − n . 2 2 ( 2 n − 3 ) !! • The Pearson correlation of the Sackin and Cophenetic under the Uniform model is estimated [Coronado, Mir, Rosselló, and Rotger 2020]: 52 − 15 π 60 cor unif ( S n , Φ n ) = ∼ 0.965. � 10 − 3 π 56 − 15 π 3 240

  66. Balance indices and probability models Tomás M. Coronado November 10, 2020 43 / 59 The Quadratic Colless index • Introduced in [Bartoszek et al. 2020]. • Only sound for bifurcating trees. • Let u ∈ ˚ V ( T ) , and call u 1 , u 2 its two children. Let κ ( u i ) be the number of leaves of T under u i . • Then, C ( 2 ) ( T ) = ( κ ( u 1 ) − κ ( u 2 )) 2 . ∑ u ∈ ˚ V ( T )

  67. Balance indices and probability models Tomás M. Coronado November 10, 2020 43 / 59 The Quadratic Colless index • Introduced in [Bartoszek et al. 2020]. • Only sound for bifurcating trees. • Let u ∈ ˚ V ( T ) , and call u 1 , u 2 its two children. Let κ ( u i ) be the number of leaves of T under u i . • Then, C ( 2 ) ( T ) = ( κ ( u 1 ) − κ ( u 2 )) 2 . ∑ u ∈ ˚ V ( T ) In other words, it has the same intuitive justification as the Colless index, but the square instead of the absolute value makes it much more easy to manipulate.

  68. Balance indices and probability models Tomás M. Coronado November 10, 2020 44 / 59 The Quadratic Colless index The Quadratic Colless index has the undeniable quality of being intuitive, as it sums up all the “local imbalances” of a tree. • Its maximum value for a tree with n leaves is n ( n − 1 )( 2 n − 1 ) and it is 6 attained exactly by the caterpillars [Bartoszek et al. 2020]. • Its minimum value is the same of the Colless index. It is attained exactly by the maximally balanced trees [Bartoszek et al. 2020].

  69. Balance indices and probability models Tomás M. Coronado November 10, 2020 44 / 59 The Quadratic Colless index The Quadratic Colless index has the undeniable quality of being intuitive, as it sums up all the “local imbalances” of a tree. • Its maximum value for a tree with n leaves is n ( n − 1 )( 2 n − 1 ) and it is 6 attained exactly by the caterpillars [Bartoszek et al. 2020]. • Its minimum value is the same of the Colless index. It is attained exactly by the maximally balanced trees [Bartoszek et al. 2020]. In contrast with the difficult characterization of the trees attaining the minimum Colless index.

  70. Balance indices and probability models Tomás M. Coronado November 10, 2020 45 / 59 The Quadratic Colless index: what do we know? σ 2 σ 2 σ 2 σ 2 index E Yule E unif E α E β α Yule unif β Colless � � O × × × × × × × × × Sackin � � � � Cophenetic � � � � × × × × Q. Colless � [1] � [1] � [1] � [4] × × × × [1] Bartoszek et al. 2020

  71. Balance indices and probability models Tomás M. Coronado November 10, 2020 46 / 59 The Quadratic Colless index: limit behaviour under the Yule model Set Y : = C ( 2 ) − E Yule ( C ( 2 ) n ) . n 2 As n → ∞ , the distribution under the Yule model of Y is such that Y → τ 2 Y ′ + ( 1 − τ ) 2 Y ′′ + ( 1 + 6 τ 2 − 6 τ ) , in distribution , where τ ∼ Unif [ 0, 1 ] and Y ′ , Y ′′ are independent and distributed according to the same law as the limit of Y [Bartoszek et al. 2020].

  72. Balance indices and probability models Tomás M. Coronado November 10, 2020 47 / 59 The rooted Quartet index There are five different trees with five leaves. Q 0 Q 1 Q 2 Q 3 Q 4 Figur: The five tree shapes in T 4 .

  73. Balance indices and probability models Tomás M. Coronado November 10, 2020 47 / 59 The rooted Quartet index There are five different trees with five leaves. Q 0 Q 1 Q 2 Q 3 Q 4 Figur: The five tree shapes in T 4 . They are ordered according to their number of automorphisms, and assigned a number q i increasing on it.

  74. Balance indices and probability models Tomás M. Coronado November 10, 2020 48 / 59 The rooted Quartet index • Introduced in [Coronado, Mir, Rosselló, and Valiente 2019]. • Can be defined (and makes sense) for all trees. • Defined as 4 ∑ QI ( T ) = |{ Q ∈ Part 4 ( L ( T )) : T ( Q ) = Q i }| · q i . i = 0

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend