semi algebraic descriptions of the general markov model
play

Semi-algebraic descriptions of the general Markov model Phylomania - PowerPoint PPT Presentation

Semi-algebraic descriptions of the general Markov model Phylomania 2010 John A. Rhodes Hobart, Tas. November 4-5 Thanks to my collaborators: Elizabeth Allman, Mathematics and Statistics, UAF Amelia Taylor, Mathematics and Computer Science,


  1. Semi-algebraic descriptions of the general Markov model Phylomania 2010 John A. Rhodes Hobart, Tas. November 4-5

  2. Thanks to my collaborators: Elizabeth Allman, Mathematics and Statistics, UAF Amelia Taylor, Mathematics and Computer Science, Colorado College

  3. GM( k ) Model on T : k = size of some alphabet (state space); e.g., k = 2 (0=R,1=Y), k = 4 (A,C,T,G) T a rooted tree, n leaves Pick a vector π = ( π 1 , . . . , π k ) ∈ [0 , 1] k , � π i = 1 to specify a distribution of states at the root of T . For each edge of T directed away from the root, pick a k × k stochastic matrix M e of conditional probabilities of state changes between the endpoints. These choices determine a joint probability distribution P ∈ R k n of states at the leaves (the pattern distribution) Semialgebraic Slide 1

  4. The General Markov (GM) model on T is the collection of such P for all choices of π , M e Variants: Require π , M e to have positive entries (To statisticians, this is a very important difference.) Require M e to be non-singular (Standard assumption of GTR and submodels.) Semialgebraic Slide 2

  5. For fixed T with n leaves, the GM model is the image of a polynomial map φ T : Θ T → R k n . with domain Θ T ⊆ R L defined by equalities (e.g., � π i = 1 , M e 1 = 1 ), and inequalities (e.g., π i ≥ 0 , det( M e � = 0) ). Semialgebraic Slide 3

  6. Definition. A subset of R m defined by polynomial equalities and inequalities is said to be semi-algebraic set. Theorem. (Tarski-Seidenberg) The polynomial image of a semi-algebraic set is semi-algebraic. Thus the GM model on T is a semialgebraic set Semialgebraic Slide 4

  7. Problem: Give an explicit semialgebraic description of the k -state GM model on a tree T . • Equalities (called phylogenetic invariants) have been much studied. • Inequalities are more elusive. Semialgebraic Slide 5

  8. Example: (Cavender-Felsenstein) 1 3 T a 4-leaf tree. 2 4 The 4-point condition with log-det distance, can be exponentiated and expressed as f 1 ( P ) = f 2 ( P ) > f 3 ( P ) where the f i are polynomials. Thus semi-algebraic considerations underlie much theory, and practical algorithms (NJ). Semialgebraic Slide 6

  9. Precursors to our work: S. Klaere: k = 2 , Trees with 3 and 4 leaves (preprint soon!?!) uses natural coordinates, parameterization, clever, detailed arguments P. Zwiernik, J. Smith: k = 2 , any number of leaves, preprints on arXiv different coordinates, parameterization approach seems to require k = 2 Semialgebraic Slide 7

  10. Goal: Understand the k = 2 semialgebraic description in a way that generalizes (at least partially) to k > 2 . Approach: Trees: 3-leaves, then 4-leaves, then more leaves Parameters: C , then R , then stochastic Semialgebraic Slide 8

  11. Background: The 2 × 2 × 2 hyperdeterminant (tangle) ∆ : For P = ( p ijk ) a 2 × 2 × 2 array, a distribution from 3-leaf tree ∆( P ) = ( p 2 000 p 2 111 + p 2 001 p 2 110 + p 2 010 p 2 101 + p 2 011 p 2 100 ) − 2( p 000 p 001 p 110 p 111 + p 000 p 010 p 101 p 111 + p 000 p 011 p 100 p 111 + p 001 p 010 p 101 p 110 + p 001 p 011 p 110 p 100 + p 010 p 011 p 101 p 100 ) + 4( p 000 p 011 p 101 p 110 + p 001 p 010 p 100 p 111 ) . Semialgebraic Slide 9

  12. One way to think of ∆ (Schl¨ afli): For a column vector v = ( x, y ) of indeterminates, • P ∗ 3 v is the sum of matrix slices of P weighted by x and y , • . . . so det( P ∗ 3 v ) is a homogeneous quadratic polynomial in x, y , of form ax 2 + bxy + cy 2 , with a, b, c quadratic in the entries of P , • . . . so the discriminant, b 2 − 4 ac , is a quartic in the entries of P , and is in fact ∆( P ) . So ∆( P ) � = 0 ⇔ there are exactly two non-zero v ∈ C 2 (up to scaling) for which P ∗ 3 v has rank ≤ 1 . Semialgebraic Slide 10

  13. 2 Application of ∆ to GM(2) on a 3-leaf tree: 1 3 Parameters π , M 1 , M 2 , M 3 P = (((Diag( π )) ∗ 1 M 1 ) ∗ 2 M 2 ) ∗ 3 M 3 Let v be the first column of M − 1 3 . Then P ∗ 3 v = (((Diag( π )) ∗ 1 M 1 ) ∗ 2 M 2 ) ∗ 3 M 3 ∗ 3 v = (((Diag( π )) ∗ 1 M 1 ) ∗ 2 M 2 ) ∗ 3 (1 , 0) = (((Diag( π )) ∗ 3 (1 , 0)) ∗ 1 M 1 ) ∗ 2 M 2 = M T 1 diag( π 0 , 0) M 2 which has rank at most 1. Semialgebraic Slide 11

  14. Proposition 1. A tensor P is in the image of the complex parameterization map for the GM(2) model on the 3-leaf tree iff its entries sum to 1 and either (a) ∆( P ) � = 0 , and det( P ∗ i 1 ) � = 0 for i = 1 , 2 , 3 , or (b) ∆( P ) = 0 , and all 2 × 2 minors of at least one of the flattenings P 1 , 23 , P 2 , 13 , P 3 , 12 are zero. In case (a), P is the image of a unique (up to label swapping) choice of non-singular parameters; in case (b), P ’s preimage is larger. Note: Only invariant for GM(2) on 3-leaf tree is trivial. Semialgebraic Slide 12

  15. Moreover, since the sign of the discriminant of a quadratic determines whether roots are real or complex, the connnection between ∆ and the discrimnant yields... Semialgebraic Slide 13

  16. Proposition 2. A tensor P is in the image of the real parameterization map for the GM(2) model on the 3-leaf tree if, an only if, it is real, its entries sum to 1, and either (a) ∆( P ) > 0 , and det( P ∗ i 1 ) � = 0 for i = 1 , 2 , 3 , or (b) ∆( P ) = 0 , and all 2 × 2 minors of at least one of the flattenings P 1 , 23 , P 2 , 13 , P 3 , 12 are zero. In case (a), P is the image of a unique (up to label swapping) choice of non-singular parameters; in case (b), P ’s preimage is larger. Semialgebraic Slide 14

  17. Note: It is not (yet) clear how to generalize the preceding to k > 2 . But what follows holds for k ≥ 2 . Semialgebraic Slide 15

  18. 2 Positivity of parameters: 1 3 Note the marginalizations of P from 3 to 2 taxa are P ·· + = P ∗ 3 (1 , 1) = M T 1 diag( π ) M 2 P · + · = P ∗ 2 (1 , 1) = M T 1 diag( π ) M 3 P + ·· = P ∗ 1 (1 , 1) = M T 2 diag( π ) M 3 so P + ·· ( P · + · ) − 1 P ·· + = M T 2 diag( π ) M 2 is a symmetric matrix. (This was a construction of invariants given in Allman-R 2003.) Semialgebraic Slide 16

  19. But P + ·· ( P · + · ) − 1 P ·· + = M T 2 diag( π ) M 2 is the matrix of a positive definite quadratic form if, and only if, π 0 , π 1 > 0 . There are known semialgebraic descriptions of matrices of such forms (and also positive semdefinite ones). Semialgebraic Slide 17

  20. Theorem. (Sylvester) A symmetric matrix defines a positive definite quadratic form if, and only if, its leading principal minors are positive. l th leading principal minor= l × l subdeterminant in upper left Semialgebraic Slide 18

  21. Theorem. A tensor P is in the image of the positive parameterization map for the GM(2) model on the 3-leaf tree if, an only if, its entries are positive, its entries sum to 1, and either (a) ∆( P ) > 0 , det( P ∗ i 1 ) � = 0 for i = 1 , 2 , 3 , and the 1,1-entries, and the determinants of the following seven matrices are positive: det( P ·· + ) P + ·· Cof( P ·· + ) T P · + · , i ·· Cof( P ·· + ) T P · + · , det( P ·· + ) P T + ·· Cof( P ·· + ) T P · i · , det( P ·· + ) P T · + · Cof( P + ·· ) T P T det( P + ·· ) P T ·· i , (b) ∆( P ) = 0 , and all 2 × 2 minors of at least one of the flattenings P 1 , 23 , P 2 , 13 , P 3 , 12 are zero. Semialgebraic Slide 19

  22. 1 3 4-leaf Tree: 2 4 P is 2 × 2 × 2 × 2 . First, marginalizing out any taxon i , gives 2 × 2 × 2 array P ∗ i (1 , 1) which arises from a 3-leaf tree, and hence earlier theorems apply. Semialgebraic Slide 20

  23. Second, all non-trivial invariants for GM(2) on trees with 4 or more leaves are known (Allman-R, 2007). The key ones are edge invariants: If T has split 12 | 34 , the 4 × 4 flattening P 12 , 34 has rank 2, so all its 3 × 3 subdeterminants are 0. Semialgebraic Slide 21

  24. Theorem. Let P be a complex 2 × 2 × 2 × 2 with entries summing to 1. Then P arises from complex non-singular parameters on a 4-leaf T iff 1. All marginalizations of P to 3 -taxon sets arise from complex non-singular parameters on 3 -leaf trees, and 2. The edge invariants are satisfied by P . For real parameters, replace all occurances of ’complex’ by ’real’. Semialgebraic Slide 22

  25. Positivity of parameters: For root distribution π and stochastic matrices M e on pendant edges, follows from 3-leaf case. Matrices on internal edges require more. . . Semialgebraic Slide 23

  26. We first ’adjust’ P If T = 12 | 34 , and P arises from matrices M 1 , M 2 , M 3 , M 4 on pendant edges, M 5 on internal 1 3 2 4 Let N 32 = P T + ·· + = M T 3 M T 5 diag( π ) M 2 N 31 = P T · + · + = M T 3 M T 5 diag( π )) M 1 so N − 1 32 N 31 = M − 1 2 M 1 . Then ˆ P = P ∗ 2 N − 1 32 N 31 arises from same parameters but with M 1 replacing M 2 . Semialgebraic Slide 24

  27. A similar trick produces ˆ ˆ P from parameters with M 1 = M 2 , M 3 = M 4 . 1 3 2 4 Now flatten ˆ ˆ P to a 4 × 4 the wrong way according to 13 | 24 . Then ˆ P 13 , 24 = A T DA ˆ where A depends on M 1 , M 3 , and D is 4 × 4 diagonal with entries of diag( π ) M 5 Semialgebraic Slide 25

  28. So the entries of M 5 are positive iff ˆ ˆ P 13 , 24 has positive leading principal minors. A bit more work extends this to 5 or more taxa. Semialgebraic Slide 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend