towards characterization of identifiability of profile
play

Towards Characterization of Identifiability of Profile HMMs - PowerPoint PPT Presentation

Towards Characterization of Identifiability of Profile HMMs Srilakshmi Pattabiraman University of Illinois, Urbana-Champaign April 26, 2018 Joint work with Prof. Tandy Warnow. 1/11 Introduction Statistically consistent estimator 0


  1. Towards Characterization of Identifiability of Profile HMMs Srilakshmi Pattabiraman University of Illinois, Urbana-Champaign April 26, 2018 Joint work with Prof. Tandy Warnow. 1/11

  2. Introduction ◮ Statistically consistent estimator ˆ θ 0 (asymptotic estimator) of a parameter θ 0 is one that identifies the correct parameter θ 0 when the data available is arbitrarily large. ◮ A necessary condition for any estimator’s asymptotic consistency is that the evolutionary model has to be identifiable. ◮ Identifiability - given the set of sequence profiles that are generated on a model tree, and the probabilities of their occurrences, can the underlying evolutionary model be identified correctly? ◮ Trivially, if there are two models that generate the same sequence profiles with matched probabilities, the models are not identifiable! 2/11

  3. Central Question Are all profile HMMs identifiable? Figure 1: The standard profile HMM. ◮ φ : 1 path, A : 2 n + 1 paths, AA : n ( n − 1) + (2 n + 1)( n + 1) 3/11 2

  4. Profile HMMs without deletion nodes Figure 2: Profile HMM with no deletion nodes. Theorem The model is identifiable iff no match state has the same distribution as the insertion states. 4/11

  5. Proof Theorem The model is identifiable iff no match state has the same distribution as the insertion states. ◮ the sequence with the minimum length defines the topology p ?[ i − 1] A ?[ n − i ] ◮ z i A = � X ∈ A , T , G , C p ?[ i − 1] X ?[ n − i ] ◮ p A ∗ = x 1 z 1 A + (1 − x 1 ) 1 4 Figure 3: Finding x 1 . 5/11

  6. Proof x 2 z 2 A + (1 − x 2 ) 1 y 1 z 1 A + (1 − y 1 ) 1 ◮ p ? A ∗ = x 1 � � � � + (1 − x 1 ) 4 4 x 2 z 2 T + (1 − x 2 ) 1 y 1 z 1 T + (1 − y 1 ) 1 ◮ p ? T ∗ = x 1 � � � � + (1 − x 1 ) 4 4 Figure 4: Finding x 2 , y 1 . ◮ p ? [ m − 1] A ∗ = x m − 1 , y m − 2 A + (1 − x m ) 1 � � + p ( m : m − 1) � x m z m � + f m , 1 1 1 4 p ( i : m − 2) � y m − 1 z m − 1 + (1 − y m − 1 ) 1 � A 4 ◮ p ? [ m − 1] T ∗ = x m − 1 , y m − 2 + p ( m : m − 1) � x m z m T + (1 − x m ) 1 � � � f m , 2 + 1 1 4 y m − 1 z m − 1 + (1 − y m − 1 ) 1 p ( i : m − 2) � � 4 T 6/11

  7. Proof Figure 5: Two models that produce the same sequence profiles. ◮ p A = x 1 1 4 x 2 ◮ p AA = (1 − x 1 ) 1 4 y 1 1 4 x 2 + x 1 1 4 (1 − x 2 ) 1 4 y 2 ◮ p A [ n ] = n − 2 +(1 − x 1 ) 1 n − 2 y 1 1 x 1 1 4 (1 − x 2 ) 1 4 (1 − y 2 ) n − 2 1 4 (1 − y 1 ) n − 2 1 4 x 2 + 4 4 n 1 y 1 1 n 2 y 2 n 1 + n 2 = n − 3 (1 − x 1 ) 1 4 (1 − y 1 ) n 1 1 4 (1 − x 2 ) 1 4 (1 − y 2 ) n 2 1 � 4 4 7/11

  8. Proof Figure 6: Two models that produce the same sequence profiles.. 8/11

  9. What about the standard profile HMMs? ◮ Unfortunately, these methods don’t extend. ◮ Finding the number of match states itself is non-trivial. ◮ Standard ML tricks may not work! ◮ Maybe they are unidentifiable? 9/11

  10. Bad news! Figure 7: Standard profile HMM with one match state. ◮ If we knew that the profile HMM had only one match state, then the model can be completely characterized. 10/11

  11. Thank you! 11/11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend