Towards Characterization of Identifiability of Profile HMMs - - PowerPoint PPT Presentation

towards characterization of identifiability of profile
SMART_READER_LITE
LIVE PREVIEW

Towards Characterization of Identifiability of Profile HMMs - - PowerPoint PPT Presentation

Towards Characterization of Identifiability of Profile HMMs Srilakshmi Pattabiraman University of Illinois, Urbana-Champaign April 26, 2018 Joint work with Prof. Tandy Warnow. 1/11 Introduction Statistically consistent estimator 0


slide-1
SLIDE 1

1/11

Towards Characterization of Identifiability of Profile HMMs

Srilakshmi Pattabiraman

University of Illinois, Urbana-Champaign

April 26, 2018 Joint work with Prof. Tandy Warnow.

slide-2
SLIDE 2

2/11

Introduction

◮ Statistically consistent estimator ˆ

θ0 (asymptotic estimator) of a parameter θ0 is one that identifies the correct parameter θ0 when the data available is arbitrarily large.

◮ A necessary condition for any estimator’s asymptotic

consistency is that the evolutionary model has to be identifiable.

◮ Identifiability - given the set of sequence profiles that are

generated on a model tree, and the probabilities of their

  • ccurrences, can the underlying evolutionary model be

identified correctly?

◮ Trivially, if there are two models that generate the same

sequence profiles with matched probabilities, the models are not identifiable!

slide-3
SLIDE 3

3/11

Central Question

Are all profile HMMs identifiable?

Figure 1: The standard profile HMM.

◮ φ: 1 path, A: 2n + 1 paths, AA: n(n−1) 2

+ (2n + 1)(n + 1)

slide-4
SLIDE 4

4/11

Profile HMMs without deletion nodes

Figure 2: Profile HMM with no deletion nodes.

Theorem

The model is identifiable iff no match state has the same distribution as the insertion states.

slide-5
SLIDE 5

5/11

Proof

Theorem

The model is identifiable iff no match state has the same distribution as the insertion states.

◮ the sequence with the minimum length defines the topology ◮ zi A = p?[i−1]A?[n−i]

  • X∈A,T,G,C p?[i−1]X?[n−i]

◮ pA∗ = x1z1 A + (1 − x1) 1 4

Figure 3: Finding x1.

slide-6
SLIDE 6

6/11

Proof

◮ p?A∗ = x1

  • x2z2

A + (1 − x2) 1 4

  • + (1 − x1)
  • y1z1

A + (1 − y1) 1 4

  • ◮ p?T∗ = x1
  • x2z2

T + (1 − x2) 1 4

  • + (1 − x1)
  • y1z1

T + (1 − y1) 1 4

  • Figure 4: Finding x2, y1.

◮ p?[m−1]A∗ =

fm,1

  • xm−1

1

, ym−2

1

  • + p(m:m−1)

xmzm

A + (1 − xm) 1 4

  • +

p(i:m−2) ym−1zm−1

A

+ (1 − ym−1) 1

4

  • ◮ p?[m−1]T∗ =

fm,2

  • xm−1

1

, ym−2

1

  • + p(m:m−1)

xmzm

T + (1 − xm) 1 4

  • +

p(i:m−2) ym−1zm−1

T

+ (1 − ym−1) 1

4

slide-7
SLIDE 7

7/11

Proof

Figure 5: Two models that produce the same sequence profiles.

◮ pA = x1 1 4x2 ◮ pAA = (1 − x1) 1 4y1 1 4x2 + x1 1 4(1 − x2) 1 4y2 ◮ pA[n] =

x1 1

4(1−x2) 1 4(1−y2)n−2 1 4 n−2+(1−x1) 1 4(1−y1)n−2 1 4 n−2y1 1 4x2+

  • n1+n2=n−3(1−x1) 1

4(1−y1)n1 1 4 n1y1 1 4(1−x2) 1 4(1−y2)n2 1 4 n2y2

slide-8
SLIDE 8

8/11

Proof

Figure 6: Two models that produce the same sequence profiles..

slide-9
SLIDE 9

9/11

What about the standard profile HMMs?

◮ Unfortunately, these methods don’t extend. ◮ Finding the number of match states itself is non-trivial. ◮ Standard ML tricks may not work! ◮ Maybe they are unidentifiable?

slide-10
SLIDE 10

10/11

Bad news!

Figure 7: Standard profile HMM with one match state.

◮ If we knew that the profile HMM had only one match state,

then the model can be completely characterized.

slide-11
SLIDE 11

11/11

Thank you!