Algebraic Identification of Binary-Valued Hidden Markov Processes - - PowerPoint PPT Presentation

algebraic identification of binary valued hidden markov
SMART_READER_LITE
LIVE PREVIEW

Algebraic Identification of Binary-Valued Hidden Markov Processes - - PowerPoint PPT Presentation

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook Algebraic Identification of Binary-Valued Hidden Markov Processes Alexander Schnhuth CWI, Amsterdam SIAM AG 11 8 Oct 2011 Guideline Introduction


slide-1
SLIDE 1

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Algebraic Identification of Binary-Valued Hidden Markov Processes

Alexander Schönhuth

CWI, Amsterdam

SIAM AG 11 8 Oct 2011

slide-2
SLIDE 2

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Guideline

Introduction Hidden Markov Chains Complete Identification Algebraic Statistics Introduction The Hidden Markov Model The Main Theorem The Hankel Matrix Invariants Computations Outlook

slide-3
SLIDE 3

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Hidden Markov Processes

Alternative Formulation

0.8 a b c a b c

1 2

START 0.25 0.5 0.25 0.25 0.3 0.45 0.5 0.7 0.5 0.3 0.2

  • Initial probabilities π = (0.8, 0.2)
  • Transition probabilities

M = (mij := P(i → j))i,j=1,2 = 0.3 0.7 0.5 0.5

  • Emission probabilities,

e.g. e1b = 0.5, e2c = 0.45. Let Ox := e1x e2x

  • , Tx := Ox · M, x = a, b, c

then (v := a1...an, Tv := Ta1...Tan) PX (X1 = a1, ..., Xn = an) = π′Tv 1 1

slide-4
SLIDE 4

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Identification of HMPs

[Blackwell and Koopmans (1957), Gilbert (1959)]

Complete Identification Problem: Let (Xt) be a stochastic process which takes values in a finite set Σ.

  • [Decision] Decide whether (Xt) is a hidden Markov process.
  • [Inference] If yes, infer its parameters.
slide-5
SLIDE 5

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Identification of HMPs

[Blackwell and Koopmans (1957), Gilbert (1959)]

Complete Identification Problem: Let (Xt) be a stochastic process which takes values in a finite set Σ.

  • [Decision] Decide whether (Xt) is a hidden Markov process.
  • [Inference] If yes, infer its parameters.
slide-6
SLIDE 6

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Identification of HMPs

[Blackwell and Koopmans (1957), Gilbert (1959)]

Complete Identification Problem: Let (Xt) be a stochastic process which takes values in a finite set Σ.

  • [Decision] Decide whether (Xt) is a hidden Markov process.
  • [Inference] If yes, infer its parameters.

Related Work:

  • Generic solutions for stationary processes available [e.g. Finesso et al.

(2010), Vidyasagar (2011)]

  • Cone-based reformulation for arbitrary processes [Heller (1965)]
  • Identifiability [e.g. Allman et al. (2009)]
  • Efficient equivalence test [Faigle, AS (2011)]
slide-7
SLIDE 7

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Pitfall

Lemma:

  • Every probability distribution

P : Σn → [0, 1] is due to a hidden Markov process (on |Σn| hidden states).

  • In case of a “stationary” P, the process can be chosen stationary.

Therefore:

  • The answer to the question whether a finite amount of data is due to a

hidden Markov process is always ’Yes’.

  • The restriction to stationary processes holds no promise for any better in

that respect.

slide-8
SLIDE 8

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Finite Identification

Let P : Σn → [0, 1] be a probability distribution over Σ-strings of finite length n. Finite Identification:

  • [Decision] Decide whether P is induced by a hidden Markov process

process on at most ⌊ n+1

2 ⌋ hidden states.

  • [Inference] If yes, infer the possibly unique parametrization.

Remark: Hidden Markov process on d hidden states is uniquely determined by its distribution on strings of length 2d − 1. Algebraic statistics can give answers!

slide-9
SLIDE 9

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Generic Identification

Result Let Hd,+ be the set of (stochastic) d-hidden state HMP parametrizations. Theorem: Let |Σ| = 2, d ≤ n+1

2

and P : Σn → [0, 1] be a probability distribution. There is a an algebraic variety Nd such that dim Nd < dim Hd,+ and an algorithm A such that A(P) =      ’HMP on d hidden states’ P ∈ fn,d((Hd,+ \ Nd) ’Cannot decide’ P ∈ fn,d(Nd) ’No HMP on d hidden states’ else . In the first case, A also outputs the parametrization, which is unique up to permutation

  • f hidden states.
slide-10
SLIDE 10

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Hidden Markov Processes

Alternative Formulation

0.8 a b c a b c

1 2

START 0.25 0.5 0.25 0.25 0.3 0.45 0.5 0.7 0.5 0.3 0.2

  • Initial probabilities π = (0.8, 0.2)
  • Transition probabilities

M = (mij := P(i → j))i,j=1,2 = 0.3 0.7 0.5 0.5

  • Emission probabilities,

e.g. e1b = 0.5, e2c = 0.45. Let Ox := e1x e2x

  • , Tx := Ox · M, x = a, b, c

then (v := a1...an, Tv := Ta1...Tan) PX (X1 = a1, ..., Xn = an) = π′Tv 1 1

slide-11
SLIDE 11

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Algebraic Statistical Models

The Hidden Markov Model Definition: Hidden Markov Model for d hidden states and strings of length n over Σ: fHd ,n Hd − → C|Σ|n (M, E, π) = ((Ta = OaM)a∈Σ), π) → (π′Tv1)v∈Σn. M ∈ Cd×d : M1d = 1d E ∈ Cd×|Σ| : E1|Σ| = 1d π ∈ Cd

slide-12
SLIDE 12

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Algebraic Statistical Models

The Hidden Markov Model Definition: Hidden Markov Model for d hidden states and strings of length n over Σ: fHd ,n Hd − → C|Σ|n (M, E, π) = ((Ta = OaM)a∈Σ), π) → (π′Tv1)v∈Σn. M ∈ Cd×d : M1d = 1d E ∈ Cd×|Σ| : E1|Σ| = 1d π ∈ Cd Hd ∼ = Cd2+d(|Σ|−1)

slide-13
SLIDE 13

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Algebraic Statistical Models

The Hidden Markov Model Definition: Hidden Markov Model for d hidden states and strings of length n over Σ: fHd ,n Hd − → C|Σ|n (M, E, π) = ((Ta = OaM)a∈Σ), π) → (π′Tv1)v∈Σn. M ∈ Cd×d : M1d = 1d E ∈ Cd×|Σ| : E1|Σ| = 1d π ∈ Cd Hd ∼ = Cd2+d(|Σ|−1) Oa := diag (e1a, ..., eda) Ta := Oa · M Tv=v1...vn := Tv1...Tvn

slide-14
SLIDE 14

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Algebraic Statistical Models

The Hidden Markov Model Definition: Hidden Markov Model for d hidden states and strings of length n over Σ: fHd ,n Hd − → C|Σ|n (M, E, π) = ((Ta = OaM)a∈Σ), π) → (π′Tv1)v∈Σn. M ∈ Cd×d : M1d = 1d E ∈ Cd×|Σ| : E1|Σ| = 1d π ∈ Cd Hd ∼ = Cd2+d(|Σ|−1) Oa := diag (e1a, ..., eda) Ta := Oa · M Tv=v1...vn := Tv1...Tvn Wanted: Equations describing im (fHd ,n)

slide-15
SLIDE 15

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Algebraic Statistical Models

The Finitary Model Definition: Finitary Model of rank d for strings of length n fMd ,n : Md − → C|Σ|n ((Ta)a∈Σ), π) → (π′Tv1)v∈Σn where Md = {((Ta)a∈Σ, π) ∈ C|Σ|d2+d |

  • a

Ta1 = 1} ∼ = C|Σ|d2. Remark: Finitary models reflect finitary processes.

slide-16
SLIDE 16

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Algebraic Statistical Models

The Finitary Model Definition: Finitary Model of rank d for strings of length n fMd ,n : Md − → C|Σ|n ((Ta)a∈Σ), π) → (π′Tv1)v∈Σn where Md = {((Ta)a∈Σ, π) ∈ C|Σ|d2+d |

  • a

Ta1 = 1} ∼ = C|Σ|d2. Remark: Finitary models reflect finitary processes. Reminder: Hidden Markov Model on d hidden states for strings of length n fHd ,n : Cd2+(|Σ|−1)d − → C|Σ|n ((Ta = OaM)a∈Σ), π) → (π′Tv1)v∈Σn Hence im (fHd ,n) ⊂ im (fMd ,n).

slide-17
SLIDE 17

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Dimension

Theorem: Let n ≥ 2d − 1. dim im fMd ,n = (|Σ| − 1)d2 + d dim im fHd ,n = (|Σ| − 1)d + d2.

slide-18
SLIDE 18

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Dimension

Theorem: Let n ≥ 2d − 1. dim im fMd ,n = (|Σ| − 1)d2 + d dim im fHd ,n = (|Σ| − 1)d + d2. Corollary: dim im fMd ,n = d2 + d = dim im fHd ,n in case of |Σ| = 2 which implies im fMd ,n = im fHd ,n since both are irreducible. Finitary model and hidden Markov model are algebraic statistically equivalent for binary-valued alphabets!

slide-19
SLIDE 19

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Main Theorem

The Hankel Matrix

  • Let v ∈ Σn. Write

p(v = v1...vn) := P(X1 = v1, ..., Xn = vn). for stochastic process (Xt).

  • Write

uv = u1...umv1...vn ∈ Σm+n for concatenation of u = u1...um ∈ Σm and v = v1...vn ∈ Σn.

slide-20
SLIDE 20

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Main Theorem

The Hankel Matrix

  • Let v ∈ Σn. Write

p(v = v1...vn) := P(X1 = v1, ..., Xn = vn). for stochastic process (Xt).

  • Write

uv = u1...umv1...vn ∈ Σm+n for concatenation of u = u1...um ∈ Σm and v = v1...vn ∈ Σn.

  • Consider the (infinite-dimensional)

Hankel matrix Pp := [p(uv)]u,v∈Σ∗ ∈ RΣ∗×Σ∗.

  • Example (Σ = {0, 1}):

Pp =          p() p(0) p(1) . . . p(0) p(00) p(10) . . . p(1) p(01) p(11) . . . p(00) p(000) p(100) . . . p(01) p(001) p(101) . . . . . . . . . . . . ...         

slide-21
SLIDE 21

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Finitary Processes

Theorem: Let p : Σ∗ → R. The following conditions are equivalent. (ii) p is a finitary process that is ∀v ∈ Σ∗ : p(v = v1...vn) = π′Tv1 · ... · Tvn1 for suitable π ∈ Rd and Ta ∈ Rd×d for all a ∈ Σ. (i) rk Pp ≤ d.

slide-22
SLIDE 22

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Finitary Processes

Theorem: Let p : Σ∗ → R. The following conditions are equivalent. (ii) p is a finitary process that is ∀v ∈ Σ∗ : p(v = v1...vn) = π′Tv1 · ... · Tvn1 for suitable π ∈ Rd and Ta ∈ Rd×d for all a ∈ Σ. (i) rk Pp ≤ d. Generic solution of the decision problem for |Σ| = 2: (Xt) is HMP ⇔ all d + 1-minors of Pp are zero.

slide-23
SLIDE 23

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Finitary Processes

Theorem: Let p : Σ∗ → R. The following conditions are equivalent. (ii) p is a finitary process that is ∀v ∈ Σ∗ : p(v = v1...vn) = π′Tv1 · ... · Tvn1 for suitable π ∈ Rd and Ta ∈ Rd×d for all a ∈ Σ. (i) rk Pp ≤ d. Generic solution of the decision problem for |Σ| = 2: (Xt) is HMP ⇔ all d + 1-minors of Pp are zero. Caveat: Pp is infinite-dimensional, expressions in strings of length = n.

slide-24
SLIDE 24

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Finitary Processes

Theorem: Let p : Σ∗ → R. The following conditions are equivalent. (ii) p is a finitary process that is ∀v ∈ Σ∗ : p(v = v1...vn) = π′Tv1 · ... · Tvn1 for suitable π ∈ Rd and Ta ∈ Rd×d for all a ∈ Σ. (i) rk Pp ≤ d. Generic solution of the decision problem for |Σ| = 2: (Xt) is HMP ⇔ all d + 1-minors of Pp are zero. Caveat: Pp is infinite-dimensional, expressions in strings of length = n. Shorter strings no problem though: p(u) =

  • w∈Σn−m

p(uw) for u ∈ Σm, m < n

slide-25
SLIDE 25

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Main Theorem

Set Theoretic Lemma Let Pp,m,n := [p(wv)]|v|≤n,|w|≤m ∈ C

(|Σ|n+1−1) |Σ|−1

× (|Σ|m+1−1)

|Σ|−1

refer to prefixes resp. suffixes of length at most m and n respectively. Lemma: Let n ≥ 2d − 1. Then the following statements are equivalent: (i) (p(v))v∈Σn

  • ∈C|Σ|n

∈ (im fMd ,n \ im fMd−1,n) (ii) rk Pp,d−1,d−1 = rk Pp,⌊ n

2 ⌋,⌈ n 2 ⌉ = rk Pp,⌈ n 2 ⌉,⌊ n 2 ⌋ = d

slide-26
SLIDE 26

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Main Theorem

Set Theoretic Lemma Lemma: Let n ≥ 2d − 1. Then the following statements are equivalent: (i) (p(v))v∈Σn

  • ∈C|Σ|n

∈ (im fMd ,n \ im fMd−1,n) (ii) rk Pp,d−1,d−1 = rk Pp,⌊ n

2 ⌋,⌈ n 2 ⌉ = rk Pp,⌈ n 2 ⌉,⌊ n 2 ⌋ = d

Example: n = 3, d = 2 (⌈ n

2 ⌉ = 2, ⌊ n 2 ⌋ = 1)

            

  • 1

00 01 10 11

  • p()

p(0) p(1) p(00) p(01) p(10) p(11) p(0) p(00) p(10) p(000) p(010) p(100) p(110) 1 p(1) p(01) p(11) p(001) p(011) p(101) p(111) 00 p(00) p(000) p(100) 01 p(01) p(001) p(101) 10 p(10) p(010) p(110) 11 p(11) p(011) p(111)             

slide-27
SLIDE 27

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Main Theorem

Set Theoretic Lemma Lemma: Let n ≥ 2d − 1. Then the following statements are equivalent: (i) (p(v))v∈Σn

  • ∈C|Σ|n

∈ (im fMd ,n \ im fMd−1,n) (1) (ii) rk Pp,d−1,d−1 = rk Pp,⌊ n

2 ⌋,⌈ n 2 ⌉ = rk Pp,⌈ n 2 ⌉,⌊ n 2 ⌋ = d

(2) Example: n = 4, d = 2 (⌈ n

2 ⌉ = 2 = ⌊ n 2 ⌋ = 2)

Pp,2,2 =             

  • 1

00 01 10 11

  • p()

p(0) p(1) p(00) p(01) p(10) p(11) p(0) p(00) p(10) p(000) p(010) p(100) p(110) 1 p(1) p(01) p(11) p(001) p(011) p(101) p(111) 00 p(00) p(000) p(100) p(0000) p(0100) p(1000) p(1100) 01 p(01) p(001) p(101) p(0001) p(0101) p(1001) p(1101) 10 p(10) p(010) p(110) p(0010) p(0110) p(1010) p(1110) 11 p(11) p(011) p(111) p(0011) p(0111) p(1011) p(1111)             

slide-28
SLIDE 28

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Main Theorem

Ideal Theoretic Characterization I := det (p(uivj))1≤i,j≤d+1 | 0 ≤ |ui|, |vj| ≤ ⌈ n

2 ⌉, |uivj| ≤ n

J := det (p(uivj))1≤i,j≤d | 0 ≤ |ui|, |vj| ≤ d − 1 . Example: n = 4, d = 2 (note that, for example, p(000) = p(0000) + p(0001))             

  • 1

00 01 10 11

  • p()

p(0) p(1) p(00) p(01) p(10) p(11) p(0) p(00) p(10) p(000) p(010) p(100) p(110) 1 p(1) p(01) p(11) p(001) p(011) p(101) p(111) 00 p(00) p(000) p(100) p(0000) p(0100) p(1000) p(1100) 01 p(01) p(001) p(101) p(0001) p(0101) p(1001) p(1101) 10 p(10) p(010) p(110) p(0010) p(0110) p(1010) p(1110) 11 p(11) p(011) p(111) p(0011) p(0111) p(1011) p(1111)             

slide-29
SLIDE 29

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Main Theorem

Ideal Theoretic Characterization I := det (p(uivj))1≤i,j≤d+1 | 0 ≤ |ui|, |vj| ≤ ⌈ n

2 ⌉, |uivj| ≤ n

J := det (p(uivj))1≤i,j≤d | 0 ≤ |ui|, |vj| ≤ d − 1 . Example: n = 4, d = 2 (note that, for example, p(000) = p(0000) + p(0001))             

  • 1

00 01 10 11

  • p()

p(0) p(1) p(00) p(01) p(10) p(11) p(0) p(00) p(10) p(000) p(010) p(100) p(110) 1 p(1) p(01) p(11) p(001) p(011) p(101) p(111) 00 p(00) p(000) p(100) p(0000) p(0100) p(1000) p(1100) 01 p(01) p(001) p(101) p(0001) p(0101) p(1001) p(1101) 10 p(10) p(010) p(110) p(0010) p(0110) p(1010) p(1110) 11 p(11) p(011) p(111) p(0011) p(0111) p(1011) p(1111)              Theorem: Let Σ = {0, 1} and n ≥ 2d − 1. I(im fHd ,n) = I(im fMd ,n) = rad I : J.

slide-30
SLIDE 30

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Macaulay

d = 2, n = 3 Macaulay pseudocode: I3,3 = minors (     p(00) p(000) p(100) p(01) p(001) p(101) p(10) p(010) p(110) p(11) p(011) p(111)     , 3) J2 = minors ( p(0) p(00) p(10) p(1) p(01) p(11)

  • , 2).
slide-31
SLIDE 31

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Macaulay

d = 2, n = 3 Macaulay pseudocode: I3,3 = minors (     p(00) p(000) p(100) p(01) p(001) p(101) p(10) p(010) p(110) p(11) p(011) p(111)     , 3) J2 = minors ( p(0) p(00) p(10) p(1) p(01) p(11)

  • , 2).

Computations reveal that rad I3,3 : J2 = I3,3 which implies im f3,2 = V(I3,3).

slide-32
SLIDE 32

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Bertini

d = 2, n = 4

J2 as before and I3,4 := minors (    p(00) p(000) p(100) p(0000) p(0100) p(1000) p(1100) p(01) p(001) p(101) p(0001) p(0101) p(1001) p(1101) p(10) p(010) p(110) p(0010) p(0110) p(1010) p(1110) p(11) p(011) p(111) p(0011) p(0111) p(1011) p(1111)    , 3)

slide-33
SLIDE 33

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Bertini

d = 2, n = 4

J2 as before and I3,4 := minors (    p(00) p(000) p(100) p(0000) p(0100) p(1000) p(1100) p(01) p(001) p(101) p(0001) p(0101) p(1001) p(1101) p(10) p(010) p(110) p(0010) p(0110) p(1010) p(1110) p(11) p(011) p(111) p(0011) p(0111) p(1011) p(1111)    , 3) rad I3,4 : J2 ⊃ I3,4 implies V(rad I3,4 : rad J2) ⊂ V(I3,4).

slide-34
SLIDE 34

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Bertini

d = 2, n = 4

J2 as before and I3,4 := minors (    p(00) p(000) p(100) p(0000) p(0100) p(1000) p(1100) p(01) p(001) p(101) p(0001) p(0101) p(1001) p(1101) p(10) p(010) p(110) p(0010) p(0110) p(1010) p(1110) p(11) p(011) p(111) p(0011) p(0111) p(1011) p(1111)    , 3) rad I3,4 : J2 ⊃ I3,4 implies V(rad I3,4 : rad J2) ⊂ V(I3,4). Bertini:

  • V(I3,4) is not irreducible
  • consists of 4 components
  • two: dim = 8, one: dim = 7, one: dim = 6
  • the dim 6 component corresponds to V(rad I3,4 : rad J2) which agrees with

dim V(rad I3,4 : rad J2) = 22 + 2.

slide-35
SLIDE 35

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Outlook

  • Lemma: Equivalent statements:

(i) fMd ,n((Ta)a∈Σ, x) ∈ im fHd ,n (ii) [Strassen Condition] ∀a, b, c ∈ Σ : (Ta)−1Tc(Tb)−1 = (Tb)−1Tc(Ta)−1.

⇒ larger alphabets Σ

  • Transporter theorem: computation of generators.
  • Generic identifiability.
slide-36
SLIDE 36

Guideline Introduction Algebraic Statistics The Main Theorem Computations Outlook

Thanks for the attention!

Preprint: arxiv.org/abs/1101.3712

BLACKWELL, D. and KOOPMANS, L. (1957). On the identifiability problem for functions of finite markov chains. Annals of Mathematical Statistics 28 1011–1015. GILBERT, E.J. (1959). On the identifiability problem for functions of finite Markov chains. Annals of Mathematical Statistics 30 688–697. HELLER, A. (1965). On stochastic processes derived from Markov chains. Annals of Mathematical Statistics, 36(4) 1286–1291. FINESSO, L., GRASSI, A. and SPREIJ, P. (2010). Approximation of stationary processes by hidden Markov models. Mathematics of Control, Signals and Systems 22 1–22. VIDYASAGAR, M. (2011). The complete realization problem for hidden Markov models: A survey and some new results. Mathematics of Control, Signals and Systems, to appear. ALLMAN, E.S., MATIAS, C. and RHODES, J.A. (2009). Identifiability of parameters in latent structure models with many observed

  • variables. Annals of Statistics 37 3099–3132.

FAIGLE, U. and SCHÖNHUTH, A. (2011). Efficient tests for equivalence of hidden Markov processes and quantum random walks. IEEE Transactions on Information Theory, 57(3) 1746–1753.