On Computing the Total Variation Distance of Hidden Markov Models - - PowerPoint PPT Presentation

on computing the total variation distance of hidden
SMART_READER_LITE
LIVE PREVIEW

On Computing the Total Variation Distance of Hidden Markov Models - - PowerPoint PPT Presentation

On Computing the Total Variation Distance of Hidden Markov Models Stefan Kiefer University of Oxford, UK ICALP 2018 Prague, 10 July 2018 Stefan Kiefer On Computing the Total Variation Distance 1 Hidden Markov Models = Labelled Markov Chains


slide-1
SLIDE 1

On Computing the Total Variation Distance of Hidden Markov Models

Stefan Kiefer

University of Oxford, UK

ICALP 2018 Prague, 10 July 2018

Stefan Kiefer On Computing the Total Variation Distance 1

slide-2
SLIDE 2

Hidden Markov Models = Labelled Markov Chains

q1

1 2a 1 4b 1 4$

q2 q3

1 3a 1 3b 1 3a 1 2a 1 2$

Pr1(aa) = 1

2 · 1 2 · 1 4

Pr2(aa) = 1

3 · 1 3 · 1 2 + 1 3 · 1 2 · 1 2

Each Labelled Markov Chain (LMC) generates a probability distribution over Σ∗.

Stefan Kiefer On Computing the Total Variation Distance 2

slide-3
SLIDE 3

Hidden Markov Models = Labelled Markov Chains

Very widely used: speech recognition gesture recognition signal processing climate modelling computational biology

DNA modelling biological sequence analysis structure prediction

probabilistic model checking: see tools like Prism or Storm

Stefan Kiefer On Computing the Total Variation Distance 3

slide-4
SLIDE 4

Hidden Markov Models = Labelled Markov Chains

q1

1 2a 1 4b 1 4$

q2 q3

1 3a 1 3b 1 3a 1 2a 1 2$

Pr1(aa) = 1

2 · 1 2 · 1 4

Pr2(aa) = 1

3 · 1 3 · 1 2 + 1 3 · 1 2 · 1 2

Each LMC generates a probability distribution over Σ∗. Equivalence problem: Are the two distributions equal? Solvable in O(|Q|3|Σ|) with linear algebra [Schützenberger’61]. Direct applications in the verification of anonymity properties.

Stefan Kiefer On Computing the Total Variation Distance 4

slide-5
SLIDE 5

Total Variation Distance in Football

PrJames 0.1 0.1 0.8 0.0 PrStefan 0.3 0.4 0.2 0.1 PrStefan

  • − PrJames
  • = 0.2

PrStefan

  • ,
  • − PrJames
  • ,
  • = 0.5

PrStefan

  • ,

,

  • − PrJames
  • ,

,

  • = 0.6

PrStefan

  • − PrJames
  • = −0.6

Stefan Kiefer On Computing the Total Variation Distance 5

slide-6
SLIDE 6

Total Variation Distance for Words

Let Pr1, Pr2 be two probability distributions over Σ∗. d(Pr1, Pr2) := max

W⊆Σ∗

  • Pr1(W) − Pr2(W)
  • The maximum is attained by

W1 := {w ∈ Σ∗ : Pr1(w) ≥ Pr2(w)}. As in the football case: d(Pr1, Pr2) = 1 2

  • w∈Σ∗
  • Pr1(w) − Pr2(w)
  • Stefan Kiefer

On Computing the Total Variation Distance 6

slide-7
SLIDE 7

Total Variation Distance for Words

Let Pr1, Pr2 be two probability distributions over Σ∗. d(Pr1, Pr2) := max

W⊆Σ∗

  • Pr1(W) − Pr2(W)
  • The maximum is attained by

W1 := {w ∈ Σ∗ : Pr1(w) ≥ Pr2(w)}. As in the football case: d(Pr1, Pr2) = 1 2

  • w∈Σ∗
  • Pr1(w) − Pr2(w)
  • By a simple calculation:

1 + d(Pr1, Pr2) = Pr1(W1) + Pr2(W2) for W2 := {w ∈ Σ∗ : Pr1(w) < Pr2(w)}.

Stefan Kiefer On Computing the Total Variation Distance 6

slide-8
SLIDE 8

Verification View

q1

1 2a 1 4b 1 4$

q2 q3

1 3a 1 3b 1 3a 1 2a 1 2$

∀ϕ : Pr2(ϕ) ∈ [Pr1(ϕ) − d, Pr1(ϕ) + d] Small distance saves verification work. Especially for parameterised models.

Stefan Kiefer On Computing the Total Variation Distance 7

slide-9
SLIDE 9

Irrational Distances

q1

1 2a 1 4b 1 4$

q2

1 4a 1 2b 1 4$

d = √ 2 4 ≈ 0.35 Given two LMCs and a threshold τ ∈ [0, 1]. Is d > τ? strict distance-threshold problem Is d ≥ τ? non-strict distance-threshold problem NP-hard: [Lyngsø,Pedersen’02], [Cortes,Mohri,Rastogi’07], [Chen,K.’14]

Stefan Kiefer On Computing the Total Variation Distance 8

slide-10
SLIDE 10

Decidability of the Distance-Threshold Problem

Theorem (K.’18) The strict distance-threshold problem is undecidable. Reduction from emptiness of probabilistic automata. What about the non-strict distance-threshold problem? It is sqrt-sum-hard [Chen,K.’14] and PP-hard [K.’18]. Decidability status “strict vs. non-strict” similar as for the joint spectral radius of a set of matrices.

Stefan Kiefer On Computing the Total Variation Distance 9

slide-11
SLIDE 11

Acyclic LMCs

q1

1 2a 1 2a

a b $ q2

1 2a 1 2a 1 2b 1 2a

$ Theorem (K.’18) For acyclic LMCs: Computing the distance is #P-complete. Approximating the distance is #P-complete. The strict and non-strict distance-threshold problems are PP-complete. Reduction from #NFA: Given an NFA A and n ∈ N in unary, compute |L(A) ∩ Σn|. Probably simpler than previous NP-hardness reductions.

Stefan Kiefer On Computing the Total Variation Distance 10

slide-12
SLIDE 12

Approximation

Theorem (K.’18) Given two LMCs and an error bound ε > 0 in binary,

  • ne can compute in PSPACE a number x ∈ [d − ε, d + ε].

1 + d(Pr1, Pr2) = Pr1(W1) + Pr2(W2) where W1 = {w ∈ Σ∗ : Pr1(w) ≥ Pr2(w)} W2 = {w ∈ Σ∗ : Pr1(w) < Pr2(w)}

Stefan Kiefer On Computing the Total Variation Distance 11

slide-13
SLIDE 13

Approximation

Theorem (K.’18) Given two LMCs and an error bound ε > 0 in binary,

  • ne can compute in PSPACE a number x ∈ [d − ε, d + ε].

1 + d(Pr1, Pr2) = Pr1(W1) + Pr2(W2) where W1 = {w ∈ Σ∗ : Pr1(w) ≥ Pr2(w)} W2 = {w ∈ Σ∗ : Pr1(w) < Pr2(w)} q1

1 2a 1 2a

a b $ q2

1 2a 1 2a 1 2b 1 2a

$

Stefan Kiefer On Computing the Total Variation Distance 11

slide-14
SLIDE 14

Approximation

Theorem (K.’18) Given two LMCs and an error bound ε > 0 in binary,

  • ne can compute in PSPACE a number x ∈ [d − ε, d + ε].

1 + d(Pr1, Pr2) = Pr1(W1) + Pr2(W2) where W1 = {w ∈ Σ∗ : Pr1(w) ≥ Pr2(w)} W2 = {w ∈ Σ∗ : Pr1(w) < Pr2(w)} q1

1 2a 1 2a

a b $ q2

1 2a 1 2a 1 2b 1 2a

$ In the cyclic case: we have to sample exponentially long words.

Stefan Kiefer On Computing the Total Variation Distance 11

slide-15
SLIDE 15

Approximation

Theorem (K.’18) Given two LMCs and an error bound ε > 0 in binary,

  • ne can compute in PSPACE a number x ∈ [d − ε, d + ε].

1 + d(Pr1, Pr2) = Pr1(W1) + Pr2(W2) where W1 = {w ∈ Σ∗ : Pr1(w) ≥ Pr2(w)} W2 = {w ∈ Σ∗ : Pr1(w) < Pr2(w)} q1

1 2a 1 2a

a b $ q2

1 2a 1 2a 1 2b 1 2a

$ In the cyclic case: we have to sample exponentially long words. Floating-point arithmetic computes Pr1(w), Pr2(w) up to small relative error.

Stefan Kiefer On Computing the Total Variation Distance 11

slide-16
SLIDE 16

Approximation

Theorem (K.’18) Given two LMCs and an error bound ε > 0 in binary,

  • ne can compute in PSPACE a number x ∈ [d − ε, d + ε].

1 + d(Pr1, Pr2) = Pr1(W1) + Pr2(W2) where W1 = {w ∈ Σ∗ : Pr1(w) ≥ Pr2(w)} W2 = {w ∈ Σ∗ : Pr1(w) < Pr2(w)} q1

1 2a 1 2a

a b $ q2

1 2a 1 2a 1 2b 1 2a

$ In the cyclic case: we have to sample exponentially long words. Floating-point arithmetic computes Pr1(w), Pr2(w) up to small relative error. Use Ladner’s result on counting in polynomial space.

Stefan Kiefer On Computing the Total Variation Distance 11

slide-17
SLIDE 17

Infinite-Word LMCs

q1

1 3a 2 3a 2 3b 1 3b

q2

1 3a 2 3a 2 3b 1 3b

E.g., if W = {aw : w ∈ Σω} then Pr1(W) = 1

3 and Pr2(W) = 2 3.

d(Pr1, Pr2) := max

W⊆Σω

  • Pr1(W) − Pr2(W)
  • =

max

W⊆Σω

  • Pr1(W) − Pr2(W)
  • Theorem (Chen,K.’14)

One can decide in polynomial time if d(Pr1, Pr2) = 1. One can also decide in polynomial time if Pr1 = Pr2. Finite-word LMCs are a special case of infinite-word LMCs.

Stefan Kiefer On Computing the Total Variation Distance 12

slide-18
SLIDE 18

Summary

Theorem (main results again) The strict distance-threshold problem is undecidable. Approximating the distance is #P-hard and in PSPACE. Open problems: decidability of the non-strict distance-threshold problem complexity of approximating the distance of

infinite-word LMCs non-hidden/deterministic LMCs

Stefan Kiefer On Computing the Total Variation Distance 13