On Computing the Total Variation Distance of Hidden Markov Models - PowerPoint PPT Presentation

On Computing the Total Variation Distance of Hidden Markov Models Stefan Kiefer University of Oxford, UK ICALP 2018 Prague, 10 July 2018 Stefan Kiefer On Computing the Total Variation Distance 1

Hidden Markov Models = Labelled Markov Chains 1 1 1 2 a 3 a 2 a 1 1 1 4 $ 3 a 2 $ q 1 q 2 q 3 1 1 4 b 3 b Pr 1 ( aa ) = 1 2 · 1 2 · 1 Pr 2 ( aa ) = 1 3 · 1 3 · 1 2 + 1 3 · 1 2 · 1 4 2 Each Labelled Markov Chain (LMC) generates a probability distribution over Σ ∗ . Stefan Kiefer On Computing the Total Variation Distance 2

Hidden Markov Models = Labelled Markov Chains Very widely used: speech recognition gesture recognition signal processing climate modelling computational biology DNA modelling biological sequence analysis structure prediction probabilistic model checking: see tools like Prism or Storm Stefan Kiefer On Computing the Total Variation Distance 3

Hidden Markov Models = Labelled Markov Chains 1 1 1 2 a 3 a 2 a 1 1 1 4 $ 3 a 2 $ q 1 q 2 q 3 1 1 4 b 3 b Pr 1 ( aa ) = 1 2 · 1 2 · 1 Pr 2 ( aa ) = 1 3 · 1 3 · 1 2 + 1 3 · 1 2 · 1 4 2 Each LMC generates a probability distribution over Σ ∗ . Equivalence problem: Are the two distributions equal? Solvable in O ( | Q | 3 | Σ | ) with linear algebra [Schützenberger’61]. Direct applications in the verification of anonymity properties. Stefan Kiefer On Computing the Total Variation Distance 4

Total Variation Distance in Football Pr James 0.1 0.1 0.8 0.0 Pr Stefan 0.3 0.4 0.2 0.1 �� Pr Stefan − Pr James = 0 . 2 �� Pr Stefan − Pr James = 0 . 5 , , �� Pr Stefan − Pr James = 0 . 6 , , , , �� Pr Stefan − Pr James = − 0 . 6 Stefan Kiefer On Computing the Total Variation Distance 5

Total Variation Distance for Words Let Pr 1 , Pr 2 be two probability distributions over Σ ∗ . � � d ( Pr 1 , Pr 2 ) := max � Pr 1 ( W ) − Pr 2 ( W ) � W ⊆ Σ ∗ The maximum is attained by W 1 := { w ∈ Σ ∗ : Pr 1 ( w ) ≥ Pr 2 ( w ) } . As in the football case: d ( Pr 1 , Pr 2 ) = 1 � � � � Pr 1 ( w ) − Pr 2 ( w ) � 2 w ∈ Σ ∗ Stefan Kiefer On Computing the Total Variation Distance 6

Total Variation Distance for Words Let Pr 1 , Pr 2 be two probability distributions over Σ ∗ . � � d ( Pr 1 , Pr 2 ) := max � Pr 1 ( W ) − Pr 2 ( W ) � W ⊆ Σ ∗ The maximum is attained by W 1 := { w ∈ Σ ∗ : Pr 1 ( w ) ≥ Pr 2 ( w ) } . As in the football case: d ( Pr 1 , Pr 2 ) = 1 � � � � Pr 1 ( w ) − Pr 2 ( w ) � 2 w ∈ Σ ∗ By a simple calculation: 1 + d ( Pr 1 , Pr 2 ) = Pr 1 ( W 1 ) + Pr 2 ( W 2 ) for W 2 := { w ∈ Σ ∗ : Pr 1 ( w ) < Pr 2 ( w ) } . Stefan Kiefer On Computing the Total Variation Distance 6

Verification View 1 1 1 2 a 3 a 2 a 1 1 1 4 $ 3 a 2 $ q 1 q 2 q 3 1 1 4 b 3 b ∀ ϕ : Pr 2 ( ϕ ) ∈ [ Pr 1 ( ϕ ) − d , Pr 1 ( ϕ ) + d ] Small distance saves verification work. Especially for parameterised models. Stefan Kiefer On Computing the Total Variation Distance 7

Irrational Distances 1 1 2 a 4 a 1 1 4 $ 4 $ q 1 q 2 1 1 4 b 2 b √ 2 d = ≈ 0 . 35 4 Given two LMCs and a threshold τ ∈ [ 0 , 1 ] . Is d > τ ? strict distance-threshold problem Is d ≥ τ ? non-strict distance-threshold problem NP-hard: [Lyngsø,Pedersen’02], [Cortes,Mohri,Rastogi’07], [Chen,K.’14] Stefan Kiefer On Computing the Total Variation Distance 8

Decidability of the Distance-Threshold Problem Theorem (K.’18) The strict distance-threshold problem is undecidable. Reduction from emptiness of probabilistic automata. What about the non-strict distance-threshold problem? It is sqrt-sum-hard [Chen,K.’14] and PP-hard [K.’18]. Decidability status “strict vs. non-strict” similar as for the joint spectral radius of a set of matrices. Stefan Kiefer On Computing the Total Variation Distance 9

Acyclic LMCs 1 2 a 1 1 2 a 2 a a $ $ q 1 q 2 1 2 b 1 2 a b 1 2 a Theorem (K.’18) For acyclic LMCs: Computing the distance is #P-complete. Approximating the distance is #P-complete. The strict and non-strict distance-threshold problems are PP-complete. Reduction from #NFA: Given an NFA A and n ∈ N in unary, compute | L ( A ) ∩ Σ n | . Probably simpler than previous NP-hardness reductions. Stefan Kiefer On Computing the Total Variation Distance 10

Approximation Theorem (K.’18) Given two LMCs and an error bound ε > 0 in binary, one can compute in PSPACE a number x ∈ [ d − ε, d + ε ] . 1 + d ( Pr 1 , Pr 2 ) = Pr 1 ( W 1 ) + Pr 2 ( W 2 ) where W 1 = { w ∈ Σ ∗ : Pr 1 ( w ) ≥ Pr 2 ( w ) } W 2 = { w ∈ Σ ∗ : Pr 1 ( w ) < Pr 2 ( w ) } Stefan Kiefer On Computing the Total Variation Distance 11

Approximation Theorem (K.’18) Given two LMCs and an error bound ε > 0 in binary, one can compute in PSPACE a number x ∈ [ d − ε, d + ε ] . 1 + d ( Pr 1 , Pr 2 ) = Pr 1 ( W 1 ) + Pr 2 ( W 2 ) where W 1 = { w ∈ Σ ∗ : Pr 1 ( w ) ≥ Pr 2 ( w ) } W 2 = { w ∈ Σ ∗ : Pr 1 ( w ) < Pr 2 ( w ) } 1 2 a 1 1 2 a 2 a a $ $ q 1 q 2 1 2 b 1 b 1 2 a 2 a Stefan Kiefer On Computing the Total Variation Distance 11

Approximation Theorem (K.’18) Given two LMCs and an error bound ε > 0 in binary, one can compute in PSPACE a number x ∈ [ d − ε, d + ε ] . 1 + d ( Pr 1 , Pr 2 ) = Pr 1 ( W 1 ) + Pr 2 ( W 2 ) where W 1 = { w ∈ Σ ∗ : Pr 1 ( w ) ≥ Pr 2 ( w ) } W 2 = { w ∈ Σ ∗ : Pr 1 ( w ) < Pr 2 ( w ) } 1 2 a 1 1 2 a 2 a a $ $ q 1 q 2 1 2 b 1 b 1 2 a 2 a In the cyclic case: we have to sample exponentially long words. Stefan Kiefer On Computing the Total Variation Distance 11

Approximation Theorem (K.’18) Given two LMCs and an error bound ε > 0 in binary, one can compute in PSPACE a number x ∈ [ d − ε, d + ε ] . 1 + d ( Pr 1 , Pr 2 ) = Pr 1 ( W 1 ) + Pr 2 ( W 2 ) where W 1 = { w ∈ Σ ∗ : Pr 1 ( w ) ≥ Pr 2 ( w ) } W 2 = { w ∈ Σ ∗ : Pr 1 ( w ) < Pr 2 ( w ) } 1 2 a 1 1 2 a 2 a a $ $ q 1 q 2 1 2 b 1 b 1 2 a 2 a In the cyclic case: we have to sample exponentially long words. Floating-point arithmetic computes Pr 1 ( w ) , Pr 2 ( w ) up to small relative error. Stefan Kiefer On Computing the Total Variation Distance 11

Approximation Theorem (K.’18) Given two LMCs and an error bound ε > 0 in binary, one can compute in PSPACE a number x ∈ [ d − ε, d + ε ] . 1 + d ( Pr 1 , Pr 2 ) = Pr 1 ( W 1 ) + Pr 2 ( W 2 ) where W 1 = { w ∈ Σ ∗ : Pr 1 ( w ) ≥ Pr 2 ( w ) } W 2 = { w ∈ Σ ∗ : Pr 1 ( w ) < Pr 2 ( w ) } 1 2 a 1 1 2 a 2 a a $ $ q 1 q 2 1 2 b 1 b 1 2 a 2 a In the cyclic case: we have to sample exponentially long words. Floating-point arithmetic computes Pr 1 ( w ) , Pr 2 ( w ) up to small relative error. Use Ladner’s result on counting in polynomial space. Stefan Kiefer On Computing the Total Variation Distance 11

Infinite-Word LMCs 2 2 3 b 3 b 1 2 1 2 q 1 q 2 3 a 3 a 3 a 3 a 1 1 3 b 3 b E.g., if W = { aw : w ∈ Σ ω } then Pr 1 ( W ) = 1 3 and Pr 2 ( W ) = 2 3 . � � d ( Pr 1 , Pr 2 ) := max � Pr 1 ( W ) − Pr 2 ( W ) � W ⊆ Σ ω � � = max Pr 1 ( W ) − Pr 2 ( W ) W ⊆ Σ ω Theorem (Chen,K.’14) One can decide in polynomial time if d ( Pr 1 , Pr 2 ) = 1 . One can also decide in polynomial time if Pr 1 = Pr 2 . Finite-word LMCs are a special case of infinite-word LMCs. Stefan Kiefer On Computing the Total Variation Distance 12

Summary Theorem (main results again) The strict distance-threshold problem is undecidable. Approximating the distance is #P-hard and in PSPACE. Open problems: decidability of the non-strict distance-threshold problem complexity of approximating the distance of infinite-word LMCs non-hidden/deterministic LMCs Stefan Kiefer On Computing the Total Variation Distance 13

On Computing the Total Variation Distance of Hidden Markov Models - PowerPoint PPT Presentation

On Computing the Total Variation Distance of Hidden Markov Models Stefan Kiefer University of Oxford, UK ICALP 2018 Prague, 10 July 2018 Stefan Kiefer On Computing the Total Variation Distance 1 Hidden Markov Models = Labelled Markov Chains

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

Advanced Algorithms (XIII) Shanghai Jiao Tong University Chihao Zhang June 1, 2020 Total

On the Total Variation Distance of SMPs Giorgio Bacci, Giovanni Bacci, Kim G. Larsen, Radu

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

Total Variation in Image Analysis (The Homo Erectus Stage?) Franois Lauze 1Department of

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Another view Hidden Input CEC is constant error Hidden carrousel No vanishing gradients

Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor?

1 2 Total Budgeted in Total Planned in Total Planned in 2019 for Total Budgeted 2020 for

Nonhomogeneous linear systems of DEs Diagonalization, Variation of Parameters ITI 11/04/2020

National Budget Meeting FY 2019 Total Agencies\Field Offices: 5 Total Tribes: 24

Computing distance-regular graph and association Computing distance-regular graph and association

Distance Education Technologies: Distance Education Technologies: Distance Education

CCLD 363 CCLD 363 Distance Field Distance Field Education Education Placements Placements

Cosmic fmow reconstructjon from deepest distance surveys (and redshifu surveys) I n s t i

Proximity Sensors n The central task is to determine P(z|x) , i.e., the probability of a

Graph Distances in the Streaming Model Joan Feigenbaum Sampath Kannan Andrew McGregor Siddharth

Measuring Differences To Compare Sets Of Models And Improve Diversity In MDE Adel Ferdjoukh Florian

Computer Vision Dr. Danna Gurari September 8, 2015 1 Todays Outline Aligning two

Layer 2 VPN(L2VPN) Service Model (L2SM) Interim meetjng Wednesday 27th September 2017

Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 9

Testing Convexity Properties of Tree Colorings Eldar Fischer and Orly Yahalom Technion IIT,