Markov Processes in Isabelle/HOL
Johannes Hölzl (Technical University of Munich) CPP 2017
Markov Processes in Isabelle/HOL Applications Probabilistic - - PowerPoint PPT Presentation
CPP 2017 Johannes Hlzl (Technical University of Munich) Markov Processes in Isabelle/HOL Applications Probabilistic programming, Continuous-time Markov Example proc x randomised walk on proc stream m proc x do x Normal x 0 1 proc x
Johannes Hölzl (Technical University of Munich) CPP 2017
Formalization for Modelling Stochastic Processes
Applications Probabilistic programming, Continuous-time Markov chains (queuing theory, biological processes, …), physical processes with errors, … Example proc x — randomised walk on Next step is Normal distributed with variance 0 1. proc
m
stream proc x do x Normal x 0 1 proc x return x Wanted A general method to construct processes: proc x do y K x proc y return y
1
Formalization for Modelling Stochastic Processes
Applications Probabilistic programming, Continuous-time Markov chains (queuing theory, biological processes, …), physical processes with errors, … Example proc x — randomised walk on Next step is Normal distributed with variance 0 1. proc
m
stream proc x do x Normal x 0 1 proc x return x Wanted A general method to construct processes: proc x do y K x proc y return y
1
Formalization for Modelling Stochastic Processes
Applications Probabilistic programming, Continuous-time Markov chains (queuing theory, biological processes, …), physical processes with errors, … Example proc x — randomised walk on Next step is Normal distributed with variance 0 1. proc
m
stream proc x do x Normal x 0 1 proc x return x Wanted A general method to construct processes: proc x do y K x proc y return y
1
Formalization for Modelling Stochastic Processes
Applications Probabilistic programming, Continuous-time Markov chains (queuing theory, biological processes, …), physical processes with errors, … Example proc x — randomised walk on Next step is Normal distributed with variance 0 1. proc
m
stream proc x do x Normal x 0 1 proc x return x Wanted A general method to construct processes: proc x do y K x proc y return y
1
Formalization for Modelling Stochastic Processes
Applications Probabilistic programming, Continuous-time Markov chains (queuing theory, biological processes, …), physical processes with errors, … Example proc x — randomised walk on R Next step is Normal distributed with variance 0 1. proc
m
stream proc x do x Normal x 0 1 proc x return x Wanted A general method to construct processes: proc x do y K x proc y return y
1
Formalization for Modelling Stochastic Processes
Applications Probabilistic programming, Continuous-time Markov chains (queuing theory, biological processes, …), physical processes with errors, … Example proc x — randomised walk on R Next step is Normal distributed with variance 0.1. proc
m
stream proc x do x Normal x 0 1 proc x return x Wanted A general method to construct processes: proc x do y K x proc y return y
1
Formalization for Modelling Stochastic Processes
Applications Probabilistic programming, Continuous-time Markov chains (queuing theory, biological processes, …), physical processes with errors, … Example proc x — randomised walk on R Next step is Normal distributed with variance 0.1. proc ∈ R →m Pr(stream(R)) proc x = do { x′ ← Normal (x, 0.1) ω ← proc x′ return x′·ω } Wanted A general method to construct processes: proc x do y K x proc y return y
1
Formalization for Modelling Stochastic Processes
Applications Probabilistic programming, Continuous-time Markov chains (queuing theory, biological processes, …), physical processes with errors, … Example proc x — randomised walk on R Next step is Normal distributed with variance 0.1. proc ∈ R →m Pr(stream(R)) proc x = do { x′ ← Normal (x, 0.1) ω ← proc x′ return x′·ω } Wanted A general method to construct processes: proc x = do {y ← K x; ω ← proc y; return (y·ω)}
1
Formalization for Modelling Stochastic Processes
Applications Probabilistic programming, Continuous-time Markov chains (queuing theory, biological processes, …), physical processes with errors, … Example proc x — randomised walk on R Next step is Normal distributed with variance 0.1. proc ∈ R →m Pr(stream(R)) proc x = do { x′ ← Normal (x, 0.1) ω ← proc x′ return x′·ω } Wanted A general method to construct processes: proc x = do {y ← K x; ω ← proc y; return (y·ω)}
1
Overview
2
2
Giry Monad
Monad on probability spaces Pr(S) for a measurable space S Monad Combinators Map map
m m
Bind bind
m
Return return
m
Note: Functions are regular HOL functions, theorems have measurability assumptions. We use do notation
3
Giry Monad
Monad on probability spaces Pr(S) for a measurable space S Monad Combinators Map map ∈ (S →m T) → (Pr(S) →m Pr(T)) Bind bind
m
Return return
m
Note: Functions are regular HOL functions, theorems have measurability assumptions. We use do notation
3
Giry Monad
Monad on probability spaces Pr(S) for a measurable space S Monad Combinators Map map ∈ (S →m T) → (Pr(S) →m Pr(T)) Bind bind ∈ Pr(S) → (S →m Pr(T)) → Pr(T) Return return
m
Note: Functions are regular HOL functions, theorems have measurability assumptions. We use do notation
3
Giry Monad
Monad on probability spaces Pr(S) for a measurable space S Monad Combinators Map map ∈ (S →m T) → (Pr(S) →m Pr(T)) Bind bind ∈ Pr(S) → (S →m Pr(T)) → Pr(T) Return return ∈ S →m Pr(S) Note: Functions are regular HOL functions, theorems have measurability assumptions. We use do notation
3
Giry Monad
Monad on probability spaces Pr(S) for a measurable space S Monad Combinators Map map ∈ (S →m T) → (Pr(S) →m Pr(T)) Bind bind ∈ Pr(S) → (S →m Pr(T)) → Pr(T) Return return ∈ S →m Pr(S) Note: Functions are regular HOL functions, theorems have measurability assumptions. We use do {. . .} notation
3
Markov Kernels (a.k.a. stochastic relations)
Transition functions for Markov chains on state spaces S traditional T : S → S → R
coalgebraic T S S
generalized K
m
4
Markov Kernels (a.k.a. stochastic relations)
Transition functions for Markov chains on state spaces S traditional T : S → S → R
coalgebraic T : S → D(S)
generalized K
m
4
Markov Kernels (a.k.a. stochastic relations)
Transition functions for Markov chains on state spaces S traditional T : S → S → R
coalgebraic T : S → D(S)
generalized K ∈ S →m Pr(S)
4
Extension Theorem by Ionescu-Tulcea
There exists proc x ∈ S →m Pr(stream(S)) where do { y1 ← K x y2 ← K y1 do { y3 ← K y2 y ← K x proc x = y4 ← K y3 = ω ← proc y y5 ← K y4 return (y·ω) y6 ← K y5 } . . . return (y1·y2·y3·y4·y5· · ·) }
5
Uniqueness
Ionescu-Tulcea proves existence. Is it unique?
Bisimulation relation R stream stream R M N exists K , and M N
m
stream s.t.
do y K M y return y ,
do y K N y return y , and
N y 1. Bisimulation implies equality (a.k.a coinduction rule for equality) R bisimulation relation: R M N M N.
6
Uniqueness
Ionescu-Tulcea proves existence. Is it unique?
Bisimulation relation R stream stream R M N exists K , and M N
m
stream s.t.
do y K M y return y ,
do y K N y return y , and
N y 1. Bisimulation implies equality (a.k.a coinduction rule for equality) R bisimulation relation: R M N M N.
6
Uniqueness
Ionescu-Tulcea proves existence. Is it unique?
Bisimulation relation R : Pr(stream(S)) → Pr(stream(S)) → B R M N exists K ∈ Pr(S), and M′, N′ ∈ S →m Pr(stream(S)) s.t.
Bisimulation implies equality (a.k.a coinduction rule for equality) R bisimulation relation: R M N M N.
6
Uniqueness
Ionescu-Tulcea proves existence. Is it unique?
Bisimulation relation R : Pr(stream(S)) → Pr(stream(S)) → B R M N exists K ∈ Pr(S), and M′, N′ ∈ S →m Pr(stream(S)) s.t.
Bisimulation implies equality (a.k.a coinduction rule for equality) R bisimulation relation: R M N = ⇒ M = N.
6
Markov Property
a b c
0.5 0.5 0.33 0.67 0.5 0.5
time a a b c a
7
Markov Property
a b c
0.5 0.5 0.33 0.67 0.5 0.5
time a a b c a
7
Markov Property
a b c
0.5 0.5 0.33 0.67 0.5 0.5
time a a b c a
7
Markov Property
a b c
0.5 0.5 0.33 0.67 0.5 0.5
time a a b c a
7
Markov Property
a b c
0.5 0.5 0.33 0.67 0.5 0.5
time a a b c a
7
Markov Property
a b c
0.5 0.5 0.33 0.67 0.5 0.5
time a a b c a
7
Markov Property
a b c
0.5 0.5 0.33 0.67 0.5 0.5
time a a b c ∑ = a
7
Markov Property
a b c
0.5 0.5 0.33 0.67 0.5 0.5
time a a b c ∑ = a
7
Markov Property
a b c
0.5 0.5 0.33 0.67 0.5 0.5
time a a b c ∑ = a
7
Markov Property
a b c
0.5 0.5 0.33 0.67 0.5 0.5
time a a b c ∑ = a
7
Markov Property
a b c
0.5 0.5 0.33 0.67 0.5 0.5
time a a b c ∑ = a = =
7
Markov Property
Lemma markov_prop: proc x = do { ω ← proc x ω′ ← proc ω[n] return (take n ω)·ω′ } Proof: Induction on n, or bisimulation. Can we generalize n?
8
Markov Property
Lemma markov_prop: proc x = do { ω ← proc x ω′ ← proc ω[n] return (take n ω)·ω′ } Proof: Induction on n, or bisimulation. Can we generalize n?
8
Strong Markov Property
Stopping time t ∈ stream(S) →m N ∪ {∞} If t ω = n, then take n ω′ = take n ω = ⇒ t ω′ = n. Examples
0,
Not a stopping time The last occurence of a state is not a stopping time. n time t n t n
9
Strong Markov Property
Stopping time t ∈ stream(S) →m N ∪ {∞} If t ω = n, then take n ω′ = take n ω = ⇒ t ω′ = n. Examples
0,
Not a stopping time The last occurence of a state is not a stopping time. n time ω t n t n
9
Strong Markov Property
Stopping time t ∈ stream(S) →m N ∪ {∞} If t ω = n, then take n ω′ = take n ω = ⇒ t ω′ = n. Examples
0,
Not a stopping time The last occurence of a state is not a stopping time. n time ω t ω = n t n
9
Strong Markov Property
Stopping time t ∈ stream(S) →m N ∪ {∞} If t ω = n, then take n ω′ = take n ω = ⇒ t ω′ = n. Examples
0,
Not a stopping time The last occurence of a state is not a stopping time. n time ω t ω = n ω′ t n
9
Strong Markov Property
Stopping time t ∈ stream(S) →m N ∪ {∞} If t ω = n, then take n ω′ = take n ω = ⇒ t ω′ = n. Examples
0,
Not a stopping time The last occurence of a state is not a stopping time. n time ω t ω = n ω′ t ω′ = n
9
Strong Markov Property
Stopping time t ∈ stream(S) →m N ∪ {∞} If t ω = n, then take n ω′ = take n ω = ⇒ t ω′ = n. Examples
Not a stopping time The last occurence of a state is not a stopping time. n time t n t n
9
Strong Markov Property
Stopping time t ∈ stream(S) →m N ∪ {∞} If t ω = n, then take n ω′ = take n ω = ⇒ t ω′ = n. Examples
Not a stopping time The last occurence of a state is not a stopping time. n time t n t n
9
Strong Markov Property
Lemma strong_markov_prop: Given: t is stopping time proc x = do { ω ← proc x case t ω of | ∞ ⇒ return ω | n ⇒ ω′ ← proc ω[n] return (take n ω)·ω′ } Proof: Bisimulation.
10
10
Queuing Example: Client-Server exchange
State n: active requests, c: client request rate, s: server reponse rate 1 2 3 4 · · · c s c s c s c s c s t n 1 2 3 4
11
Queuing Example: Client-Server exchange
State n: active requests, c: client request rate, s: server reponse rate 1 2 3 4 · · · c s c s c s c s c s t n 1 2 3 4
11
Exponential Distribution
X ∼ Exp(r) — Exponentially distributed with rate r 1 t Pr(X > t) t t t t Pr(X > t) = exp(−t · r) Exponential distribution is memoryless t t X t X t X t t
12
Exponential Distribution
X ∼ Exp(r) — Exponentially distributed with rate r 1 t Pr(X > t) t t t t Pr(X > t) = exp(−t · r) Exponential distribution is memoryless t′ ⩾ t = ⇒ Pr(X > t′ | X > t) = Pr(X > t′ − t)
12
Exponential Distribution
X ∼ Exp(r) — Exponentially distributed with rate r 1 t Pr(X > t) t t t t Pr(X > t) = exp(−t · r) Exponential distribution is memoryless t′ ⩾ t = ⇒ Pr(X > t′ | X > t) = Pr(X > t′ − t)
12
Exponential Distribution
X ∼ Exp(r) — Exponentially distributed with rate r 1 t Pr(X > t) t t′ t t Pr(X > t) = exp(−t · r) Exponential distribution is memoryless t′ ⩾ t = ⇒ Pr(X > t′ | X > t) = Pr(X > t′ − t)
12
Exponential Distribution
X ∼ Exp(r) — Exponentially distributed with rate r 1 t Pr(X > t) t t′ t′ − t Pr(X > t) = exp(−t · r) Exponential distribution is memoryless t′ ⩾ t = ⇒ Pr(X > t′ | X > t) = Pr(X > t′ − t)
12
Parallel Choices
s s1 s2 s3 r1 r2 r3 time t J S J i
ri
i ri
Exp
i ri
J do t
iExp ri
i THE i j ti tj return ti i
13
Parallel Choices
s s1 s2 s3 r1 r2 r3 time t J S J i
ri
i ri
Exp
i ri
J do t
iExp ri
i THE i j ti tj return ti i
13
Parallel Choices
s s1 s2 s3 r1 r2 r3 time t J S J i
ri
i ri
Exp
i ri
J do t
iExp ri
i THE i j ti tj return ti i
13
Parallel Choices
s s1 s2 s3 r1 r2 r3 time t J S J i
ri
i ri
Exp
i ri
J do t
iExp ri
i THE i j ti tj return ti i
13
Parallel Choices
s s1 s2 s3 r1 r2 r3 time t J ∈ D(S) J {i} :=
ri ∑
i ri
Exp(∑
i ri) × J
= do { t ← ΠiExp(ri) i := THE i. ∀j. ti < tj return (ti, i) }
13
Kernel for CTMCs
Transition rates R : S → S → R The rate to go from state x to state y is R x y. Nonnegative R x y ⩾ 0 Finite and Positive 0 < ∑
y R x y < ∞
Zero Diagonal R x x = 0 Markov Kernel K K maps current state and jump time to next state and jump time. K S
m
S K t x map t Exp
y R x y
Jx CTMC for rate R ctmc S
m
stream S ctmc t x procK t x
14
Kernel for CTMCs
Transition rates R : S → S → R The rate to go from state x to state y is R x y. Nonnegative R x y ⩾ 0 Finite and Positive 0 < ∑
y R x y < ∞
Zero Diagonal R x x = 0 Markov Kernel K K maps current state and jump time to next state and jump time. K ∈ R × S →m Pr(R × S) K (t, x) := ( map (+t) Exp(∑
y R x y)
) × Jx CTMC for rate R ctmc S
m
stream S ctmc t x procK t x
14
Kernel for CTMCs
Transition rates R : S → S → R The rate to go from state x to state y is R x y. Nonnegative R x y ⩾ 0 Finite and Positive 0 < ∑
y R x y < ∞
Zero Diagonal R x x = 0 Markov Kernel K K maps current state and jump time to next state and jump time. K ∈ R × S →m Pr(R × S) K (t, x) := ( map (+t) Exp(∑
y R x y)
) × Jx CTMC for rate R ctmc ∈ R × S →m Pr(stream (R × S)) ctmc (t, x) := procK (t, x)
14
Construct CTMCs
Markov Property ctmc (t, x) = do { ω ← ctmc (t, x) t ⩽ t′ = ⇒ ω′ ← ctmc (t′, state_at ω t′) return (merge ω t′ ω′) } t 1 t 1
15
Construct CTMCs
Markov Property ctmc (t, x) = do { ω ← ctmc (t, x) t ⩽ t′ = ⇒ ω′ ← ctmc (t′, state_at ω t′) return (merge ω t′ ω′) } t 1 ω t 1
15
Construct CTMCs
Markov Property ctmc (t, x) = do { ω ← ctmc (t, x) t ⩽ t′ = ⇒ ω′ ← ctmc (t′, state_at ω t′) return (merge ω t′ ω′) } t 1 ω t′ 1
15
Construct CTMCs
Markov Property ctmc (t, x) = do { ω ← ctmc (t, x) t ⩽ t′ = ⇒ ω′ ← ctmc (t′, state_at ω t′) return (merge ω t′ ω′) } t 1 ω t′ 1 ω′
15
Properties of CTMCs
Transition probability p p x y t := Pr(CTMC started in x is in y at time t) Chapman–Kolmogorov equation t1 t2 p x y t1 t2
x
p x x t1 p x y t2 p is the solution of a differential equation t p x y t
x
R x x p x y t p x y t
16
Properties of CTMCs
Transition probability p p x y t := Pr(CTMC started in x is in y at time t) Chapman–Kolmogorov equation t1, t2 ⩾ 0 = ⇒ p x y (t1 + t2) = ∑
x′
p x x′ t1 · p x′ y t2 p is the solution of a differential equation t p x y t
x
R x x p x y t p x y t
16
Properties of CTMCs
Transition probability p p x y t := Pr(CTMC started in x is in y at time t) Chapman–Kolmogorov equation t1, t2 ⩾ 0 = ⇒ p x y (t1 + t2) = ∑
x′
p x x′ t1 · p x′ y t2 p is the solution of a differential equation t′ > 0 = ⇒ p′ x y t = ∑
x′
R x x′ · (p x′ y t − p x y t)
16
16
Difference to Traditional Probability Theory
Traditional statement for the Markov property: Pr(A | ∀t′ ⩽ t. Xt′ = xt′) = Pr(A | Xt = xt).
Not obvious to derive rule for integral and with probability 1
Advantage of working on the measure level
Equations for monadic operations
17
Difference to Traditional Probability Theory
Traditional statement for the Markov property: Pr(A | ∀t′ ⩽ t. Xt′ = xt′) = Pr(A | Xt = xt).
Not obvious to derive rule for integral and with probability 1
Advantage of working on the measure level
Equations for monadic operations
17
Difference to Traditional Probability Theory
Traditional statement for the Markov property: Pr(A | ∀t′ ⩽ t. Xt′ = xt′) = Pr(A | Xt = xt).
Not obvious to derive rule for integral and with probability 1
Advantage of working on the measure level
Equations for monadic operations
17
Difference to Traditional Probability Theory
Traditional statement for the Markov property: Pr(A | ∀t′ ⩽ t. Xt′ = xt′) = Pr(A | Xt = xt).
Not obvious to derive rule for integral and with probability 1
Advantage of working on the measure level
Equations for monadic operations
17
Difference to Traditional Probability Theory
Traditional statement for the Markov property: Pr(A | ∀t′ ⩽ t. Xt′ = xt′) = Pr(A | Xt = xt).
Not obvious to derive rule for integral and with probability 1
Advantage of working on the measure level
Equations for monadic operations
17
Related Work
(discrete: Audebaud and Paulin-Mohring [MPC 2006], and more)
(Immler, Masters thesis 2012)
(Eberl, Hölzl, and Nipkow [ESOP 2015])
(Backes, Berg, and Unruh [LPAR 2008]) but missing proofs Newly developed
18
Related Work
(discrete: Audebaud and Paulin-Mohring [MPC 2006], and more)
(Immler, Masters thesis 2012)
(Eberl, Hölzl, and Nipkow [ESOP 2015])
(Backes, Berg, and Unruh [LPAR 2008]) but missing proofs Newly developed
18
Related Work
(discrete: Audebaud and Paulin-Mohring [MPC 2006], and more)
(Immler, Masters thesis 2012)
(Eberl, Hölzl, and Nipkow [ESOP 2015])
(Backes, Berg, and Unruh [LPAR 2008]) but missing proofs Newly developed
18
Related Work
(discrete: Audebaud and Paulin-Mohring [MPC 2006], and more)
(Immler, Masters thesis 2012)
(Eberl, Hölzl, and Nipkow [ESOP 2015])
(Backes, Berg, and Unruh [LPAR 2008]) but missing proofs Newly developed
18
Related Work
(discrete: Audebaud and Paulin-Mohring [MPC 2006], and more)
(Immler, Masters thesis 2012)
(Eberl, Hölzl, and Nipkow [ESOP 2015])
(Backes, Berg, and Unruh [LPAR 2008]) but missing proofs Newly developed
18
Related Work
(discrete: Audebaud and Paulin-Mohring [MPC 2006], and more)
(Immler, Masters thesis 2012)
(Eberl, Hölzl, and Nipkow [ESOP 2015])
(Backes, Berg, and Unruh [LPAR 2008]) but missing proofs Newly developed
18
Related Work
(discrete: Audebaud and Paulin-Mohring [MPC 2006], and more)
(Immler, Masters thesis 2012)
(Eberl, Hölzl, and Nipkow [ESOP 2015])
(Backes, Berg, and Unruh [LPAR 2008]) but missing proofs Newly developed
18
Related Work
(discrete: Audebaud and Paulin-Mohring [MPC 2006], and more)
(Immler, Masters thesis 2012)
(Eberl, Hölzl, and Nipkow [ESOP 2015])
(Backes, Berg, and Unruh [LPAR 2008]) but missing proofs Newly developed
18
Related Work
(discrete: Audebaud and Paulin-Mohring [MPC 2006], and more)
(Immler, Masters thesis 2012)
(Eberl, Hölzl, and Nipkow [ESOP 2015])
(Backes, Berg, and Unruh [LPAR 2008]) but missing proofs Newly developed
18
18