SLIDE 1
Reactive Probabilistic Programming Semantics with Mixed - - PowerPoint PPT Presentation
Reactive Probabilistic Programming Semantics with Mixed - - PowerPoint PPT Presentation
Reactive Probabilistic Programming Semantics with Mixed Nondeterministic/Probabilistic Automata Albert Benveniste Jean-Baptiste Raclet INRIA Rennes and IRIT Toulouse September 18, 2020 What is Probabilistic Programming? Bringing the
SLIDE 2
SLIDE 3
Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion
SLIDE 4
Basic issues in probabilistic paradigms
◮ Specifying a probabilistic system
◮ A distribution (Bernoulli, Gaussian. . . ) ◮ A probabilistic dynamics (Markov Chain, Cyber Physical System subject to noise, Safety analysis. . . )
◮ Estimating, learning, inferring
◮ Model parameters ◮ black-box dynamics (deep learning)
◮ Statistical decision and classification
SLIDE 5
Basic issues in probabilistic paradigms
◮ Specifying a probabilistic system
◮ A distribution (Bernoulli, Gaussian. . . ) ◮ A probabilistic dynamics (Markov Chain, Cyber Physical System subject to noise, Safety analysis. . . )
◮ Estimating, learning, inferring
◮ Model parameters ◮ black-box dynamics (deep learning)
◮ Statistical decision and classification ◮ Issues
◮ Blending probabilities and nondeterminism ◮ Modularity in the above tasks
SLIDE 6
Requirements on Probabilistic Programming
◮ Probabilistic programming: offer a high-level language for the
◮ specification ◮ estimation ◮ decision/detection/classification
- f systems involving a mix of proba and nondeterminism
SLIDE 7
Requirements on Probabilistic Programming
◮ Probabilistic programming: offer a high-level language for the
◮ specification ◮ estimation ◮ decision/detection/classification
- f systems involving a mix of proba and nondeterminism
◮ Supporting important nontrivial constructions:
◮ Conditioning: π(A | B) =❞❡❢
π(A∩B) π(B)
provided that π(B) > 0 ◮ Modularity in specification, estimation, and decision: ◮ Factor Graphs & Bayesian Networks
(generalizations of Bayes rule P(X, Y) = P(X)P(Y|X)) ◮ Parallel composition
SLIDE 8
Requirements on Probabilistic Programming
◮ Probabilistic programming: offer a high-level language for the
◮ specification ◮ estimation ◮ decision/detection/classification
- f systems involving a mix of proba and nondeterminism
◮ Supporting important nontrivial constructions:
◮ Conditioning: π(A | B) =❞❡❢
π(A∩B) π(B)
provided that π(B) > 0 ◮ Modularity in specification, estimation, and decision: ◮ Factor Graphs & Bayesian Networks
(generalizations of Bayes rule P(X, Y) = P(X)P(Y|X)) ◮ Parallel composition
◮ Hosting libraries of algorithms for estimation and decision ◮ Providing a layered language for supporting all of this
SLIDE 9
Requirements on Probabilistic Programming
◮ Factor Graphs: nondirected ◮ Bayesian Networks: directed, for causal reasoning
SLIDE 10
Advantages of a layered language
3 layers, each one specifying: ◮ a probabilistic system
◮ semantics, equivalence, rewriting rules
◮ a statistical problem (probability of some property, sampling,
estimating, detecting, classifying,. . . ) ◮ semantics, equivalence, rewriting rules
◮ algorithms for solving statistical problems
∼ operational semantics
SLIDE 11
Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion
SLIDE 12
Existing approaches by statisticians and AI people
Pragmatic proposals by statisticians and AI people ◮ BUGS [Spiegelhalter 1994]: a software package for Bayesian inference
using Gibbs sampling. The software has been instrumental in raising awareness of Bayesian modelling among both academic and commercial communities internationally, and has enjoyed considerable success over its 20-year life span. 2009
◮ Stan [Carpenter 2017]: Stan is a probabilistic programming language for
specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and
- constants. As of version 2.14.0, Stan provides full Bayesian inference for
continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm.
◮ As part of TensorFlow open source platform for machine learning
SLIDE 13
Existing approaches by statisticians and AI people
◮ Programming languages for specifying
◮ Factor Graphs: nondirected ◮ Bayesian Networks: directed, for causal reasoning
◮ Emphazis is on algorithms for performing Bayesian inference;
Decentralized algorithms with local computations only (Metropolis, MCMC,. . . ) in order to scale up
SLIDE 14
Reactive Programming: ProbZelus [Baudart 2020]
A conservative extension of Lucid Synchrone synchronous language with probabilistic primitives:
◮ ① ❂ s❛♠♣❧❡ ❞: declares random variable X with distribution d ◮ ♦❜s❡r✈❡✭❞✱②✮: estimates likelihood of y wrt distribution d ◮ ✐♥❢❡r✭♠✱♦❜s✮: infers distribution of outputs of model m
based on observations of obs
SLIDE 15
Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion
SLIDE 16
Example of programming we would like to support
❙✶ : s②st❡♠ ❙②st❡♠✭②✵✮ ♦❜s❡r✈❡❞ ✉ ❛♥❞ ✐♥✐t ② ❂ ②✵ ❛♥❞ ✈ ❂ ✉ ✰ ♣r❡ ② ❛♥❞ ② ❂ ✐❢ ❢❛✐❧ t❤❡♥ ✈ ✰ ♥♦✐s❡ ❡❧s❡ ✈ ❙✷ : s②st❡♠ ◆♦✐s❡❣❡♥✭✈❛r✮ ✐♥✐t ♥♦✐s❡ ❂ ✵✳✵ ❛♥❞ ♥♦✐s❡ ❂ ✵✳✾ ✯ ♣r❡ ♥♦✐s❡ ✰ ✇ ❛♥❞ ✇ ∼ ♥♦r♠❛❧✭✵✱✈❛r✮ ❙✸ : s②st❡♠ ❋❛✐❧✉r❡✭♣✮ ✐♥✐t ❜❛❝❦✉♣ ❂ ❢❛❧s❡ ❛♥❞ ❢❛✐❧ = r♦♦t❢❛✐❧ ∧ ¬ ♣r❡ ❜❛❝❦✉♣ ❛♥❞ r♦♦t❢❛✐❧ ∼ ❇❡r♥♦✉❧❧✐✭♣✮ ❙✹ : s②st❡♠ ❙❡♥s♦r ♦❜s❡r✈❡❞ ② with parallel compositions of any of them probabilistic statements x ∼ · · · are private
SLIDE 17
Example of programming we would like to support
❙✶ : s②st❡♠ ❙②st❡♠✭②✵✮ ♦❜s❡r✈❡❞ ✉ ❛♥❞ ✐♥✐t ② ❂ ②✵ ❛♥❞ ✈ ❂ ✉ ✰ ♣r❡ ② ❛♥❞ ② ❂ ✐❢ ❢❛✐❧ t❤❡♥ ✈ ✰ ♥♦✐s❡ ❡❧s❡ ✈ ❙✹ : s②st❡♠ ❙❡♥s♦r ♦❜s❡r✈❡❞ ② progS1 ❛♥❞ ❙✹ ∀n :
- bserved(un)
vn = un + yn−1 yn = if failn then (vn + noisen) else vn
◮ The semantics is a dynamical system with observed and
unobserved signals; traces of observed signals are fixed; equations define relations.
◮ Intuition: signal un can be seen as an input; failn, noisen as
nondeterministic inputs (daemons).
SLIDE 18
Example of programming we would like to support
❙✶ : s②st❡♠ ❙②st❡♠✭②✵✮ ♦❜s❡r✈❡❞ ✉ ❛♥❞ ✐♥✐t ② ❂ ②✵ ❛♥❞ ✈ ❂ ✉ ✰ ♣r❡ ② ❛♥❞ ② ❂ ✐❢ ❢❛✐❧ t❤❡♥ ✈ ✰ ♥♦✐s❡ ❡❧s❡ ✈ ❙✹ : s②st❡♠ ❙❡♥s♦r ♦❜s❡r✈❡❞ ② ❙✶ ❛♥❞ ❙✹ ∀n :
- bserved(un, yn)
vn = un + yn−1 yn = if failn then (vn + noisen) else vn
◮ The semantics is a dynamical system with observed and
unobserved signals; traces of observed signals are fixed; equations define relations.
◮ Intuition: signal un can be seen as an input; failn, noisen as
nondeterministic inputs (daemons). Output yn is measured.
SLIDE 19
Example of programming we would like to support
❙✷ : s②st❡♠ ◆♦✐s❡❣❡♥✭✈❛r✮ ✐♥✐t ♥♦✐s❡ ❂ ✵✳✵ ❛♥❞ ♥♦✐s❡ ❂ ✵✳✾ ✯ ♣r❡ ♥♦✐s❡ ✰ ✇ ❛♥❞ ✇ ∼ ♥♦r♠❛❧✭✵✱✈❛r✮ ∀n :
- bserved(none)
noisen = 0.9 ∗ noisen−1 + wn wn ∼ N(0, var) and w i.i.d.
◮ Prior distribution of ✇ is i.i.d. (independent identically
distributed) with distribution as specified;
◮ ❙✷: probabilistic model for ♥♦✐s❡
it is a time series AR(1))
SLIDE 20
Example of programming we would like to support
Semantics of ❙✶ ❛♥❞ ❙✷ ❛♥❞ ❙✸ ❛♥❞ ❙✹: ❙✶ : s②st❡♠ ❙②st❡♠✭②✵✮ ♦❜s❡r✈❡❞ ✉ ❛♥❞ ✐♥✐t ② ❂ ②✵ ❛♥❞ ✈ ❂ ✉ ✰ ♣r❡ ② ❛♥❞ ② ❂ ✐❢ ❢❛✐❧ t❤❡♥ ✈ ✰ ♥♦✐s❡ ❡❧s❡ ✈ ❙✷ : s②st❡♠ ◆♦✐s❡❣❡♥✭✈❛r✮ ✐♥✐t ♥♦✐s❡ ❂ ✵✳✵ ❛♥❞ ♥♦✐s❡ ❂ ✵✳✾ ✯ ♣r❡ ♥♦✐s❡ ✰ ✇ ❛♥❞ ✇ ∼ ♥♦r♠❛❧✭✵✱✈❛r✮ ❙✸ : s②st❡♠ ❋❛✐❧✉r❡✭♣✮ ✐♥✐t ❜❛❝❦✉♣ ❂ ❢❛❧s❡ ❛♥❞ ❢❛✐❧ = r♦♦t❢❛✐❧ ∧ ¬ ♣r❡ ❜❛❝❦✉♣ ❛♥❞ r♦♦t❢❛✐❧ ∼ ❇❡r♥♦✉❧❧✐✭♣✮ ❙✹ :
- s②st❡♠ ❙❡♥s♦r
♦❜s❡r✈❡❞ ②
SLIDE 21
Example of programming we would like to support
Semantics of ❙✶ ❛♥❞ ❙✷ ❛♥❞ ❙✸ ❛♥❞ ❙✹:
- bserved(u, y)
init(u, y, backup) = (u0, y0, backup0) v
=
u + pre y y
=
if fail then v + noise else v noise
=
0.9 ∗ pre noise + w fail
=
rootfail ∧ ¬pre backup
(w, rootfail) ∼ N(0, var) ⊗ Bernoulli(p) ◮ y not observed = ⇒ a scheduling meeting causality
constraints is possible: Bayesian Network
◮ y observed = ⇒ distribution of (w, rootfail) is conditioned by (u, y) having the values observed: not a Bayesian Network
SLIDE 22
Example of programming we would like to support
The bottom line: ◮ Combining constraints and probabilities
defines (posterior) conditional distributions
◮ State variables are visible for interaction ◮ Probabilistic variables are private and made visible only
through the constraints relating them to the state variables
◮ Putting all of this to maths. . .
SLIDE 23
Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion
SLIDE 24
Factor Graphs + constraints = Mixed Systems
[Benveniste, Levy, Fabre, Le Guernic, TCS 1995]
SLIDE 25
Factor Graphs + constraints = Mixed Systems
Mixed System: S = (Ω, π, X, C), where: ◮ (Ω, π): private probability space (we address only discrete Ω) ◮ X set of public variables with domain Q =
x∈X Qx of states
◮ C ⊆ Ω × Q is a relation; write ωCq iff (ω, q) ∈ C Intuition:
C1(ω1, q1) (Ω1, π1) (Ω2, π2) C2(ω2, q2) x2 x1 x X, dom(X) = Q ω C(ω, q) (Ω, π) x1 x2 x C1(ω1, q1) ∧ C2(ω2, q2) (Ω1×Ω2, π1 ⊗ π2) ω2 ω1
draw ω with π(· | ∃q.C)
SLIDE 26
Factor Graphs + constraints = Mixed Systems
Mixed System: S = (Ω, π, X, C), where: ◮ (Ω, π): private probability space (we address only discrete Ω) ◮ X set of public variables with domain Q =
x∈X Qx of states
◮ C ⊆ Ω × Q is a relation; write ωCq iff (ω, q) ∈ C Operational Semantics ◮ Let Ωc = {ω ∈ Ω | ∃q : ωCq} be the consistent subset of Ω;
if π(Ωc) > 0 say that S is consistent and define
πc(A) =❞❡❢ π(A | Ωc) = π(A ∩ Ωc) π(Ωc) ◮ If S is consistent it has an operational semantics called
sampling, written S q and defined by
- 1. draw ω at random using πc
- 2. for this ω, select nondeterministically q such that ωCq
SLIDE 27
Factor Graphs + constraints = Mixed Systems
Mixed System: S = (Ω, π, X, C), where: ◮ (Ω, π): private probability space (we address only discrete Ω) ◮ X set of public variables with domain Q =
x∈X Qx of states
◮ C ⊆ Ω × Q is a relation; write ωCq iff (ω, q) ∈ C ◮ Ωc = {ω ∈ Ω | ∃q : ωCq}; πc = π(· | Ωc), provided π(Ωc) > 0 ◮ S q =❞❡❢ πc ω → some q : ωCq Compression and Equivalence, see [de Meent & al. 2018] ◮ Program equivalence is extensively discussed in this ref.;
Mixed Systems provide a formalization
◮ Private space Ω can be too large = ⇒ compress it: ω ∼ ω′ : ∀q. ωCq ⇔ ω′Cq; compress to Ω = Ω/∼, π(ω) =
ω ∈ ω π(ω) and compress relation C accordingly
◮ Equivalence: S′ ≡ S iff S′ and S are isomorphic
SLIDE 28
Factor Graphs + constraints = Mixed Systems
Parallel composition: S1 × S2 = (Ω, π, X, C), where: ◮ (Ω, π) = (Ω1, π1) ⊗ (Ω2, π2) (independent)
X
=
X1 ∪ X2 ; C = C1 ∧ C2
◮ S′
1 ≡ S1
= ⇒
S′
1 × S2 ≡ S1 × S2 (≡ is a congruence) C1(ω1, q1) (Ω1, π1) (Ω2, π2) C2(ω2, q2) x2 x1 x X, dom(X) = Q ω C(ω, q) (Ω, π) x1 x2 x C1(ω1, q1) ∧ C2(ω2, q2) (Ω1×Ω2, π1 ⊗ π2) ω2 ω1
SLIDE 29
Factor Graphs + constraints = Mixed Systems
Parallel composition: S1 × S2 = (Ω, π, X, C), where: ◮ (Ω, π) = (Ω1, π1) ⊗ (Ω2, π2) (independent)
X
=
X1 ∪ X2 ; C = C1 ∧ C2
◮ S′
1 ≡ S1
= ⇒
S′
1 × S2 ≡ S1 × S2 (≡ is a congruence)
◮ ❙ = ❙✶ ❛♥❞ ❙✷ ❛♥❞ ❙✸ ❛♥❞ ❙✹
(previous values in gray) ❙✶ :
- bserved(u)
❙✹ :
- bserved(y)
❙✶ : v = u + pre y ❙✶ : y = if fail then v + noise else v ❙✷ : noise = 0.9 ∗ pre noise + w ❙✸ : fail = rootfail ∧ ¬pre backup ❙✷ : w ∼ N(0, var) ❙✸ : rootfail ∼ Bernoulli(p) The semantics of ❙ is the parallel composition of the semantics of ❙✶, ❙✷, ❙✸, and ❙✹
SLIDE 30
Factor Graphs + constraints = Mixed Systems
Factor graph of ❙ = ❙✶ ❛♥❞ ❙✷ ❛♥❞ ❙✸ ❛♥❞ ❙✹: ❙✸ and ❙✶ interact via ❢❛✐❧, and similarly for all interactions:
② ❙✸ ❢❛✐❧ ❙✶ ♥♦✐s❡ ❙✷ ❙✹ ❙✶ :
- bserved(u)
❙✹ :
- bserved(y)
❙✶ : v = u + pre y ❙✶ : y = if fail then v + noise else v ❙✷ : noise = 0.9 ∗ pre noise + w ❙✸ : fail = rootfail ∧ ¬pre backup ❙✷ : w ∼ N(0, var) ❙✸ : rootfail ∼ Bernoulli(p) Parallel composition of mixed systems subsumes factor graphs
SLIDE 31
Bayesian networks from Mixed Systems
Marginal MY(S): Let S = (Ω, π, X, C) and Y ⊂ X
- 1. Define
MY(S) =❞❡❢ (Ω, π, Y, proj
Y(C));
- 2. Projecting C over Y makes (Ω, π) possibly too large
= ⇒ compress if needed
Subsumes marginals of distributions
Lemma (incremental sampling)
If composing S1 with S2 does not affect S1: MX1(S1×S2) ≡ S1, then, S1×S2 can be sampled incrementally, by
- 1. sampling S1 q1, and,
- 2. sampling S2 given q1
formally, by sampling (X1=q1) × S2. To indicate incremental sampling, we write S1
× S2
SLIDE 32
Bayesian networks from Mixed Systems
Bayesian network
G = (S, E):
◮ DAG G = (S, E), S set of Mixed Systems, E directed edges;
let be the partial order defined by this DAG;
◮ Say that G is a Bayesian network if, for all S ∈ S, the parallel
composition
- S′≺S S′
× S is incremental.
Example:
❙✸ → ❢❛✐❧ → ❙✶ ← ♥♦✐s❡ ← ❙✷
❙✶ :
- bserved(u)
❙✶ : v = u + pre y ❙✶ : y = if fail then v + noise else v ❙✷ : noise = 0.9 ∗ pre noise + w ❙✷ : w ∼ N(0, var) ❙✸ : fail = rootfail ∧ ¬pre backup ❙✸ : rootfail ∼ Bernoulli(p)
SLIDE 33
Bayesian networks from Mixed Systems
Revisiting Bayes’ rule p(x, y) = p(y|x) p(x) ◮ Message Passing algorithms from statistics and AI transform
◮ tree-shaped Factor Graphs ◮ to Bayesian Networks ( = ⇒ incremental sampling)
◮ Key idea to extend this to Mixed Systems: regard
◮ conditional distribution p(y|x) ◮ as a disjunction
x p(y|x)
(nondeterministic choice among the alternatives for x)
SLIDE 34
Bayesian networks from Mixed Systems
Disjunction:
i∈I Si
Nondeterministic choice among the alternatives for i. Disjunction subsumes Transition Probabilities p(y|x)
SLIDE 35
Bayesian networks from Mixed Systems
Disjunction:
i∈I Si
and define S
× (S1
- S2) =❞❡❢ (S
× S1)
- (S
× S2) Conditional CY(S): Let S = (Ω, π, X, C) and Y ⊂ X:
CY(S) =❞❡❢
- qY : MY(S) qY
- (Y = qY) × S
- S where Y=qY
Theorem (Bayes formula)
S ≡ MY(S)
× CY(S)
SLIDE 36
Bayesian networks from Mixed Systems
Corollary (Message passing step)
S = (Ω, π, X, C) and S′ = (Ω′, π′, X ′, C′) , Y = X ∩ X ′. Then: S′ × S ≡ (S′ × MY(S))
× CY(S) Theorem (Message passing algorithm)
Let S =
i∈I Si be a parallel composition of systems, whose factor
graph is a tree. Select a root node of this tree. Applying the message passing step inward, starting from the leaves toward the root, yields a Bayesian network with incremental sampling. See [Loeliger2004] (An introduction to Factor Graphs) for more info
- n Factor Graphs, Bayesian Networks, and Message Passing in
statistics and signal processing.
SLIDE 37
Bayesian networks from Mixed Systems
Example: factor graph of ❙✶ ❛♥❞ ❙✷ ❛♥❞ ❙✸ ❛♥❞ ❙✹
② ❙✷ ❢❛✐❧ ❙✶ ♥♦✐s❡ ❙✸ ❙✹ The message passing algorithm yields the Bayesian network C②(❙✹)
↑ ② ↑
C❢❛✐❧(❙✸) ← ❢❛✐❧ ← ❙✶ → ♥♦✐s❡ → C♥♦✐s❡(❙✷) where
❙✶ = ❙✶ × M♥♦✐s❡(❙✷) × M❢❛✐❧(❙✸) × M②(❙✹)
SLIDE 38
Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion
SLIDE 39
Preliminaries to Mixed Automata
The idea:
Upgrade: automata ր probabilistic automata ր mixed automata:
- 1. automata: q
α
− → q′
- 2. simple probabilistic automata (Segala-Lynch) q
α
− → π′ q′
- 3. mixed automata:
(to be defined / inherited from Mixed Systems) ◮ q
α
− → S′ q′ ◮ parallel composition, equivalence (via compression) ◮ simulation relation
SLIDE 40
Preliminaries to Mixed Automata
To support q
α
− → S′ q′, extend Mixed System to
Mixed System with previous state S = (Ω, π, X, p, C) ◮ (Ω, π): private probability space (we address only denum Ω) ◮ X set of public variables with domain Q =
x∈X Qx of states
◮ •Q a copy of Q, p ∈ •Q is a parameter; write S(p) ◮ C ⊆ •Q×Ω×Q is a relation; write ωCq iff (p, ω, q) ∈ C ◮ Allows chaining transitions to form runs:
q0
1
− → S1 q1
2
− → S2 q2 . . . qk−1
k
− → Sk qk
SLIDE 41
Preliminaries to Mixed Automata
To support simulation relations:
Define a simulation relation ≤ on pairs of states
- 1. Automata:
q1
α
− → q′
1
q1 ≤ q2
- =
⇒ ∃q′
2 :
- q2
α
− → q′
2
q′
1 ≤ q′ 2
- 2. Probabilistic Automata:
q1
α
− → π′
1
q1 ≤ q2
- =
⇒ ∃π′
2 :
- q2
α
− → π′
2
π′
1 ≤P π′ 2 ( q′ 1 ≤ q′ 2)
≤P: lifting of ≤ to pairs of probabilistic states
- 3. Mixed Automata:
q1
α
− → S′
1
q1 ≤ q2
- =
⇒ ∃S′
2 :
- q2
α
− → S′
2
S′
1 ≤S S′ 2 ( q′ 1 ≤ q′ 2)
SLIDE 42
Preliminaries to Mixed Automata
Lifting relations, from pairs of states to pairs of systems
Given a relation ρ ⊆ Q1×Q2, ρS ⊆S(X1)×S(X2) is the lifting of ρ if there exists a weighting function w : Ω1×Ω2→[0, 1] such that:
- 1. ∀(ω1, ω2; q1) :
w(ω1, ω2) > 0
ω1 C1 q1
- =
⇒ ∃q2 :
- q1 ρ q2
ω2 C2 q2
- 2. Weighting function w projects to π1 and π2:
- ω2 w(ω1, ω2)
= π1(ω1)
- ω1 w(ω1, ω2)
= π2(ω2) Lemma (equivalence preserves lifting)
S1 ρS S2 S′
1 ≡ S1
- =
⇒ S′
1 ρS S2
SLIDE 43
Mixed Automata
Mixed automaton: M = (Σ, X, q0, →) and q
α
− → S′ q′ ◮ q
α
− → S′ is the transition relation, α ∈ Σ; deterministic ◮ S′ has previous state q and X as set of variables ◮ S′ q′ is Mixed Systems sampling
Simulation: q1
α
− → S′
1
q1 ≤ q2
- =
⇒ ∃S′
2 :
- q2
α
− → S′
2
S′
1 ≤S S′ 2 ( q′ 1 ≤ q′ 2)
Parallel composition: qi
αi
− → S′
i
α1 ⊓ α2 α = α1 ⊔ α2 = ⇒ M1 × M2 : (q1, q2)
α
− → S′
1 × S′ 2 (q′ 1, q′ 2)
Thm: N1 ≤ M1 =
⇒ N1 × M2 ≤ M1 × M2
SLIDE 44
Mixed Automata
Inheriting Bayes Calculus from Mixed Systems:
◮ Mixed Automata inherit, from Mixed Systems, Bayes Calculus
in space; transition relations capture factor graphs and Bayesian networks.
◮ Mixed Automata, however, remain a causal model:
the current transition depends on the past, not on the future.
◮ Consequently, Mixed Automata cannot be used to specify
smoothing problems in time ◮ e.g., estimating zk based on X0, . . . , Xk, . . . , XN To overcome this, we must unfold time as space.
SLIDE 45
Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion
SLIDE 46
Stateless fragment of ReactiveBayes
Syntax: e
::=
c | x | ω | (e, e) | op(e) | f(e) ev
::=
c | x | (ev, ev) | op(ev) | f(ev) S
::= ω ∼ P(ev) | e = e | ♦❜s❡r✈❡ x | S ❛♥❞ S
P(ev): proba distributions parameterized with visible expressions Semantics:
[ [♦❜s❡r✈❡ x] ] = (·, ·, {x}, x = c) [ [ω ∼ P] ] = (Ω, P, {xω}, xω = ω) [ [ω ∼ P(ev(x1, . . . , xp))] ] =
- ev(x1,...,xp)←c [
[ω ∼ P(c)] ] [ [e(x1, . . . , xp) = e′(xp+1, . . . , xp+m)] ] = (·, ·, {x1, . . . , xp+m}, e = e′) [ [S1 ❛♥❞ S2] ] = [ [S1] ] × [ [S2] ]
Factor graph of S follows.
SLIDE 47
Stateless fragment of ReactiveBayes
Compilation to extended Bayesian Networks
◮ Transition relation compiled to extended Bayesian Network G:
S is ♦❜s❡r✈❡ x
⊢
→
G [ [S] ] = {is_source(x)}
S is ω ∼ P
⊢
→
G [ [S] ] = {xω}
S is ω ∼ P(ev(x1, . . . , xp))
⊢
→
G [ [S] ] = {x1, . . . , xp} → S → xω
S is x = e(x1, . . . , xp)
⊢
→
G [ [S] ] = {x1, . . . , xp} → S → x
S is S1 ❛♥❞ S2
⊢
→
G [ [S] ] =
→
G [ [S1] ] ∪
→
G [ [S2] ] ◮ The compilation succeeds if G is circuitfree. ◮ Resulting scheduling encodes incremental sampling.
SLIDE 48
Semantics of ReactiveBayes
Syntax (in red: additional items) e
::=
c | x | ω | (e, e) | op(e) | f(e) | ♣r❡ x ev
::=
c | x | (ev, ev) | op(ev) | f(ev) | ♣r❡ x S
::= ω ∼ P(ev) | e = e | ♦❜s❡r✈❡ x | S ❛♥❞ S | ✐♥✐t x = c α ::=
“true” occurrences of ev of type Bool A
::= ♦♥ α t❤❡♥ S ❡❧s❡ S | ✐♥✐t x = c | A ❛♥❞ A
Semantics (for additional items)
[ [♣r❡ x] ] =
- ♣r❡ x←c (·, ·, {x}, c, ·)
[ [✐♥✐t x = c] ] = (∅, {x}, c, ∅) [ [♦♥ t❤❡♥ S1 ❡❧s❡ S2 ] ] =
S1 and S2 have previous state p
{α, ¬α} , X1 ∪ X2 , ·
- p
α
− → S1, p
¬α
− → S2
-
[ [A1 ❛♥❞ A2] ] = [ [A1] ] × [ [A2] ]
SLIDE 49
Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion
SLIDE 50
Comparison with Probabilistic Automata
Probabilistic Automata:
[Segala94],[Segala & Lynch03], tutorial [Sokolova & de Vink04] P = (Σ, Q, q0, →) Simple PA
: → ⊆ Q×Σ×P(Q)
PA
: → ⊆ Q×P(Σ×Q) Theorem (SPA, PA, and Mixed Automata (MA))
- 1. ∃ mapping: SPA → MA, preserving simulation and product.
∃ reverse mapping: MA → SPA, preserving simulation. ∃ reverse mapping preserving parallel composition.
- 2. ∃ mapping PA → MA, preserving simulation.
Parallel composition, however, is not preserved.
SLIDE 51
Comparison with Probabilistic Automata
What is the problem with parallel composition? ◮ SPA → ⊆ Q×Σ×P(Q):
◮ Synchronization and parallel composition interleave = ⇒ no difficulty for defining parallel composition; ◮ however, impossible to describe any interaction among probability spaces = ⇒ low expressive power.
◮ SPA → ⊆ Q×P(Σ×Q):
◮ more expressive; supports nondeterminism; action α is probabilistically chosen; ◮ conflict between (1) independent probabilistic choice of actions in each component, and (2) the need for synchronizing actions; ◮ various ways of solving this conflict; most authors add a scheduler giving hand to one component that is itself randomly
- selected. Not appropriate for probabilistic programming.
SLIDE 52
(Non reactive) Probabilistic Programming
[J-W. van de Meent, B. Paige, H. Yang, F. Wood: An introduction to probabilistic programming, 2018] ◮ Mapping probabilistic programs to graphical models
◮ Bayesian Networks (≈ functions) ◮ or Factor Graphs (≈ relations), depending on source language
◮ Long discussions about program equivalence (Sect. 3.1) ◮ Covers many extensions including recursion in language
(mainly targeting causal stochastic processes)
Our positioning with Mixed Systems: ++ Our semantics blends Bayesian Networks and Factor Graphs ++ Equivalence through compression − No recursion; reactive covered (Mixed Automata) − Supporting continuous proba distributions is technical
SLIDE 53
Reactive Probabilistic Programming
ProbZelus [Baudart & al. 2020] ◮ Reactive Proba Programs ≈ Bayesian Network + functions ◮ No direct modeling of Factor Graphs; indirect support through
the ♦❜s❡r✈❡ primitive, specifying inference problems
Our positioning with Mixed Automata: + We support both Bayesian Networks and Factor Graphs + Equivalence through compression + Mixed Automata ⊂ ProbZelus if we use only ×
SLIDE 54
Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion
SLIDE 55
Limitations of Mixed Automata and Fixes
Continuous Probability distributions ◮ Doable but technical ◮ Difficulty: interaction between (Ω, π) and C via conditioning:
quite often π(Ωc) = 0 for seemingly consistent models, which forbids the naive block πc(A) = π(A∩Ωc)
π(Ωc)
used in sampling ◮ Example: (X, Y) ∼ π with C stating that Y is observed = ⇒
fixes the value of Y, which has a zero probability
Addressing this difficulty (in progress) by using ◮ Conditional expectations E(f | G) where f : Ω → R+ and G a
sub-σ-algebra of F, the σ-algebra on Ω (always exist)
◮ Regular versions of conditional expectations, which exist in
restricted cases (covering usual needs) ◮ Example: transition probability P(y, X)
SLIDE 56
Limitations of Mixed Automata and Fixes
Handling constraints ◮ Sampling: computing {q | ωCq} ◮ Compressions: computing ω ∼ ω′ : ∀q . ωCq iff ω′Cq ◮ Projections: computing proj
Y(C)
Addressing this difficulty (TBD) by using abstractions: ◮ Sampling: restrict computing {q | ωCq} to easy cases
(Boolean); otherwise make sure that ω → q is a function and use causality graphs
◮ Compressions: develop sufficient conditions showing S′ ≡ S ◮ Projections: use graph based algorithms as abstractions
SLIDE 57
Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion
SLIDE 58
Conclusion
Mixed Systems ◮ They subsume factor graphs, Bayesian networks, and
constraints
◮ They come with equivalence, parallel composition, and the
lifting of relations from states to systems
Mixed Automata ◮ On top of Mixed Systems ◮ They subsume Probabilistic Automata regarding simulation
relations and Simple PA regarding parallel composition
◮ The parallel composition of PA differs from ours and does not
allow for describing interactions between probabilistic parts
SLIDE 59