Reactive Probabilistic Programming Semantics with Mixed - - PowerPoint PPT Presentation

reactive probabilistic programming semantics with mixed
SMART_READER_LITE
LIVE PREVIEW

Reactive Probabilistic Programming Semantics with Mixed - - PowerPoint PPT Presentation

Reactive Probabilistic Programming Semantics with Mixed Nondeterministic/Probabilistic Automata Albert Benveniste Jean-Baptiste Raclet INRIA Rennes and IRIT Toulouse September 18, 2020 What is Probabilistic Programming? Bringing the


slide-1
SLIDE 1

Reactive Probabilistic Programming Semantics with Mixed Nondeterministic/Probabilistic Automata

Albert Benveniste Jean-Baptiste Raclet

INRIA Rennes and IRIT Toulouse

September 18, 2020

slide-2
SLIDE 2

What is Probabilistic Programming?

Bringing the inference algorithms and theory from statistics combined with formal semantics, compilers, and other tools from programming languages to build efficient inference evaluators for models and applications from Machine Learning. [...] Probabilistic programming is a tool for statistical modeling. (Fabiana Clemente)

slide-3
SLIDE 3

Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion

slide-4
SLIDE 4

Basic issues in probabilistic paradigms

◮ Specifying a probabilistic system

◮ A distribution (Bernoulli, Gaussian. . . ) ◮ A probabilistic dynamics (Markov Chain, Cyber Physical System subject to noise, Safety analysis. . . )

◮ Estimating, learning, inferring

◮ Model parameters ◮ black-box dynamics (deep learning)

◮ Statistical decision and classification

slide-5
SLIDE 5

Basic issues in probabilistic paradigms

◮ Specifying a probabilistic system

◮ A distribution (Bernoulli, Gaussian. . . ) ◮ A probabilistic dynamics (Markov Chain, Cyber Physical System subject to noise, Safety analysis. . . )

◮ Estimating, learning, inferring

◮ Model parameters ◮ black-box dynamics (deep learning)

◮ Statistical decision and classification ◮ Issues

◮ Blending probabilities and nondeterminism ◮ Modularity in the above tasks

slide-6
SLIDE 6

Requirements on Probabilistic Programming

◮ Probabilistic programming: offer a high-level language for the

◮ specification ◮ estimation ◮ decision/detection/classification

  • f systems involving a mix of proba and nondeterminism
slide-7
SLIDE 7

Requirements on Probabilistic Programming

◮ Probabilistic programming: offer a high-level language for the

◮ specification ◮ estimation ◮ decision/detection/classification

  • f systems involving a mix of proba and nondeterminism

◮ Supporting important nontrivial constructions:

◮ Conditioning: π(A | B) =❞❡❢

π(A∩B) π(B)

provided that π(B) > 0 ◮ Modularity in specification, estimation, and decision: ◮ Factor Graphs & Bayesian Networks

(generalizations of Bayes rule P(X, Y) = P(X)P(Y|X)) ◮ Parallel composition

slide-8
SLIDE 8

Requirements on Probabilistic Programming

◮ Probabilistic programming: offer a high-level language for the

◮ specification ◮ estimation ◮ decision/detection/classification

  • f systems involving a mix of proba and nondeterminism

◮ Supporting important nontrivial constructions:

◮ Conditioning: π(A | B) =❞❡❢

π(A∩B) π(B)

provided that π(B) > 0 ◮ Modularity in specification, estimation, and decision: ◮ Factor Graphs & Bayesian Networks

(generalizations of Bayes rule P(X, Y) = P(X)P(Y|X)) ◮ Parallel composition

◮ Hosting libraries of algorithms for estimation and decision ◮ Providing a layered language for supporting all of this

slide-9
SLIDE 9

Requirements on Probabilistic Programming

◮ Factor Graphs: nondirected ◮ Bayesian Networks: directed, for causal reasoning

slide-10
SLIDE 10

Advantages of a layered language

3 layers, each one specifying: ◮ a probabilistic system

◮ semantics, equivalence, rewriting rules

◮ a statistical problem (probability of some property, sampling,

estimating, detecting, classifying,. . . ) ◮ semantics, equivalence, rewriting rules

◮ algorithms for solving statistical problems

∼ operational semantics

slide-11
SLIDE 11

Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion

slide-12
SLIDE 12

Existing approaches by statisticians and AI people

Pragmatic proposals by statisticians and AI people ◮ BUGS [Spiegelhalter 1994]: a software package for Bayesian inference

using Gibbs sampling. The software has been instrumental in raising awareness of Bayesian modelling among both academic and commercial communities internationally, and has enjoyed considerable success over its 20-year life span. 2009

◮ Stan [Carpenter 2017]: Stan is a probabilistic programming language for

specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and

  • constants. As of version 2.14.0, Stan provides full Bayesian inference for

continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm.

◮ As part of TensorFlow open source platform for machine learning

slide-13
SLIDE 13

Existing approaches by statisticians and AI people

◮ Programming languages for specifying

◮ Factor Graphs: nondirected ◮ Bayesian Networks: directed, for causal reasoning

◮ Emphazis is on algorithms for performing Bayesian inference;

Decentralized algorithms with local computations only (Metropolis, MCMC,. . . ) in order to scale up

slide-14
SLIDE 14

Reactive Programming: ProbZelus [Baudart 2020]

A conservative extension of Lucid Synchrone synchronous language with probabilistic primitives:

◮ ① ❂ s❛♠♣❧❡ ❞: declares random variable X with distribution d ◮ ♦❜s❡r✈❡✭❞✱②✮: estimates likelihood of y wrt distribution d ◮ ✐♥❢❡r✭♠✱♦❜s✮: infers distribution of outputs of model m

based on observations of obs

slide-15
SLIDE 15

Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion

slide-16
SLIDE 16

Example of programming we would like to support

❙✶ :            s②st❡♠ ❙②st❡♠✭②✵✮ ♦❜s❡r✈❡❞ ✉ ❛♥❞ ✐♥✐t ② ❂ ②✵ ❛♥❞ ✈ ❂ ✉ ✰ ♣r❡ ② ❛♥❞ ② ❂ ✐❢ ❢❛✐❧ t❤❡♥ ✈ ✰ ♥♦✐s❡ ❡❧s❡ ✈ ❙✷ :        s②st❡♠ ◆♦✐s❡❣❡♥✭✈❛r✮ ✐♥✐t ♥♦✐s❡ ❂ ✵✳✵ ❛♥❞ ♥♦✐s❡ ❂ ✵✳✾ ✯ ♣r❡ ♥♦✐s❡ ✰ ✇ ❛♥❞ ✇ ∼ ♥♦r♠❛❧✭✵✱✈❛r✮ ❙✸ :        s②st❡♠ ❋❛✐❧✉r❡✭♣✮ ✐♥✐t ❜❛❝❦✉♣ ❂ ❢❛❧s❡ ❛♥❞ ❢❛✐❧ = r♦♦t❢❛✐❧ ∧ ¬ ♣r❡ ❜❛❝❦✉♣ ❛♥❞ r♦♦t❢❛✐❧ ∼ ❇❡r♥♦✉❧❧✐✭♣✮ ❙✹ : s②st❡♠ ❙❡♥s♦r ♦❜s❡r✈❡❞ ② with parallel compositions of any of them probabilistic statements x ∼ · · · are private

slide-17
SLIDE 17

Example of programming we would like to support

❙✶ :            s②st❡♠ ❙②st❡♠✭②✵✮ ♦❜s❡r✈❡❞ ✉ ❛♥❞ ✐♥✐t ② ❂ ②✵ ❛♥❞ ✈ ❂ ✉ ✰ ♣r❡ ② ❛♥❞ ② ❂ ✐❢ ❢❛✐❧ t❤❡♥ ✈ ✰ ♥♦✐s❡ ❡❧s❡ ✈ ❙✹ : s②st❡♠ ❙❡♥s♦r ♦❜s❡r✈❡❞ ② progS1 ❛♥❞ ❙✹ ∀n :   

  • bserved(un)

vn = un + yn−1 yn = if failn then (vn + noisen) else vn

◮ The semantics is a dynamical system with observed and

unobserved signals; traces of observed signals are fixed; equations define relations.

◮ Intuition: signal un can be seen as an input; failn, noisen as

nondeterministic inputs (daemons).

slide-18
SLIDE 18

Example of programming we would like to support

❙✶ :            s②st❡♠ ❙②st❡♠✭②✵✮ ♦❜s❡r✈❡❞ ✉ ❛♥❞ ✐♥✐t ② ❂ ②✵ ❛♥❞ ✈ ❂ ✉ ✰ ♣r❡ ② ❛♥❞ ② ❂ ✐❢ ❢❛✐❧ t❤❡♥ ✈ ✰ ♥♦✐s❡ ❡❧s❡ ✈ ❙✹ : s②st❡♠ ❙❡♥s♦r ♦❜s❡r✈❡❞ ② ❙✶ ❛♥❞ ❙✹ ∀n :   

  • bserved(un, yn)

vn = un + yn−1 yn = if failn then (vn + noisen) else vn

◮ The semantics is a dynamical system with observed and

unobserved signals; traces of observed signals are fixed; equations define relations.

◮ Intuition: signal un can be seen as an input; failn, noisen as

nondeterministic inputs (daemons). Output yn is measured.

slide-19
SLIDE 19

Example of programming we would like to support

❙✷ :        s②st❡♠ ◆♦✐s❡❣❡♥✭✈❛r✮ ✐♥✐t ♥♦✐s❡ ❂ ✵✳✵ ❛♥❞ ♥♦✐s❡ ❂ ✵✳✾ ✯ ♣r❡ ♥♦✐s❡ ✰ ✇ ❛♥❞ ✇ ∼ ♥♦r♠❛❧✭✵✱✈❛r✮ ∀n :   

  • bserved(none)

noisen = 0.9 ∗ noisen−1 + wn wn ∼ N(0, var) and w i.i.d.

◮ Prior distribution of ✇ is i.i.d. (independent identically

distributed) with distribution as specified;

◮ ❙✷: probabilistic model for ♥♦✐s❡

it is a time series AR(1))

slide-20
SLIDE 20

Example of programming we would like to support

Semantics of ❙✶ ❛♥❞ ❙✷ ❛♥❞ ❙✸ ❛♥❞ ❙✹: ❙✶ :            s②st❡♠ ❙②st❡♠✭②✵✮ ♦❜s❡r✈❡❞ ✉ ❛♥❞ ✐♥✐t ② ❂ ②✵ ❛♥❞ ✈ ❂ ✉ ✰ ♣r❡ ② ❛♥❞ ② ❂ ✐❢ ❢❛✐❧ t❤❡♥ ✈ ✰ ♥♦✐s❡ ❡❧s❡ ✈ ❙✷ :        s②st❡♠ ◆♦✐s❡❣❡♥✭✈❛r✮ ✐♥✐t ♥♦✐s❡ ❂ ✵✳✵ ❛♥❞ ♥♦✐s❡ ❂ ✵✳✾ ✯ ♣r❡ ♥♦✐s❡ ✰ ✇ ❛♥❞ ✇ ∼ ♥♦r♠❛❧✭✵✱✈❛r✮ ❙✸ :        s②st❡♠ ❋❛✐❧✉r❡✭♣✮ ✐♥✐t ❜❛❝❦✉♣ ❂ ❢❛❧s❡ ❛♥❞ ❢❛✐❧ = r♦♦t❢❛✐❧ ∧ ¬ ♣r❡ ❜❛❝❦✉♣ ❛♥❞ r♦♦t❢❛✐❧ ∼ ❇❡r♥♦✉❧❧✐✭♣✮ ❙✹ :

  • s②st❡♠ ❙❡♥s♦r

♦❜s❡r✈❡❞ ②

slide-21
SLIDE 21

Example of programming we would like to support

Semantics of ❙✶ ❛♥❞ ❙✷ ❛♥❞ ❙✸ ❛♥❞ ❙✹:

                  

  • bserved(u, y)

init(u, y, backup) = (u0, y0, backup0) v

=

u + pre y y

=

if fail then v + noise else v noise

=

0.9 ∗ pre noise + w fail

=

rootfail ∧ ¬pre backup

(w, rootfail) ∼ N(0, var) ⊗ Bernoulli(p) ◮ y not observed = ⇒ a scheduling meeting causality

constraints is possible: Bayesian Network

◮ y observed = ⇒ distribution of (w, rootfail) is conditioned by (u, y) having the values observed: not a Bayesian Network

slide-22
SLIDE 22

Example of programming we would like to support

The bottom line: ◮ Combining constraints and probabilities

defines (posterior) conditional distributions

◮ State variables are visible for interaction ◮ Probabilistic variables are private and made visible only

through the constraints relating them to the state variables

◮ Putting all of this to maths. . .

slide-23
SLIDE 23

Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion

slide-24
SLIDE 24

Factor Graphs + constraints = Mixed Systems

[Benveniste, Levy, Fabre, Le Guernic, TCS 1995]

slide-25
SLIDE 25

Factor Graphs + constraints = Mixed Systems

Mixed System: S = (Ω, π, X, C), where: ◮ (Ω, π): private probability space (we address only discrete Ω) ◮ X set of public variables with domain Q =

x∈X Qx of states

◮ C ⊆ Ω × Q is a relation; write ωCq iff (ω, q) ∈ C Intuition:

C1(ω1, q1) (Ω1, π1) (Ω2, π2) C2(ω2, q2) x2 x1 x X, dom(X) = Q ω C(ω, q) (Ω, π) x1 x2 x C1(ω1, q1) ∧ C2(ω2, q2) (Ω1×Ω2, π1 ⊗ π2) ω2 ω1

draw ω with π(· | ∃q.C)

slide-26
SLIDE 26

Factor Graphs + constraints = Mixed Systems

Mixed System: S = (Ω, π, X, C), where: ◮ (Ω, π): private probability space (we address only discrete Ω) ◮ X set of public variables with domain Q =

x∈X Qx of states

◮ C ⊆ Ω × Q is a relation; write ωCq iff (ω, q) ∈ C Operational Semantics ◮ Let Ωc = {ω ∈ Ω | ∃q : ωCq} be the consistent subset of Ω;

if π(Ωc) > 0 say that S is consistent and define

πc(A) =❞❡❢ π(A | Ωc) = π(A ∩ Ωc) π(Ωc) ◮ If S is consistent it has an operational semantics called

sampling, written S q and defined by

  • 1. draw ω at random using πc
  • 2. for this ω, select nondeterministically q such that ωCq
slide-27
SLIDE 27

Factor Graphs + constraints = Mixed Systems

Mixed System: S = (Ω, π, X, C), where: ◮ (Ω, π): private probability space (we address only discrete Ω) ◮ X set of public variables with domain Q =

x∈X Qx of states

◮ C ⊆ Ω × Q is a relation; write ωCq iff (ω, q) ∈ C ◮ Ωc = {ω ∈ Ω | ∃q : ωCq}; πc = π(· | Ωc), provided π(Ωc) > 0 ◮ S q =❞❡❢ πc ω → some q : ωCq Compression and Equivalence, see [de Meent & al. 2018] ◮ Program equivalence is extensively discussed in this ref.;

Mixed Systems provide a formalization

◮ Private space Ω can be too large = ⇒ compress it: ω ∼ ω′ : ∀q. ωCq ⇔ ω′Cq; compress to Ω = Ω/∼, π(ω) =

ω ∈ ω π(ω) and compress relation C accordingly

◮ Equivalence: S′ ≡ S iff S′ and S are isomorphic

slide-28
SLIDE 28

Factor Graphs + constraints = Mixed Systems

Parallel composition: S1 × S2 = (Ω, π, X, C), where: ◮ (Ω, π) = (Ω1, π1) ⊗ (Ω2, π2) (independent)

X

=

X1 ∪ X2 ; C = C1 ∧ C2

◮ S′

1 ≡ S1

= ⇒

S′

1 × S2 ≡ S1 × S2 (≡ is a congruence) C1(ω1, q1) (Ω1, π1) (Ω2, π2) C2(ω2, q2) x2 x1 x X, dom(X) = Q ω C(ω, q) (Ω, π) x1 x2 x C1(ω1, q1) ∧ C2(ω2, q2) (Ω1×Ω2, π1 ⊗ π2) ω2 ω1

slide-29
SLIDE 29

Factor Graphs + constraints = Mixed Systems

Parallel composition: S1 × S2 = (Ω, π, X, C), where: ◮ (Ω, π) = (Ω1, π1) ⊗ (Ω2, π2) (independent)

X

=

X1 ∪ X2 ; C = C1 ∧ C2

◮ S′

1 ≡ S1

= ⇒

S′

1 × S2 ≡ S1 × S2 (≡ is a congruence)

◮ ❙ = ❙✶ ❛♥❞ ❙✷ ❛♥❞ ❙✸ ❛♥❞ ❙✹

(previous values in gray)                        ❙✶ :

  • bserved(u)

❙✹ :

  • bserved(y)

❙✶ : v = u + pre y ❙✶ : y = if fail then v + noise else v ❙✷ : noise = 0.9 ∗ pre noise + w ❙✸ : fail = rootfail ∧ ¬pre backup ❙✷ : w ∼ N(0, var) ❙✸ : rootfail ∼ Bernoulli(p) The semantics of ❙ is the parallel composition of the semantics of ❙✶, ❙✷, ❙✸, and ❙✹

slide-30
SLIDE 30

Factor Graphs + constraints = Mixed Systems

Factor graph of ❙ = ❙✶ ❛♥❞ ❙✷ ❛♥❞ ❙✸ ❛♥❞ ❙✹: ❙✸ and ❙✶ interact via ❢❛✐❧, and similarly for all interactions:

② ❙✸ ❢❛✐❧ ❙✶ ♥♦✐s❡ ❙✷ ❙✹                        ❙✶ :

  • bserved(u)

❙✹ :

  • bserved(y)

❙✶ : v = u + pre y ❙✶ : y = if fail then v + noise else v ❙✷ : noise = 0.9 ∗ pre noise + w ❙✸ : fail = rootfail ∧ ¬pre backup ❙✷ : w ∼ N(0, var) ❙✸ : rootfail ∼ Bernoulli(p) Parallel composition of mixed systems subsumes factor graphs

slide-31
SLIDE 31

Bayesian networks from Mixed Systems

Marginal MY(S): Let S = (Ω, π, X, C) and Y ⊂ X

  • 1. Define

MY(S) =❞❡❢ (Ω, π, Y, proj

Y(C));

  • 2. Projecting C over Y makes (Ω, π) possibly too large

= ⇒ compress if needed

Subsumes marginals of distributions

Lemma (incremental sampling)

If composing S1 with S2 does not affect S1: MX1(S1×S2) ≡ S1, then, S1×S2 can be sampled incrementally, by

  • 1. sampling S1 q1, and,
  • 2. sampling S2 given q1

formally, by sampling (X1=q1) × S2. To indicate incremental sampling, we write S1

× S2

slide-32
SLIDE 32

Bayesian networks from Mixed Systems

Bayesian network

G = (S, E):

◮ DAG G = (S, E), S set of Mixed Systems, E directed edges;

let be the partial order defined by this DAG;

◮ Say that G is a Bayesian network if, for all S ∈ S, the parallel

composition

  • S′≺S S′

× S is incremental.

Example:

❙✸ → ❢❛✐❧ → ❙✶ ← ♥♦✐s❡ ← ❙✷                 

❙✶ :

  • bserved(u)

❙✶ : v = u + pre y ❙✶ : y = if fail then v + noise else v ❙✷ : noise = 0.9 ∗ pre noise + w ❙✷ : w ∼ N(0, var) ❙✸ : fail = rootfail ∧ ¬pre backup ❙✸ : rootfail ∼ Bernoulli(p)

slide-33
SLIDE 33

Bayesian networks from Mixed Systems

Revisiting Bayes’ rule p(x, y) = p(y|x) p(x) ◮ Message Passing algorithms from statistics and AI transform

◮ tree-shaped Factor Graphs ◮ to Bayesian Networks ( = ⇒ incremental sampling)

◮ Key idea to extend this to Mixed Systems: regard

◮ conditional distribution p(y|x) ◮ as a disjunction

x p(y|x)

(nondeterministic choice among the alternatives for x)

slide-34
SLIDE 34

Bayesian networks from Mixed Systems

Disjunction:

i∈I Si

Nondeterministic choice among the alternatives for i. Disjunction subsumes Transition Probabilities p(y|x)

slide-35
SLIDE 35

Bayesian networks from Mixed Systems

Disjunction:

i∈I Si

and define S

× (S1

  • S2) =❞❡❢ (S

× S1)

  • (S

× S2) Conditional CY(S): Let S = (Ω, π, X, C) and Y ⊂ X:

CY(S) =❞❡❢

  • qY : MY(S) qY
  • (Y = qY) × S
  • S where Y=qY

Theorem (Bayes formula)

S ≡ MY(S)

× CY(S)

slide-36
SLIDE 36

Bayesian networks from Mixed Systems

Corollary (Message passing step)

S = (Ω, π, X, C) and S′ = (Ω′, π′, X ′, C′) , Y = X ∩ X ′. Then: S′ × S ≡ (S′ × MY(S))

× CY(S) Theorem (Message passing algorithm)

Let S =

i∈I Si be a parallel composition of systems, whose factor

graph is a tree. Select a root node of this tree. Applying the message passing step inward, starting from the leaves toward the root, yields a Bayesian network with incremental sampling. See [Loeliger2004] (An introduction to Factor Graphs) for more info

  • n Factor Graphs, Bayesian Networks, and Message Passing in

statistics and signal processing.

slide-37
SLIDE 37

Bayesian networks from Mixed Systems

Example: factor graph of ❙✶ ❛♥❞ ❙✷ ❛♥❞ ❙✸ ❛♥❞ ❙✹

② ❙✷ ❢❛✐❧ ❙✶ ♥♦✐s❡ ❙✸ ❙✹ The message passing algorithm yields the Bayesian network C②(❙✹)

↑ ② ↑

C❢❛✐❧(❙✸) ← ❢❛✐❧ ← ❙✶ → ♥♦✐s❡ → C♥♦✐s❡(❙✷) where

❙✶ = ❙✶ × M♥♦✐s❡(❙✷) × M❢❛✐❧(❙✸) × M②(❙✹)

slide-38
SLIDE 38

Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion

slide-39
SLIDE 39

Preliminaries to Mixed Automata

The idea:

Upgrade: automata ր probabilistic automata ր mixed automata:

  • 1. automata: q

α

− → q′

  • 2. simple probabilistic automata (Segala-Lynch) q

α

− → π′ q′

  • 3. mixed automata:

(to be defined / inherited from Mixed Systems) ◮ q

α

− → S′ q′ ◮ parallel composition, equivalence (via compression) ◮ simulation relation

slide-40
SLIDE 40

Preliminaries to Mixed Automata

To support q

α

− → S′ q′, extend Mixed System to

Mixed System with previous state S = (Ω, π, X, p, C) ◮ (Ω, π): private probability space (we address only denum Ω) ◮ X set of public variables with domain Q =

x∈X Qx of states

◮ •Q a copy of Q, p ∈ •Q is a parameter; write S(p) ◮ C ⊆ •Q×Ω×Q is a relation; write ωCq iff (p, ω, q) ∈ C ◮ Allows chaining transitions to form runs:

q0

1

− → S1 q1

2

− → S2 q2 . . . qk−1

k

− → Sk qk

slide-41
SLIDE 41

Preliminaries to Mixed Automata

To support simulation relations:

Define a simulation relation ≤ on pairs of states

  • 1. Automata:

q1

α

− → q′

1

q1 ≤ q2

  • =

⇒ ∃q′

2 :

  • q2

α

− → q′

2

q′

1 ≤ q′ 2

  • 2. Probabilistic Automata:

q1

α

− → π′

1

q1 ≤ q2

  • =

⇒ ∃π′

2 :

  • q2

α

− → π′

2

π′

1 ≤P π′ 2 ( q′ 1 ≤ q′ 2)

≤P: lifting of ≤ to pairs of probabilistic states

  • 3. Mixed Automata:

q1

α

− → S′

1

q1 ≤ q2

  • =

⇒ ∃S′

2 :

  • q2

α

− → S′

2

S′

1 ≤S S′ 2 ( q′ 1 ≤ q′ 2)

slide-42
SLIDE 42

Preliminaries to Mixed Automata

Lifting relations, from pairs of states to pairs of systems

Given a relation ρ ⊆ Q1×Q2, ρS ⊆S(X1)×S(X2) is the lifting of ρ if there exists a weighting function w : Ω1×Ω2→[0, 1] such that:

  • 1. ∀(ω1, ω2; q1) :

w(ω1, ω2) > 0

ω1 C1 q1

  • =

⇒ ∃q2 :

  • q1 ρ q2

ω2 C2 q2

  • 2. Weighting function w projects to π1 and π2:
  • ω2 w(ω1, ω2)

= π1(ω1)

  • ω1 w(ω1, ω2)

= π2(ω2) Lemma (equivalence preserves lifting)

S1 ρS S2 S′

1 ≡ S1

  • =

⇒ S′

1 ρS S2

slide-43
SLIDE 43

Mixed Automata

Mixed automaton: M = (Σ, X, q0, →) and q

α

− → S′ q′ ◮ q

α

− → S′ is the transition relation, α ∈ Σ; deterministic ◮ S′ has previous state q and X as set of variables ◮ S′ q′ is Mixed Systems sampling

Simulation: q1

α

− → S′

1

q1 ≤ q2

  • =

⇒ ∃S′

2 :

  • q2

α

− → S′

2

S′

1 ≤S S′ 2 ( q′ 1 ≤ q′ 2)

Parallel composition: qi

αi

− → S′

i

α1 ⊓ α2 α = α1 ⊔ α2    = ⇒ M1 × M2 : (q1, q2)

α

− → S′

1 × S′ 2 (q′ 1, q′ 2)

Thm: N1 ≤ M1 =

⇒ N1 × M2 ≤ M1 × M2

slide-44
SLIDE 44

Mixed Automata

Inheriting Bayes Calculus from Mixed Systems:

◮ Mixed Automata inherit, from Mixed Systems, Bayes Calculus

in space; transition relations capture factor graphs and Bayesian networks.

◮ Mixed Automata, however, remain a causal model:

the current transition depends on the past, not on the future.

◮ Consequently, Mixed Automata cannot be used to specify

smoothing problems in time ◮ e.g., estimating zk based on X0, . . . , Xk, . . . , XN To overcome this, we must unfold time as space.

slide-45
SLIDE 45

Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion

slide-46
SLIDE 46

Stateless fragment of ReactiveBayes

Syntax: e

::=

c | x | ω | (e, e) | op(e) | f(e) ev

::=

c | x | (ev, ev) | op(ev) | f(ev) S

::= ω ∼ P(ev) | e = e | ♦❜s❡r✈❡ x | S ❛♥❞ S

P(ev): proba distributions parameterized with visible expressions Semantics:

[ [♦❜s❡r✈❡ x] ] = (·, ·, {x}, x = c) [ [ω ∼ P] ] = (Ω, P, {xω}, xω = ω) [ [ω ∼ P(ev(x1, . . . , xp))] ] =

  • ev(x1,...,xp)←c [

[ω ∼ P(c)] ] [ [e(x1, . . . , xp) = e′(xp+1, . . . , xp+m)] ] = (·, ·, {x1, . . . , xp+m}, e = e′) [ [S1 ❛♥❞ S2] ] = [ [S1] ] × [ [S2] ]

Factor graph of S follows.

slide-47
SLIDE 47

Stateless fragment of ReactiveBayes

Compilation to extended Bayesian Networks

◮ Transition relation compiled to extended Bayesian Network G:

S is ♦❜s❡r✈❡ x

G [ [S] ] = {is_source(x)}

S is ω ∼ P

G [ [S] ] = {xω}

S is ω ∼ P(ev(x1, . . . , xp))

G [ [S] ] = {x1, . . . , xp} → S → xω

S is x = e(x1, . . . , xp)

G [ [S] ] = {x1, . . . , xp} → S → x

S is S1 ❛♥❞ S2

G [ [S] ] =

G [ [S1] ] ∪

G [ [S2] ] ◮ The compilation succeeds if G is circuitfree. ◮ Resulting scheduling encodes incremental sampling.

slide-48
SLIDE 48

Semantics of ReactiveBayes

Syntax (in red: additional items) e

::=

c | x | ω | (e, e) | op(e) | f(e) | ♣r❡ x ev

::=

c | x | (ev, ev) | op(ev) | f(ev) | ♣r❡ x S

::= ω ∼ P(ev) | e = e | ♦❜s❡r✈❡ x | S ❛♥❞ S | ✐♥✐t x = c α ::=

“true” occurrences of ev of type Bool A

::= ♦♥ α t❤❡♥ S ❡❧s❡ S | ✐♥✐t x = c | A ❛♥❞ A

Semantics (for additional items)

[ [♣r❡ x] ] =

  • ♣r❡ x←c (·, ·, {x}, c, ·)

[ [✐♥✐t x = c] ] = (∅, {x}, c, ∅) [ [♦♥ t❤❡♥ S1 ❡❧s❡ S2 ] ] =

S1 and S2 have previous state p

  {α, ¬α} , X1 ∪ X2 , ·

  • p

α

− → S1, p

¬α

− → S2

[ [A1 ❛♥❞ A2] ] = [ [A1] ] × [ [A2] ]

slide-49
SLIDE 49

Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion

slide-50
SLIDE 50

Comparison with Probabilistic Automata

Probabilistic Automata:

[Segala94],[Segala & Lynch03], tutorial [Sokolova & de Vink04] P = (Σ, Q, q0, →) Simple PA

: → ⊆ Q×Σ×P(Q)

PA

: → ⊆ Q×P(Σ×Q) Theorem (SPA, PA, and Mixed Automata (MA))

  • 1. ∃ mapping: SPA → MA, preserving simulation and product.

∃ reverse mapping: MA → SPA, preserving simulation. ∃ reverse mapping preserving parallel composition.

  • 2. ∃ mapping PA → MA, preserving simulation.

Parallel composition, however, is not preserved.

slide-51
SLIDE 51

Comparison with Probabilistic Automata

What is the problem with parallel composition? ◮ SPA → ⊆ Q×Σ×P(Q):

◮ Synchronization and parallel composition interleave = ⇒ no difficulty for defining parallel composition; ◮ however, impossible to describe any interaction among probability spaces = ⇒ low expressive power.

◮ SPA → ⊆ Q×P(Σ×Q):

◮ more expressive; supports nondeterminism; action α is probabilistically chosen; ◮ conflict between (1) independent probabilistic choice of actions in each component, and (2) the need for synchronizing actions; ◮ various ways of solving this conflict; most authors add a scheduler giving hand to one component that is itself randomly

  • selected. Not appropriate for probabilistic programming.
slide-52
SLIDE 52

(Non reactive) Probabilistic Programming

[J-W. van de Meent, B. Paige, H. Yang, F. Wood: An introduction to probabilistic programming, 2018] ◮ Mapping probabilistic programs to graphical models

◮ Bayesian Networks (≈ functions) ◮ or Factor Graphs (≈ relations), depending on source language

◮ Long discussions about program equivalence (Sect. 3.1) ◮ Covers many extensions including recursion in language

(mainly targeting causal stochastic processes)

Our positioning with Mixed Systems: ++ Our semantics blends Bayesian Networks and Factor Graphs ++ Equivalence through compression − No recursion; reactive covered (Mixed Automata) − Supporting continuous proba distributions is technical

slide-53
SLIDE 53

Reactive Probabilistic Programming

ProbZelus [Baudart & al. 2020] ◮ Reactive Proba Programs ≈ Bayesian Network + functions ◮ No direct modeling of Factor Graphs; indirect support through

the ♦❜s❡r✈❡ primitive, specifying inference problems

Our positioning with Mixed Automata: + We support both Bayesian Networks and Factor Graphs + Equivalence through compression + Mixed Automata ⊂ ProbZelus if we use only ×

slide-54
SLIDE 54

Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion

slide-55
SLIDE 55

Limitations of Mixed Automata and Fixes

Continuous Probability distributions ◮ Doable but technical ◮ Difficulty: interaction between (Ω, π) and C via conditioning:

quite often π(Ωc) = 0 for seemingly consistent models, which forbids the naive block πc(A) = π(A∩Ωc)

π(Ωc)

used in sampling ◮ Example: (X, Y) ∼ π with C stating that Y is observed = ⇒

fixes the value of Y, which has a zero probability

Addressing this difficulty (in progress) by using ◮ Conditional expectations E(f | G) where f : Ω → R+ and G a

sub-σ-algebra of F, the σ-algebra on Ω (always exist)

◮ Regular versions of conditional expectations, which exist in

restricted cases (covering usual needs) ◮ Example: transition probability P(y, X)

slide-56
SLIDE 56

Limitations of Mixed Automata and Fixes

Handling constraints ◮ Sampling: computing {q | ωCq} ◮ Compressions: computing ω ∼ ω′ : ∀q . ωCq iff ω′Cq ◮ Projections: computing proj

Y(C)

Addressing this difficulty (TBD) by using abstractions: ◮ Sampling: restrict computing {q | ωCq} to easy cases

(Boolean); otherwise make sure that ω → q is a function and use causality graphs

◮ Compressions: develop sufficient conditions showing S′ ≡ S ◮ Projections: use graph based algorithms as abstractions

slide-57
SLIDE 57

Basic issues in probabilistic paradigms Approaches Statisticians and AI people Reactive Programming: ProbZelus ReactiveBayes minilanguage Factor Graphs + constraints = Mixed Systems Putting dynamics: Mixed Automata Preliminaries to Mixed Automata Mixed Automata ReactiveBayes and its semantics Discussion and Comparisons Probabilistic Automata (Non reactive) Probabilistic Programming Reactive Probabilistic Programming Limitations of Mixed Automata and Fixes Conclusion

slide-58
SLIDE 58

Conclusion

Mixed Systems ◮ They subsume factor graphs, Bayesian networks, and

constraints

◮ They come with equivalence, parallel composition, and the

lifting of relations from states to systems

Mixed Automata ◮ On top of Mixed Systems ◮ They subsume Probabilistic Automata regarding simulation

relations and Simple PA regarding parallel composition

◮ The parallel composition of PA differs from ours and does not

allow for describing interactions between probabilistic parts

slide-59
SLIDE 59

Conclusion

Support for Probabilistic Programming? ◮ Mixed Systems provide the needed concepts for basic

probabilistic programming (without recursion)

◮ Mixed Automata provide the needed concepts for reactive

probabilistic programming

Further work ◮ Define abstractions to make effective MS and MA basic

properties and operations, thus ensuring that they can become compilation steps