A Denotational Semantics for Low-Level Probabilistic Programs with - - PowerPoint PPT Presentation

a denotational semantics for low level probabilistic
SMART_READER_LITE
LIVE PREVIEW

A Denotational Semantics for Low-Level Probabilistic Programs with - - PowerPoint PPT Presentation

A Denotational Semantics for Low-Level Probabilistic Programs with Nondeterminism Di Wang 1 Jan Hoffmann 1 Thomas Reps 2,3 1 Carnegie Mellon University 2 University of Wisconsin 3 GrammaTech, Inc. Probabilistic Programs Draw random data from


slide-1
SLIDE 1

A Denotational Semantics for Low-Level Probabilistic Programs with Nondeterminism

Di Wang 1 Jan Hoffmann 1 Thomas Reps 2,3

1Carnegie Mellon University 2University of Wisconsin 3GrammaTech, Inc.

slide-2
SLIDE 2

Probabilistic Programs

Draw random data from distributions Condition control-flow at random

slide-3
SLIDE 3

Low-Level Probabilistic Programs

High-Level Features:

  • Functional (Borgström

et al. 2016)

  • Higher-order (Ehrhard,

Pagani, and Tasson 2018)

  • Recursive types (Vákár,

Kammar, and Staton 2019) Formal semantics has been well studied. Compiler

= = = = = = ⇒

Low-Level Features:

  • Imperative
  • Unstructured

control-flow Operational semantics: (Ferrer Fioriti and Hermanns 2015) Denotational semantics: This work

Benefits of A Denotational Semantics

  • Abstraction from details about program executions
  • Compositionality
slide-4
SLIDE 4

Low-Level Probabilistic Programs

High-Level Features:

  • Functional (Borgström

et al. 2016)

  • Higher-order (Ehrhard,

Pagani, and Tasson 2018)

  • Recursive types (Vákár,

Kammar, and Staton 2019) Formal semantics has been well studied. Compiler

= = = = = = ⇒

Low-Level Features:

  • Imperative
  • Unstructured

control-flow Operational semantics: (Ferrer Fioriti and Hermanns 2015) Denotational semantics: This work

Benefits of A Denotational Semantics

  • Abstraction from details about program executions
  • Compositionality
slide-5
SLIDE 5

Low-Level Probabilistic Programs

Example

The following code implements a variant of geometric distributions. n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d

There are multiple possible executions of the program, e.g., n could end up with 0, 3, or 10.

Principle

Probabilistic programs establish input/output-distribution relations. A probabilistic program can be modeled as a function in X → D(X), where X is a program state space and D(X) consists of probability distributions over X.

slide-6
SLIDE 6

Low-Level Probabilistic Programs

Example

The following code implements a variant of geometric distributions. n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d

There are multiple possible executions of the program, e.g., n could end up with 0, 3, or 10.

Principle

Probabilistic programs establish input/output-distribution relations. A probabilistic program can be modeled as a function in X → D(X), where X is a program state space and D(X) consists of probability distributions over X.

slide-7
SLIDE 7

Low-Level Probabilistic Programs

Example

The following code implements a variant of geometric distributions. n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d

There are multiple possible executions of the program, e.g., n could end up with 0, 3, or 10.

Principle

Probabilistic programs establish input/output-distribution relations. A probabilistic program can be modeled as a function in X → D(X), where X is a program state space and D(X) consists of probability distributions over X.

slide-8
SLIDE 8

Nondeterminism

Sources

  • Agents for Markov decisions processes (MDPs)
  • Abstraction and refinement on programs

A Common Resolution

A nondeterministic function f from X to Y is a set-valued function that maps an input to a collection of outputs, i.e., f ∈ X → ℘(Y).

Nondeterminism in Probabilistic Programming

A nondeterministic function f from X to D(X) should have the signature f ∈ X → ℘(D(X)), where D(X) consists of probability distributions over X.

slide-9
SLIDE 9

Nondeterminism

Sources

  • Agents for Markov decisions processes (MDPs)
  • Abstraction and refinement on programs

A Common Resolution

A nondeterministic function f from X to Y is a set-valued function that maps an input to a collection of outputs, i.e., f ∈ X → ℘(Y).

Nondeterminism in Probabilistic Programming

A nondeterministic function f from X to D(X) should have the signature f ∈ X → ℘(D(X)), where D(X) consists of probability distributions over X.

slide-10
SLIDE 10

Nondeterminism

Sources

  • Agents for Markov decisions processes (MDPs)
  • Abstraction and refinement on programs

A Common Resolution

A nondeterministic function f from X to Y is a set-valued function that maps an input to a collection of outputs, i.e., f ∈ X → ℘(Y).

Nondeterminism in Probabilistic Programming

A nondeterministic function f from X to D(X) should have the signature f ∈ X → ℘(D(X)), where D(X) consists of probability distributions over X.

slide-11
SLIDE 11

When to Resolve Nondeterminism?

X is a program state space. D(X) consists of probability distributions over X.

The Common Resolution: Input Prior to Nondeterminism

f ∈ X → ℘(D(X))

What about: Nondeterminism Prior to Input?

f ∈ ℘(X → D(X)) Intuition: A nondeterministic program is a specification that models a collection of deterministic refinements.

slide-12
SLIDE 12

When to Resolve Nondeterminism?

X is a program state space. D(X) consists of probability distributions over X.

The Common Resolution: Input Prior to Nondeterminism

f ∈ X → ℘(D(X))

What about: Nondeterminism Prior to Input?

f ∈ ℘(X → D(X)) Intuition: A nondeterministic program is a specification that models a collection of deterministic refinements.

slide-13
SLIDE 13

When to Resolve Nondeterminism?

X is a program state space. D(X) consists of probability distributions over X.

The Common Resolution: Input Prior to Nondeterminism

f ∈ X → ℘(D(X))

What about: Nondeterminism Prior to Input?

f ∈ ℘(X → D(X)) Intuition: A nondeterministic program is a specification that models a collection of deterministic refinements.

slide-14
SLIDE 14

Nondeterminism-First: Nondeterminism Prior to Input

Example

Consider the following program P where ⋆ represents nondeterminism. if prob(⋆) then t ≔ t + 1 else t ≔ t − 1 fi

The Common Resolution

t = 1 t′ = 2 w.p. 0.5 t′ = 0 w.p. 0.5 t′ = 2 w.p. 0.8 t′ = 0 w.p. 0.2

⋆ resolved afer t is given

Nondeterminism-First

t = 1 t′ = 2 w.p. 0.5 t′ = 0 w.p. 0.5

⋆ resolved as 0.5

t = 1 t′ = 2 w.p. 0.8 t′ = 0 w.p. 0.2

⋆ resolved as 0.8

⋆ resolved before t is given

slide-15
SLIDE 15

Nondeterminism-First: Nondeterminism Prior to Input

Example

Consider the following program P where ⋆ represents nondeterminism. if prob(⋆) then t ≔ t + 1 else t ≔ t − 1 fi

The Common Resolution

t = 1 t′ = 2 w.p. 0.5 t′ = 0 w.p. 0.5 t′ = 2 w.p. 0.8 t′ = 0 w.p. 0.2

⋆ resolved afer t is given

Nondeterminism-First

t = 1 t′ = 2 w.p. 0.5 t′ = 0 w.p. 0.5

⋆ resolved as 0.5

t = 1 t′ = 2 w.p. 0.8 t′ = 0 w.p. 0.2

⋆ resolved as 0.8

⋆ resolved before t is given

slide-16
SLIDE 16

Nondeterminism-First: Nondeterminism Prior to Input

Example

Consider the following program P where ⋆ represents nondeterminism. if prob(⋆) then t ≔ t + 1 else t ≔ t − 1 fi

The Common Resolution

t = 1 t′ = 2 w.p. 0.5 t′ = 0 w.p. 0.5 t′ = 2 w.p. 0.8 t′ = 0 w.p. 0.2

⋆ resolved afer t is given

Nondeterminism-First

t = 1 t′ = 2 w.p. 0.5 t′ = 0 w.p. 0.5

⋆ resolved as 0.5

t = 1 t′ = 2 w.p. 0.8 t′ = 0 w.p. 0.2

⋆ resolved as 0.8

⋆ resolved before t is given

slide-17
SLIDE 17

Nondeterminism-First: What’s the Benefit?

Example

Consider the following program P where ⋆ represents nondeterminism. if prob(⋆) then t ≔ t + 1 else t ≔ t − 1 fi

Relational Reasoning about Refinements of a Program

  • For all refinements P ′ of P, for all t1, t2, can we prove that

Et′

1∼P′(t1),t′ 2∼P′(t2)[t′

1 − t′ 2] = t1 − t2?

  • For all refinements P ′ of P, for all t1, t2, does P ′ exhibit similar execution

time on t1 and t2?

slide-18
SLIDE 18

Nondeterminism-First: What’s the Benefit?

Example

Consider the following program P where ⋆ represents nondeterminism. if prob(⋆) then t ≔ t + 1 else t ≔ t − 1 fi

Relational Reasoning about Refinements of a Program

  • For all refinements P ′ of P, for all t1, t2, can we prove that

Et′

1∼P′(t1),t′ 2∼P′(t2)[t′

1 − t′ 2] = t1 − t2?

  • For all refinements P ′ of P, for all t1, t2, does P ′ exhibit similar execution

time on t1 and t2?

slide-19
SLIDE 19

Nondeterminism-First: What’s the Benefit?

Example

Consider the following program P where ⋆ represents nondeterminism. if prob(⋆) then t ≔ t + 1 else t ≔ t − 1 fi

Relational Reasoning about Refinements of a Program

  • For all refinements P ′ of P, for all t1, t2, can we prove that

Et′

1∼P′(t1),t′ 2∼P′(t2)[t′

1 − t′ 2] = t1 − t2?

  • For all refinements P ′ of P, for all t1, t2, does P ′ exhibit similar execution

time on t1 and t2?

slide-20
SLIDE 20

Contributions

  • We develop a denotational semantics for low-level probabilistic

programs with unstructured control-flow, general recursion, and nondeterminism.

  • We study different resolutions for nondeterminism and propose a new

model that involves nondeterminacy among state transformers.

  • We devise an algebraic framework for denotational semantics, which

can be instantiated with different resolutions for nondeterminism.

slide-21
SLIDE 21

Outline

Motivation Control-Flow Hyper-Graphs Algebraic Denotational Semantics Nondeterminism-First

slide-22
SLIDE 22

Representation of Low-Level Probabilistic Programs

v0 v5 v0 v1 v4 v1 v2 v2 v3 v3

[n1] [n=1] [n1] [n mod 2=0] [n mod 20] n≔3×n+1 [n mod 2=0] n≔n/2 n≔n/2 i≔i+1

A standard CFG and an execution path

v0 v1 v2 v3 v4 v5 v6 v7

[n1] n≔n+1 prob(0.5) prob(0.5)

A tree-like hyper-path

Principle

For probabilistic programs, execution paths are not independent. A formal semantics should reason about distributions over paths.

slide-23
SLIDE 23

Representation of Low-Level Probabilistic Programs

v0 v5 v0 v1 v4 v1 v2 v2 v3 v3

[n1] [n=1] [n1] [n mod 2=0] [n mod 20] n≔3×n+1 [n mod 2=0] n≔n/2 n≔n/2 i≔i+1

A standard CFG and an execution path

v0 v1 v2 v3 v4 v5 v6 v7

[n1] n≔n+1 prob(0.5) prob(0.5)

A tree-like hyper-path

Principle

For probabilistic programs, execution paths are not independent. A formal semantics should reason about distributions over paths.

slide-24
SLIDE 24

Representation of Low-Level Probabilistic Programs

v0 v5 v0 v1 v4 v1 v2 v2 v3 v3

[n1] [n=1] [n1] [n mod 2=0] [n mod 20] n≔3×n+1 [n mod 2=0] n≔n/2 n≔n/2 i≔i+1

A standard CFG and an execution path

v0 v1 v2 v3 v4 v5 v6 v7

[n1] n≔n+1 prob(0.5) prob(0.5)

A tree-like hyper-path

Principle

For probabilistic programs, execution paths are not independent. A formal semantics should reason about distributions over paths.

slide-25
SLIDE 25

Paths vs. Hyper-Paths

Example

if ⋆ then if prob(0.5) then t ≔ 0 else t ≔ 1 fi else if prob(0.8) then t ≔ 0 else t ≔ 1 fi fi

Paths Annotated with Probabilities

  • t′ = 0

t′ = 1 t′ = 0 t′ = 1

0.5 0.5 0.8 0.2

Hyper-Paths, each of which stands for a distribution

  • t′ = 0

t′ = 1 t′ = 0 t′ = 1

0.5 0.5 0.8 0.2

slide-26
SLIDE 26

Paths vs. Hyper-Paths

Example

if ⋆ then if prob(0.5) then t ≔ 0 else t ≔ 1 fi else if prob(0.8) then t ≔ 0 else t ≔ 1 fi fi

Paths Annotated with Probabilities

  • t′ = 0

t′ = 1 t′ = 0 t′ = 1

0.5 0.5 0.8 0.2

Hyper-Paths, each of which stands for a distribution

  • t′ = 0

t′ = 1 t′ = 0 t′ = 1

0.5 0.5 0.8 0.2

slide-27
SLIDE 27

Paths vs. Hyper-Paths

Example

if ⋆ then if prob(0.5) then t ≔ 0 else t ≔ 1 fi else if prob(0.8) then t ≔ 0 else t ≔ 1 fi fi

Paths Annotated with Probabilities

  • t′ = 0

t′ = 1 t′ = 0 t′ = 1

0.5 0.5 0.8 0.2

Hyper-Paths, each of which stands for a distribution

  • t′ = 0

t′ = 1 t′ = 0 t′ = 1

0.5 0.5 0.8 0.2

slide-28
SLIDE 28

Control-Flow Hyper-Graphs

  • Hyper-graphs are directed graphs with hyper-edges that could have

multiple destinations. Hyper-paths are made up of hyper-egdes.

  • The following hyper-graph

v4 v0 v1 v2 v3

n ≔ n + 1 n ≔ 0 prob(0.9) false false true n ≥ 10 true

represents the control-flow of the example program n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d
slide-29
SLIDE 29

Control-Flow Hyper-Graphs

  • Hyper-graphs are directed graphs with hyper-edges that could have

multiple destinations. Hyper-paths are made up of hyper-egdes.

  • The following hyper-graph

v4 v0 v1 v2 v3

n ≔ n + 1 n ≔ 0 prob(0.9) false false true n ≥ 10 true

represents the control-flow of the example program n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d
slide-30
SLIDE 30

Outline

Motivation Control-Flow Hyper-Graphs Algebraic Denotational Semantics Nondeterminism-First

slide-31
SLIDE 31

An Algebraic Denotational Semantics

Goal

Develop a denotational semantics that can be instantiated with different resolutions of nondeterminism.

An Algebraic Approach

  • Perform reasoning in some abstract space of program states and state

transformers.

  • The state transformers should obey some algebraic laws.
  • For example, the command skip should be interpreted as an identity

element for sequencing in the algebra of transformers.

Outcome

The semantics is a good fit for developing static analyses (Wang, Hoffmann, and Reps 2018).

slide-32
SLIDE 32

An Algebraic Denotational Semantics

Goal

Develop a denotational semantics that can be instantiated with different resolutions of nondeterminism.

An Algebraic Approach

  • Perform reasoning in some abstract space of program states and state

transformers.

  • The state transformers should obey some algebraic laws.
  • For example, the command skip should be interpreted as an identity

element for sequencing in the algebra of transformers.

Outcome

The semantics is a good fit for developing static analyses (Wang, Hoffmann, and Reps 2018).

slide-33
SLIDE 33

An Algebraic Denotational Semantics

Goal

Develop a denotational semantics that can be instantiated with different resolutions of nondeterminism.

An Algebraic Approach

  • Perform reasoning in some abstract space of program states and state

transformers.

  • The state transformers should obey some algebraic laws.
  • For example, the command skip should be interpreted as an identity

element for sequencing in the algebra of transformers.

Outcome

The semantics is a good fit for developing static analyses (Wang, Hoffmann, and Reps 2018).

slide-34
SLIDE 34

The Algebra

Actions skip x ≔ x + 5 k ∼ Binomial(10, 0.5) · · ·

Semantic Function

− − − − − − − − − − − − − − → State Transformers M equipped with sequencing ⊗ conditional-choice φ nondeterministic-choice − ∪

  • M, ⊑, ⊗, φ, −

∪, ⊥, 1

  • M, ⊑ forms a directed complete partial order (dcpo) with ⊥ as its least

element.

  • M, ⊗, 1 forms a monoid.
  • Nondeterministic-choice −

∪ is a semilatice operation.

slide-35
SLIDE 35

The Algebra

Actions skip x ≔ x + 5 k ∼ Binomial(10, 0.5) · · ·

Semantic Function

− − − − − − − − − − − − − − → State Transformers M equipped with sequencing ⊗ conditional-choice φ nondeterministic-choice − ∪

  • M, ⊑, ⊗, φ, −

∪, ⊥, 1

  • M, ⊑ forms a directed complete partial order (dcpo) with ⊥ as its least

element.

  • M, ⊗, 1 forms a monoid.
  • Nondeterministic-choice −

∪ is a semilatice operation.

slide-36
SLIDE 36

The Algebra

Actions skip x ≔ x + 5 k ∼ Binomial(10, 0.5) · · ·

Semantic Function

− − − − − − − − − − − − − − → State Transformers M equipped with sequencing ⊗ conditional-choice φ nondeterministic-choice − ∪

  • M, ⊑, ⊗, φ, −

∪, ⊥, 1

  • M, ⊑ forms a directed complete partial order (dcpo) with ⊥ as its least

element.

  • M, ⊗, 1 forms a monoid.
  • Nondeterministic-choice −

∪ is a semilatice operation.

slide-37
SLIDE 37

The Algebra

Actions skip x ≔ x + 5 k ∼ Binomial(10, 0.5) · · ·

Semantic Function

− − − − − − − − − − − − − − → State Transformers M equipped with sequencing ⊗ conditional-choice φ nondeterministic-choice − ∪

  • M, ⊑, ⊗, φ, −

∪, ⊥, 1

  • M, ⊑ forms a directed complete partial order (dcpo) with ⊥ as its least

element.

  • M, ⊗, 1 forms a monoid.
  • Nondeterministic-choice −

∪ is a semilatice operation.

slide-38
SLIDE 38

The Algebra

Actions skip x ≔ x + 5 k ∼ Binomial(10, 0.5) · · ·

Semantic Function

− − − − − − − − − − − − − − → State Transformers M equipped with sequencing ⊗ conditional-choice φ nondeterministic-choice − ∪

  • M, ⊑, ⊗, φ, −

∪, ⊥, 1

  • M, ⊑ forms a directed complete partial order (dcpo) with ⊥ as its least

element.

  • M, ⊗, 1 forms a monoid.
  • Nondeterministic-choice −

∪ is a semilatice operation.

slide-39
SLIDE 39

Fixpoint Semantics for Hyper-Graphs

Principle

The semantics of a node in the control-flow hyper-graph is a summary of computation that continues from that node. Recall the control-flow hyper-graph below.

n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d

v4 v0 v1 v2 v3

n ≔ n + 1 n ≔ 0 prob(0.9) false false true n ≥ 10 true

Semantics is defined as the least solution to the following equation system

S(v0) = seq[n ≔ 0](S(v1)) S(v2) = seq[n ≔ n + 1](S(v3)) S(v4) = 1 S(v1) = prob[0.9](S(v2), S(v4)) S(v3) = cond[n ≥ 10](S(v4), S(v1))

slide-40
SLIDE 40

Fixpoint Semantics for Hyper-Graphs

Principle

The semantics of a node in the control-flow hyper-graph is a summary of computation that continues from that node. Recall the control-flow hyper-graph below.

n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d

v4 v0 v1 v2 v3

n ≔ n + 1 n ≔ 0 prob(0.9) false false true n ≥ 10 true

Semantics is defined as the least solution to the following equation system

S(v0) = seq[n ≔ 0](S(v1)) S(v2) = seq[n ≔ n + 1](S(v3)) S(v4) = 1 S(v1) = prob[0.9](S(v2), S(v4)) S(v3) = cond[n ≥ 10](S(v4), S(v1))

slide-41
SLIDE 41

Fixpoint Semantics for Hyper-Graphs

Principle

The semantics of a node in the control-flow hyper-graph is a summary of computation that continues from that node. Recall the control-flow hyper-graph below.

n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d

v4 v0 v1 v2 v3

n ≔ n + 1 n ≔ 0 prob(0.9) false false true n ≥ 10 true

Semantics is defined as the least solution to the following equation system

S(v0) = seq[n ≔ 0](S(v1)) S(v2) = seq[n ≔ n + 1](S(v3)) S(v4) = 1 S(v1) = prob[0.9](S(v2), S(v4)) S(v3) = cond[n ≥ 10](S(v4), S(v1))

slide-42
SLIDE 42

Fixpoint Semantics for Hyper-Graphs

Semantics is defined as the least solution to the following equation system

S(v0) = seq[n ≔ 0](S(v1)) S(v2) = seq[n ≔ n + 1](S(v3)) S(v4) = 1 S(v1) = prob[0.9](S(v2), S(v4)) S(v3) = cond[n ≥ 10](S(v4), S(v1))

Use the algebra to reinterpret the equation system

S(v0) = n ≔ 0 ⊗ S(v1) S(v2) = n ≔ n + 1 ⊗ S(v3) S(v4) = 1 S(v1) = S(v2) prob(0.9) S(v4) S(v3) = S(v4) n≥10 S(v1)

where · maps actions into state transformers in M.

slide-43
SLIDE 43

Fixpoint Semantics for Hyper-Graphs

Semantics is defined as the least solution to the following equation system

S(v0) = seq[n ≔ 0](S(v1)) S(v2) = seq[n ≔ n + 1](S(v3)) S(v4) = 1 S(v1) = prob[0.9](S(v2), S(v4)) S(v3) = cond[n ≥ 10](S(v4), S(v1))

Use the algebra to reinterpret the equation system

S(v0) = n ≔ 0 ⊗ S(v1) S(v2) = n ≔ n + 1 ⊗ S(v3) S(v4) = 1 S(v1) = S(v2) prob(0.9) S(v4) S(v3) = S(v4) n≥10 S(v1)

where · maps actions into state transformers in M.

slide-44
SLIDE 44

A Denotational Semantics without Nondeterminism

  • X

def

= Var ⇀fin Q and M

def

= X → D(X).

  • D(X) stands for sub-probability distributions on X, i.e., ∆ ∈ D(X) iff

∆ : X → [0, 1] and

x ∈X ∆(x) ≤ 1.

  • For actions act, we have act ∈ M.
  • For conditions φ, we have φ : X → [0, 1], e.g., prob(p)

def

= λ_.p.

  • f ⊑ g

def

= ∀x ∈ X : ∀x′ ∈ X : f (x)(x′) ≤ g(x)(x′).

  • f ⊗ g

def

= λx.λx′′.

x′∈X f (x, x′) · g(x′, x′′).

  • f φ g

def

= λx.λx′. φ (x) · f (x)(x′) + (1 − φ (x)) · g(x)(x′).

def

= λ_.λ_.0.

  • 1

def

= λx.δ(x) where the point distribution δ(x)

def

= λx′.[x = x′].

slide-45
SLIDE 45

A Denotational Semantics without Nondeterminism

  • X

def

= Var ⇀fin Q and M

def

= X → D(X).

  • D(X) stands for sub-probability distributions on X, i.e., ∆ ∈ D(X) iff

∆ : X → [0, 1] and

x ∈X ∆(x) ≤ 1.

  • For actions act, we have act ∈ M.
  • For conditions φ, we have φ : X → [0, 1], e.g., prob(p)

def

= λ_.p.

  • f ⊑ g

def

= ∀x ∈ X : ∀x′ ∈ X : f (x)(x′) ≤ g(x)(x′).

  • f ⊗ g

def

= λx.λx′′.

x′∈X f (x, x′) · g(x′, x′′).

  • f φ g

def

= λx.λx′. φ (x) · f (x)(x′) + (1 − φ (x)) · g(x)(x′).

def

= λ_.λ_.0.

  • 1

def

= λx.δ(x) where the point distribution δ(x)

def

= λx′.[x = x′].

slide-46
SLIDE 46

A Denotational Semantics without Nondeterminism

  • X

def

= Var ⇀fin Q and M

def

= X → D(X).

  • D(X) stands for sub-probability distributions on X, i.e., ∆ ∈ D(X) iff

∆ : X → [0, 1] and

x ∈X ∆(x) ≤ 1.

  • For actions act, we have act ∈ M.
  • For conditions φ, we have φ : X → [0, 1], e.g., prob(p)

def

= λ_.p.

  • f ⊑ g

def

= ∀x ∈ X : ∀x′ ∈ X : f (x)(x′) ≤ g(x)(x′).

  • f ⊗ g

def

= λx.λx′′.

x′∈X f (x, x′) · g(x′, x′′).

  • f φ g

def

= λx.λx′. φ (x) · f (x)(x′) + (1 − φ (x)) · g(x)(x′).

def

= λ_.λ_.0.

  • 1

def

= λx.δ(x) where the point distribution δ(x)

def

= λx′.[x = x′].

slide-47
SLIDE 47

A Denotational Semantics without Nondeterminism

  • X

def

= Var ⇀fin Q and M

def

= X → D(X).

  • D(X) stands for sub-probability distributions on X, i.e., ∆ ∈ D(X) iff

∆ : X → [0, 1] and

x ∈X ∆(x) ≤ 1.

  • For actions act, we have act ∈ M.
  • For conditions φ, we have φ : X → [0, 1], e.g., prob(p)

def

= λ_.p.

  • f ⊑ g

def

= ∀x ∈ X : ∀x′ ∈ X : f (x)(x′) ≤ g(x)(x′).

  • f ⊗ g

def

= λx.λx′′.

x′∈X f (x, x′) · g(x′, x′′).

  • f φ g

def

= λx.λx′. φ (x) · f (x)(x′) + (1 − φ (x)) · g(x)(x′).

def

= λ_.λ_.0.

  • 1

def

= λx.δ(x) where the point distribution δ(x)

def

= λx′.[x = x′].

slide-48
SLIDE 48

A Denotational Semantics without Nondeterminism

  • X

def

= Var ⇀fin Q and M

def

= X → D(X).

  • D(X) stands for sub-probability distributions on X, i.e., ∆ ∈ D(X) iff

∆ : X → [0, 1] and

x ∈X ∆(x) ≤ 1.

  • For actions act, we have act ∈ M.
  • For conditions φ, we have φ : X → [0, 1], e.g., prob(p)

def

= λ_.p.

  • f ⊑ g

def

= ∀x ∈ X : ∀x′ ∈ X : f (x)(x′) ≤ g(x)(x′).

  • f ⊗ g

def

= λx.λx′′.

x′∈X f (x, x′) · g(x′, x′′).

  • f φ g

def

= λx.λx′. φ (x) · f (x)(x′) + (1 − φ (x)) · g(x)(x′).

def

= λ_.λ_.0.

  • 1

def

= λx.δ(x) where the point distribution δ(x)

def

= λx′.[x = x′].

slide-49
SLIDE 49

A Denotational Semantics without Nondeterminism

  • X

def

= Var ⇀fin Q and M

def

= X → D(X).

  • D(X) stands for sub-probability distributions on X, i.e., ∆ ∈ D(X) iff

∆ : X → [0, 1] and

x ∈X ∆(x) ≤ 1.

  • For actions act, we have act ∈ M.
  • For conditions φ, we have φ : X → [0, 1], e.g., prob(p)

def

= λ_.p.

  • f ⊑ g

def

= ∀x ∈ X : ∀x′ ∈ X : f (x)(x′) ≤ g(x)(x′).

  • f ⊗ g

def

= λx.λx′′.

x′∈X f (x, x′) · g(x′, x′′).

  • f φ g

def

= λx.λx′. φ (x) · f (x)(x′) + (1 − φ (x)) · g(x)(x′).

def

= λ_.λ_.0.

  • 1

def

= λx.δ(x) where the point distribution δ(x)

def

= λx′.[x = x′].

slide-50
SLIDE 50

A Denotational Semantics without Nondeterminism

n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d

v4 v0 v1 v2 v3

n ≔ n + 1 n ≔ 0 prob(0.9) false false true n ≥ 10 true

Because Var = {n} is a singleton, we present the semantics as if X

def

= Z. S(v0) = λ_.

9

  • k=0

(0.1 × 0.9k) · δ(k) + 0.3486784401 · δ(10) δ(n0) represents a point distribution at n0.

slide-51
SLIDE 51

A Denotational Semantics without Nondeterminism

n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d

v4 v0 v1 v2 v3

n ≔ n + 1 n ≔ 0 prob(0.9) false false true n ≥ 10 true

Because Var = {n} is a singleton, we present the semantics as if X

def

= Z. S(v0) = λ_.

9

  • k=0

(0.1 × 0.9k) · δ(k) + 0.3486784401 · δ(10) δ(n0) represents a point distribution at n0.

slide-52
SLIDE 52

A Denotational Semantics without Nondeterminism

n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d

v4 v0 v1 v2 v3

n ≔ n + 1 n ≔ 0 prob(0.9) false false true n ≥ 10 true

Because Var = {n} is a singleton, we present the semantics as if X

def

= Z. S(v0) = λ_.

9

  • k=0

(0.1 × 0.9k) · δ(k) + 0.3486784401 · δ(10) δ(n0) represents a point distribution at n0.

slide-53
SLIDE 53

A Denotational Semantics without Nondeterminism

n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d

v4 v0 v1 v2 v3

n ≔ n + 1 n ≔ 0 prob(0.9) false false true n ≥ 10 true

Recall the equation S(v0) = n ≔ 0 ⊗ S(v1) Obtain S(v0) from S(v1)

S(v0) = λ_.

9

  • k=0

(0.1 × 0.9k) · δ(k) + 0.3486784401 · δ(10) n ≔ 0 = λ_.δ(0) S(v1) = λn.[n ≥ 9] · (0.1 · δ(n) + 0.9 · δ(n + 1))+ [n < 9] · ∞

  • k=n

(0.1 × 0.9k−n) · δ(min{k, 10})

slide-54
SLIDE 54

A Denotational Semantics without Nondeterminism

n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d

v4 v0 v1 v2 v3

n ≔ n + 1 n ≔ 0 prob(0.9) false false true n ≥ 10 true

Recall the equation S(v0) = n ≔ 0 ⊗ S(v1) Obtain S(v0) from S(v1)

S(v0) = λ_.

9

  • k=0

(0.1 × 0.9k) · δ(k) + 0.3486784401 · δ(10) n ≔ 0 = λ_.δ(0) S(v1) = λn.[n ≥ 9] · (0.1 · δ(n) + 0.9 · δ(n + 1))+ [n < 9] · ∞

  • k=n

(0.1 × 0.9k−n) · δ(min{k, 10})

slide-55
SLIDE 55

A Denotational Semantics without Nondeterminism

n ≔ 0; while prob(0.9) do n ≔ n + 1; if n ≥ 10 then break else continue

  • d

v4 v0 v1 v2 v3

n ≔ n + 1 n ≔ 0 prob(0.9) false false true n ≥ 10 true

Recall the equation S(v0) = n ≔ 0 ⊗ S(v1) Obtain S(v0) from S(v1)

S(v0) = λ_.

9

  • k=0

(0.1 × 0.9k) · δ(k) + 0.3486784401 · δ(10) n ≔ 0 = λ_.δ(0) S(v1) = λn.[n ≥ 9] · (0.1 · δ(n) + 0.9 · δ(n + 1))+ [n < 9] · ∞

  • k=n

(0.1 × 0.9k−n) · δ(min{k, 10})

slide-56
SLIDE 56

Outline

Motivation Control-Flow Hyper-Graphs Algebraic Denotational Semantics Nondeterminism-First

slide-57
SLIDE 57

Sub-Probability Kernels

Definition

A function κ : X → D(X) is called a sub-probability kernel. The set of kernels is denoted by K(X).

Goal

The common resolution for nondeterminism admits the following signature X → ℘(D(X)), while our nondeterminism-first model should have the following signature ℘(X → D(X)) ≡ ℘(K(X)).

slide-58
SLIDE 58

Sub-Probability Kernels

Definition

A function κ : X → D(X) is called a sub-probability kernel. The set of kernels is denoted by K(X).

Goal

The common resolution for nondeterminism admits the following signature X → ℘(D(X)), while our nondeterminism-first model should have the following signature ℘(X → D(X)) ≡ ℘(K(X)).

slide-59
SLIDE 59

Reasoning with Nondeterminism-First

Example

Recall the following nondeterministic program P if prob(⋆) then t ≔ t + 1 else t ≔ t − 1 fi Then the common resolution for nondeterminism derives λt.{r · δ(t + 1) + (1 − r) · δ(t − 1) | r ∈ [0, 1]}, but the nondeterminism-first model leads to {λt.r · δ(t + 1) + (1 − r) · δ(t − 1) | r ∈ [0, 1]}. With the new model, we can prove that for every refinement P ′ with ⋆ resolved as r ∈ [0, 1], for all t1, t2, we have

Et′

1∼P′(t1),t′ 2∼P′(t2)[t′

1 − t′ 2] = Et′

1∼P′(t1)[t′

1] − Et′

2∼P′(t2)[t′

2]

= (r(t1 + 1) + (1 − r)(t1 − 1)) − (r(t2 + 1) + (1 − r)(t2 − 1)) = t1 − t2

slide-60
SLIDE 60

Reasoning with Nondeterminism-First

Example

Recall the following nondeterministic program P if prob(⋆) then t ≔ t + 1 else t ≔ t − 1 fi Then the common resolution for nondeterminism derives λt.{r · δ(t + 1) + (1 − r) · δ(t − 1) | r ∈ [0, 1]}, but the nondeterminism-first model leads to {λt.r · δ(t + 1) + (1 − r) · δ(t − 1) | r ∈ [0, 1]}. With the new model, we can prove that for every refinement P ′ with ⋆ resolved as r ∈ [0, 1], for all t1, t2, we have

Et′

1∼P′(t1),t′ 2∼P′(t2)[t′

1 − t′ 2] = Et′

1∼P′(t1)[t′

1] − Et′

2∼P′(t2)[t′

2]

= (r(t1 + 1) + (1 − r)(t1 − 1)) − (r(t2 + 1) + (1 − r)(t2 − 1)) = t1 − t2

slide-61
SLIDE 61

A Powerdomain for Nondeterminism-First

Necessary Conditions

We need to identify a subset A of ℘(K(X)) as the collection of admissible semantic objects.

  • A admits a semilatice operation −

∪ (used as nondeterministic-choice), s.t. for all A ∈ A, A − ∪ A = A.

  • A is equipped with a conditional-choice operation ϕ where

ϕ : X → [0, 1] represents a Boolean-valued random variable.

  • For all A1, A2 ∈ A and ϕ : X → [0, 1], if κ1 ∈ A1 and κ2 ∈ A2, then

κ1 ϕ κ2 should be in A1 − ∪ A2.

A Convexity-Like Condition

For all A ∈ A, we have A − ∪ A = A, therefore we should also have ∀ϕ ∈ X → [0, 1]: ∀κ1,κ2 ∈ A: κ1 ϕ κ2 ∈ A.

slide-62
SLIDE 62

A Powerdomain for Nondeterminism-First

Necessary Conditions

We need to identify a subset A of ℘(K(X)) as the collection of admissible semantic objects.

  • A admits a semilatice operation −

∪ (used as nondeterministic-choice), s.t. for all A ∈ A, A − ∪ A = A.

  • A is equipped with a conditional-choice operation ϕ where

ϕ : X → [0, 1] represents a Boolean-valued random variable.

  • For all A1, A2 ∈ A and ϕ : X → [0, 1], if κ1 ∈ A1 and κ2 ∈ A2, then

κ1 ϕ κ2 should be in A1 − ∪ A2.

A Convexity-Like Condition

For all A ∈ A, we have A − ∪ A = A, therefore we should also have ∀ϕ ∈ X → [0, 1]: ∀κ1,κ2 ∈ A: κ1 ϕ κ2 ∈ A.

slide-63
SLIDE 63

A Powerdomain for Nondeterminism-First

Necessary Conditions

We need to identify a subset A of ℘(K(X)) as the collection of admissible semantic objects.

  • A admits a semilatice operation −

∪ (used as nondeterministic-choice), s.t. for all A ∈ A, A − ∪ A = A.

  • A is equipped with a conditional-choice operation ϕ where

ϕ : X → [0, 1] represents a Boolean-valued random variable.

  • For all A1, A2 ∈ A and ϕ : X → [0, 1], if κ1 ∈ A1 and κ2 ∈ A2, then

κ1 ϕ κ2 should be in A1 − ∪ A2.

A Convexity-Like Condition

For all A ∈ A, we have A − ∪ A = A, therefore we should also have ∀ϕ ∈ X → [0, 1]: ∀κ1,κ2 ∈ A: κ1 ϕ κ2 ∈ A.

slide-64
SLIDE 64

A Powerdomain for Nondeterminism-First

Necessary Conditions

We need to identify a subset A of ℘(K(X)) as the collection of admissible semantic objects.

  • A admits a semilatice operation −

∪ (used as nondeterministic-choice), s.t. for all A ∈ A, A − ∪ A = A.

  • A is equipped with a conditional-choice operation ϕ where

ϕ : X → [0, 1] represents a Boolean-valued random variable.

  • For all A1, A2 ∈ A and ϕ : X → [0, 1], if κ1 ∈ A1 and κ2 ∈ A2, then

κ1 ϕ κ2 should be in A1 − ∪ A2.

A Convexity-Like Condition

For all A ∈ A, we have A − ∪ A = A, therefore we should also have ∀ϕ ∈ X → [0, 1]: ∀κ1,κ2 ∈ A: κ1 ϕ κ2 ∈ A.

slide-65
SLIDE 65

A Powerdomain for Nondeterminism-First

Necessary Conditions

We need to identify a subset A of ℘(K(X)) as the collection of admissible semantic objects.

  • A admits a semilatice operation −

∪ (used as nondeterministic-choice), s.t. for all A ∈ A, A − ∪ A = A.

  • A is equipped with a conditional-choice operation ϕ where

ϕ : X → [0, 1] represents a Boolean-valued random variable.

  • For all A1, A2 ∈ A and ϕ : X → [0, 1], if κ1 ∈ A1 and κ2 ∈ A2, then

κ1 ϕ κ2 should be in A1 − ∪ A2.

A Convexity-Like Condition

For all A ∈ A, we have A − ∪ A = A, therefore we should also have ∀ϕ ∈ X → [0, 1]: ∀κ1,κ2 ∈ A: κ1 ϕ κ2 ∈ A.

slide-66
SLIDE 66

Generalized Convexity

Let ϕ · κ

def

= λx.λx′.ϕ(x) · κ(x)(x′) and κ1 + κ2

def

= λx.λx.κ1(x)(x′) + κ2(x)(x′). Then κ1 ϕ κ2 can be represented as ϕ · κ1 + ( 1 − ϕ) · κ2.

Definition

A subset A of K(X) is said to be g-convex, if for all sequences {κi}i∈N ⊆ A and {ϕi}i∈N ⊆ X → [0, 1] such that ∞

i=1 ϕi =

1, then ∞

i=1 ϕi · κi ∈ A.

Clearly g-convexity of a set A implies that for all ϕ : X → [0, 1] and κ1,κ2 ∈ A, we have κ1 ϕ κ2 ∈ A.

slide-67
SLIDE 67

Generalized Convexity

Let ϕ · κ

def

= λx.λx′.ϕ(x) · κ(x)(x′) and κ1 + κ2

def

= λx.λx.κ1(x)(x′) + κ2(x)(x′). Then κ1 ϕ κ2 can be represented as ϕ · κ1 + ( 1 − ϕ) · κ2.

Definition

A subset A of K(X) is said to be g-convex, if for all sequences {κi}i∈N ⊆ A and {ϕi}i∈N ⊆ X → [0, 1] such that ∞

i=1 ϕi =

1, then ∞

i=1 ϕi · κi ∈ A.

Clearly g-convexity of a set A implies that for all ϕ : X → [0, 1] and κ1,κ2 ∈ A, we have κ1 ϕ κ2 ∈ A.

slide-68
SLIDE 68

Generalized Convexity

Let ϕ · κ

def

= λx.λx′.ϕ(x) · κ(x)(x′) and κ1 + κ2

def

= λx.λx.κ1(x)(x′) + κ2(x)(x′). Then κ1 ϕ κ2 can be represented as ϕ · κ1 + ( 1 − ϕ) · κ2.

Definition

A subset A of K(X) is said to be g-convex, if for all sequences {κi}i∈N ⊆ A and {ϕi}i∈N ⊆ X → [0, 1] such that ∞

i=1 ϕi =

1, then ∞

i=1 ϕi · κi ∈ A.

Clearly g-convexity of a set A implies that for all ϕ : X → [0, 1] and κ1,κ2 ∈ A, we have κ1 ϕ κ2 ∈ A.

slide-69
SLIDE 69

A G-Convex Powerdomain for Nondeterminism-First

Idea

Construct a Plotkin-style powerdomain on K(X), except that g-convexity replaces standard convexity in the development.

Example

Consider the following nondeterministic program P if ⋆ then t ≔ t + 1 else t ≔ t − 1 fi Let the state space X

def

= Z represent the value of t. The common resolution for nondeterminism gives the following semantics λt.{r · δ(t + 1) + (1 − r) · δ(t − 1) | r ∈ [0, 1]}, while the nondeterminism-first resolution derives {λt.ϕ(t) · δ(t + 1) + (1 − ϕ(t)) · δ(t − 1) | ϕ ∈ Z → [0, 1]}.

slide-70
SLIDE 70

A G-Convex Powerdomain for Nondeterminism-First

Idea

Construct a Plotkin-style powerdomain on K(X), except that g-convexity replaces standard convexity in the development.

Example

Consider the following nondeterministic program P if ⋆ then t ≔ t + 1 else t ≔ t − 1 fi Let the state space X

def

= Z represent the value of t. The common resolution for nondeterminism gives the following semantics λt.{r · δ(t + 1) + (1 − r) · δ(t − 1) | r ∈ [0, 1]}, while the nondeterminism-first resolution derives {λt.ϕ(t) · δ(t + 1) + (1 − ϕ(t)) · δ(t − 1) | ϕ ∈ Z → [0, 1]}.

slide-71
SLIDE 71

Summary

This Work

We have developed an algebraic framework for denotational semantics of low-level probabilistic programs, which can be instantiated with different models of nondeterminism, including the common resolution for nondeterminism and the new nondeterminism-first.

Limitations and Future Work

  • The framework does not support for continuous distributions yet.
  • We are looking for interesting applications of nondeterminism-first,

especially for relational reasoning.

slide-72
SLIDE 72

Summary

This Work

We have developed an algebraic framework for denotational semantics of low-level probabilistic programs, which can be instantiated with different models of nondeterminism, including the common resolution for nondeterminism and the new nondeterminism-first.

Limitations and Future Work

  • The framework does not support for continuous distributions yet.
  • We are looking for interesting applications of nondeterminism-first,

especially for relational reasoning.