Symbolic Bayesian inference by lazy partial evaluation Chung-chieh - - PowerPoint PPT Presentation

▶

Dec 09, 2022 371 likes •727 views

Symbolic Bayesian inference by lazy partial evaluation Chung-chieh Shan (Indiana University) Norman Ramsey (Tufts University) November 2015 This research was supported by DARPA grants FA8750-14-2-0007 and FA8750-14-C-0002, NSF grant

SLIDE 1

1

Symbolic Bayesian inference by lazy partial evaluation

Chung-chieh Shan (Indiana University) Norman Ramsey (Tufts University) November 2015

This research was supported by DARPA grants FA8750-14-2-0007 and FA8750-14-C-0002, NSF grant CNS-0723054, Lilly Endowment, Inc. (through its support for the Indiana University Pervasive Technology Institute), and the Indiana METACyt Initiative. The Indiana METACyt Initiative at IU is also supported in part by Lilly Endowment, Inc.

SLIDE 2

2

Program transformations galore

Expectation Total Computer algebra Normalization Disintegration Simplification Conditioning Density Exact inference Gibbs sampling MH sampling

SLIDE 3

3

Disintegration for medical diagnosis

Diseases A and B are equally prevalent. Disease A causes one of symptoms 1, 2, 3 with equal probability. Disease B causes one of symptoms 1, 2 with equal probability. do {disease A → 1/2, B → 1/2

: M Disease;

SLIDE 4

3

Disintegration for medical diagnosis

Diseases A and B are equally prevalent. Disease A causes one of symptoms 1, 2, 3 with equal probability. Disease B causes one of symptoms 1, 2 with equal probability. do {disease A → 1/2, B → 1/2 ; symptom case disease of A → 1 → 1/3, 2 → 1/3, 3 → 1/3 B → 1 → 1/2, 2 → 1/2; return (symptom, disease)}

: M (Symptom × Disease)

SLIDE 5

3

Disintegration for medical diagnosis

Diseases A and B are equally prevalent. Disease A causes one of symptoms 1, 2, 3 with equal probability. Disease B causes one of symptoms 1, 2 with equal probability. do {disease A → 1/2, B → 1/2 ; symptom case disease of A → 1 → 1/3, 2 → 1/3, 3 → 1/3 B → 1 → 1/2, 2 → 1/2; return (symptom, disease)}

: M (Symptom × Disease) = (A, 1) → 1/6, (A, 2) → 1/6, (A, 3) → 1/6, (B, 1) → 1/4, (B, 2) → 1/4

A

1/6 1/6 1/6

B

1/4 1/4

1 2 3

SLIDE 6

3

Disintegration for medical diagnosis

Diseases A and B are equally prevalent. Disease A causes one of symptoms 1, 2, 3 with equal probability. Disease B causes one of symptoms 1, 2 with equal probability. do {disease A → 1/2, B → 1/2 ; symptom case disease of A → 1 → 1/3, 2 → 1/3, 3 → 1/3 B → 1 → 1/2, 2 → 1/2; return (symptom, disease)}

: M (Symptom × Disease) = (A, 1) → 1/6, (A, 2) → 1/6, (A, 3) → 1/6, (B, 1) → 1/4, (B, 2) → 1/4 ⇒λsymptom. case symptom of

1 → A → 1/6, B → 1/4 2 → A → 1/6, B → 1/4 3 → A → 1/6

: Symptom → M Disease

A

1/6 1/6 1/6

B

1/4 1/4

1 2 3

SLIDE 7

4

Disintegration on a zero-probability observation

Diseases A and B are equally prevalent. Disease A causes a symptom chosen uniformly from [0, 3] ⊂ R. Disease B causes a symptom chosen uniformly from [0, 2] ⊂ R. do {disease A → 1/2, B → 1/2

: M Disease;

SLIDE 8

4

Disintegration on a zero-probability observation

Diseases A and B are equally prevalent. Disease A causes a symptom chosen uniformly from [0, 3] ⊂ R. Disease B causes a symptom chosen uniformly from [0, 2] ⊂ R. do {disease A → 1/2, B → 1/2 ; symptom case disease of A → uniform 0 3 B → uniform 0 2; return (symptom, disease)}

: M (Symptom × Disease)

SLIDE 9

4

Disintegration on a zero-probability observation

Diseases A and B are equally prevalent. Disease A causes a symptom chosen uniformly from [0, 3] ⊂ R. Disease B causes a symptom chosen uniformly from [0, 2] ⊂ R. do {disease A → 1/2, B → 1/2 ; symptom case disease of A → uniform 0 3 B → uniform 0 2; return (symptom, disease)}

: M (Symptom × Disease)

1 2 3 B A

SLIDE 10

4

Disintegration on a zero-probability observation

Diseases A and B are equally prevalent. Disease A causes a symptom chosen uniformly from [0, 3] ⊂ R. Disease B causes a symptom chosen uniformly from [0, 2] ⊂ R. do {disease A → 1/2, B → 1/2 ; symptom case disease of A → uniform 0 3 B → uniform 0 2; return (symptom, disease)}

: M (Symptom × Disease) ⇒λsymptom. if symptom ≤ 2

then A → 1/6, B → 1/4 else A → 1/6

: Symptom → M Disease

1 2 3 B A

SLIDE 11

5

Disintegration on a zero-probability observation

Choose disease uniformly from [1, 3] ⊂ R. Choose symptom uniformly from [0, disease] ⊂ R. do {disease uniform 1 3

: M Disease;

SLIDE 12

5

Disintegration on a zero-probability observation

Choose disease uniformly from [1, 3] ⊂ R. Choose symptom uniformly from [0, disease] ⊂ R. do {disease uniform 1 3; symptom uniform 0 disease; return (symptom, disease)}

: M (Symptom × Disease)

SLIDE 13

5

Disintegration on a zero-probability observation

Choose disease uniformly from [1, 3] ⊂ R. Choose symptom uniformly from [0, disease] ⊂ R. do {disease uniform 1 3; symptom uniform 0 disease; return (symptom, disease)}

: M (Symptom × Disease)

1 2 3 1 2 3

SLIDE 14

5

Disintegration on a zero-probability observation

Choose disease uniformly from [1, 3] ⊂ R. Choose symptom uniformly from [0, disease] ⊂ R. do {disease uniform 1 3; symptom uniform 0 disease; return (symptom, disease)}

: M (Symptom × Disease) ⇒λsymptom. do {disease uniform 1 3;

if 0 ≤ symptom ≤ disease then disease → 1/disease else }

: Symptom → M Disease

1 2 3 1 2 3

SLIDE 15

6

Measure semantics

M α = (α → R) → R

SLIDE 16

6

Measure semantics

M α = (α → R) → R A → 1/2, B → 1/2 = λc. c(A)

2

+ c(B)

2

return (symptom, disease) = λc. c(symptom, disease) = (symptom, disease) → 1 uniform 1 3 = λc. 3

1 c(x) 2 dx

lebesgue = λc. ∞

−∞

c(x) dx

do {x m; k} = λc. m(λx. kc)

SLIDE 17

6

Measure semantics

M α = (α → R) → R A → 1/2, B → 1/2 = λc. c(A)

2

+ c(B)

2

return (symptom, disease) = λc. c(symptom, disease) = (symptom, disease) → 1 uniform 1 3 = λc. 3

1 c(x) 2 dx

lebesgue = λc. ∞

−∞

c(x) dx

do {x m; k} = λc. m(λx. kc)

do {d uniform 1 3;

s uniform 0 d; return (s, d)}

λc. 3

1

d

c(s, d) 2 · d ds dd

SLIDE 18

7

Disintegration specification

m = do {s lebesgue; d k; return (s, d)}

SLIDE 19

7

Disintegration specification

m = do {s lebesgue; d k; return (s, d)}

m = do {d uniform 1 3; s uniform 0 d; return (s, d)} k = do {d uniform 1 3; if 0 ≤ s ≤ d then d → 1/d else }

SLIDE 20

7

Disintegration specification

m = do {s lebesgue; d k; return (s, d)}

m = do {d uniform 1 3; s uniform 0 d; return (s, d)} k = do {d uniform 1 3; if 0 ≤ s ≤ d then d → 1/d else }

k = λc. 3

1 if 0 ≤ s ≤ d then c(d) d else 0 2 dd

do {s lebesgue; d k; return (s, d)} = λc. ∞

−∞

3

1 if 0 ≤ s ≤ d then c(s, d) d else 0 2 dd ds

= λc. 3

1

d

c(s, d) 2 · d ds dd = m

SLIDE 21

8

Useful but unspecified and thus unautomated before

Figure 12. Gibbs sampler.

Borel paradox

SLIDE 22

9

Radio Yerevan

Question: Is it correct that Grigori Grigorievich Grigoriev won a luxury car at the All-Union Championship in Moscow? Answer: In principle, yes. But first of all it was not Grigori Grigorievich Grigoriev, but Vassili Vassilievich Vassiliev. Second, it was not at the All-Union Championship in Moscow, but at a Collective Farm Sports Festival in Smolensk. Third, it was not a car, but a bicycle. And fourth he didn’t win it, but rather it was stolen from him.

SLIDE 23

10

Automatic disintegrator

Question: Is it correct that our disintegrator is a lazy evaluator? Answer: In principle, yes. evaluate : ⌈α⌉ → H → (α × H)

SLIDE 24

10

Automatic disintegrator

Question: Is it correct that our disintegrator is a lazy evaluator? Answer: In principle, yes. But first of all it is not an evaluator, but a partial evaluator. evaluate : ⌈α⌉ → H → (⌊α⌋ × H)

SLIDE 25

10

Automatic disintegrator

Question: Is it correct that our disintegrator is a lazy evaluator? Answer: In principle, yes. But first of all it is not an evaluator, but a partial evaluator. Second, it not only evaluates terms, but also performs random choices. evaluate : ⌈

α⌉ → H → (⌊α⌋ → H → ⌊M γ⌋) → ⌊M γ⌋

perform : ⌈M α⌉ → H → (⌊α⌋ → H → ⌊M γ⌋) → ⌊M γ⌋

SLIDE 26

10

Automatic disintegrator

Question: Is it correct that our disintegrator is a lazy evaluator? Answer: In principle, yes. But first of all it is not an evaluator, but a partial evaluator. Second, it not only evaluates terms, but also performs random choices. Third, it not only produces outcomes and values, but also constrains them. evaluate : ⌈

α⌉ → H → (⌊α⌋ → H → ⌊M γ⌋) → ⌊M γ⌋

perform : ⌈M α⌉ → H → (⌊α⌋ → H → ⌊M γ⌋) → ⌊M γ⌋ constrain-value : ⌈

α⌉ → ⌊α⌋ → H → (H → ⌊M γ⌋) → ⌊M γ⌋

constrain-outcome : ⌈M α⌉ → ⌊α⌋ → H → (H → ⌊M γ⌋) → ⌊M γ⌋

SLIDE 27

10

Automatic disintegrator

Question: Is it correct that our disintegrator is a lazy evaluator? Answer: In principle, yes. But first of all it is not an evaluator, but a partial evaluator. Second, it not only evaluates terms, but also performs random choices. Third, it not only produces outcomes and values, but also constrains them. And fourth it doesn’t produce one term, but searches for a random variable to constrain. evaluate : ⌈

α⌉ → H → (⌊α⌋ → H → {⌊M γ⌋}) → {⌊M γ⌋}

perform : ⌈M α⌉ → H → (⌊α⌋ → H → {⌊M γ⌋}) → {⌊M γ⌋} constrain-value : ⌈

α⌉ → ⌊α⌋ → H → (H → {⌊M γ⌋}) → {⌊M γ⌋}

constrain-outcome : ⌈M α⌉ → ⌊α⌋ → H → (H → {⌊M γ⌋}) → {⌊M γ⌋}

SLIDE 28

11

Automatic disintegrator in action

[]

perform (do {d uniform 1 3; s uniform 0 d; return (s, d)})

[d′ uniform 1 3]

perform (do {s uniform 0 d′; return (s, d′)})

[d′ uniform 1 3; s′ uniform 0 d′]

perform (return (s′, d′)) evaluate (s′, d′) ⇒ (s′, d′) constrain-value s′ s constrain-outcome (uniform 0 d′) s

SLIDE 29

11

Automatic disintegrator in action

[]

perform (do {d uniform 1 3; s uniform 0 d; return (s, d)})

[d′ uniform 1 3]

perform (do {s uniform 0 d′; return (s, d′)})

[d′ uniform 1 3; s′ uniform 0 d′]

perform (return (s′, d′)) evaluate (s′, d′) ⇒ (s′, d′) constrain-value s′ s constrain-outcome (uniform 0 d′) s nondeterminism evaluate 0 ⇒ 0 evaluate d′ perform (uniform 1 3) do {d′′ uniform 1 3; }

⇒ d′′ [let d′ = d′′; s′ uniform 0 d′] ⇒ d′′

if 0 ≤ s ≤ d′′ then do {() () → 1/d′′ ; } else

[let d′ = d′′; let s′ = s]

SLIDE 30

11

Automatic disintegrator in action

[]

perform (do {d uniform 1 3; s uniform 0 d; return (s, d)})

[d′ uniform 1 3]

perform (do {s uniform 0 d′; return (s, d′)})

[d′ uniform 1 3; s′ uniform 0 d′]

perform (return (s′, d′)) evaluate (s′, d′) ⇒ (s′, d′) constrain-value s′ s constrain-outcome (uniform 0 d′) s nondeterminism evaluate 0 ⇒ 0 evaluate d′ perform (uniform 1 3) do {d′′ uniform 1 3; }

⇒ d′′ [let d′ = d′′; s′ uniform 0 d′] ⇒ d′′

if 0 ≤ s ≤ d′′ then do {() () → 1/d′′ ; } else

[let d′ = d′′; let s′ = s]

SLIDE 31

12

Determinism requires inversion

do {d uniform 0 1; s return (2 · d); return (s, d)}

SLIDE 32

12

Determinism requires inversion

do {d uniform 0 1; s return (2 · d); return (s, d)} do {d1 uniform 0 1; d2 uniform 0 1; s return (d1 + d2); return (s, (d1, d2))}

SLIDE 33

12

Determinism requires inversion

do {d uniform 0 1; s return (2 · d); return (s, d)} do {d1 uniform 0 1; d2 uniform 0 1; s return (d1 + d2); return (s, (d1, d2))} do {d1 uniform 0 1; d2 1 → 1/2, 2 → 1/2 ; s return dd2

1 ;

return (s, (d1, d2))}

SLIDE 34

13

Summary

◮ Observe symptoms with hidden causes ◮ Infer probabilities by program transformations ◮ Specify disintegration by measure semantics ◮ Automate disintegration by lazy partial evaluation ◮ Future work: arrays (symbolically evaluated)

beyond lebesgue prove correctness computer algebra — please help!