Calculating distributions Chung-chieh Shan Indiana University - - PowerPoint PPT Presentation

calculating distributions
SMART_READER_LITE
LIVE PREVIEW

Calculating distributions Chung-chieh Shan Indiana University - - PowerPoint PPT Presentation

Calculating distributions Chung-chieh Shan Indiana University 2018-09-21 Calculating distributions executable meaningful Calculating distributions executable meaningful Id also like to address this concept of being fake or


slide-1
SLIDE 1 Calculating distributions Chung-chieh Shan Indiana University 2018-09-21
slide-2
SLIDE 2 Calculating distributions meaningful executable
slide-3
SLIDE 3 Calculating distributions meaningful executable
slide-4
SLIDE 4

‘ ’

I’d also like to address this concept of being “fake” or “calculating. ” If being “fake” means not thinking or feeling the same way in one moment than you thought
  • r felt in a different moment,
then lord help us all. If being “calculating” is thinking through your words and actions and modeling the behavior you would like to see in the world, even when it is difficult, then I hope more of you will become calculating. —BenDeLaCreme
slide-5
SLIDE 5 Creative definitions and reasoning from first principles Symbolic representations
  • f common definition
patterns Mechanical operations for common reasoning patterns Virtuous cycle
  • f automation
and exploration (Buchberger)
slide-6
SLIDE 6 Creative definitions and reasoning from first principles Symbolic representations
  • f common definition
patterns Mechanical operations for common reasoning patterns Virtuous cycle
  • f automation
and exploration (Buchberger) natural numbers unary, binary <, +, ÷ rationals, reals, polynomials
slide-7
SLIDE 7 Creative definitions and reasoning from first principles Symbolic representations
  • f common definition
patterns Mechanical operations for common reasoning patterns Virtuous cycle
  • f automation
and exploration (Buchberger) natural numbers unary, binary <, +, ÷ rationals, reals, polynomials probability distributions table, Bayes net, probabilistic program recognize, integrate, disintegrate inference, learning,
  • ptimization
slide-8
SLIDE 8 8 An unknown random process yields a stateless coin that can be flipped repeatedly to produce heads (H) or tails (T). We assume that the probability p that the coin produces H each time is distributed uniformly between 0 and 1 by the process. We flip the coin 3 times and observe THH. What is the probability that the next flip produces H versus T? (adapted from Eddy)
slide-9
SLIDE 9 9 An unknown random process yields a stateless coin that can be flipped repeatedly to produce heads (H) or tails (T). We assume that the probability p that the coin produces H each time is distributed uniformly between 0 and 1 by the process. We flip the coin 3 times and observe THH. What is the probability that the next flip produces H versus T? (adapted from Eddy) Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate
slide-10
SLIDE 10 10 p p
  • x
p
  • x
p
  • x
y p
  • x
y Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate
slide-11
SLIDE 11 11 p p
  • x
p
  • x
p
  • x
y p
  • x
y Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate
slide-12
SLIDE 12 12 p p
  • x
p
  • x
p
  • x
y p
  • x
y Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate
slide-13
SLIDE 13 13 p p
  • x
p
  • x
p
  • x
y p
  • x
y Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate
slide-14
SLIDE 14 14 p p
  • x
p
  • x
p
  • x
y p
  • x
y Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate
slide-15
SLIDE 15 15 p p
  • x
p
  • x
p
  • x
y p
  • x
y = = = Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate
slide-16
SLIDE 16 16 p p
  • x
p
  • x
p
  • x
y p
  • x
y p p y y = = = Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate simplify
slide-17
SLIDE 17 17 Approximations calculated exactly sampler prediction problems.
  • 1. Introduction
The hidden Markov model (HMM) (Rabiner, 1989) is
  • ne of the most widely used models in machine learn-
ing and statistics for sequential or time series data. The HMM consists of a hidden state sequence with dynamics, and independent observations at state. There are A emerged Infinite Hidden Markov Model in- d- um-
  • 2. The Infinite Hidden Markov Model
We start this section by describing the finite HMM, then taking the infinite limit to obtain an intuition for the infinite HMM, followed by a more precise def-
  • inition. A finite HMM consists of a hidden state se-
, s , . . . , sT ) and a corresponding ob- . , y ). Each state hyperparameters can will use this in the experiments to follo
  • 3. The Gibbs Sampler
The Gibbs sampler was the first sampling algorithm for the iHMM that converges to the true posterior. One proposal builds on the direct assignment sampling scheme for the HDP in (Teh et al., 2006) by marginal- hidden variables π, φ from (2), (3) and implicit in β. Thus we base The algorithm W not suffer from this the whole sequence s in one go.
  • 4. The Beam Sampler
The forward-backward algorithm does not apply to the iHMM because the number of states, and hence the number of potential state trajectories, are infinite. The idea of beam sampling is to introduce auxiliary h that conditioned on u the number probability is finite. Now is data. curve shows the Gibbs line show the one standard deviation error
  • 5. Experiments
We evaluate the beam sampler on two artificial and datasets to illustrate the following properties: mixes in much fewer iterations exity per infinite capacit
slide-18
SLIDE 18 18 Approximations calculated exactly sampler prediction problems.
  • 1. Introduction
The hidden Markov model (HMM) (Rabiner, 1989) is
  • ne of the most widely used models in machine learn-
ing and statistics for sequential or time series data. The HMM consists of a hidden state sequence with dynamics, and independent observations at state. There are A emerged Infinite Hidden Markov Model in- d- um-
  • 2. The Infinite Hidden Markov Model
We start this section by describing the finite HMM, then taking the infinite limit to obtain an intuition for the infinite HMM, followed by a more precise def-
  • inition. A finite HMM consists of a hidden state se-
, s , . . . , sT ) and a corresponding ob- . , y ). Each state hyperparameters can will use this in the experiments to follo
  • 3. The Gibbs Sampler
The Gibbs sampler was the first sampling algorithm for the iHMM that converges to the true posterior. One proposal builds on the direct assignment sampling scheme for the HDP in (Teh et al., 2006) by marginal- hidden variables π, φ from (2), (3) and implicit in β. Thus we base The algorithm W not suffer from this the whole sequence s in one go.
  • 4. The Beam Sampler
The forward-backward algorithm does not apply to the iHMM because the number of states, and hence the number of potential state trajectories, are infinite. The idea of beam sampling is to introduce auxiliary h that conditioned on u the number probability is finite. Now is data. curve shows the Gibbs line show the one standard deviation error
  • 5. Experiments
We evaluate the beam sampler on two artificial and datasets to illustrate the following properties: mixes in much fewer iterations exity per infinite capacit x u (i+1) (i+1) f(x ) (i)
  • =
  • T
h e n ∈ ⊗ t h a t µ a n d µ T a r e m u t u a l l y a b s
  • l
u t e l m u t u a l l y s i n g u l a r
  • n
t h e c
  • m
p l e m e n t
  • f
R , R c . T h e s e t a r e n u l l f
  • r
b
  • t
h µ a n d µ T . L e t µ R a n d µ T R b e t h e . T h e n t h e r e e x i s t s a v e r s i
  • n
  • f
t h e d e n s i t y r
  • x
  • y
  • =
µ R
  • d
x
  • d
y
  • µ
T R
  • d
x
  • d
y
  • x
  • y
  • <
∞ a n d r
  • x
  • y
  • =
1 / r
  • y
  • x
  • f
  • r
a l l x
  • y
∈ E d x
  • d
y
  • =
µ
  • d
x
  • d
y s y m m e t r i c
slide-19
SLIDE 19 19 Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate
slide-20
SLIDE 20 20 Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate s i m p l i f y 200 400 600 800 200 400 Data size Time in seconds PSI
slide-21
SLIDE 21 21 Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate s i m p l i f y 200 400 600 800 200 400 Data size Time in seconds PSI disintegrate simplify Time in seconds Time in seconds Time in seconds Time in seconds Time in seconds Time in seconds Time in seconds Time in seconds Time in seconds Accuracy in % Accuracy in % Accuracy in % Accuracy in % Accuracy in % Accuracy in % Accuracy in % Accuracy in % Accuracy in % 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 8 10 10 10 10 10 10 10 10 10 30 30 30 30 30 30 30 30 30 35 35 35 35 35 35 35 35 40 40 40 40 40 40 40 40 40 45 45 45 45 45 45 45 45 50 50 50 50 50 50 50 50 50 55 55 55 55 55 55 55 55 60 60 60 60 60 60 60 60 60 Haskell-backend Haskell-backend Haskell-backend Haskell-backend Haskell-backend Haskell-backend Haskell-backend Haskell-backend Hakaru AugurV2 AugurV2 AugurV2 AugurV2 AugurV2 AugurV2 AugurV2 AugurV2 AugurV2 JAGS JAGS JAGS JAGS JAGS JAGS JAGS JAGS JAGS · · · bind disintegrate simplify Inference method Run time (msecs) Mean SD WebPPL 1078 16 Hakaru without simplifications 1321 93 Hakaru with simplifications 269 10 Handwritten 207 4 Put approximations in the language! (FLOPS 2016, UAI 2017)
slide-22
SLIDE 22 22 p p
  • x
p
  • x
p
  • x
y p
  • x
y p p y y = = = recognize Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate simplify
slide-23
SLIDE 23 23 Recognizing a density function Program denote measures:
  • p
  • x
  • =
  • p
slide-24
SLIDE 24 24 Recognizing a density function Program denote measures:
  • p
  • x
  • =
  • p
M [0, 1]
slide-25
SLIDE 25 25 Recognizing a density function Program denote measures:
  • p
  • x
  • =
  • p
M [0, 1] ⊆ ([0, 1] → R+) → R+ Measures compute expectations:
slide-26
SLIDE 26 26 Recognizing a density function Program denote measures:
  • p
  • x
  • =
  • p
M [0, 1] ⊆ ([0, 1] → R+) → R+ Measures compute expectations: f
slide-27
SLIDE 27 27 Recognizing a density function Program denote measures:
  • p
  • x
  • =
  • p
M [0, 1] ⊆ ([0, 1] → R+) → R+ Measures compute expectations: f 1
  • x∈{H,T}3
p
  • ixi=H(1 − p)
  • ix1=T
x = THH · f(p) dp
slide-28
SLIDE 28 28 Recognizing a density function Program denote measures:
  • p
  • x
  • =
  • p
M [0, 1] ⊆ ([0, 1] → R+) → R+ Measures compute expectations: f 1
  • x∈{H,T}3
p
  • ixi=H(1 − p)
  • ix1=T
x = THH · f(p) dp
slide-29
SLIDE 29 29 Recognizing a density function Program denote measures:
  • p
  • x
  • =
  • p
M [0, 1] ⊆ ([0, 1] → R+) → R+ Measures compute expectations: f 1
  • x∈{H,T}3
p
  • ixi=H(1 − p)
  • ix1=T
x = THH · f(p) dp
slide-30
SLIDE 30 30 Recognizing a density function Program denote measures:
  • p
  • x
  • =
  • p
M [0, 1] ⊆ ([0, 1] → R+) → R+ Measures compute expectations: f 1
  • x∈{H,T}3
p
  • ixi=H(1 − p)
  • ix1=T
x = THH · f(p) dp
slide-31
SLIDE 31 31 Recognizing a density function Program denote measures:
  • p
  • x
  • =
  • p
M [0, 1] ⊆ ([0, 1] → R+) → R+ Measures compute expectations: f 1
  • x∈{H,T}3
p
  • ixi=H(1 − p)
  • ix1=T
x = THH · f(p) dp = 1
  • p2(1 − p) · f(p) dp
slide-32
SLIDE 32 32 Recognizing a density function Program denote measures:
  • p
  • x
  • =
  • p
M [0, 1] ⊆ ([0, 1] → R+) → R+ Measures compute expectations: f 1
  • x∈{H,T}3
p
  • ixi=H(1 − p)
  • ix1=T
x = THH · f(p) dp = 1
  • p2(1 − p) · f(p) dp
Need to recognize simplified denotation as Beta distribution …
slide-33
SLIDE 33 33 Recognizing a density function Goal: recognize h(p) = p2(1 − p) as the density of beta 3 2 Robustness challenge: many equivalent ways to write p2(1 − p) arise Modularity challenge: many distribution families (beta, normal, …) known
slide-34
SLIDE 34 34 Recognizing a density function Goal: recognize h(p) = p2(1 − p) as the density of beta 3 2 Robustness challenge: many equivalent ways to write p2(1 − p) arise Modularity challenge: many distribution families (beta, normal, …) known Solution: characterize density functions by their holonomic representation, a homogeneous linear differential equation such as p(1 − p) · h′(p) +
  • p − 2(1 − p)
  • · h(p) = 0
computed compositionally!
slide-35
SLIDE 35 35 p p
  • x
p
  • x
p
  • x
y p
  • x
y p p y y = = = recognize integrate Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate simplify
slide-36
SLIDE 36 36 Eliminating a random variable Program denote measures:
  • p
y
  • =
  • y
slide-37
SLIDE 37 37 Eliminating a random variable Program denote measures:
  • p
y
  • =
  • y
M {H, T} ⊆ ({H, T} → R+) → R+ Measures compute expectations: f 1
  • p2(1 − p)
  • y∈{H,T}
py=H(1 − p)y=T · f(y) dp
slide-38
SLIDE 38 38 Eliminating a random variable Program denote measures:
  • p
y
  • =
  • y
M {H, T} ⊆ ({H, T} → R+) → R+ Measures compute expectations: f 1
  • p2(1 − p)
  • y∈{H,T}
py=H(1 − p)y=T · f(y) dp
slide-39
SLIDE 39 39 Eliminating a random variable Program denote measures:
  • p
y
  • =
  • y
M {H, T} ⊆ ({H, T} → R+) → R+ Measures compute expectations: f 1
  • p2(1 − p)
  • y∈{H,T}
py=H(1 − p)y=T · f(y) dp
slide-40
SLIDE 40 40 Eliminating a random variable Program denote measures:
  • p
y
  • =
  • y
M {H, T} ⊆ ({H, T} → R+) → R+ Measures compute expectations: f 1
  • p2(1 − p)
  • y∈{H,T}
py=H(1 − p)y=T · f(y) dp
slide-41
SLIDE 41 41 Eliminating a random variable Program denote measures:
  • p
y
  • =
  • y
M {H, T} ⊆ ({H, T} → R+) → R+ Measures compute expectations: f 1
  • p2(1 − p)
  • y∈{H,T}
py=H(1 − p)y=T · f(y) dp
slide-42
SLIDE 42 42 Eliminating a random variable Program denote measures:
  • p
y
  • =
  • y
M {H, T} ⊆ ({H, T} → R+) → R+ Measures compute expectations: f 1
  • p2(1 − p)
  • y∈{H,T}
py=H(1 − p)y=T · f(y) dp
slide-43
SLIDE 43 43 Eliminating a random variable Program denote measures:
  • p
y
  • =
  • y
M {H, T} ⊆ ({H, T} → R+) → R+ Measures compute expectations: f 1
  • p2(1 − p)
  • y∈{H,T}
py=H(1 − p)y=T · f(y) dp =
  • y∈{H,T}
  • 1
  • p2(1 − p)py=H(1 − p)y=T dp
  • · f(y)
slide-44
SLIDE 44 44 Eliminating a random variable Program denote measures:
  • p
y
  • =
  • y
M {H, T} ⊆ ({H, T} → R+) → R+ Measures compute expectations: f 1
  • p2(1 − p)
  • y∈{H,T}
py=H(1 − p)y=T · f(y) dp =
  • y∈{H,T}
  • 1
  • p2(1 − p)py=H(1 − p)y=T dp
  • · f(y)
=
  • y∈{H,T}
  • 1
20
  • y=H
1 30
  • y=T
· f(y)
slide-45
SLIDE 45 45 Eliminating a random variable Program denote measures:
  • p
y
  • =
  • y
M {H, T} ⊆ ({H, T} → R+) → R+ Measures compute expectations: f 1
  • p2(1 − p)
  • y∈{H,T}
py=H(1 − p)y=T · f(y) dp =
  • y∈{H,T}
  • 1
  • p2(1 − p)py=H(1 − p)y=T dp
  • · f(y)
=
  • y∈{H,T}
  • 1
20
  • y=H
1 30
  • y=T
· f(y)
slide-46
SLIDE 46 46 p p
  • x
p
  • x
p
  • x
y p
  • x
y p p y y = = = recognize integrate Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate simplify Computer algebra Programming languages (PADL 2016)
slide-47
SLIDE 47 47 Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate
slide-48
SLIDE 48 48 An unknown random process yields a stateless particle whose one-dimensional position can be measured repeatedly to produce a real number. We assume that the position p of the particle is distributed normally with mean 3 and standard deviation 2. We measure the particle 3 times, each time drawing independently from the normal distribution with mean p and standard deviation 1, and observe −1.4, +1.0, −0.2. What is the distribution of the next measurement? Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate
slide-49
SLIDE 49 49 p p
  • x
p
  • x
= Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate simplify
slide-50
SLIDE 50 50 p p
  • x
p p = Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate simplify disintegrate
slide-51
SLIDE 51 51 p p
  • x
p p p y p y p y y = = = Pr(p) Pr(p, x) Pr(p | x) Pr(p, y | x) Pr(y | x) bind disintegrate bind integrate simplify disintegrate
slide-52
SLIDE 52 52 Disintegrating a joint measure p
  • x
p Pr(p, x) Pr(p | x) disintegrate
slide-53
SLIDE 53 53 Disintegrating a joint measure p
  • x
p
  • x
Pr(p, x) Pr(p | x) disintegrate
slide-54
SLIDE 54 54 Disintegrating a joint measure p
  • x
p
  • x
Pr(p, x) Pr(p | x) disintegrate = Program transformation derived from semantics. Tricky when x is not just drawn from a primitive distribution: ◮ total momentum ◮ loop over array ◮ clamped measurement ◮ coordinate-wise MCMC Addressed in recent work. (ICFP 2016, POPL 2017, ICFP 2017)
slide-55
SLIDE 55 7 8 9 ÷ 4 5 6 × 1 2 3 − . = +
slide-56
SLIDE 56 7 plate 8 lambda 9 apply ÷ disintegrate 4 pair 5 fst 6 snd × bind 1 dirac 2 beta 3 normal − gradient mzero . factor = simplify + mplus
slide-57
SLIDE 57 ÷ disintegrate × bind − gradient mzero . factor = simplify + mplus Thanks! Jacques Carette Oleg Kiselyov Wazim Mohammed Ismail Praveen Narayanan Norman Ramsey Wren Romano Sam Tobin-Hochstadt Rajan Walia Robert Zinkov