Filtering with Abstract Particles Jacob Steinhardt Percy Liang - - PowerPoint PPT Presentation

filtering with abstract particles
SMART_READER_LITE
LIVE PREVIEW

Filtering with Abstract Particles Jacob Steinhardt Percy Liang - - PowerPoint PPT Presentation

Filtering with Abstract Particles Jacob Steinhardt Percy Liang Stanford University { jsteinhardt,pliang } @cs.stanford.edu May 1, 2013 J. Steinhardt & P. Liang (Stanford) Filtering with Abstract Particles May 1, 2013 1 / 12 Motivation


slide-1
SLIDE 1

Filtering with Abstract Particles

Jacob Steinhardt Percy Liang

Stanford University

{jsteinhardt,pliang}@cs.stanford.edu

May 1, 2013

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 1 / 12

slide-2
SLIDE 2

Motivation

  • Goal. Given an (un-normalized) target distribution f ∗(x), p∗(x) = 1

Z f ∗(x),

want to compute normalization constant Z.

  • Issue. Often computationally intractable, so use some approximation ˆ

f to f ∗. variational Bayes, expectation propagation (drop dependencies) MCMC, sequential Monte Carlo, beam search (use samples) We will show how to combine advantages of both types of methods.

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 2 / 12

slide-3
SLIDE 3

Variational vs. Particle Methods

Goal: infer missing characters in r e

c e

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 3 / 12

slide-4
SLIDE 4

Variational vs. Particle Methods

Goal: infer missing characters in r e

c e

Particle 0.5 replace 0.5 retrace

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 3 / 12

slide-5
SLIDE 5

Variational vs. Particle Methods

Goal: infer missing characters in r e

c e

Particle Actual 0.5 replace 0.33 replace 0.5 retrace 0.33 retrace 0.33 rejoice 0.01 . . .

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 3 / 12

slide-6
SLIDE 6

Variational vs. Particle Methods

Goal: infer missing characters in r e

c e

Particle Actual Variational 0.5 replace 0.33 replace r e

0.33 j 0.33 p 0.33 t 0.33 l 0.33 o 0.33 r 0.66 a 0.33 i

c e 0.5 retrace 0.33 retrace 0.33 rejoice 0.01 . . .

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 3 / 12

slide-7
SLIDE 7

Variational vs. Particle Methods

Goal: infer missing characters in r e

c e

Particle Actual Variational 0.5 replace 0.33 replace r e

0.33 j 0.33 p 0.33 t 0.33 l 0.33 o 0.33 r 0.66 a 0.33 i

c e 0.5 retrace 0.33 retrace 0.33 rejoice 0.01 . . . Particles provide precision but lack coverage, while variational inference lacks precision.

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 3 / 12

slide-8
SLIDE 8

Our Proposal

Define approximations over intermediate regions. variational particle

re ⋆⋆⋆ ce replace retrace

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 4 / 12

slide-9
SLIDE 9

Our Proposal

Define approximations over intermediate regions. variational particle

re ⋆⋆⋆ ce replace retrace rejoice

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 4 / 12

slide-10
SLIDE 10

Our Proposal

Define approximations over intermediate regions. variational intermediate particle

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 4 / 12

slide-11
SLIDE 11

Our Proposal

Define approximations over intermediate regions. variational intermediate particle

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

  • Goal. Stitch together approximations at multiple levels to simultaneously obtain

precision (from lower levels) and coverage (from higher levels).

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 4 / 12

slide-12
SLIDE 12

Stitching Together Models

  • Question. How to combine the different models?

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 5 / 12

slide-13
SLIDE 13

Stitching Together Models

  • Question. How to combine the different models?

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

  • Answer. Just use most precise model available at each point (relies on nested

structure, e.g. the regions form a hierarchical decomposition).

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 5 / 12

slide-14
SLIDE 14

Generalizing the Construction

Let X be some space. Suppose we have a hierarchical decomposition A ⊆ 2X together with an approximation ˆ fa to f ∗ defined on each region a ∈ A. r e

⋆ ⋆ ⋆

c e r e

⋆ ⋆

a c e r e

l a c e r e p l a c e

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 6 / 12

slide-15
SLIDE 15

Generalizing the Construction

Let X be some space. Suppose we have a hierarchical decomposition A ⊆ 2X together with an approximation ˆ fa to f ∗ defined on each region a ∈ A. If a = {x0} is a singleton set, can have

ˆ

fa(x0) = f ∗(x0). r e

⋆ ⋆ ⋆

c e r e

⋆ ⋆

a c e r e

l a c e r e p l a c e

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 6 / 12

slide-16
SLIDE 16

Generalizing the Construction

Let X be some space. Suppose we have a hierarchical decomposition A ⊆ 2X together with an approximation ˆ fa to f ∗ defined on each region a ∈ A. If a = {x0} is a singleton set, can have

ˆ

fa(x0) = f ∗(x0). If a = X, will need to drop most of the dependencies. r e

⋆ ⋆ ⋆

c e r e

⋆ ⋆

a c e r e

l a c e r e p l a c e

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 6 / 12

slide-17
SLIDE 17

Generalizing the Construction

Let X be some space. Suppose we have a hierarchical decomposition A ⊆ 2X together with an approximation ˆ fa to f ∗ defined on each region a ∈ A. If a = {x0} is a singleton set, can have

ˆ

fa(x0) = f ∗(x0). If a = X, will need to drop most of the dependencies. For intermediate values of a (for instance, fixing the values of certain variables) can keep some subset of the dependencies. r e

⋆ ⋆ ⋆

c e r e

⋆ ⋆

a c e r e

l a c e r e p l a c e

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 6 / 12

slide-18
SLIDE 18

Generalizing the Construction

Let X be some space. Suppose we have a hierarchical decomposition A ⊆ 2X together with an approximation ˆ fa to f ∗ defined on each region a ∈ A. If a = {x0} is a singleton set, can have

ˆ

fa(x0) = f ∗(x0). If a = X, will need to drop most of the dependencies. For intermediate values of a (for instance, fixing the values of certain variables) can keep some subset of the dependencies. r e

⋆ ⋆ ⋆

c e r e

⋆ ⋆

a c e r e

l a c e r e p l a c e Set ˆ f(x) def

= ˆ

fa(x), where a is the smallest region containing x. Can think of each region a ∈ A as an abstract particle.

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 6 / 12

slide-19
SLIDE 19

Inference

If ˆ f is constructed as in the previous slide, then we can compute normalization constant Z as long as we can compute ∑x∈bˆ fa(x) for all regions b ⊆ a. Proof by picture:

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 7 / 12

slide-20
SLIDE 20

Inference

If ˆ f is constructed as in the previous slide, then we can compute normalization constant Z as long as we can compute ∑x∈bˆ fa(x) for all regions b ⊆ a. Proof by picture:

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 7 / 12

slide-21
SLIDE 21

Inference

If ˆ f is constructed as in the previous slide, then we can compute normalization constant Z as long as we can compute ∑x∈bˆ fa(x) for all regions b ⊆ a. Proof by picture:

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

− +

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 7 / 12

slide-22
SLIDE 22

Inference

If ˆ f is constructed as in the previous slide, then we can compute normalization constant Z as long as we can compute ∑x∈bˆ fa(x) for all regions b ⊆ a. Proof by picture:

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

− + − +

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 7 / 12

slide-23
SLIDE 23

Inference

If ˆ f is constructed as in the previous slide, then we can compute normalization constant Z as long as we can compute ∑x∈bˆ fa(x) for all regions b ⊆ a. Proof by picture:

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

− + − +

=

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 7 / 12

slide-24
SLIDE 24

A Family of Approximations

A hierarchical decomposition A leads to an approximation ˆ f. We would like to define a family of approximations and choose the best one. Key idea. Every subset B of a hierarchical decomposition A is itself a hierarchical decomposition. Can let A have large cardinality and search for a small subset B that yields a good approximation. Example:

A :

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 8 / 12

slide-25
SLIDE 25

A Family of Approximations

A hierarchical decomposition A leads to an approximation ˆ f. We would like to define a family of approximations and choose the best one. Key idea. Every subset B of a hierarchical decomposition A is itself a hierarchical decomposition. Can let A have large cardinality and search for a small subset B that yields a good approximation. Example:

B :

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 8 / 12

slide-26
SLIDE 26

A Family of Approximations

A hierarchical decomposition A leads to an approximation ˆ f. We would like to define a family of approximations and choose the best one. Key idea. Every subset B of a hierarchical decomposition A is itself a hierarchical decomposition. Can let A have large cardinality and search for a small subset B that yields a good approximation. Example:

B :

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 8 / 12

slide-27
SLIDE 27

A Family of Approximations

A hierarchical decomposition A leads to an approximation ˆ f. We would like to define a family of approximations and choose the best one. Key idea. Every subset B of a hierarchical decomposition A is itself a hierarchical decomposition. Can let A have large cardinality and search for a small subset B that yields a good approximation. Example:

B :

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 8 / 12

slide-28
SLIDE 28

A Family of Approximations

A hierarchical decomposition A leads to an approximation ˆ f. We would like to define a family of approximations and choose the best one. Key idea. Every subset B of a hierarchical decomposition A is itself a hierarchical decomposition. Can let A have large cardinality and search for a small subset B that yields a good approximation. Example:

B :

re ⋆⋆⋆ ce re ⋆⋆ace re ⋆⋆ice replace retrace rejoice

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 8 / 12

slide-29
SLIDE 29

Search Strategy

Suppose that A has size 1000 and we want a subset of size 100.

1000

100

  • possibilities; far too many!
  • Solution. “Abstract beam search.” Iteratively refine and prune a candidate

decomposition. Refine: split each region into smaller regions (to gain precision). Prune: greedily keep a small set of regions that yield a good approximation (so we can refine again).

⋆⋆⋆ ⋆c⋆

ab⋆

⋆⋆⋆ ⋆⋆ a ⋆⋆ b ⋆⋆ c ⋆ca

aba

⋆cb

abb

⋆cc

abc Refine(⋆⋆⋆) Refine(⋆c⋆) Refine(ab⋆)

Applies naturally to filtering tasks (refine to go to next time step, prune to save resources).

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 9 / 12

slide-30
SLIDE 30

Summary (so far)

Interpolate between individual particles and full variational approximations by using region-specific approximations. Stitch together approximations in different regions via a hierarchical decomposition. Prune and refine the decomposition to find a good approximation. Related to split variational inference (Bouchard & Zoeter, 2009). Also to a growing family of coarse-to-fine inference methods (Petrov et al., 2006; Weiss & Taskar, 2010; many others).

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 10 / 12

slide-31
SLIDE 31

Experiments

0.376 0.022 w i t h 0.055 n 0.026 n

  • w

t h e 0.038 n

  • w

t h i 0.059 n

  • w

t h a 0.06 n

  • w

t h

  • 0.022

n w i t h 0.016 w i t h a e s t u h m 0.05 w i t h t 0.023 w i t h h 0.035 n w i t h y 0.018 n w i t h m e h i

  • y

Input: ? ? ? n ? w ? t h ? ? ?

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 11 / 12

slide-32
SLIDE 32

Experiments

n-gram text reconstruction (n = 8)

0.0 0.5 1.0 1.5 2.0

language model queries (billions)

−0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0.0 0.1 0.2

accuracy

abstract (greedy) concrete (smc) concrete (beam)

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 11 / 12

slide-33
SLIDE 33

Experiments

Factorial HMM (100 states, 15 factors)

50 100 150

runtime (seconds)

0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55

accuracy

abstract (greedy) concrete (smc) concrete (beam)

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 11 / 12

slide-34
SLIDE 34

Conclusion

Abstract particles combine the advantages of variational and particle inference. Provide a framework for reasoning about the optimal representation for approximate inference. Thanks!

  • J. Steinhardt & P. Liang (Stanford)

Filtering with Abstract Particles May 1, 2013 12 / 12