Truncated Random Measures Jonathan Huggins MIT CSAIL and Dept. of - - PowerPoint PPT Presentation

truncated random measures
SMART_READER_LITE
LIVE PREVIEW

Truncated Random Measures Jonathan Huggins MIT CSAIL and Dept. of - - PowerPoint PPT Presentation

Truncated Random Measures Jonathan Huggins MIT CSAIL and Dept. of EECS with: T. Campbell, J. How, T. Broderick What leads to a statistical method being used for science? What leads to a statistical method being used for science? 1.


slide-1
SLIDE 1

Truncated Random Measures

Jonathan Huggins

MIT CSAIL and Dept. of EECS with: T. Campbell, J. How, T. Broderick

slide-2
SLIDE 2

What leads to a statistical method being used for science?

slide-3
SLIDE 3

What leads to a statistical method being used for science?

  • 1. Conceptually clear
slide-4
SLIDE 4

What leads to a statistical method being used for science?

  • 1. Conceptually clear
  • Bayesian methods are conceptually clear…
slide-5
SLIDE 5

What leads to a statistical method being used for science?

  • 1. Conceptually clear
  • Bayesian methods are conceptually clear…
  • 2. Easy to use
slide-6
SLIDE 6

What leads to a statistical method being used for science?

  • 1. Conceptually clear
  • Bayesian methods are conceptually clear…
  • 2. Easy to use
  • …but often not easy to use…
slide-7
SLIDE 7

What leads to a statistical method being used for science?

  • 1. Conceptually clear
  • Bayesian methods are conceptually clear…
  • 2. Easy to use
  • …but often not easy to use…
  • 3. Reliable
slide-8
SLIDE 8

What leads to a statistical method being used for science?

  • 1. Conceptually clear
  • Bayesian methods are conceptually clear…
  • 2. Easy to use
  • …but often not easy to use…
  • 3. Reliable
  • …which makes them less reliable
slide-9
SLIDE 9

What leads to a statistical method being used for science?

  • 1. Conceptually clear
  • Bayesian methods are conceptually clear…
  • 2. Easy to use
  • …but often not easy to use…
  • 3. Reliable
  • …which makes them less reliable
  • How to fix this? probabilistic programming
slide-10
SLIDE 10

What leads to a statistical method being used for science?

  • 1. Conceptually clear
  • Bayesian methods are conceptually clear…
  • 2. Easy to use
  • …but often not easy to use…
  • 3. Reliable
  • …which makes them less reliable
  • How to fix this? probabilistic programming
  • Write down the model, but don’t worry about inference
slide-11
SLIDE 11

What leads to a statistical method being used for science?

  • 1. Conceptually clear
  • Bayesian methods are conceptually clear…
  • 2. Easy to use
  • …but often not easy to use…
  • 3. Reliable
  • …which makes them less reliable
  • How to fix this? probabilistic programming
  • Write down the model, but don’t worry about inference
  • v1.0: BUGS/JAGS (Gibbs sampling)
slide-12
SLIDE 12

What leads to a statistical method being used for science?

  • 1. Conceptually clear
  • Bayesian methods are conceptually clear…
  • 2. Easy to use
  • …but often not easy to use…
  • 3. Reliable
  • …which makes them less reliable
  • How to fix this? probabilistic programming
  • Write down the model, but don’t worry about inference
  • v1.0: BUGS/JAGS (Gibbs sampling)
  • v2.0: Stan (HMC or variational inference or MAP estimation)
slide-13
SLIDE 13

What leads to a statistical method being used for science?

  • 1. Conceptually clear
  • Bayesian methods are conceptually clear…
  • 2. Easy to use
  • …but often not easy to use…
  • 3. Reliable
  • …which makes them less reliable
  • How to fix this? probabilistic programming
  • Write down the model, but don’t worry about inference
  • v1.0: BUGS/JAGS (Gibbs sampling)
  • v2.0: Stan (HMC or variational inference or MAP estimation)
  • Goal: integrate BNP priors into PPLs like Stan
slide-14
SLIDE 14

BNP: awesome, but 
 challenging to use

slide-15
SLIDE 15

BNP: awesome, but 
 challenging to use

Need models that can extract new, useful information from infinite streams of data

slide-16
SLIDE 16

BNP: awesome, but 
 challenging to use

e.g. keep learning new topics from a stream of documents

Need models that can extract new, useful information from infinite streams of data

slide-17
SLIDE 17

BNP: awesome, but 
 challenging to use

e.g. keep learning new topics from a stream of documents

Need models that can extract new, useful information from infinite streams of data Bayesian nonparametrics: achieves growing model size via infinite parameters

slide-18
SLIDE 18

BNP: awesome, but 
 challenging to use

e.g. keep learning new topics from a stream of documents

Need models that can extract new, useful information from infinite streams of data Bayesian nonparametrics: achieves growing model size via infinite parameters

[Gopalan 2014]

movie s

[Teh 2006]

text

[Huang 2014]

medicine

[Michini 2015]

robotics

[Lennox 2010]

genetics

[Prunster 2014]

finance

[Yang 2015]

astronomy

[Yu 2012]

traffic

[Ozaki 2008]

agriculture

[Kottas 2008]

pathology

slide-19
SLIDE 19

BNP: awesome, but 
 challenging to use

e.g. keep learning new topics from a stream of documents

Need models that can extract new, useful information from infinite streams of data Bayesian nonparametrics: achieves growing model size via infinite parameters

[Gopalan 2014]

movie s

[Teh 2006]

text

[Huang 2014]

medicine

[Michini 2015]

robotics

[Lennox 2010]

genetics

[Prunster 2014]

finance

[Yang 2015]

astronomy

[Yu 2012]

traffic

[Ozaki 2008]

agriculture

[Kottas 2008]

pathology

hard work!

slide-20
SLIDE 20

BNP: awesome, but 
 challenging to use

e.g. keep learning new topics from a stream of documents

Need models that can extract new, useful information from infinite streams of data Bayesian nonparametrics: achieves growing model size via infinite parameters

[Gopalan 2014]

movie s

[Teh 2006]

text

[Huang 2014]

medicine

[Michini 2015]

robotics

[Lennox 2010]

genetics

[Prunster 2014]

finance

[Yang 2015]

astronomy

[Yu 2012]

traffic

[Ozaki 2008]

agriculture

[Kottas 2008]

pathology

hard work! automate inference with probabilistic programming

slide-21
SLIDE 21

Inference in BNP models

slide-22
SLIDE 22
  • Option #1: Integrate out the parameter (CRP, IBP, etc.)


issues: care about the parameters, using approximations (HMC/VB), distributed computation

Inference in BNP models

slide-23
SLIDE 23
  • Option #1: Integrate out the parameter (CRP, IBP, etc.)


issues: care about the parameters, using approximations (HMC/VB), distributed computation

  • Option #2: use a finite approximation... 


with e.g. variational inference, HMC 


[Blei 06; Neal 10]

Inference in BNP models

slide-24
SLIDE 24
  • Option #1: Integrate out the parameter (CRP, IBP, etc.)


issues: care about the parameters, using approximations (HMC/VB), distributed computation

  • Option #2: use a finite approximation... 


with e.g. variational inference, HMC 


[Blei 06; Neal 10]

Problem: Wide variety of priors in BNP with no finite approximation

Inference in BNP models

slide-25
SLIDE 25
  • Option #1: Integrate out the parameter (CRP, IBP, etc.)


issues: care about the parameters, using approximations (HMC/VB), distributed computation

  • Option #2: use a finite approximation... 


with e.g. variational inference, HMC 


[Blei 06; Neal 10]

Problem: Wide variety of priors in BNP with no finite approximation

All BNP priors

Inference in BNP models

slide-26
SLIDE 26
  • Option #1: Integrate out the parameter (CRP, IBP, etc.)


issues: care about the parameters, using approximations (HMC/VB), distributed computation

  • Option #2: use a finite approximation... 


with e.g. variational inference, HMC 


[Blei 06; Neal 10]

Problem: Wide variety of priors in BNP with no finite approximation

All BNP priors Previously studied priors

Inference in BNP models

slide-27
SLIDE 27
  • Option #1: Integrate out the parameter (CRP, IBP, etc.)


issues: care about the parameters, using approximations (HMC/VB), distributed computation

  • Option #2: use a finite approximation... 


with e.g. variational inference, HMC 


[Blei 06; Neal 10]

Problem: Wide variety of priors in BNP with no finite approximation

All BNP priors Previously studied priors with finite approx (past work)

Inference in BNP models

slide-28
SLIDE 28
  • Option #1: Integrate out the parameter (CRP, IBP, etc.)


issues: care about the parameters, using approximations (HMC/VB), distributed computation

  • Option #2: use a finite approximation... 


with e.g. variational inference, HMC 


[Blei 06; Neal 10]

Problem: Wide variety of priors in BNP with no finite approximation

Contributions:

All BNP priors Previously studied priors with finite approx (past work)

Inference in BNP models

slide-29
SLIDE 29
  • Option #1: Integrate out the parameter (CRP, IBP, etc.)


issues: care about the parameters, using approximations (HMC/VB), distributed computation

  • Option #2: use a finite approximation... 


with e.g. variational inference, HMC 


[Blei 06; Neal 10]

Problem: Wide variety of priors in BNP with no finite approximation

Contributions:

  • 2 representation forms (7 reps total) that allow finite approximation
  • f (normalized) completely random measures ( (N)CRMs )

All BNP priors Previously studied priors with finite approx (past work)

Inference in BNP models

slide-30
SLIDE 30
  • Option #1: Integrate out the parameter (CRP, IBP, etc.)


issues: care about the parameters, using approximations (HMC/VB), distributed computation

  • Option #2: use a finite approximation... 


with e.g. variational inference, HMC 


[Blei 06; Neal 10]

Problem: Wide variety of priors in BNP with no finite approximation

Contributions:

  • 2 representation forms (7 reps total) that allow finite approximation
  • f (normalized) completely random measures ( (N)CRMs )

All BNP priors Priors with finite approx (new) Previously studied priors with finite approx (past work)

Inference in BNP models

slide-31
SLIDE 31
  • Option #1: Integrate out the parameter (CRP, IBP, etc.)


issues: care about the parameters, using approximations (HMC/VB), distributed computation

  • Option #2: use a finite approximation... 


with e.g. variational inference, HMC 


[Blei 06; Neal 10]

Problem: Wide variety of priors in BNP with no finite approximation

Contributions:

  • 2 representation forms (7 reps total) that allow finite approximation
  • f (normalized) completely random measures ( (N)CRMs )
  • Approximation error analysis

All BNP priors Priors with finite approx (new) Previously studied priors with finite approx (past work)

Inference in BNP models

slide-32
SLIDE 32
  • Option #1: Integrate out the parameter (CRP, IBP, etc.)


issues: care about the parameters, using approximations (HMC/VB), distributed computation

  • Option #2: use a finite approximation... 


with e.g. variational inference, HMC 


[Blei 06; Neal 10]

Problem: Wide variety of priors in BNP with no finite approximation

Contributions:

  • 2 representation forms (7 reps total) that allow finite approximation
  • f (normalized) completely random measures ( (N)CRMs )
  • Approximation error analysis
  • Computational complexity analysis (not in this talk)

All BNP priors Priors with finite approx (new) Previously studied priors with finite approx (past work)

Inference in BNP models

slide-33
SLIDE 33

Past work: finite approximations to BNP priors

Finite Approximation Approximation Error Bounds Computational Complexity DP

✓ ✓ ✓

BP

✓ ✓ ✓

BPP

𝚫P

✓ ✓ ✓

(N)CRM

slide-34
SLIDE 34

Past work: finite approximations to BNP priors

Finite Approximation Approximation Error Bounds Computational Complexity DP

✓ ✓ ✓

BP

✓ ✓ ✓

BPP

𝚫P

✓ ✓ ✓

(N)CRM

[Sethuraman 94] [Roychowdhury 15] [Teh 07] [Paisley 12] [Thibaux 07] [Broderick 14] [Bondesson 82] [Roychowdhury 15] [Ishwaran 01] [Doshi-Velez 09] [Paisley 12] [Broderick 14] [Roychowdhury 15]

slide-35
SLIDE 35

Past work: finite approximations to BNP priors

Finite Approximation Approximation Error Bounds Computational Complexity DP

✓ ✓ ✓

BP

✓ ✓ ✓

BPP

𝚫P

✓ ✓ ✓

(N)CRM

[Sethuraman 94] [Roychowdhury 15] [Teh 07] [Paisley 12] [Thibaux 07] [Broderick 14] [Bondesson 82] [Roychowdhury 15] [Ishwaran 01] [Doshi-Velez 09] [Paisley 12] [Broderick 14] [Roychowdhury 15]

Sparse results for a few priors in BNP

slide-36
SLIDE 36

Past work: finite approximations to BNP priors

Finite Approximation Approximation Error Bounds Computational Complexity DP

✓ ✓ ✓

BP

✓ ✓ ✓

BPP

𝚫P

✓ ✓ ✓

(N)CRM

[Sethuraman 94] [Roychowdhury 15] [Teh 07] [Paisley 12] [Thibaux 07] [Broderick 14] [Bondesson 82] [Roychowdhury 15] [Ishwaran 01] [Doshi-Velez 09] [Paisley 12] [Broderick 14] [Roychowdhury 15]

Sparse results for a few priors in BNP No general theory

slide-37
SLIDE 37

Truncation Roadmap

slide-38
SLIDE 38

Tractable models in BNP

Truncation Roadmap

slide-39
SLIDE 39

Tractable models in BNP two forms for sequential representations

Truncation Roadmap

slide-40
SLIDE 40

Tractable models in BNP two forms for sequential representations Truncation and error analysis

Truncation Roadmap

slide-41
SLIDE 41

Tractable models in BNP two forms for sequential representations Truncation and error analysis

Truncation Roadmap

slide-42
SLIDE 42

Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)

The Standard Model in BNP (By Example )

343 189 210 854 342 584

s p

  • r

t s politics f

  • d

… 0.7 0.5 0.2 …

slide-43
SLIDE 43

topic 
 space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)

The Standard Model in BNP (By Example )

343 189 210 854 342 584

s p

  • r

t s politics f

  • d

… 0.7 0.5 0.2 …

slide-44
SLIDE 44

topic 
 space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)

The Standard Model in BNP (By Example )

343 189 210 854 342 584

s p

  • r

t s politics f

  • d

… 0.7 0.5 0.2 …

slide-45
SLIDE 45

topic 
 space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)

The Standard Model in BNP (By Example )

343 189 210 854 342 584

s p

  • r

t s politics f

  • d

… 0.7 0.5 0.2 …

slide-46
SLIDE 46

topic 
 space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)

The Standard Model in BNP (By Example )

343 189 210 854 342 584

s p

  • r

t s politics f

  • d

… 0.7 0.5 0.2 … 0.7 sports

slide-47
SLIDE 47

topic 
 space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)

The Standard Model in BNP (By Example )

343 189 210 854 342 584

s p

  • r

t s politics f

  • d

… 0.7 0.5 0.2 … 0.7 sports

slide-48
SLIDE 48

topic 
 space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)

The Standard Model in BNP (By Example )

343 189 210 854 342 584

s p

  • r

t s politics f

  • d

… 0.7 0.5 0.2 … 0.7 sports

slide-49
SLIDE 49

topic 
 space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)

The Standard Model in BNP (By Example )

343 189 210 854 342 584

s p

  • r

t s politics f

  • d

… 0.7 0.5 0.2 … 0.7 sports

slide-50
SLIDE 50

topic 
 space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)

The Standard Model in BNP (By Example )

343 189 210 854 342 584

s p

  • r

t s politics f

  • d

… 0.7 0.5 0.2 … 0.7 sports

slide-51
SLIDE 51

topic 
 space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)

The Standard Model in BNP (By Example )

343 189 210 854 342 584

s p

  • r

t s politics f

  • d

… 0.7 0.5 0.2 … 0.7 sports

ϴ is a random discrete measure

  • n the topics

Θ

slide-52
SLIDE 52

topic 
 space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)

The Standard Model in BNP (By Example )

343 189 210 854 342 584

s p

  • r

t s politics f

  • d

… 0.7 0.5 0.2 … 0.7 sports

ϴ is a random discrete measure

  • n the topics

Θ

slide-53
SLIDE 53

trait
 space rate 
 space Obs 1 Obs 2 Obs 3 Obs 4

The Standard Model in BNP (By Example )

343 189 210 854 342 584 … … “traits” “rates”

ϴ is a random discrete measure

  • n the topics

topics traits θ1 θ2 θ3 ψ1 ψ2 ψ3

Θ

slide-54
SLIDE 54

trait
 space rate 
 space Obs 1 Obs 2 Obs 3 Obs 4

The Standard Model in BNP (By Example )

343 189 210 854 342 584 … … “traits” “rates”

ϴ is a random discrete measure

  • n the topics

topics traits θ1 θ2 θ3 ψ1 ψ2 ψ3

Θ

slide-55
SLIDE 55

Poisson processes and (N)CRMs

How do we generate infinitely many trait/rate points (𝜔, 𝜄)?

trait space rate 
 space

slide-56
SLIDE 56

Poisson processes and (N)CRMs

How do we generate infinitely many trait/rate points (𝜔, 𝜄)?

Poisson point process with measure 𝜉(d𝜄 x d𝜔):

[Kingman 93]

trait space rate 
 space

slide-57
SLIDE 57

Poisson processes and (N)CRMs

How do we generate infinitely many trait/rate points (𝜔, 𝜄)?

completely random measure (CRM) (e.g. BP, 𝚫P) Poisson point process with measure 𝜉(d𝜄 x d𝜔):

[Kingman 93]

trait space rate 
 space

Θ

slide-58
SLIDE 58

Poisson processes and (N)CRMs

How do we generate infinitely many trait/rate points (𝜔, 𝜄)?

completely random measure (CRM) (e.g. BP, 𝚫P) Normalize rates: normalized CRM (NCRM) (e.g. DP) Poisson point process with measure 𝜉(d𝜄 x d𝜔):

[Kingman 93]

trait space rate 
 space

Θ

slide-59
SLIDE 59

Poisson processes and (N)CRMs

How do we generate infinitely many trait/rate points (𝜔, 𝜄)?

completely random measure (CRM) (e.g. BP, 𝚫P) Normalize rates: normalized CRM (NCRM) (e.g. DP) Captures a large class of useful priors in BNP Poisson point process with measure 𝜉(d𝜄 x d𝜔):

[Kingman 93]

trait space rate 
 space

Θ

slide-60
SLIDE 60

Poisson processes and (N)CRMs

How do we generate infinitely many trait/rate points (𝜔, 𝜄)?

completely random measure (CRM) (e.g. BP, 𝚫P) Normalize rates: normalized CRM (NCRM) (e.g. DP) Captures a large class of useful priors in BNP How do we pick a finite subset of the points? Poisson point process with measure 𝜉(d𝜄 x d𝜔):

[Kingman 93]

trait space rate 
 space

Θ

slide-61
SLIDE 61

Tractable models in BNP two forms for sequential representations Truncation and error analysis

Truncation Roadmap

slide-62
SLIDE 62

Tractable models in BNP two forms for sequential representations Truncation and error analysis

Truncation Roadmap

slide-63
SLIDE 63

Sequential representation & truncation

We pick a finite subset of atoms (𝜔,𝜄) by:

Θ

trait space rate 
 space

slide-64
SLIDE 64

Sequential representation & truncation

We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)

Θ

trait space rate 
 space

slide-65
SLIDE 65

Sequential representation & truncation

We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)

Θ

trait space rate 
 space

1

slide-66
SLIDE 66

Sequential representation & truncation

We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)

Θ

trait space rate 
 space

1 2

slide-67
SLIDE 67

Sequential representation & truncation

We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)

Θ

trait space rate 
 space

1 2 3

slide-68
SLIDE 68

Sequential representation & truncation

We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)

Θ

trait space rate 
 space

1 2 3 4

slide-69
SLIDE 69

Sequential representation & truncation

We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)

Θ

trait space rate 
 space

1 2 3 4 K

slide-70
SLIDE 70

Sequential representation & truncation

We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)

Θ

trait space rate 
 space

1 2 3 4 K

slide-71
SLIDE 71

Sequential representation & truncation

We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation) 2) removing any atoms beyond the K-th (truncation)

Θ

trait space rate 
 space

1 2 3 4 K

slide-72
SLIDE 72

Sequential representation & truncation

We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation) 2) removing any atoms beyond the K-th (truncation)

Θ

trait space rate 
 space

1 2 3 4 K

slide-73
SLIDE 73

Sequential representation & truncation

We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation) 2) removing any atoms beyond the K-th (truncation)

Θ

trait space rate 
 space

1 2 3 4 K

slide-74
SLIDE 74

Sequential representation & truncation

We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation) 2) removing any atoms beyond the K-th (truncation)

Θ

trait space rate 
 space

1 2 3 4 K

slide-75
SLIDE 75

We describe 2 forms for sequential representations

Ordering of (N)CRM atoms

slide-76
SLIDE 76

Series representation
 function of a homogenous 
 Poisson point process
 (4 versions) We describe 2 forms for sequential representations

Ordering of (N)CRM atoms

slide-77
SLIDE 77

Superposition representation
 infinite sum of homogenous CRMs, each with finite # of atoms
 (3 versions)

Series representation
 function of a homogenous 
 Poisson point process
 (4 versions) We describe 2 forms for sequential representations

Ordering of (N)CRM atoms

slide-78
SLIDE 78

Superposition representation
 infinite sum of homogenous CRMs, each with finite # of atoms
 (3 versions)

Series representation
 function of a homogenous 
 Poisson point process
 (4 versions) We describe 2 forms for sequential representations

Ordering of (N)CRM atoms

Theorem (H., Campbell, How, Broderick).
 Can generate (N)CRMs using all 7 sequential representations

slide-79
SLIDE 79

Sequential representation comparison

Why so many representations?

slide-80
SLIDE 80

They’re all useful in different circumstances

Sequential representation comparison

Why so many representations?

slide-81
SLIDE 81

They’re all useful in different circumstances

Sequential representation comparison

Why so many representations?

Series Reps Superposition Reps B-Rep IL-Rep R-Rep T-Rep DB-Rep PL-Rep SB-Rep Error Bound Decay

(exp)

(exp)

✓/✗ ✗ ✓

(exp)

(exp)

Ease of Analysis

✗ ✗✗ ✗ ✗ ✓ ✓ ✓

Generality

✓ ✓ ✓ ✓ ✓ ✓ ✓

Known # Atoms

✓ ✓ ✗ ✗ ✗ ✗ ✗

slide-82
SLIDE 82

Given Gamma process:

Sequential representation example

slide-83
SLIDE 83

Given Gamma process: Step 1: compute

Sequential representation example

slide-84
SLIDE 84

Given Gamma process: Step 1: compute

Sequential representation example

slide-85
SLIDE 85

Given Gamma process: Step 1: compute Step 2: compute

Sequential representation example

slide-86
SLIDE 86

Given Gamma process: Step 1: compute Step 2: compute

Sequential representation example

slide-87
SLIDE 87

Given Gamma process: Exponential(𝜇) density! Step 1: compute Step 2: compute

Sequential representation example

slide-88
SLIDE 88

Given Gamma process: Exponential(𝜇) density! Step 3: plug in! Step 1: compute Step 2: compute

Sequential representation example

slide-89
SLIDE 89

Tractable models in BNP two forms for sequential representations Truncation and error analysis

Truncation Roadmap

slide-90
SLIDE 90

Tractable models in BNP two forms for sequential representations Truncation and error analysis

Truncation Roadmap

slide-91
SLIDE 91

How close is our finite approximation?

Choosing between the seven representations

slide-92
SLIDE 92

How close is our finite approximation? Truncation error:

Choosing between the seven representations

slide-93
SLIDE 93

How close is our finite approximation?

truncated ϴK full infinite ϴ

Truncation error:

Choosing between the seven representations

slide-94
SLIDE 94

generated data generated data

How close is our finite approximation?

truncated ϴK full infinite ϴ

Truncation error:

Choosing between the seven representations

slide-95
SLIDE 95

Compare the distribution of the data under full vs. truncated generated data generated data

How close is our finite approximation?

truncated ϴK full infinite ϴ

Truncation error:

Choosing between the seven representations

slide-96
SLIDE 96

Depends on number of observations N and truncation level K

How close is our finite approximation? Truncation error:

Choosing between the seven representations

slide-97
SLIDE 97

Depends on number of observations N and truncation level K As N gets larger, error increases

How close is our finite approximation? Truncation error:

Choosing between the seven representations

slide-98
SLIDE 98

Depends on number of observations N and truncation level K As N gets larger, error increases As K gets larger, error decreases

How close is our finite approximation? Truncation error:

Choosing between the seven representations

ε

slide-99
SLIDE 99

Depends on number of observations N and truncation level K As N gets larger, error increases As K gets larger, error decreases Cannot evaluate exactly, so we develop new upper bounds

How close is our finite approximation? Truncation error:

Choosing between the seven representations

ε

slide-100
SLIDE 100

Lemma (H., Campbell, How, Broderick).

Protobound

i.e. P( whoops! )

Leads to all the other truncation error bounds in this work

The truncation error

slide-101
SLIDE 101

Lemma (H., Campbell, How, Broderick).

Protobound

i.e. P( whoops! )

Leads to all the other truncation error bounds in this work

The truncation error

Theorem (HCHB). The series

rep error is bounded by

kpN,∞ pN,Kk1  1 e−

R ∞ E[¯ π(τ(V,u+GK))N ]du

slide-102
SLIDE 102

Lemma (H., Campbell, How, Broderick).

Protobound

i.e. P( whoops! )

Leads to all the other truncation error bounds in this work

The truncation error

Theorem (HCHB). The series

rep error is bounded by

kpN,∞ pN,Kk1  1 e−

R ∞ E[¯ π(τ(V,u+GK))N ]du

kpN,∞ pN,Kk1  1 e−

R ∞ ¯ π(θ)Nν+

K(dθ)

Theorem (HCHB). The superposition rep error is bounded by

slide-103
SLIDE 103

Given Gamma-Poisson process:

Error bound example

slide-104
SLIDE 104

Step 1: bound the integral, where : Given Gamma-Poisson process:

Error bound example

GK ∼ Gamma(K, c)

slide-105
SLIDE 105

Step 1: bound the integral, where : Given Gamma-Poisson process:

Integration by parts

Error bound example

GK ∼ Gamma(K, c)

slide-106
SLIDE 106

Step 1: bound the integral, where : Given Gamma-Poisson process:

Integration by parts

Error bound example

GK ∼ Gamma(K, c)

slide-107
SLIDE 107

Step 1: bound the integral, where : Given Gamma-Poisson process:

Integration by parts Gamma expectation

Error bound example

GK ∼ Gamma(K, c)

slide-108
SLIDE 108

Step 1: bound the integral, where : Given Gamma-Poisson process:

Integration by parts Gamma expectation

Step 2: plug in!

Error bound example

1 2kpN,∞ pN,Kk1  1 exp ( Nγ ✓ γλ 1 + γλ ◆K) ⇠ Nγ ✓ γλ 1 + γλ ◆K , K ! 1 GK ∼ Gamma(K, c)

slide-109
SLIDE 109

Tractable models in BNP two forms for sequential representations Truncation and error analysis

Truncation Roadmap

slide-110
SLIDE 110

Tractable models in BNP two forms for sequential representations Truncation and error analysis

Truncation Roadmap ✓ ✓ ✓

slide-111
SLIDE 111

Previous Work

Finite Approximation Approximation Error Bounds Computational Complexity DP

✓ ✓ ✓

BP

✓ ✓ ✓

BPP

𝚫P

✓ ✓ ✓

(N)CRM

slide-112
SLIDE 112

Our Work

Finite Approximation Approximation Error Bounds Computational Complexity DP

✓ ✓ ✓

BP

✓ ✓ ✓

BPP

𝚫P

✓ ✓ ✓

(N)CRM

✓ ✓ ✓

slide-113
SLIDE 113

Our Work

Finite Approximation Approximation Error Bounds Computational Complexity DP

✓ ✓ ✓

BP

✓ ✓ ✓

BPP

✓ ✓ ✓

𝚫P

✓ ✓ ✓

(N)CRM

✓ ✓ ✓

slide-114
SLIDE 114

Conclusions

slide-115
SLIDE 115

The sequential representations and 
 truncation error bounds we develop…

Conclusions

slide-116
SLIDE 116

The sequential representations and 
 truncation error bounds we develop…

  • Expand the class of BNP priors that admit efficient inference

Conclusions

slide-117
SLIDE 117

The sequential representations and 
 truncation error bounds we develop…

  • Expand the class of BNP priors that admit efficient inference
  • Help automate the use of BNP models (e.g. in PPLs)

Conclusions

slide-118
SLIDE 118

The sequential representations and 
 truncation error bounds we develop…

  • Expand the class of BNP priors that admit efficient inference
  • Help automate the use of BNP models (e.g. in PPLs)
  • Facilitates the use of “modern” inference methods (e.g. HMC

and VB) with BNP models

Conclusions

slide-119
SLIDE 119

The sequential representations and 
 truncation error bounds we develop…

  • Expand the class of BNP priors that admit efficient inference
  • Help automate the use of BNP models (e.g. in PPLs)
  • Facilitates the use of “modern” inference methods (e.g. HMC

and VB) with BNP models

  • Trade off computational efficiency and statistical accuracy of

truncated model

Conclusions

slide-120
SLIDE 120

The sequential representations and 
 truncation error bounds we develop…

  • Expand the class of BNP priors that admit efficient inference
  • Help automate the use of BNP models (e.g. in PPLs)
  • Facilitates the use of “modern” inference methods (e.g. HMC

and VB) with BNP models

  • Trade off computational efficiency and statistical accuracy of

truncated model

  • J. Huggins*, T. Campbell*, J. How, T. Broderick. 


Truncated Random Measures. Submitted, 2016.
 Available online: https://arxiv.org/abs/1603.00861

Conclusions