Truncated Random Measures
Jonathan Huggins
MIT CSAIL and Dept. of EECS with: T. Campbell, J. How, T. Broderick
Truncated Random Measures Jonathan Huggins MIT CSAIL and Dept. of - - PowerPoint PPT Presentation
Truncated Random Measures Jonathan Huggins MIT CSAIL and Dept. of EECS with: T. Campbell, J. How, T. Broderick What leads to a statistical method being used for science? What leads to a statistical method being used for science? 1.
Jonathan Huggins
MIT CSAIL and Dept. of EECS with: T. Campbell, J. How, T. Broderick
Need models that can extract new, useful information from infinite streams of data
e.g. keep learning new topics from a stream of documents
Need models that can extract new, useful information from infinite streams of data
e.g. keep learning new topics from a stream of documents
Need models that can extract new, useful information from infinite streams of data Bayesian nonparametrics: achieves growing model size via infinite parameters
e.g. keep learning new topics from a stream of documents
Need models that can extract new, useful information from infinite streams of data Bayesian nonparametrics: achieves growing model size via infinite parameters
[Gopalan 2014]
movie s
[Teh 2006]
text
[Huang 2014]
medicine
[Michini 2015]
robotics
[Lennox 2010]
genetics
[Prunster 2014]
finance
[Yang 2015]
astronomy
[Yu 2012]
traffic
[Ozaki 2008]
agriculture
[Kottas 2008]
pathology
e.g. keep learning new topics from a stream of documents
Need models that can extract new, useful information from infinite streams of data Bayesian nonparametrics: achieves growing model size via infinite parameters
[Gopalan 2014]
movie s
[Teh 2006]
text
[Huang 2014]
medicine
[Michini 2015]
robotics
[Lennox 2010]
genetics
[Prunster 2014]
finance
[Yang 2015]
astronomy
[Yu 2012]
traffic
[Ozaki 2008]
agriculture
[Kottas 2008]
pathology
hard work!
e.g. keep learning new topics from a stream of documents
Need models that can extract new, useful information from infinite streams of data Bayesian nonparametrics: achieves growing model size via infinite parameters
[Gopalan 2014]
movie s
[Teh 2006]
text
[Huang 2014]
medicine
[Michini 2015]
robotics
[Lennox 2010]
genetics
[Prunster 2014]
finance
[Yang 2015]
astronomy
[Yu 2012]
traffic
[Ozaki 2008]
agriculture
[Kottas 2008]
pathology
hard work! automate inference with probabilistic programming
issues: care about the parameters, using approximations (HMC/VB), distributed computation
issues: care about the parameters, using approximations (HMC/VB), distributed computation
with e.g. variational inference, HMC
[Blei 06; Neal 10]
issues: care about the parameters, using approximations (HMC/VB), distributed computation
with e.g. variational inference, HMC
[Blei 06; Neal 10]
Problem: Wide variety of priors in BNP with no finite approximation
issues: care about the parameters, using approximations (HMC/VB), distributed computation
with e.g. variational inference, HMC
[Blei 06; Neal 10]
Problem: Wide variety of priors in BNP with no finite approximation
All BNP priors
issues: care about the parameters, using approximations (HMC/VB), distributed computation
with e.g. variational inference, HMC
[Blei 06; Neal 10]
Problem: Wide variety of priors in BNP with no finite approximation
All BNP priors Previously studied priors
issues: care about the parameters, using approximations (HMC/VB), distributed computation
with e.g. variational inference, HMC
[Blei 06; Neal 10]
Problem: Wide variety of priors in BNP with no finite approximation
All BNP priors Previously studied priors with finite approx (past work)
issues: care about the parameters, using approximations (HMC/VB), distributed computation
with e.g. variational inference, HMC
[Blei 06; Neal 10]
Problem: Wide variety of priors in BNP with no finite approximation
Contributions:
All BNP priors Previously studied priors with finite approx (past work)
issues: care about the parameters, using approximations (HMC/VB), distributed computation
with e.g. variational inference, HMC
[Blei 06; Neal 10]
Problem: Wide variety of priors in BNP with no finite approximation
Contributions:
All BNP priors Previously studied priors with finite approx (past work)
issues: care about the parameters, using approximations (HMC/VB), distributed computation
with e.g. variational inference, HMC
[Blei 06; Neal 10]
Problem: Wide variety of priors in BNP with no finite approximation
Contributions:
All BNP priors Priors with finite approx (new) Previously studied priors with finite approx (past work)
issues: care about the parameters, using approximations (HMC/VB), distributed computation
with e.g. variational inference, HMC
[Blei 06; Neal 10]
Problem: Wide variety of priors in BNP with no finite approximation
Contributions:
All BNP priors Priors with finite approx (new) Previously studied priors with finite approx (past work)
issues: care about the parameters, using approximations (HMC/VB), distributed computation
with e.g. variational inference, HMC
[Blei 06; Neal 10]
Problem: Wide variety of priors in BNP with no finite approximation
Contributions:
All BNP priors Priors with finite approx (new) Previously studied priors with finite approx (past work)
Finite Approximation Approximation Error Bounds Computational Complexity DP
BP
BPP
𝚫P
(N)CRM
Finite Approximation Approximation Error Bounds Computational Complexity DP
BP
BPP
𝚫P
(N)CRM
[Sethuraman 94] [Roychowdhury 15] [Teh 07] [Paisley 12] [Thibaux 07] [Broderick 14] [Bondesson 82] [Roychowdhury 15] [Ishwaran 01] [Doshi-Velez 09] [Paisley 12] [Broderick 14] [Roychowdhury 15]
Finite Approximation Approximation Error Bounds Computational Complexity DP
BP
BPP
𝚫P
(N)CRM
[Sethuraman 94] [Roychowdhury 15] [Teh 07] [Paisley 12] [Thibaux 07] [Broderick 14] [Bondesson 82] [Roychowdhury 15] [Ishwaran 01] [Doshi-Velez 09] [Paisley 12] [Broderick 14] [Roychowdhury 15]
Finite Approximation Approximation Error Bounds Computational Complexity DP
BP
BPP
𝚫P
(N)CRM
[Sethuraman 94] [Roychowdhury 15] [Teh 07] [Paisley 12] [Thibaux 07] [Broderick 14] [Bondesson 82] [Roychowdhury 15] [Ishwaran 01] [Doshi-Velez 09] [Paisley 12] [Broderick 14] [Roychowdhury 15]
Tractable models in BNP
Tractable models in BNP two forms for sequential representations
Tractable models in BNP two forms for sequential representations Truncation and error analysis
Tractable models in BNP two forms for sequential representations Truncation and error analysis
Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)
343 189 210 854 342 584
s p
t s politics f
… 0.7 0.5 0.2 …
topic space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)
343 189 210 854 342 584
s p
t s politics f
… 0.7 0.5 0.2 …
topic space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)
343 189 210 854 342 584
s p
t s politics f
… 0.7 0.5 0.2 …
topic space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)
343 189 210 854 342 584
s p
t s politics f
… 0.7 0.5 0.2 …
topic space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)
343 189 210 854 342 584
s p
t s politics f
… 0.7 0.5 0.2 … 0.7 sports
topic space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)
343 189 210 854 342 584
s p
t s politics f
… 0.7 0.5 0.2 … 0.7 sports
topic space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)
343 189 210 854 342 584
s p
t s politics f
… 0.7 0.5 0.2 … 0.7 sports
topic space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)
343 189 210 854 342 584
s p
t s politics f
… 0.7 0.5 0.2 … 0.7 sports
topic space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)
343 189 210 854 342 584
s p
t s politics f
… 0.7 0.5 0.2 … 0.7 sports
topic space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)
343 189 210 854 342 584
s p
t s politics f
… 0.7 0.5 0.2 … 0.7 sports
ϴ is a random discrete measure
topic space frequency space Doc 1 (532 words) Doc 2 (210 words) Doc 3 (854 words) Doc 4 (926 words)
343 189 210 854 342 584
s p
t s politics f
… 0.7 0.5 0.2 … 0.7 sports
ϴ is a random discrete measure
trait space rate space Obs 1 Obs 2 Obs 3 Obs 4
343 189 210 854 342 584 … … “traits” “rates”
ϴ is a random discrete measure
topics traits θ1 θ2 θ3 ψ1 ψ2 ψ3
trait space rate space Obs 1 Obs 2 Obs 3 Obs 4
343 189 210 854 342 584 … … “traits” “rates”
ϴ is a random discrete measure
topics traits θ1 θ2 θ3 ψ1 ψ2 ψ3
How do we generate infinitely many trait/rate points (𝜔, 𝜄)?
trait space rate space
How do we generate infinitely many trait/rate points (𝜔, 𝜄)?
Poisson point process with measure 𝜉(d𝜄 x d𝜔):
[Kingman 93]
trait space rate space
How do we generate infinitely many trait/rate points (𝜔, 𝜄)?
completely random measure (CRM) (e.g. BP, 𝚫P) Poisson point process with measure 𝜉(d𝜄 x d𝜔):
[Kingman 93]
trait space rate space
How do we generate infinitely many trait/rate points (𝜔, 𝜄)?
completely random measure (CRM) (e.g. BP, 𝚫P) Normalize rates: normalized CRM (NCRM) (e.g. DP) Poisson point process with measure 𝜉(d𝜄 x d𝜔):
[Kingman 93]
trait space rate space
How do we generate infinitely many trait/rate points (𝜔, 𝜄)?
completely random measure (CRM) (e.g. BP, 𝚫P) Normalize rates: normalized CRM (NCRM) (e.g. DP) Captures a large class of useful priors in BNP Poisson point process with measure 𝜉(d𝜄 x d𝜔):
[Kingman 93]
trait space rate space
How do we generate infinitely many trait/rate points (𝜔, 𝜄)?
completely random measure (CRM) (e.g. BP, 𝚫P) Normalize rates: normalized CRM (NCRM) (e.g. DP) Captures a large class of useful priors in BNP How do we pick a finite subset of the points? Poisson point process with measure 𝜉(d𝜄 x d𝜔):
[Kingman 93]
trait space rate space
Tractable models in BNP two forms for sequential representations Truncation and error analysis
Tractable models in BNP two forms for sequential representations Truncation and error analysis
We pick a finite subset of atoms (𝜔,𝜄) by:
trait space rate space
We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)
trait space rate space
We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)
trait space rate space
1
We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)
trait space rate space
1 2
We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)
trait space rate space
1 2 3
We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)
trait space rate space
1 2 3 4
We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)
trait space rate space
1 2 3 4 K
We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation)
trait space rate space
1 2 3 4 K
We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation) 2) removing any atoms beyond the K-th (truncation)
trait space rate space
1 2 3 4 K
We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation) 2) removing any atoms beyond the K-th (truncation)
trait space rate space
1 2 3 4 K
We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation) 2) removing any atoms beyond the K-th (truncation)
trait space rate space
1 2 3 4 K
We pick a finite subset of atoms (𝜔,𝜄) by: 1) ordering the atoms (sequential representation) 2) removing any atoms beyond the K-th (truncation)
trait space rate space
1 2 3 4 K
We describe 2 forms for sequential representations
Series representation function of a homogenous Poisson point process (4 versions) We describe 2 forms for sequential representations
Superposition representation infinite sum of homogenous CRMs, each with finite # of atoms (3 versions)
Series representation function of a homogenous Poisson point process (4 versions) We describe 2 forms for sequential representations
Superposition representation infinite sum of homogenous CRMs, each with finite # of atoms (3 versions)
Series representation function of a homogenous Poisson point process (4 versions) We describe 2 forms for sequential representations
Theorem (H., Campbell, How, Broderick). Can generate (N)CRMs using all 7 sequential representations
Why so many representations?
They’re all useful in different circumstances
Why so many representations?
They’re all useful in different circumstances
Why so many representations?
Series Reps Superposition Reps B-Rep IL-Rep R-Rep T-Rep DB-Rep PL-Rep SB-Rep Error Bound Decay
(exp)
(exp)
(exp)
(exp)
Ease of Analysis
Generality
Known # Atoms
Given Gamma process:
Given Gamma process: Step 1: compute
Given Gamma process: Step 1: compute
Given Gamma process: Step 1: compute Step 2: compute
Given Gamma process: Step 1: compute Step 2: compute
Given Gamma process: Exponential(𝜇) density! Step 1: compute Step 2: compute
Given Gamma process: Exponential(𝜇) density! Step 3: plug in! Step 1: compute Step 2: compute
Tractable models in BNP two forms for sequential representations Truncation and error analysis
Tractable models in BNP two forms for sequential representations Truncation and error analysis
How close is our finite approximation?
How close is our finite approximation? Truncation error:
How close is our finite approximation?
truncated ϴK full infinite ϴ
Truncation error:
generated data generated data
How close is our finite approximation?
truncated ϴK full infinite ϴ
Truncation error:
Compare the distribution of the data under full vs. truncated generated data generated data
How close is our finite approximation?
truncated ϴK full infinite ϴ
Truncation error:
Depends on number of observations N and truncation level K
How close is our finite approximation? Truncation error:
Depends on number of observations N and truncation level K As N gets larger, error increases
How close is our finite approximation? Truncation error:
Depends on number of observations N and truncation level K As N gets larger, error increases As K gets larger, error decreases
How close is our finite approximation? Truncation error:
ε
Depends on number of observations N and truncation level K As N gets larger, error increases As K gets larger, error decreases Cannot evaluate exactly, so we develop new upper bounds
How close is our finite approximation? Truncation error:
ε
Lemma (H., Campbell, How, Broderick).
i.e. P( whoops! )
Leads to all the other truncation error bounds in this work
The truncation error
Lemma (H., Campbell, How, Broderick).
i.e. P( whoops! )
Leads to all the other truncation error bounds in this work
The truncation error
Theorem (HCHB). The series
rep error is bounded by
kpN,∞ pN,Kk1 1 e−
R ∞ E[¯ π(τ(V,u+GK))N ]du
Lemma (H., Campbell, How, Broderick).
i.e. P( whoops! )
Leads to all the other truncation error bounds in this work
The truncation error
Theorem (HCHB). The series
rep error is bounded by
kpN,∞ pN,Kk1 1 e−
R ∞ E[¯ π(τ(V,u+GK))N ]du
kpN,∞ pN,Kk1 1 e−
R ∞ ¯ π(θ)Nν+
K(dθ)
Theorem (HCHB). The superposition rep error is bounded by
Given Gamma-Poisson process:
Step 1: bound the integral, where : Given Gamma-Poisson process:
GK ∼ Gamma(K, c)
Step 1: bound the integral, where : Given Gamma-Poisson process:
Integration by parts
GK ∼ Gamma(K, c)
Step 1: bound the integral, where : Given Gamma-Poisson process:
Integration by parts
GK ∼ Gamma(K, c)
Step 1: bound the integral, where : Given Gamma-Poisson process:
Integration by parts Gamma expectation
GK ∼ Gamma(K, c)
Step 1: bound the integral, where : Given Gamma-Poisson process:
Integration by parts Gamma expectation
Step 2: plug in!
1 2kpN,∞ pN,Kk1 1 exp ( Nγ ✓ γλ 1 + γλ ◆K) ⇠ Nγ ✓ γλ 1 + γλ ◆K , K ! 1 GK ∼ Gamma(K, c)
Tractable models in BNP two forms for sequential representations Truncation and error analysis
Tractable models in BNP two forms for sequential representations Truncation and error analysis
Finite Approximation Approximation Error Bounds Computational Complexity DP
BP
BPP
𝚫P
(N)CRM
Finite Approximation Approximation Error Bounds Computational Complexity DP
BP
BPP
𝚫P
(N)CRM
Finite Approximation Approximation Error Bounds Computational Complexity DP
BP
BPP
𝚫P
(N)CRM
The sequential representations and truncation error bounds we develop…
The sequential representations and truncation error bounds we develop…
The sequential representations and truncation error bounds we develop…
The sequential representations and truncation error bounds we develop…
and VB) with BNP models
The sequential representations and truncation error bounds we develop…
and VB) with BNP models
truncated model
The sequential representations and truncation error bounds we develop…
and VB) with BNP models
truncated model
Truncated Random Measures. Submitted, 2016. Available online: https://arxiv.org/abs/1603.00861