Multiresolution Gaussian Processes Emily Fox ICERM 2012 - - PowerPoint PPT Presentation
Multiresolution Gaussian Processes Emily Fox ICERM 2012 - - PowerPoint PPT Presentation
Multiresolution Gaussian Processes Emily Fox ICERM 2012 Providence, RI Joint work with David Dunson (Duke) Goals Data from Neuronal Recordings Many :me series exhibit: 0.5 Long-range
50 100 150 200 250 300 −1 −0.5 0.5
Time Observations
Data from Neuronal Recordings
Many ¡:me ¡series ¡exhibit: ¡
- Long-‑range ¡correla:ons ¡
- Non-‑Markovian ¡dynamics ¡
¡ In ¡a ¡mul:variate ¡seAng: ¡
- Time-‑varying ¡correla4ons ¡
Goals
Some:mes ¡also… ¡
- Func:onal ¡data ¡analysis ¡
¡ ¡ ¡ ¡à ¡sharing ¡common ¡global ¡trend ¡
Magnetoencephalography (MEG)
. . .
Helmet with 102 sensors
COW
Magnetoencephalography (MEG)
. . .
Helmet with 102 sensors
COW
- Long-range dependencies
- Time-varying correlations
Trial-to-Trial Variability
- Data are noisy (low SNR)
§ Multiple trials recorded for each stimulus
- Each trial records the same process
§ Capture common global trajectory § Allow trial-to-trial variability
- Functional data analysis setting
MEG Noise
MEG Noise
MEG Noise
MEG Noise
Build Word-Specific Model
Stimulus: w = HOUSE
yt ∼ N(µ(w)(xt), Σ(w)(xt))
Hierarchy captures trial-to-trial variability
Build Word-Specific Model
Capturing heteroscedasticity is key
Stimulus: w = HOUSE
yt ∼ N(µ(w)(xt), Σ(w)(xt))
µ(x)
Time 1 Time 2 Time 3
Sensor 2 Sensor 1
Σ(x1) Σ(x2) Σ(x3) x1 = x2 = x3 =
Build Word-Specific Model
Harness k-dim latent space
Stimulus: w = HOUSE
yt ∼ N(µ(w)(xt), Σ(w)(xt)) R102 Rk
Low-Rank Covariance Evolution
X
λ11(·) λ12(·) λ22(·) λ21(·) λp1(·)λp2(·)
Σ(x) = Λ(x)Λ(x) + Σ0
- Matrix ¡of ¡“dic:onary ¡elements” ¡
§ E.g., ¡Gaussian ¡processes ¡ § ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡elements ¡
p × k
p × k
k << p
Fox and Dunson, “Bayesian Nonparametric Covariance Regression”, under review.
Low-Rank Covariance Evolution
X
λ11(·) λ12(·) λ22(·) λ21(·) λp1(·)λp2(·)
Σ(x) = Λ(x)Λ(x) + Σ0
X
+
Fox and Dunson, “Bayesian Nonparametric Covariance Regression”, under review.
λ11(·) λ12(·) λ22(·) λ21(·) λp1(·)λp2(·)
One Step Further…
Σ(x) = Θξ(x)ξ(x)Θ + Σ0
X
θ11 θ12 θ13 θ21 θ22 θ23 . . . . . . . . . . . . . . . . . . θp1 θp2 θp3
Θ ξ(·)
ξ11(·) ξ12(·) ξ21(·) ξ22(·) ξ32(·) ξ31(·)
X Λ(·)
Fox and Dunson, “Bayesian Nonparametric Covariance Regression”, under review.
Changing Correlations – MEG
102 sensors: Correlations between sensors change with processing of word “kick”
Mean Hierarchy
µ(w)(x) µ(w,1)(x) µ(w,J)(x)
Trial 1 Trial J
(Note: defined in a k-dim space and projected up)
Fyshe, Fox, Dunson, and Mitchell, “Hierarchical Latent Dictionary Learning for Word Classification using Brain Activation Patterns”, AISTATS 2012.
Data Collection
- 4 word categories, 5 words per category
- 20 repetitions per word (400 total)
§ 15 train/word (300 total) § 5 test/word (100 total)
Animals Tools Food Buildings
Fyshe, Fox, Dunson, and Mitchell, “Hierarchical Latent Dictionary Learning for Word Classification using Brain Activation Patterns”, AISTATS 2012.
Classification Performance
Fyshe, Fox, Dunson, and Mitchell, “Hierarchical Latent Dictionary Learning for Word Classification using Brain Activation Patterns”, AISTATS 2012.
50 100 150 200 250 300 −1 −0.5 0.5
Time Observations
MEG Data – 1 Sensor
3 trials, 1 sensor Yes: ¡
- Long-‑range ¡correla:ons ¡
- Non-‑Markovian ¡dynamics ¡
What ¡we ¡missed: ¡
- Abrupt ¡changes ¡
- Locally ¡sta4onary ¡dynamics ¡
Long-‑range ¡correla:ons ¡span ¡changepoints ¡
50 100 150 200 250 300 −1 −0.5 0.5
Time Observations
MEG Data – 1 Sensor
3 trials, 1 sensor
50 100 150 200 250 300 50 100 150 200 250 300
Sample Correlation Matrix
(20 trials)
Key ¡features: ¡
- Long-‑range ¡correla:ons ¡
- Abrupt ¡changes ¡
- Locally ¡smooth ¡
Time Time
GPs on Nested Partition
Parent ¡func+on: ¡
- Smooth ¡global ¡trajectory ¡
- Long-‑range ¡correla:ons ¡
- Non-‑Markovian ¡dynamics ¡
- Sta4onary ¡
x3 x1x2 xn . . .
x
f 0(x) ∼ N(0, K0)
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.
GPs on Nested Partition
A1
1
A1
2
changepoint = break in stationarity
Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.
GPs on Nested Partition
A1
1
A1
2
f 1(A1
1) ∼ GP(f 0(A1 1), c1 1) Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.
GPs on Nested Partition
f 1(A1
2) ∼ GP(f 0(A1 2), c1 2)
A1
1
A1
2
f 1(A1
1) ∼ GP(f 0(A1 1), c1 1) Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.
GPs on Nested Partition
f 1(A1
2) ∼ GP(f 0(A1 2), c1 2)
A1
1
A1
2
f 1(A1
1) ∼ GP(f 0(A1 1), c1 1)
f 1(x) | f 0 ∼ N(0, K1)
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.
GPs on Nested Partition
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
f 1(A1
2) ∼ GP(f 0(A1 2), c1 2)
f 1(A1
1) ∼ GP(f 0(A1 1), c1 1) Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.
GPs on Nested Partition
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
f 1(A1
2) ∼ GP(f 0(A1 2), c1 2)
f 1(A1
1) ∼ GP(f 0(A1 1), c1 1)
. . . . . .
Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.
GPs on Nested Partition
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
f 1(A1
2) ∼ GP(f 0(A1 2), c1 2)
f 1(A1
1) ∼ GP(f 0(A1 1), c1 1)
f `(x) | f `−1 ∼ N(0, K`)
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.
GPs on Nested Partition
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
g = f L
. . .
Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.
GPs on Nested Partition
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
g = f L
. . .
Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.
Induced Marginal GP
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
Conditioned on partition, marginalize GPs
Induced Marginal GP
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
Equivalent to GP with partition-dependent (non-stationary) covariance function
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
Correlation Structure
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
cA0(xi, xj) cA1
1(xi, xj)
xi yi yj xj
locations
- bservations
corr(yi, yj | A) + +
Correlation Structure
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
corr(yi, yj | A) = PLij
`=0 c` r`
i (xi, xj)
r (σ2 + PL−1
`=0 c` r`
i (xi, xi))(σ2 + PL−1
`=0 c` r`
j(xj, xj))
Lowest tree level in same partition set
cA0(xi, xj) cA1
1(xi, xj)
+ +
- Correlation spans
changepoints
- Higher corr for sharing
more partition sets
corr(yi, yj | A)
Covariance Function – Length scale
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
Length-‑scale ¡hyperparam: ¡
- Fractal-‑like ¡smoothness ¡
- Locally ¡as ¡smooth ¡as ¡parent ¡fcn ¡
- Lower ¡levels ¡capture ¡more ¡detail ¡
- Only ¡one ¡param ¡
Covariance Function – Variance
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
Variance ¡hyperparam: ¡
- Decreasing ¡variability ¡from ¡parent ¡
- Finite ¡var ¡regardless ¡of ¡tree ¡depth ¡
- Lower ¡levels ¡are ¡less ¡influen:al ¡
Covariance Function – Variance
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
Variance ¡hyperparam: ¡
- Decreasing ¡variability ¡from ¡parent ¡
- Finite ¡var ¡regardless ¡of ¡tree ¡depth ¡
- Lower ¡levels ¡are ¡less ¡influen:al ¡
Resulting function is similar to higher level function despite adding changepoints
. . .
Balanced Binary Trees
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
A0
=
A2
3
A1
1
Related Methods
A2
1
A2
2
A2
3
A2
4
Treed GP, Gramacy and Lee 2008 Kim, Mallick, Holmes 2005 GP changepoint models Saatci, Turner, Rasmussen 2010
A2
1
A2
2
A2
3
A2
4
Mixture of GP experts, Meeds and Osindero 2006 Rasmussen and Ghahramani 2002 Phylogenies of GPs Jones and Moriarty 2011 Henao and Lucas 2012 Function-Valued Observations
=
A2
3
A1
1
Related Methods
A2
1
A2
2
A2
3
A2
4
Treed GP, Gramacy and Lee 2008 Kim, Mallick, Holmes 2005 GP changepoint models Saatci, Turner, Rasmussen 2010
A2
1
A2
2
A2
3
A2
4
Mixture of GP experts, Meeds and Osindero 2006 Rasmussen and Ghahramani 2002 Multiscale Gaussian models c.f., Willsky 2002
=
Multiple Trials
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
f 0
Multiresolution GP
Multiple Trials – Example for MEG
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
f 0
Shared parent function Shared partition
Multiresolution GP
Trail- specific process
j = 1, . . . , J
Multiple Trials – Example for MEG
f 0
Shared parent function
. . .
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
y(1)
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
y(2)
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
y(J)
Multiple Trials – Example for MEG
f 0
Shared parent function
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
. . .
y(1) y(2) y(J)
Z dx Z dx Z dx Z dx Z dx Z dx
Multiple Trials – Example for MEG
f 0
Shared parent function
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
. . .
y(1) y(2) y(J)
Z dx Z dx Z dx Z dx Z dx Z dx
f0 A
y(1) y(2) y(J)
. . .
Shared parent function Shared partition Conditionally independent trials
Draw from Prior
50 100 150 200 −4 −2 2 4 6 8
Time Observations
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
50 100 150 200 250 300 −1 −0.5 0.5
Time Observations
50 100 150 200 250 300 50 100 150 200 250 300
Sample Corr. Matrix
(20 trials)
Sample Corr. Matrix
(100 trials)
MEG data Sim. data
Conditioned on the Partition…
- Posterior global trajectory
Shared parent function
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
y(1)
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
y(2)
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
y(J)
. . .
Z dx Z dx Z dx Z dx Z dx Z dx
Conditioned on the Partition…
- Posterior global trajectory
- Posterior predictive distribution of new trial
Shared parent function
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
y(1)
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
y(2)
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
y(J)
. . .
Z dx Z dx Z dx Z dx Z dx Z dx Z dx
Conditioned on the Partition…
- Posterior global trajectory
- Posterior predictive distribution of new trial
- Marginal (conditional) likelihood
Key to inference of nested partition!
Independence Chain MCMC
- Likelihood:
Independence Chain MCMC
- Likelihood:
- Prior:
§ Define distribution on changepoints (level-independent) § Easy to define uniform distribution on trees and elicit prior info
p(A) = Y
i
F(zi)
zi
Throw down 2^L -1 points according to F Deterministically merge to form partition A
Independence Chain MCMC
- Likelihood:
- Prior:
§ Define distribution on changepoints (level-independent) § Easy to define uniform distribution on trees and elicit prior info
- Proposal: ????
nested partition = balanced binary tree
p(A) = Y
i
F(zi)
zi
Inference of Hierarchical Partition
- Stochastic tree search
tends to be inefficient
- Can harness specific
correlation structure
- Want method to
(hierarchically) find drops in correlation
50 100 150 200 250 300 50 100 150 200 250 300
Sample Corr. Matrix
Inference of Hierarchical Partition
cut 1 cut 2 cut 2 TIME
- Stochastic tree search
tends to be inefficient
- Can harness specific
correlation structure
- Want method to
(hierarchically) find drops in correlation
- Think of problem as
graph cutting
§ Node = time step § Edge = correlation
50 100 150 200 250 300 50 100 150 200 250 300
Sample Corr. Matrix
Normalized Cuts (Shi & Malik 2000)
cut 1 cut 2 cut 2 TIME
- Normalized cuts balances:
§ Amount of edge weight cut § Connectivity of component
- Cost matrix =
sample correlation matrix
50 100 150 200 250 300 50 100 150 200 250 300
Sample Corr. Matrix
W = abs(corr(Y ))
Normalized Cuts
cut 1 cut 2 cut 2 TIME
- Normalized cuts balances:
§ Amount of edge weight cut § Connectivity of component
- Cost matrix =
sample correlation matrix
- Cost of cut =
50 100 150 200 250 300 50 100 150 200 250 300
Sample Corr. Matrix cutpoint
W = abs(corr(Y ))
ncut(A, B) = W ✓ 1 W + 1 W ◆
Encourages cutting small edge weights Penalizes cutting disconnected components
Normalized Cuts
50 100 150 200 250 300 50 100 150 200 250 300
Sample Corr. Matrix
- Normalized cuts balances:
§ Amount of edge weight cut § Connectivity of component
- Cost matrix =
sample correlation matrix
- Cost of cut =
- Hierarchically perform cuts
W = abs(corr(Y ))
ncut(A, B) = W ✓ 1 W + 1 W ◆
Normalized Cuts
50 100 150 200 250 300 50 100 150 200 250 300
Sample Corr. Matrix
50 100 150 200 250 300 50 100 150 200 250 300
Normalized Cuts Partition
(recursive minimization)
Normalized Cuts Proposal
- Instead of recursive min
50 100 150 200 250 300 50 100 150 200 250 300
Sample Corr. Matrix
ncut(A, B) cutpoint
Always chooses this cutpoint
Normalized Cuts Proposal
- Instead of recursive min
- Use ncuts metric as proposal
50 100 150 200 250 300 50 100 150 200 250 300
Sample Corr. Matrix
ncut(A, B) cutpoint
Independence Chain MCMC
- Likelihood:
- Prior:
§ Define distribution on changepoints (level-independent) § Easy to define uniform distribution on trees and elicit prior info
- Proposal:
p(A) = Y
i
F(zi)
zi
Complexity O(n3) Complexity O(n2(L-1))
Independence Chain MCMC
- Likelihood:
- Prior:
§ Define distribution on changepoints (level-independent) § Easy to define uniform distribution on trees and elicit prior info
- Proposal:
- Can also interleave local node repartition proposals
instead of global partition proposals
p(A) = Y
i
F(zi)
zi
Node Proposals
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
A0
=
A3
1 A3 2 A3 3 A3 4
A3
5A3 6A3 7
A3
8
Node Proposals
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
A0
=
A3
1 A3 2 A3 3 A3 4
A3
5A3 6A3 7
A3
8
Node Proposals
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
A0
=
A3
1 A3 2 A3 3 A3 4
A3
5A3 6A3 7
A3
8
Node Proposals
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
A0
=
A3
1 A3 2 A3 3 A3 4
A3
5A3 6A3 7
A3
8
Equivalent to global repartition proposal!
Simulated Data
50 100 150 200 −4 −2 2 4 6 8
Time Observations
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
Sample Corr. Matrix
(100 trials)
Simulated Data
50 100 150 200 −4 −2 2 4 6 8
Time Observations
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
True Partition Sample Corr. Matrix
(100 trials)
Simulated Data
50 100 150 200 −4 −2 2 4 6 8
Time Observations
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
True Partition Ncuts Partition Sample Corr. Matrix
(100 trials)
Simulated Data
50 100 150 200 −4 −2 2 4 6 8
Time Observations
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
True Partition Ncuts Partition MAP Partition Sample Corr. Matrix
(100 trials)
MEG Data
- 10 words
- 20 repetitions per word
§ 15 training/word § 5 test/word
- Examine one
multiresolution GP per word per sensor
Buildings Tools Apartment Chisel Barn Hammer Church Pliers Igloo Saw House Screwdriver
MEG Changepoints – Level 1
MEG Changepoints – Level 1
STIMULUS ONSET
MEG Changepoints – Level 1
STIMULUS ONSET n100
MEG Changepoints – Level 1
STIMULUS ONSET n100 semantic processing
MEG Changepoints – Level 1
STIMULUS ONSET n100 semantic processing
Baselines – Single and Hierarchical GPs
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
f 0 g(j)
j = 1, . . . , J
f 0 g(j)
j = 1, . . . , J
f 0
Multiresolution GP Hierarchical GP Single GP
(cf., Fyshe et al. AISTATS 2012)
Baselines – Single and Hierarchical GPs
A1
1
A1
2
A2
1
A2
2
A2
3
A2
4
f 0 g(j)
j = 1, . . . , J
f 0 g(j)
j = 1, . . . , J
f 0
Multiresolution GP Hierarchical GP Single GP
(cf., Fyshe et. al. AISTATS 2012)
No partition No trial-to-trail variability No partition
Decrease in MSE
100 150 200 250 300 −5 5 10 15 20 25
Conditioning Point % Decrease in MSE v. GP
Visual Frontal Parietal Temporal
% Decrease in MSE per lobe
mGP vs. GP
100 200 300 400 −15 −10 −5 5 10
Time (sec) Observations
test mGP hGP
CONDITIONED PREDICTION conditioning point
100 150 200 250 300 −5 5 10 15 20 25
Conditioning Point % Decrease in MSE v. hGP
Visual Frontal Parietal Temporal
Decrease in MSE
% Decrease in MSE per lobe
mGP vs. hGP
100 150 200 250 300 −5 5 10 15 20 25
Conditioning Point % Decrease in MSE v. GP
Visual Frontal Parietal Temporal
mGP vs. GP
Entire Heldout Prediction
0.5 1 1.5 −20 −10 10
Time (sec) Observations
MLE hGP mGP
Wavelet-based Functional Mixed Models
Morris & Carrol 2006 (JRSS B)
- Allows spiky trajectories
- Model related functions
- Notes:
§ Assume regular grid of obs § Can cope with multivariate
setting (not used here)
−8 −7 −6 −5 x 10
4
w f m m m G P Heldout Log Likelihood
- Examine each word and sensor independently
- Compute heldout likelihood of 5 entire trials
50 100 150 200 250 300 50 100 150 200 250 300
Key ¡features: ¡
- Long-‑range ¡correla:ons ¡
- Abrupt ¡changes ¡
- Locally ¡smooth ¡
Summary
Addi:onally: ¡
- Func:onal ¡data ¡analysis ¡
¡ ¡ ¡ ¡à ¡sharing ¡common ¡global ¡trend ¡
- Irregular ¡grid ¡of ¡observa:ons ¡
- Tractability ¡and ¡interpretability ¡
50 100 150 200 250 300 −1 −0.5 0.5
Time Observations
50 100 150 200 250 300 50 100 150 200 250 300
Extensions
- Mul:variate ¡seAngs ¡
§ Input ¡spaces ¡ § Output ¡spaces ¡
- Hierarchical ¡dependence ¡structures ¡
§ Par:al ¡sharing ¡of ¡parents ¡in ¡the ¡tree ¡ § mGP ¡factor ¡models ¡
- Incorporate ¡mGP ¡in ¡a ¡func:onal ¡ANOVA ¡framework ¡
- Theore:cal ¡analysis ¡
§ Posterior ¡consistency ¡
θ11 θ12 θ13 θ21 θ22 θ23 . . . . . . . . . . . . . . . . . . θp1 θp2 θp3
Θ ξ(·)
ξ11(·) ξ12(·) ξ21(·) ξ22(·) ξ32(·) ξ31(·)
X
- Prior on multivariate partitions
- Partition proposals: