Scalable Posterior Approximation
Galen Reeves
August 2015 Departments of ECE and Statistical Science Duke University
Galen Reeves Departments of ECE and Statistical Science Duke - - PowerPoint PPT Presentation
Scalable Posterior Approximation Galen Reeves Departments of ECE and Statistical Science Duke University August 2015 Collaborators at Duke David B. Dunson Willem van den Boom 2 variable selection / support recovery identify the locations
August 2015 Departments of ECE and Statistical Science Duke University
2
Willem van den Boom David B. Dunson
significant effects on observed behaviors
significant
3
4
β1
β2
β3
β4 βp y1
y2
y3 yn
p unknown parameters n observations (the data)
5
β1
β2
β3
β4 βp y1
y2
y3 yn
p unknown parameters n observations (the data)
Types of questions:
covariance
distribution p(β|y) E[β|y] Cov[β|y]
high-dimensional distribution p x 1 vector, p x p matrix
distribution
p(β1|y)
6
β1
β2
β3
β4 βp y1
y2
y3 yn
p unknown parameters n observations (the data)
7
β1
β2
β3
β4 βp y1
y2
y3 yn
p unknown parameters n observations (the data)
8
β1
β2
β3
β4 βp y1
y2
y3 yn
p unknown parameters n observations (the data)
9
p(β|θ) =
p
Y
j=1
p(βj|θ)
prior distribution
probability equal to zero
p(βj|θ)
distribution if nonzero
entries of β conditionally independent given hyper parameters θ mixed discrete-continuous distribution for marginal prior
10
p unknown parameters n observations Gaussian errors n x p matrix
N(0, σ2I)
inference
11
12
accuracy scalability
linear methods (least-squares) brute-force numerical integration LASSO MCMC MCMC (unbounded time) BCR AMP YFA focus of recent research
13
13
13
13
14
prior distribution
probability mass at zero
14
prior distribution
posterior distribution
large
probability mass at zero probability mass at zero
14
prior distribution posterior distribution
small
probability mass at zero
posterior distribution
large
probability mass at zero probability mass at zero
15
prior distribution posterior distribution posterior distribution
large
small
Gaussian approximation Gaussian approximation
16
(AMP)
16
17
p unknown parameters drawn independently with known distribution (e.g. spike & slab) n observations
✏ ∼ N(0, 2I)
Gaussian errors n x p matrix
17
p unknown parameters drawn independently with known distribution (e.g. spike & slab) n observations
✏ ∼ N(0, 2I)
Gaussian errors n x p matrix
Goal: compute posterior marginal distribution of first entry
p(β1|y) = Z p(β|y)dβp
2
mean and posterior variance of the auxiliary variable
dimensional integration problem to obtain posterior approximation
18
19
entry in the first column of the data
˜ y1 = ˜ x1,11 +
p
X
j=2
˜ x1,jj + ˜ ✏1 φ(βp
2)
auxiliary variable captures influence of other parameters
˜ y = 2 6 6 6 4 ˜ x1,1 ˜ x1,2 · · · ˜ x1,p ˜ x2,2 · · · ˜ x2,p . . . . . . . . . ˜ xn,2 · · · ˜ xn,p 3 7 7 7 5 + ˜ ✏
20
β1 β2 β3 β4 βp y1
y2
y3 yn
21
β1
β2 β3 β4 βp y1
y2
y3 yn
22
β1
β2 β3 β4 βp y1
y2
y3 yn
22
β1
β2 β3 β4 βp y1
y2
y3 yn
23
β1
β2 β3 β4 βp
˜ y1 ˜ y2 ˜ y3 ˜ yn
24
β1
β2 β3 β4 βp
φ(βp
2) = p
X
j=2
˜ x1,jβj
auxiliary variable encapsulates influence
φ(βp
2)
˜ y2 ˜ y3 ˜ yn ˜ y1
25
β1
β2 β3 β4 βp
φ(βp
2) = p
X
j=2
˜ x1,jβj
auxiliary variable encapsulates influence
φ(βp
2)
˜ y2 ˜ y3 ˜ yn ˜ y1
p(β1, ˜ y1|˜ yp
2) =
Z p(β1, ˜ y1|φ(βp
2)) p(φ(βp 2)|˜
yp
2) dφ(βp 2)
26
V ar[φ(βp
2)|˜
yp
2]
E[φ(βp
2)|˜
yp
2]
27
Gaussian by assumption on noise prior distribution replace with Gaussian using mean and variance from previous step
posterior are highly non-Gaussian
compute posterior approximation
p(β1|y) ∝ Z p(˜ y1|φ(βp
2), β1) p(β1) p(φ(βp 2)|˜
yp
2) dφ(βp 2)
assumptions as AMP
approximations in settings where AMP fails
28
29
0.0 0.2 0.4 0.6 0.8 0.00 0.06 MSE
to true posterior inclusion probability
correlation between columns of matrix
p(β1 6= 0|y)
approximate message passing (AMP) Bayesian compressed regression (BCR)
30
False positive rate True positive rate 1 1 False positive rate True positive rate 1 1 AMP LASSO LASSO AMP
matrix with! iid entries matrix with correlated columns
for large problems, ground true is intractable compare methods using empirical ROC curves
incoherent columns
framework extends to more general models
31
p(β, y) = Z p(β|θ) p(y|β, θ) dθ p(β|θ) =
p
Y
j=1
p(βj|θ)
conditionally independent conditionally Gaussian
framework extends to more general models
31
p(β, y) = Z p(β|θ) p(y|β, θ) dθ p(β|θ) =
p
Y
j=1
p(βj|θ)
conditionally independent conditionally Gaussian
provide theoretical guarantees for approximate Gaussianity of auxiliary variable p(φ(βp
2)|yp 2) ≈ N
⇣ E[φ(βp
2)|yp 2], V ar[φ(βp 2)|yp 2]
⌘
true posterior
approximation ??
that is studied heavily
marginal posterior approximations
apply a Gaussian approximation to an auxiliary variable
32
33
34
95% accuracy in detecting nonzeros [Reeves & Gastpar, ’12]
−20 20 40 60 80 100 10
−4
10
−3
10
−2
10
−1
10
SNR (dB) Sampling Rate
Not Achievable Linear MMSE AMP − Soft Thresholding AMP − MMSE Maximum Likelihood
signal-to-noise ratio
ratio of
to unkown paramters