Fast Quantification of Uncertainty and Robustness with Variational Bayes
ITT Career Development Assistant Professor, MIT
Tamara Broderick
With: Ryan Giordano, Rachael Meager, Jonathan H. Huggins, Michael I. Jordan
Fast Quantification of Uncertainty and Robustness with Variational - - PowerPoint PPT Presentation
Fast Quantification of Uncertainty and Robustness with Variational Bayes Tamara Broderick ITT Career Development Assistant Professor, MIT With: Ryan Giordano, Rachael Meager, Jonathan H. Huggins, Michael I. Jordan Bayesian inference
ITT Career Development Assistant Professor, MIT
Tamara Broderick
With: Ryan Giordano, Rachael Meager, Jonathan H. Huggins, Michael I. Jordan
1
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
p(θ|x) ∝θ p(x|θ)p(θ)
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
p(θ|x) ∝θ p(x|θ)p(θ)
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
p(θ|x) ∝θ p(x|θ)p(θ)
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
p(θ|x) ∝θ p(x|θ)p(θ)
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
p(θ|x) ∝θ p(x|θ)p(θ)
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
p(θ|x) ∝θ p(x|θ)p(θ)
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
p(θ|x) ∝θ p(x|θ)p(θ)
Bayes Theorem
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
Some reasonable priors
1
p(θ|x) ∝θ p(x|θ)p(θ)
Some reasonable priors
Bayes Theorem
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
p(θ|x) ∝θ p(x|θ)p(θ)
Some reasonable priors
Bayes Theorem
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
p(θ|x) ∝θ p(x|θ)p(θ)
Some reasonable priors
Bayes Theorem
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
p(θ|x) ∝θ p(x|θ)p(θ)
Some reasonable priors
Bayes Theorem
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
p(θ|x) ∝θ p(x|θ)p(θ)
Some reasonable priors
Bayes Theorem
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
p(θ|x) ∝θ p(x|θ)p(θ)
Some reasonable priors
Bayes Theorem
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
variational Bayes
1
p(θ|x) ∝θ p(x|θ)p(θ)
Some reasonable priors
Bayes Theorem
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
variational Bayes
1
p(θ|x) ∝θ p(x|θ)p(θ)
variational Bayes
Some reasonable priors
Bayes Theorem
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
1
p(θ|x) ∝θ p(x|θ)p(θ)
variational Bayes
Some reasonable priors
Bayes Theorem
prior beliefs in a distribution
subjective; complex models
Approximating the posterior can be computationally expensive
[see also Opper, Winther 2003]
2
2
2
2
2
2
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
p(θ|x) q(θ) q∗(θ)
3
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
p(θ|x) q(θ) q∗(θ)
3
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
p(θ|x) q(θ) q∗(θ)
3
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
q∗(θ)
3
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
q∗(θ)
q(θ)
3
q(θ)
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
q∗(θ)
p(θ|x)
3
q(θ)
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
q∗(θ)
p(θ|x)
3
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
q∗(θ) p(θ|x) q∗(θ)
3
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
q∗(θ)
p(θ|x) q∗(θ)
3
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
q∗(θ)
p(θ|x) q∗(θ)
3
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
q∗(θ)
p(θ|x) q∗(θ)
3
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
q∗(θ)
p(θ|x) q∗(θ)
3
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
q∗(θ)
[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]
p(θ|x) q∗(θ)
3
posterior
(KL) divergence: p(θ|x) KL(qkp(·|x))
q∗(θ)
[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]
p(θ|x) q∗(θ)
3
!
!
severely)
[Bishop 2006]
q(θ) =
J
Y
j=1
q(θj) KL(q||p(·|x)) = Z
θ
q(θ) log q(θ) p(θ|x)dθ θ1 θ2
4
!
!
severely)
q(θ) =
J
Y
j=1
q(θj)
4
!
!
severely)
[Bishop 2006]
q(θ) =
J
Y
j=1
q(θj) θ1 θ2 p(θ|x)
4
!
!
severely)
[Bishop 2006]
q(θ) =
J
Y
j=1
q(θj) KL(q||p(·|x)) = Z
θ
q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x)
4
!
!
severely)
[Bishop 2006]
q(θ) =
J
Y
j=1
q(θj) KL(q||p(·|x)) = Z
θ
q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x)
4
!
!
severely)
[Bishop 2006]
q(θ) =
J
Y
j=1
q(θj) KL(q||p(·|x)) = Z
θ
q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x) q∗(θ)
4
!
!
severely)
[Bishop 2006]
q(θ) =
J
Y
j=1
q(θj) KL(q||p(·|x)) = Z
θ
q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x) q∗(θ)
4
!
!
severely)
[Bishop 2006]
q(θ) =
J
Y
j=1
q(θj) KL(q||p(·|x)) = Z
θ
q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x) q∗(θ)
4
!
!
severely)
q(θ) =
J
Y
j=1
q(θj) KL(q||p(·|x)) = Z
θ
q(θ) log q(θ) p(θ|x)dθ θ1 θ2
[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]
p(θ|x) q∗(θ)
4
!
!
severely)
q(θ) =
J
Y
j=1
q(θj) KL(q||p(·|x)) = Z
θ
q(θ) log q(θ) p(θ|x)dθ θ1 θ2
[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]
p(θ|x) q∗(θ)
[Fosdick 2013; Dunson 2014; Bardenet, Doucet, Holmes 2015]
4
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ p(θ|x) q∗(θ)
5
[see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ p(θ|x) q∗(θ)
5
[see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ p(θ|x) q∗(θ)
5
[see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ p(θ|x) q∗(θ)
5
[see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ p(θ|x) q∗(θ) p(θ|x)
[Bishop 2006]
5
[see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ C(t) := log EetT θ p(θ|x)
5
[Bishop 2006] [see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ p(θ|x) q∗(θ) p(θ|x) q∗(θ)
5
[Bishop 2006] [see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ p(θ|x) q∗(θ) p(θ|x) q∗(θ)
5
[Bishop 2006] [see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ q∗(θ)
[Bishop 2006]
5
p(θ|x)
[Bishop 2006] [see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ
[Bishop 2006]
5
p(θ|x)
[Bishop 2006] [see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ
[Bishop 2006]
5
p(θ|x)
[Bishop 2006] [see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
, MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ
[Bishop 2006]
5
p(θ|x) log pt(θ) := log p(θ|x) + tT θ − C(t)
[Bishop 2006] [see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ
[Bishop 2006]
5
p(θ|x)
[Bishop 2006] [see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ
[Bishop 2006]
5
p(θ|x) log pt(θ) := log p(θ|x) + tT θ − C(t)
[Bishop 2006] [see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
[Bishop 2006]
C(t) := log EetT θ
[Bishop 2006]
5
p(θ|x) log pt(θ) := log p(θ|x) + tT θ − C(t)
[Bishop 2006] [see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ C(t) := log EetT θ p(θ|x) q∗(θ) p(θ|x) q∗(θ)
5
[Bishop 2006] [see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ C(t) := log EetT θ p(θ|x) q∗(θ) p(θ|x) q∗(θ)
5
[Bishop 2006] [see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
C(t) := log EetT θ p(θ|x) q∗(θ) p(θ|x) q∗(θ) Σ = d dtT d dtCp(·|x)(t)
5
[Bishop 2006] [see also Opper, Winther 2003]
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
C(t) := log EetT θ p(θ|x) q∗(θ) p(θ|x) q∗(θ)
5
[Bishop 2006] [see also Opper, Winther 2003]
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
C(t) := log EetT θ p(θ|x) q∗(θ) p(θ|x) q∗(θ)
5
[Bishop 2006] [see also Opper, Winther 2003]
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
C(t) := log EetT θ p(θ|x) q∗(θ) p(θ|x) q∗(θ)
5
[Bishop 2006] [see also Opper, Winther 2003]
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
C(t) := log EetT θ p(θ|x) q∗(θ) p(θ|x) q∗(θ)
5
[Bishop 2006] [see also Opper, Winther 2003]
!
!
!
V := d2 dtT dtCq∗(t)
mean = d dtC(t)
Σ := d2 dtT dtCp(·|x)(t)
log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗
t
Σ = d dtT Eptθ
≈ d dtT Eq∗
t θ
=: ˆ Σ C(t) := log EetT θ p(θ|x) q∗(θ)
5
[Bishop 2006] [see also Opper, Winther 2003]
ˆ Σ := d dtT Eq∗
t θ
qt mt
6
ˆ Σ := d dtT Eq∗
t θ
qt mt
6
ˆ Σ := d dtT Eq∗
t θ
qt mt = (I − V H)−1V
6
ˆ Σ = ✓ ∂2KL ∂m∂mT
◆−1 ˆ Σ := d dtT Eq∗
t θ
qt mt = (I − V H)−1V
6
ˆ Σ = ✓ ∂2KL ∂m∂mT
◆−1 ˆ Σ := d dtT Eq∗
t θ
qt mt = (I − V H)−1V
6
ˆ Σ = ✓ ∂2KL ∂m∂mT
◆−1 ˆ Σ := d dtT Eq∗
t θ
qt mt = (I − V H)−1V
6
ˆ Σ = ✓ ∂2KL ∂m∂mT
◆−1 ˆ Σ := d dtT Eq∗
t θ
qt mt = (I − V H)−1V
6
t θ
ˆ Σ = ✓ ∂2KL ∂m∂mT
◆−1 ˆ Σ := d dtT Eq∗
t θ
qt mt = (I − V H)−1V
6
t θ
p(θ|x) q∗(θ)
[Bishop 2006]
ˆ Σ = ✓ ∂2KL ∂m∂mT
◆−1 ˆ Σ := d dtT Eq∗
t θ
qt mt = (I − V H)−1V
6
t θ
p(θ|x) q∗(θ)
exact mean (e.g. multivariate normal)
[Bishop 2006]
ˆ Σ = ✓ ∂2KL ∂m∂mT
◆−1 ˆ Σ := d dtT Eq∗
t θ
qt mt = (I − V H)−1V
6
Morocco, Philippines, Ethiopia)
! !
7
Morocco, Philippines, Ethiopia)
! !
7
Morocco, Philippines, Ethiopia)
! !
7
Morocco, Philippines, Ethiopia)
! !
7
Morocco, Philippines, Ethiopia)
! !
7
Morocco, Philippines, Ethiopia)
! !
ykn
indep
∼ N(µk + Tknτk, σ2
k)
profit
7
Morocco, Philippines, Ethiopia)
! !
ykn
indep
∼ N(µk + Tknτk, σ2
k)
profit
7
Morocco, Philippines, Ethiopia)
! !
ykn
indep
∼ N(µk + Tknτk, σ2
k)
profit
7
Morocco, Philippines, Ethiopia)
! !
ykn
indep
∼ N(µk + Tknτk, σ2
k)
profit
7
Morocco, Philippines, Ethiopia)
! !
ykn
indep
∼ N(µk + Tknτk, σ2
k)
profit 1 if microcredit
7
Morocco, Philippines, Ethiopia)
! !
ykn
indep
∼ N(µk + Tknτk, σ2
k)
profit 1 if microcredit
7
Morocco, Philippines, Ethiopia)
! !
ykn
indep
∼ N(µk + Tknτk, σ2
k)
profit 1 if microcredit
7
Morocco, Philippines, Ethiopia)
! !
ykn
indep
∼ N(µk + Tknτk, σ2
k)
profit 1 if microcredit
7
Morocco, Philippines, Ethiopia)
! !
ykn
indep
∼ N(µk + Tknτk, σ2
k)
profit 1 if microcredit
7
Morocco, Philippines, Ethiopia)
! !
ykn
indep
∼ N(µk + Tknτk, σ2
k)
✓ µk τk ◆
iid
∼ N ✓✓ µ τ ◆ , C ◆ profit 1 if microcredit
7
Morocco, Philippines, Ethiopia)
! !
ykn
indep
∼ N(µk + Tknτk, σ2
k)
✓ µk τk ◆
iid
∼ N ✓✓ µ τ ◆ , C ◆ σ−2
k iid
∼ Γ(a, b) profit 1 if microcredit
7
Morocco, Philippines, Ethiopia)
! !
ykn
indep
∼ N(µk + Tknτk, σ2
k)
✓ µk τk ◆
iid
∼ N ✓✓ µ τ ◆ , C ◆ ✓ µ τ ◆
iid
∼ N ✓✓ µ0 τ0 ◆ , Λ−1 ◆ σ−2
k iid
∼ Γ(a, b) profit 1 if microcredit
7
C ∼ Sep&LKJ(η, c, d)
MFVB
8
MCMC draws: 45 minutes
uncertainties, all sensitivity measures: 58 seconds
3.08 USD PPP
1.83 USD PPP
dev from 0
MFVB
8
MCMC draws: 45 minutes
uncertainties, all sensitivity measures: 58 seconds
3.08 USD PPP
1.83 USD PPP
dev from 0
MFVB
8
MCMC draws: 45 minutes
uncertainties, all sensitivity measures: 58 seconds
3.08 USD PPP
1.83 USD PPP
dev from 0
MFVB
8
MCMC draws: 45 minutes
uncertainties, all sensitivity measures: 58 seconds
3.08 USD PPP
1.83 USD PPP
dev from 0
MFVB
LRVB,! MFVB
8
MCMC draws: 45 minutes
uncertainties, all sensitivity measures: 58 seconds
3.08 USD PPP
1.83 USD PPP
dev from 0
MFVB
LRVB,! MFVB
8
MCMC draws: 45 minutes
uncertainties, all sensitivity measures: 58 seconds
3.08 USD PPP
1.83 USD PPP
dev from 0
MFVB
LRVB,! MFVB
8
MCMC draws: 45 minutes
uncertainties, all sensitivity measures: 58 seconds
3.08 USD PPP
1.83 USD PPP
dev from 0
MFVB
LRVB,! MFVB
8
! !
10,000 data points each, R bayesm package (function rnmixGibbs; at least 500 effective samples)
P(znk = 1) = πk, p(x|π, µ, Λ, z) = Y
n=1:N
Y
k=1:K
N(xn|µk, Λ−1
k )znk
p(π) ∝ 1, p(µ) ∝ 1, p(Λ) ∝ 1
with conjugate priors on π, µ, Λ
9
! !
10,000 data points each, R bayesm package (function rnmixGibbs; at least 500 effective samples)
P(znk = 1) = πk, p(x|π, µ, Λ, z) = Y
n=1:N
Y
k=1:K
N(xn|µk, Λ−1
k )znk
p(π) ∝ 1, p(µ) ∝ 1, p(Λ) ∝ 1
with conjugate priors on π, µ, Λ
9
! !
10,000 data points each, R bayesm package
P(znk = 1) = πk, p(x|π, µ, Λ, z) = Y
n=1:N
Y
k=1:K
N(xn|µk, Λ−1
k )znk
p(π) ∝ 1, p(µ) ∝ 1, p(Λ) ∝ 1
with conjugate priors on π, µ, Λ
9
P(znk = 1) = πk, p(x|π, µ, Λ, z) = Y
n=1:N
Y
k=1:K
N(xn|µk, Λ−1
k )znk
p(π) ∝ 1, p(µ) ∝ 1, p(Λ) ∝ 1
LRVB, MFVB with conjugate priors on π, µ, Λ
! !
10,000 data points each, R bayesm package
9
P(znk = 1) = πk, p(x|π, µ, Λ, z) = Y
n=1:N
Y
k=1:K
N(xn|µk, Λ−1
k )znk
p(π) ∝ 1, p(µ) ∝ 1, p(Λ) ∝ 1
LRVB, MFVB with conjugate priors on π, µ, Λ
! !
10,000 data points each, R bayesm package
9
! !
10,000 data points each, R bayesm package
P(znk = 1) = πk, p(x|π, µ, Λ, z) = Y
n=1:N
Y
k=1:K
N(xn|µk, Λ−1
k )znk
p(π) ∝ 1, p(µ) ∝ 1, p(Λ) ∝ 1
LRVB, MFVB with conjugate priors on π, µ, Λ
9
! !
10,000 data points each, R bayesm package
P(znk = 1) = πk, p(x|π, µ, Λ, z) = Y
n=1:N
Y
k=1:K
N(xn|µk, Λ−1
k )znk
p(π) ∝ 1, p(µ) ∝ 1, p(Λ) ∝ 1
LRVB, MFVB with conjugate priors on π, µ, Λ
9
! !
10,000 data points each, R bayesm package
P(znk = 1) = πk, p(x|π, µ, Λ, z) = Y
n=1:N
Y
k=1:K
N(xn|µk, Λ−1
k )znk
p(π) ∝ 1, p(µ) ∝ 1, p(Λ) ∝ 1
LRVB, MFVB with conjugate priors on π, µ, Λ
9
model
!
MCMCglmm package (20,000 samples)
zn|β, τ
indep
∼ N
, yn|zn
indep
∼ Poisson (yn| exp(zn)) , β ∼ N(β|0, σ2
β),
τ ∼ Gamma(τ|ατ, βτ)
10
model
!
MCMCglmm package (20,000 samples)
zn|β, τ
indep
∼ N
, yn|zn
indep
∼ Poisson (yn| exp(zn)) , β ∼ N(β|0, σ2
β),
τ ∼ Gamma(τ|ατ, βτ)
10
model
!
MCMCglmm package
zn|β, τ
indep
∼ N
, yn|zn
indep
∼ Poisson (yn| exp(zn)) , β ∼ N(β|0, σ2
β),
τ ∼ Gamma(τ|ατ, βτ)
10
zn|β, τ
indep
∼ N
, yn|zn
indep
∼ Poisson (yn| exp(zn)) , β ∼ N(β|0, σ2
β),
τ ∼ Gamma(τ|ατ, βτ)
model
!
MCMCglmm package
10
zn|β, τ
indep
∼ N
, yn|zn
indep
∼ Poisson (yn| exp(zn)) , β ∼ N(β|0, σ2
β),
τ ∼ Gamma(τ|ατ, βτ)
LRVB, MFVB
model
!
MCMCglmm package
10
zn|β, τ
indep
∼ N
, yn|zn
indep
∼ Poisson (yn| exp(zn)) , β ∼ N(β|0, σ2
β),
τ ∼ Gamma(τ|ατ, βτ)
LRVB, MFVB
model
!
MCMCglmm package
10
zn|β, τ
indep
∼ N
, yn|zn
indep
∼ Poisson (yn| exp(zn)) , β ∼ N(β|0, σ2
β),
τ ∼ Gamma(τ|ατ, βτ)
LRVB, MFVB
model
!
MCMCglmm package
10
zn|β, τ
indep
∼ N
, yn|zn
indep
∼ Poisson (yn| exp(zn)) , β ∼ N(β|0, σ2
β),
τ ∼ Gamma(τ|ατ, βτ)
LRVB, MFVB
model
!
MCMCglmm package
10
!
!
θ = (αT , zT )T ˆ Σα = (Iα − VαHα − VαHαz
−1 Vα ˆ Σ = (I − V H)−1V H = Hα Hz Hαz Hzα V H I − V H
11
!
!
θ = (αT , zT )T ˆ Σα = (Iα − VαHα − VαHαz
−1 Vα ˆ Σ = (I − V H)−1V H = Hα Hz Hαz Hzα V H I − V H
11
!
!
θ = (αT , zT )T ˆ Σα = (Iα − VαHα − VαHαz
−1 Vα ˆ Σ = (I − V H)−1V H = Hα Hz Hαz Hzα V H I − V H
11
!
!
θ = (αT , zT )T ˆ Σα = (Iα − VαHα − VαHαz
−1 Vα ˆ Σ = (I − V H)−1V H = Hα Hz Hαz Hzα V H I − V H
11
!
!
θ = (αT , zT )T ˆ Σα = (Iα − VαHα − VαHαz
−1 Vα ˆ Σ = (I − V H)−1V H = Hα Hz Hαz Hzα V H I − V H
11
!
!
θ = (αT , zT )T ˆ Σα = (Iα − VαHα − VαHαz
−1 Vα ˆ Σ = (I − V H)−1V H = Hα Hz Hαz Hzα V H I − V H
11
!
!
θ = (αT , zT )T ˆ Σα = (Iα − VαHα − VαHαz
−1 Vα ˆ Σ = (I − V H)−1V H = Hα Hz Hαz Hzα V H I − V H
11
!
!
θ = (αT , zT )T ˆ Σα = (Iα − VαHα − VαHαz
−1 Vα ˆ Σ = (I − V H)−1V H = Hα Hz Hαz Hzα V H I − V H
11
!
!
θ = (αT , zT )T ˆ Σα = (Iα − VαHα − VαHαz
−1 Vα ˆ Σ = (I − V H)−1V H = Hα Hz Hαz Hzα V H I − V H
11
!
!
θ = (αT , zT )T ˆ Σα = (Iα − VαHα − VαHαz
−1 Vα ˆ Σ = (I − V H)−1V H = Hα Hz Hαz Hzα V H I − V H
11
!
!
θ = (αT , zT )T ˆ Σα = (Iα − VαHα − VαHαz
−1 Vα ˆ Σ = (I − V H)−1V H = Hα Hz Hαz Hzα V H I − V H
11
!
!
θ = (αT , zT )T ˆ Σα = (Iα − VαHα − VαHαz
−1 Vα ˆ Σ = (I − V H)−1V H = Hα Hz Hαz Hzα V H I − V H
11
!
!
θ = (αT , zT )T ˆ Σα = (Iα − VαHα − VαHαz
−1 Vα ˆ Σ = (I − V H)−1V H = Hα Hz Hαz Hzα V H I − V H
11
12
13
p(θ|x) ∝θ p(x|θ)p(θ)
14
pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)
14
pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)
14
pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)
14
Bayes Theorem
pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)
14
Some reasonable priors
Bayes Theorem
pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)
S := dEpα[g(θ)] dα
∆α
14
Some reasonable priors
Bayes Theorem
pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)
S := dEpα[g(θ)] dα
∆α
14
Some reasonable priors
Bayes Theorem
pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)
S := dEpα[g(θ)] dα
∆α
14
Some reasonable priors
Bayes Theorem
pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)
S := dEpα[g(θ)] dα
∆α
14
Some reasonable priors
Bayes Theorem
pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)
S := dEpα[g(θ)] dα
∆α
14
Some reasonable priors
Bayes Theorem
pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)
S := dEpα[g(θ)] dα
∆α ≈ dEq∗
α[g(θ)]
dα
∆α =: ˆ S
14
Some reasonable priors
Bayes Theorem
pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)
S := dEpα[g(θ)] dα
∆α ≈ dEq∗
α[g(θ)]
dα
∆α =: ˆ S LRVB estimator
14
Some reasonable priors
Bayes Theorem
pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)
S := dEpα[g(θ)] dα
∆α ≈ dEq∗
α[g(θ)]
dα
∆α =: ˆ S LRVB estimator
q∗
α
14
Some reasonable priors
Bayes Theorem
ˆ S = A ✓ ∂2KL ∂m∂mT
◆−1 B pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)
S := dEpα[g(θ)] dα
∆α ≈ dEq∗
α[g(θ)]
dα
∆α =: ˆ S LRVB estimator
q∗
α
14
Some reasonable priors
C ∼ Sep&LKJ(η, c, d)
Morocco, Philippines, Ethiopia)
! !
ykn
indep
∼ N(µk + Tknτk, σ2
k)
✓ µk τk ◆
iid
∼ N ✓✓ µ τ ◆ , C ◆ ✓ µ τ ◆
iid
∼ N ✓✓ µ0 τ0 ◆ , Λ−1 ◆ σ−2
k iid
∼ Γ(a, b) profit 1 if microcredit
15
16
MFVB
16
0.03 ➔ 0.04
MFVB
16
0.03 ➔ 0.04 Sensitivity
MFVB LRVB
16
0.03 ➔ 0.04 Sensitivity
MFVB LRVB
16
17
expected microcredit effect (τ)
scale of τ std devs
3.08 USD PPP
1.83 USD PPP
dev from 0
⇒ Mean > 2 std dev
17
expected microcredit effect (τ)
scale of τ std devs
3.08 USD PPP
1.83 USD PPP
dev from 0
⇒ Mean > 2 std dev
17
expected microcredit effect (τ)
scale of τ std devs
3.08 USD PPP
1.83 USD PPP
dev from 0
⇒ Mean > 2 std dev
17
expected microcredit effect (τ)
scale of τ std devs
3.08 USD PPP
1.83 USD PPP
dev from 0
⇒ Mean > 2 std dev
17
expected microcredit effect (τ)
scale of τ std devs
3.08 USD PPP
1.83 USD PPP
dev from 0
⇒ Mean > 2 std dev
17
expected microcredit effect (τ)
scale of τ std devs
3.08 USD PPP
1.83 USD PPP
dev from 0
⇒ Mean > 2 std dev
17
expected microcredit effect (τ)
scale of τ std devs
3.08 USD PPP
1.83 USD PPP
dev from 0
⇒ Mean > 2 std dev
17
MFVB for fast & accurate covariance estimate
18
[Huggins, Campbell, Broderick 2016; Huggins, Adams, Broderick, submitted]
T Broderick, N Boyd, A Wibisono, AC Wilson, and MI Jordan. Streaming variational
! T Campbell*, JH Huggins*, J How, and T Broderick. Truncated random measures.
! R Giordano, T Broderick, and MI Jordan. Linear response methods for accurate covariance estimates from mean field variational Bayes. NIPS, 2015.! ! R Giordano, T Broderick, R Meager, JH Huggins, and MI Jordan. Fast robustness quantification with variational Bayes. ICML Workshop on #Data4Good: Machine Learning in Social Good Applications, 2016. ArXiv:1606.07153.! ! JH Huggins, T Campbell, and T Broderick. Core sets for scalable Bayesian logistic
! R Meager. Understanding the impact of microcredit expansions: A Bayesian hierarchical analysis of 7 randomised experiments. ArXiv:1506.06669, 2016.
19
R Bardenet, A Doucet, and C Holmes. On Markov chain Monte Carlo methods for tall data. arXiv, 2015. CM Bishop. Pattern Recognition and Machine Learning, 2006. D Dunson. Robust and scalable approach to Bayesian inference. Talk at ISBA 2014. B Fosdick. Modeling Heterogeneity within and between Matrices and Arrays, Chapter 4.7. PhD Thesis, University of Washington, 2013. DJC MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003. M Opper and O Winther. Variational linear response. NIPS 2003. RE Turner and M Sahani. Two problems with variational expectation maximisation for time- series models. In D Barber, AT Cemgil, and S Chiappa, editors, Bayesian Time Series Models, 2011. B Wang and M Titterington. Inadequacy of interval estimates corresponding to variational Bayesian approximations. In AISTATS, 2004.
20