VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine - PowerPoint PPT Presentation

VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015

probabilistic models! sky fog bridge water grass object tracking & recognition personalized recommendations genomics & phylogenetics small molecule discovery

Outline Bayesian inference and Markov chain Monte Carlo MCMC is hard → New data–parallel algorithms VCMC: Our approach and theoretical results Empirical evaluation

Bayesian models encode uncertainty using probabilities Probability distribution over model parameters π ( α, β, σ | x , y ) y A model is a probabilistic description of data y i ∼ N ( α x i + β, σ 2 ) x

Bayesian inference uses Bayes’ rule π ( θ | x ) ∝ π ( θ ) π ( x | θ ) � �� posterior prior likelihood Model parameters θ = ( α, β, σ ) Data x = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x 10 , y 10 ) } y i ∼ N ( α x i + β, σ 2 ) Probabilistic model of data

In general, posterior distributions are difficult to work with Normalizing involves an integral that is often intractable π ( θ ) π ( x | θ ) π ( θ | x ) = � Θ π ( θ ) π ( x | θ ) d θ

In general, posterior distributions are difficult to work with Normalizing involves an integral that is often intractable π ( θ ) π ( x | θ ) π ( θ | x ) = � Θ π ( θ ) π ( x | θ ) d θ Expectations w.r.t. the posterior = More intractable integrals � E π [ f ] = f ( θ ) π ( θ | x ) d θ Θ (These are statistics that distill information about the posterior)

y y x x Solution: Monte Carlo integration Given a finite set of samples θ 1 , θ 2 , . . . , θ T ∼ π ( θ | x )

y y x x Solution: Monte Carlo integration Given a finite set of samples θ 1 , θ 2 , . . . , θ T ∼ π ( θ | x ) Estimate an intractable expectation as a sum: T � f ( θ ) π ( θ | x ) d θ ≈ 1 � E π [ f ] = f ( θ t ) T Θ t =1

Solution: Monte Carlo integration Given a finite set of samples θ 1 , θ 2 , . . . , θ T ∼ π ( θ | x ) Estimate an intractable expectation as a sum: T � f ( θ ) π ( θ | x ) d θ ≈ 1 � E π [ f ] = f ( θ t ) T Θ t =1 i.e., replace a distribution with samples from it: y y x x

Markov chain Monte Carlo (MCMC) Widely used class of sampling algorithms Sample by simulating a Markov chain (biased random walk) whose stationary distribution (after convergence) is the posterior θ 1 , θ 2 , . . . , θ T ∼ π ( θ | x ) Use samples for Monte Carlo integration � T f ( θ ) π ( θ | x ) d θ ≈ 1 � E π [ f ] = f ( θ t ) T Θ t =1

Traditional MCMC ◮ Serial, iterative algorithm for generating samples ◮ Slow for two reasons: (1) Large number of iterations required to converge (2) Each iteration depends on the entire dataset ◮ Most innovation in MCMC has targeted (1) ◮ Recent threads of work target (2)

Serial MCMC Data Single core Samples

Data-parallel MCMC Data Parallel cores “Samples”

Aggregate samples from across partitions — but how? Data Parallel cores “Samples” Aggregate

Factorization ( ⋆ ) motivates a data-parallel approach J � π ( θ ) 1 / J π ( x ( j ) | θ ) π ( θ | x ) ∝ π ( θ ) π ( x | θ ) = � �� j =1 posterior prior likelihood sub-posterior

Factorization ( ⋆ ) motivates a data-parallel approach J � π ( θ ) 1 / J π ( x ( j ) | θ ) π ( θ | x ) ∝ π ( θ ) π ( x | θ ) = � �� j =1 posterior prior likelihood sub-posterior ◮ Partition the data as x (1) , . . . , x ( J ) across J cores ◮ The j th core samples from a distribution proportional to the j th sub-posterior (a ‘piece’ of the full posterior) ◮ Aggregate the sub-posterior samples to form approximate full posterior samples

Aggregation strategies for sub-posterior samples J � π ( θ ) 1 / J π ( x ( j ) | θ ) π ( θ | x ) ∝ π ( θ ) π ( x | θ ) = � �� j =1 posterior prior likelihood sub-posterior Sub-posterior density estimation (Neiswanger et al, UAI 2014) Weierstrass samplers (Wang & Dunson, 2013) Weighted averaging of sub-posterior samples ◮ Consensus Monte Carlo (Scott et al, Bayes 250, 2013) ◮ Variational Consensus Monte Carlo (Rabinovich et al, NIPS 2015)

Aggregate ‘horizontally’ ( ⋆ ) across partions Data Parallel cores “Samples” Aggregate

Recall that samples are parameter vectors ( , ) ( , ) =

Na¨ ıve aggregation = Average Aggregate ( , ) x x = + 0.5 0.5

Less na¨ ıve aggregation = Weighted average Aggregate ( , ) x x = + 0.58 0.42

Consensus Monte Carlo (Scott et al, 2013) Aggregate ( , ) x x = + ◮ Weights are inverse covariance matrices ◮ Motivated by Gaussian assumptions ◮ Designed at Google for the MapReduce framework

Variational Consensus Monte Carlo Goal: Choose the aggregation function to best approximate the target distribution Method: Convex optimization via variational Bayes

Variational Consensus Monte Carlo Goal: Choose the aggregation function to best approximate the target distribution Method: Convex optimization via variational Bayes F = aggregation function q F = approximate distribution L ( F ) = E q F [log π ( X , θ )] + H [ q F ] � �� entropy objective likelihood

Variational Consensus Monte Carlo Goal: Choose the aggregation function to best approximate the target distribution Method: Convex optimization via variational Bayes F = aggregation function q F = approximate distribution ˜ ˜ L ( F ) = E q F [log π ( X , θ )] + H [ q F ] � �� objective likelihood relaxed entropy

Variational Consensus Monte Carlo Goal: Choose the aggregation function to best approximate the target distribution Method: Convex optimization via variational Bayes F = aggregation function q F = approximate distribution ˜ ˜ L ( F ) = E q F [log π ( X , θ )] + H [ q F ] � �� objective relaxed entropy likelihood No mean field assumption

Variational Consensus Monte Carlo Aggregate ( , ) x x = + ◮ Optimize over weight matrices ( ⋆ ) ◮ Restrict to valid solutions when parameter vectors constrained

Variational Consensus Monte Carlo Theorem (Entropy relaxation) Under mild structural assumptions, we can choose K H [ q F ] = c 0 + 1 � ˜ h k ( F ) , K k =1 with each h k a concave function of F such that H [ q F ] ≥ ˜ H [ q F ] . We therefore have L ( F ) ≥ ˜ L ( F ) .

Variational Consensus Monte Carlo Theorem (Concavity of the variational Bayes objective) Under mild structural assumptions, the relaxed variational Bayes objective L ( F ) = E q F [log π ( X , θ )] + ˜ ˜ H [ q F ] is concave in F.

Empirical evaluation ◮ Compare 3 aggregation strategies: ◮ Uniform average ◮ Gaussian-motivated weighted average (CMC) ◮ Optimized weighted average (VCMC) ◮ For each algorithm A , report approximation error of some expectation E π [ f ], relative to serial MCMC ǫ A ( f ) = | E A [ f ] − E MCMC [ f ] | | E MCMC [ f ] | ◮ Preliminary speedup results

Example 1: High-dimensional Bayesian probit regression #data = 100 , 000 , d = 300 First moment estimation error, relative to serial MCMC (Error truncated at 2.0)

Example 2: High-dimensional covariance estimation Normal-inverse Wishart model #data = 100 , 000 , #dim = 100 = ⇒ 5 , 050 parameters (L) First moment estimation error (R) Eigenvalue estimation error

Example 3: Mixture of 8, 8-dim Gaussians Error relative to serial MCMC, for cluster comembership probabilities of pairs of test data points

VCMC error decreases as the optimization runs longer Initialize VCMC with CMC weights (inverse covariance matrices)

VCMC reduces CMC error at the cost of speedup ( ∼ 2x) VCMC speedup is approximately linear CMC VCMC

Concluding thoughts Contributions ◮ Convex optimization framework for Consensus Monte Carlo ◮ Structured aggregation accounting for constrained parameters ◮ Entropy relaxation ◮ Empirical evaluation Future work ◮ More structured and complex (latent variable) models ◮ Alternate posterior factorizations and aggregation schemes We’d love to hear about your Bayesian inference problems!

VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine - PowerPoint PPT Presentation

VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object tracking & recognition

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Variational Hamiltonian Monte Carlo via Score Matching Cheng Zhang (Joint work with Prof. Shahbaba

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

? Quantum Variational Monte Carlo Problem statement Minimize the functional E [ T ], where

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Consensus Building Consensus is Consensus is finding an acceptable proposal that all members

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Consensus and Dissent or: Meta - Consensus Consensus about what we have consensus

Entropy and mutual information in models of deep neural networks NeurIPS 2018 - Thursday Dec 06th

Formally Certifying the Security of Digital Signature Schemes Santiago Zanella 1 , 2 Benjamin

Spontaneous false lumen thrombosis (4%) 2 4/6/2017 Intramural Hematoma (IMH) vs TBAD Intramural

T oo Many Knobs to Tune? T owards Faster Database Tuning by Pre-selecting Important Knobs

Correlated Variational Auto-Encoders Da Tang 1 Dawen Liang 2 Tony Jebara 1 , 2 Nicholas Ruozzi 3 1

The Discovery of Asymptotic Freedom & The Emergence of QCD David Gross Nobel Lecture

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

A Probabilistic Model for Using Social Networks in Personalized Item Recommendation Allison J.B.

VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine - PowerPoint PPT Presentation

VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object tracking & recognition

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Variational Hamiltonian Monte Carlo via Score Matching Cheng Zhang (Joint work with Prof. Shahbaba

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

? Quantum Variational Monte Carlo Problem statement Minimize the functional E [ T ], where

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Consensus Building Consensus is Consensus is finding an acceptable proposal that all members

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Consensus and Dissent or: Meta - Consensus Consensus about what we have consensus

Entropy and mutual information in models of deep neural networks NeurIPS 2018 - Thursday Dec 06th

Formally Certifying the Security of Digital Signature Schemes Santiago Zanella 1 , 2 Benjamin

Spontaneous false lumen thrombosis (4%) 2 4/6/2017 Intramural Hematoma (IMH) vs TBAD Intramural

T oo Many Knobs to Tune? T owards Faster Database Tuning by Pre-selecting Important Knobs

Correlated Variational Auto-Encoders Da Tang 1 Dawen Liang 2 Tony Jebara 1 , 2 Nicholas Ruozzi 3 1

The Discovery of Asymptotic Freedom &amp; The Emergence of QCD David Gross Nobel Lecture

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

A Probabilistic Model for Using Social Networks in Personalized Item Recommendation Allison J.B.

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

The Discovery of Asymptotic Freedom & The Emergence of QCD David Gross Nobel Lecture