divide and couple using monte carlo variational
play

Divide and Couple: Using Monte Carlo Variational Objectives for - PowerPoint PPT Presentation

Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation Justin Domke and Daniel Sheldon University of Massachusetts Amherst Overview Variational inference gives both a lower-bound on the log-likelihood and an


  1. Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation Justin Domke and Daniel Sheldon University of Massachusetts Amherst Overview Variational inference gives both a lower-bound on the log-likelihood and an approximate posterior . Easy to get other lower-bounds. Do they also give approximate posteriors? This work: A general theory connecting likelihood bounds to posterior approximations.

  2. p ( z , x ) − → z Take p ( z , x ) with x fixed.

  3. p ( z , x ) − → z Take p ( z , x ) with x fixed. Observation : If E R = p ( x ) , then E log R ≤ log p ( x ) .

  4. p ( z , x ) − → z Take p ( z , x ) with x fixed. Observation : If E R = p ( x ) , then E log R ≤ log p ( x ) . Example : Take R = p ( x , z ) q ( z ) for z ∼ q Gaussian, optimize q .

  5. log R = 0.237 p ( z , x ) q ( z ) , naive − → z Take p ( z , x ) with x fixed. Observation : If E R = p ( x ) , then E log R ≤ log p ( x ) . Example : Take R = p ( x , z ) q ( z ) for z ∼ q Gaussian, optimize q .

  6. log R = 0.237 p ( z , x ) q ( z ) , naive − → z Take p ( z , x ) with x fixed. Observation : If E R = p ( x ) , then E log R ≤ log p ( x ) . Example : Take R = p ( x , z ) q ( z ) for z ∼ q Gaussian, optimize q . Decomposition : KL ( q ( z ) � p ( z | x )) = log p ( x ) − E log R . Likelihood bound: � Posterior approximation: �

  7. p ( z , x ) Recent work : Better Monte Carlo estimators R .

  8. log R ′ = 0.060 p ( z , x ) q ( z ) , antithetic Recent work : Better Monte Carlo estimators R . Antithetic Sampling : Let T ( z ) “flip” z around mean of q . � p ( z , x )+ p ( T ( z ) , x ) � R = 1 2 q ( z )

  9. log R ′ = 0.060 p ( z , x ) q ( z ) , antithetic Recent work : Better Monte Carlo estimators R . Antithetic Sampling : Let T ( z ) “flip” z around mean of q . � p ( z , x )+ p ( T ( z ) , x ) � R = 1 2 q ( z ) Likelihood bound: � Posterior approximation: × × ×

  10. log R ′ = 0.060 p ( z , x ) q ( z ) , antithetic Recent work : Better Monte Carlo estimators R . Antithetic Sampling : Let T ( z ) “flip” z around mean of q . � p ( z , x )+ p ( T ( z ) , x ) � R = 1 2 q ( z ) Likelihood bound: � Posterior approximation: × × × This paper : Is some other distribution close to p ?

  11. log R ′ = 0.060 p ( z , x ) q ( z ) , antithetic log R ′ = 0.060 p ( z , x ) Q ( z ) , antithetic Contribution of this paper : Given estimator with E R = p ( x ) , we show how to construct Q ( z ) such that KL ( Q ( z ) � p ( z | x )) ≤ log p ( x ) − E log R .

  12. Contribution of this paper : Given estimator with E R = p ( x ) , we show how to construct Q ( z ) such that KL ( Q ( z ) � p ( z | x )) ≤ log p ( x ) − E log R . log R ′ = 0.060 p ( z , x ) q ( z ) , antithetic log R ′ = 0.060 p ( z , x ) Q ( z ) , antithetic

  13. Contribution of this paper : Given estimator with E R = p ( x ) , we show how to construct Q ( z ) such that KL ( Q ( z ) � p ( z | x )) ≤ log p ( x ) − E log R . log R ′ = 0.063 p ( z , x ) q ( z ) , stratified log R ′ = 0.063 p ( z , x ) Q ( z ) , stratified

  14. Contribution of this paper : Given estimator with E R = p ( x ) , we show how to construct Q ( z ) such that KL ( Q ( z ) � p ( z | x )) ≤ log p ( x ) − E log R . log R ′ = 0.021 p ( z , x ) q ( z ) , antithetic within strata log R ′ = 0.021 p ( z , x ) Q ( z ) , antithetic within strata

  15. Contribution of this paper : Given estimator with E R = p ( x ) , we show how to construct Q ( z ) such that KL ( Q ( z ) � p ( z | x )) ≤ log p ( x ) − E log R . How?

  16. Contribution of this paper : Given estimator with E R = p ( x ) , we show how to construct Q ( z ) such that KL ( Q ( z ) � p ( z | x )) ≤ log p ( x ) − E log R . Unbiased estimator: Where is z ? ω R ( ω ) = p ( x ) E

  17. Contribution of this paper : Given estimator with E R = p ( x ) , we show how to construct Q ( z ) such that KL ( Q ( z ) � p ( z | x )) ≤ log p ( x ) − E log R . Unbiased estimator: Where is z ? ω R ( ω ) = p ( x ) E We suggest: Need a coupling : ω R ( ω ) a ( z | ω ω ) = p ( z , x ) E ω � �� � coupling

  18. Contribution of this paper : Given estimator with E R = p ( x ) , we show how to construct Q ( z ) such that KL ( Q ( z ) � p ( z | x )) ≤ log p ( x ) − E log R . Unbiased estimator: Where is z ? ω R ( ω ) = p ( x ) E We suggest: Need a coupling : ω R ( ω ) a ( z | ω ω ) = p ( z , x ) E ω � �� � coupling Then, exist augmented distributions s.t. KL ( Q ( z , ω ω ) � p ( z , ω ω | x )) = log p ( x ) − E log R ω ω

  19. Contribution of this paper : Given estimator with E R = p ( x ) , we show how to construct Q ( z ) such that KL ( Q ( z ) � p ( z | x )) ≤ log p ( x ) − E log R . Summary : Tightening a bound log p ( x ) − E log R is equivalent to VI in an augmented state space ( ω ω , z ) . ω To sample from Q ( z ) draw ω then z ∼ a ( z | ω ) . Paper gives couplings for: ◮ Antithetic sampling ◮ Stratified sampling ◮ Quasi Monte Carlo ◮ Latin hypercube sampling ◮ Arbitrary recursive combinations of above

  20. Implementation : Different sampling methods with Gaussian q .

  21. Experiments confirm : Better likelihood bounds ⇔ better posteriors Poster : Tue Dec 10th, 5:30-7:30pm @ East Exhibition Hall B + C #166

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend