Reparameterization Gradient for Non-differentiable Models Wonyeol - PowerPoint PPT Presentation

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok Yang KAIST Published at NeurIPS 2018

Backgrounds

Posterior inference • Latent variable z � � n . • Observed variable x � � m . • Joint density p(x,z). • Want to infer posterior p(z|x 0 ) given a particular value x 0 of x.

Variational inference 1. Fix a family of variational distr. {q θ (z)} θ . 2. Find q θ (z) that approximates p(z|x 0 ) well. • Typically, by solving argmax θ (ELBO θ ) where ELBO θ = � qθ(z) [ log( p(x 0 ,z)/q θ (z) ) ].

Variational inference differentiable & easy-to-sample 1. Fix a family of variational distr. {q θ (z)} θ . 2. Find q θ (z) that approximates p(z|x 0 ) well. • Typically, by solving argmax θ (ELBO θ ) where ELBO θ = � qθ(z) [ log( p(x 0 ,z)/q θ (z) ) ].

Variational inference differentiable & easy-to-sample 1. Fix a family of variational distr. {q θ (z)} θ . 2. Find q θ (z) that approximates p(z|x 0 ) well. • Typically, by solving argmax θ (ELBO θ ) Typically, by solving where ELBO θ = � qθ (z) [ log( p(x 0 ,z)/q θ (z) ) ]. argmax θ (ELBO θ ) where ELBO θ = � qθ(z) [ log( p(x 0 ,z)/q θ (z) ) ].

Variational inference differentiable & easy-to-sample 1. Fix a family of variational distr. {q θ (z)} θ . 2. Find q θ (z) that approximates p(z|x 0 ) well. • Typically, by solving argmax θ (ELBO θ ) Typically, by solving where ELBO θ = � qθ(z) [ log( p(x 0 ,z)/q θ (z) ) ]. argmax θ (ELBO θ ) where ELBO θ = � qθ(z) [ log( p(x 0 ,z)/q θ (z) ) ]. .. z .. z ..

Gradient ascent θ n+1 = θ n + η × � θ ELBO θ=θn • Difficult to compute � θ ELBO θ . • Use an estimated gradient instead.

Reparameterization estimator • Works if p(x 0 ,z) is differentiable wrt. z. • Need distr. q(ε) & smooth function f θ (ε) s.t. f θ (ε) for ε ~ q(ε) has the distr. q θ (z). • Derived from the equation: � θ ELBO θ = � q(ε) [ � θ (.. f θ (ε) .. f θ (ε) ..) ]

� θ ELBO θ = � θ � qθ(z) [.. z .. z ..] θ θ θ qθ(z) Reparameterization estimator = � θ � q(ε) [.. f θ (ε) .. f θ (ε) ..] θ q(ε) θ θ = � q(ε) [ � θ (.. f θ (ε) .. f θ (ε) ..)] q(ε) θ θ θ • Works if p(x 0 ,z) is differentiable wrt. z. • Need distr. q(ε) & smooth function f θ (ε) s.t. f θ (ε) for ε ~ q(ε) has the distr. q θ (z). • Derived from the equation: � θ ELBO θ = � q(ε) [ � θ (.. f θ (ε) .. f θ (ε) ..) ]

� θ ELBO θ = � θ � qθ(z) [.. z .. z ..] Reparameterization estimator = � θ � q(ε) [.. f θ (ε) .. f θ (ε) ..] = � q(ε) [ � θ (.. f θ (ε) .. f θ (ε) ..)] • Works if p(x 0 ,z) is differentiable wrt. z. • Need distr. q(ε) & smooth function f θ (ε) s.t. f θ (ε) for ε ~ q(ε) has the distr. q θ (z). • Derived from the equation: � θ ELBO θ = � q(ε) [ � θ (.. f θ (ε) .. f θ (ε) ..) ]

� θ ELBO θ = � θ � qθ(z) [.. z .. z ..] θ θ θ qθ(z) Reparameterization estimator = � θ � q(ε) [.. f θ (ε) .. f θ (ε) ..] θ q(ε) θ θ = � q(ε) [ � θ (.. f θ (ε) .. f θ (ε) ..)] q(ε) θ θ θ • Works if p(x 0 ,z) is differentiable wrt. z. • Need distr. q(ε) & smooth function f θ (ε) s.t. f θ (ε) for ε ~ q(ε) has the distr. q θ (z). • Derived from the equation: � θ ELBO θ = � q(ε) [ � θ (.. f θ (ε) .. f θ (ε) ..) ]

Non-differentiable models from probabilistic programming

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z)

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z) r 1 (z) = � (z|0,1) � (x=0|3,1) r 2 (z) = � (z|0,1) � (x=0|-2,1) p(z,x=0) = [z>0]r 1 (z) + [z≤0]r 2 (z)

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z) p(z,x=0) r 1 (z) = � (z|0,1) � (x=0|3,1) r 2 (z) = � (z|0,1) � (x=0|-2,1) p(z,x=0) = [z>0]r 1 (z) + [z≤0]r 2 (z) z

(let (let ≈ [z (sample (normal 0 1))] [ ε (sample (normal 0 1)) (if (> z 0) z (+ ε θ )] (observe (normal 3 1) 0) z) (observe (normal -2 1) 0)) z) r 1 (z) = � (z|0,1) � (x=0|3,1) r 2 (z) = � (z|0,1) � (x=0|-2,1) p(z,x=0) = [z>0]r 1 (z) + [z≤0]r 2 (z)

(let (let ≈ [z (sample (normal 0 1))] [ ε (sample (normal 0 1)) (if (> z 0) z (+ ε θ )] (observe (normal 3 1) 0) z) (observe (normal -2 1) 0)) z) q(ε) = � (ε|0,1) r 1 (z) = � (z|0,1) � (x=0|3,1) z = ε+θ r 2 (z) = � (z|0,1) � (x=0|-2,1) p(z,x=0) = [z>0]r 1 (z) + [z≤0]r 2 (z)

(let (let ≈ [z (sample (normal 0 1))] [ ε (sample (normal 0 1)) (if (> z 0) z (+ ε θ )] (observe (normal 3 1) 0) z) (observe (normal -2 1) 0)) z) q(ε) = � (ε|0,1) r 1 (z) = � (z|0,1) � (x=0|3,1) z = ε+θ r 2 (z) = � (z|0,1) � (x=0|-2,1) p(z,x=0) = [z>0]r 1 (z) + [z≤0]r 2 (z) How to find a good θ? θ n+1 ← θ n + η × � θ ELBO θ=θn

(let (let ≈ [z (sample (normal 0 1))] [ ε (sample (normal 0 1)) (if (> z 0) z (+ ε θ )] (observe (normal 3 1) 0) z) (observe (normal -2 1) 0)) z) q(ε) = � (ε|0,1) r 1 (z) = � (z|0,1) � (x=0|3,1) z = ε+θ r 2 (z) = � (z|0,1) � (x=0|-2,1) p(z,x=0) = [z>0]r 1 (z) + [z≤0]r 2 (z) How to find a good θ? By gradient ascent on ELBO θ . θ n+1 ← θ n + η × � θ ELBO θ=θn

Reparameterization Gradient for Non-differentiable Models Wonyeol - PowerPoint PPT Presentation

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok Yang KAIST Published at NeurIPS 2018 Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok Yang

Implicit Reparameterization Gradients Michael Figurnov, Shakir Mohamed, Andriy Mnih Poster: Room

Learning Automatic Schedulers through Projective Reparameterization Ajay Jain Saman Amarasinghe

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

ImUp ImUp: A Maple Package for Uniformity-Improved Reparameterization of Plane Curves Jing Yang

Soft Threshold Weight Reparameterization for Learnable Sparsity Aditya Kusupati Vivek Ramanujan *

Reparameterization: a Universal Tool for Optimization and Counting George Katsirelos 10/05/2017

How to use Gradient and Multi-Texture 1. Many situations, we need use the gradient texture for our

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration

Gradient interfaces with and without disorder Codina Cotar University College London September

Gradient Gibbs measures with disorder Codina Cotar University College London April 16, 2015,

20 Kelvin cold High gradient RF gun Materials and gradient Some properties of pure metals in low

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

The neutron beta decay correlation program at SNS Dinko Po cani c University of Virginia

Using Seaborn Styles DATA VIS UALIZ ATION W ITH S EABORN Chris Moftt Instructor Setting

Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational

Physical Defaults & the Robust Probing Model Sebastian Faust, Vincent Grosso, Santos Merino

Privacy and Security of Online Social Networks at the Workplace Seda Grses COSIC/ESAT K.U.

Semantic functions Natural Semantics ( x := a, s ) s [ x A [ a ] s ] Natural

PrivacyinOnlineSocial NetworkingatWork YangWang ISR,UCI

iSCSI Naming & Discovery 50 th IETF - Minneapolis March 2001 Mark Bakke, Cisco Yaron Klein,

Reparameterization Gradient for Non-differentiable Models Wonyeol - PowerPoint PPT Presentation

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok Yang KAIST Published at NeurIPS 2018 Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok Yang

Implicit Reparameterization Gradients Michael Figurnov, Shakir Mohamed, Andriy Mnih Poster: Room

Learning Automatic Schedulers through Projective Reparameterization Ajay Jain Saman Amarasinghe

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

ImUp ImUp: A Maple Package for Uniformity-Improved Reparameterization of Plane Curves Jing Yang

Soft Threshold Weight Reparameterization for Learnable Sparsity Aditya Kusupati Vivek Ramanujan *

Reparameterization: a Universal Tool for Optimization and Counting George Katsirelos 10/05/2017

How to use Gradient and Multi-Texture 1. Many situations, we need use the gradient texture for our

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration

Gradient interfaces with and without disorder Codina Cotar University College London September

Gradient Gibbs measures with disorder Codina Cotar University College London April 16, 2015,

20 Kelvin cold High gradient RF gun Materials and gradient Some properties of pure metals in low

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

The neutron beta decay correlation program at SNS Dinko Po cani c University of Virginia

Using Seaborn Styles DATA VIS UALIZ ATION W ITH S EABORN Chris Moftt Instructor Setting

Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational

Physical Defaults &amp; the Robust Probing Model Sebastian Faust, Vincent Grosso, Santos Merino

Privacy and Security of Online Social Networks at the Workplace Seda Grses COSIC/ESAT K.U.

Semantic functions Natural Semantics ( x := a, s ) s [ x A [ a ] s ] Natural

PrivacyinOnlineSocial NetworkingatWork YangWang ISR,UCI

iSCSI Naming &amp; Discovery 50 th IETF - Minneapolis March 2001 Mark Bakke, Cisco Yaron Klein,

Physical Defaults & the Robust Probing Model Sebastian Faust, Vincent Grosso, Santos Merino

iSCSI Naming & Discovery 50 th IETF - Minneapolis March 2001 Mark Bakke, Cisco Yaron Klein,