Discretely Relaxing Continuous Variables for tractable Variational - PowerPoint PPT Presentation

Discretely Relaxing Continuous Variables for tractable Variational Inference Trefor W. Evans & Prasanth B. Nair University of Toronto Neural Information Processing Systems (NeurIPS) Montréal, Canada December, 2018 Evans & Nair NeurIPS 2018 Discretely Relaxing Continuous Variables (DIRECT) 1

Motivation The Need for Efficient Inference Memory and energy efficiency are critical for mobile devices performing on-board inference, as well as large-scale deployed models. We introduce a new technique to perform approximate Bayesian inference with discrete variables. This can dramatically reduce computational requirements. Evans & Nair NeurIPS 2018 Discretely Relaxing Continuous Variables (DIRECT) 2

Overview Discretely Relaxing Continuous Variables (DIRECT) Continuous priors are typically used for approximate Bayesian inference due to computationally tractable training strategies (e.g. reparameterization trick). However, discrete priors offer many advantages at inference time since posterior samples will be sparse and low-precision quantized integers. We introduce a variational inference technique that enables extremely fast and efficient training with discrete priors. Evans & Nair NeurIPS 2018 Discretely Relaxing Continuous Variables (DIRECT) 3

Overview The Curse of Dimensionality The size of the discretized hypothesis space increases exponentially with the number of variables in the model. The ELBO (that we maximize during training) requires evaluating the log-likelihood at each point in the hypothesis space. This quickly becomes computationally intractable! For this reason, we historically resort to high-variance stochastic gradient estimators for training. Evans & Nair NeurIPS 2018 Discretely Relaxing Continuous Variables (DIRECT) 4

Overview How we Compute the ELBO Exactly Viewing the log-prior, log-likelihood, and variational distributions over the hypothesis space as tensors, we exploit the low-rank structure of these tensors to rewrite the ELBO in a compact form using Kronecker matrix algebra. The cost of evaluating the ELBO in this compact form is independent of the number of training points! ✔ ✜ This “DIRECT” approach ✖ ✣ is not practical for all likelihoods, however, ✖ ✣ ✕ ✢ we identify a couple that are practical. Evans & Nair NeurIPS 2018 Discretely Relaxing Continuous Variables (DIRECT) 5

Experiments Experiments Training with DIRECT greatly outperforms REINFORCE: DIRECT 80 80 REINFORCE 1 sample REINFORCE 10 sample 90 90 100 100 ELBO 110 110 120 120 130 130 0 20 40 60 80 100 1 10 1 10 10 0 10 2 Iterations Time Elapsed (s) DIRECT can outperform REPARAM: Continuous Prior Discrete 4-bit Prior REPARAM Mean-Field DIRECT Mean-Field DIRECT 5-Mixture SGD Dataset n RMSE Sparsity RMSE Sparsity RMSE Sparsity auto 159 0 . 425 ✟ 0 . 2 0% 0 . 129 ✟ 0 . 063 51% 0 . 122 ✟ 0 . 056 51% gas 2.5K 0 . 27 ✟ 0 . 052 0% 0 . 211 ✟ 0 . 058 84% 0 . 184 ✟ 0 . 063 76% protein 45K 0 . 642 ✟ 0 . 006 0% 0 . 619 ✟ 0 . 007 76% 0 . 618 ✟ 0 . 007 60% song 515K 0 . 537 ✟ 0 . 002 0% 0 . 501 ✟ 0 . 002 32% 0 . 498 ✟ 0 . 002 28% electric 2M 9 . 26 ✟ 4 . 47 0% 0 . 575 ✟ 0 . 032 99.6% 0 . 557 ✟ 0 . 055 99.6% Evans & Nair NeurIPS 2018 Discretely Relaxing Continuous Variables (DIRECT) 6

Discussion Poster #39 The proposed "DIRECT" approach can exactly compute ELBO gradients, eliminating variance 1 its training complexity is independent of the number of training points 2 posterior samples consist of sparse and low-precision quantized integers 3 we demonstrate accurate inference using 4-bit quantized integers and an 4 30 ✂ ✡ ELBO summing over 10 2352 ✓ log-likelihood evaluations Code: https://github.com/treforevans/direct Contact: trefor.evans@mail.utoronto.ca Evans & Nair NeurIPS 2018 Discretely Relaxing Continuous Variables (DIRECT) 7

Discretely Relaxing Continuous Variables for tractable Variational - PowerPoint PPT Presentation

Discretely Relaxing Continuous Variables for tractable Variational Inference Trefor W. Evans & Prasanth B. Nair University of Toronto Neural Information Processing Systems (NeurIPS) Montral, Canada December, 2018 Evans & Nair

Formal Modeling in Cognitive Science 1 Continuous Random Variables Lecture 21: Continuous Random

Chapter 4 ICS-275 Fall 2010 Fall 2010 ICS 275 - Constraint Networks 1 Tractable Tractable

Continuous Probability 3 2 Continuous Probability Motivation I Sometimes you cant model

continuous random variables continuous random variables Discrete random variable: takes values in

P3 - Continuous random variables STAT 587 (Engineering) Iowa State University August 22, 2020

Computing the continuous discretely: The magic quest for a volume Matthias Beck San Francisco

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Relaxing Exclusive Control in Boolean Games Arianna Novaro IRIT, University of Toulouse F.

Relaxing IND-CCA: Indistinguishability Against Chosen Ciphertext Verification Attack Sumit Kumar

Chapter 5 Continuous Random Variables Continuous Probability Distributions Continuous Probability

Continuous Distributions 1.8-1.9: Continuous Random Variables 1.10.1: Uniform Distribution

Continuous Distributions 1.8-1.9: Continuous Random Variables 1.10.1: Uniform Distribution

continuous random variables Continuous random variable: takes values in an uncountable set, e.g.

On Sequential Monte Carlo Sampling of Discretely Observed Stochastic Differential Equations Simo

Closures & Scoping Variables Parameters Local variables Free variables

Tractable Term Structure ModelsA New Approach Bruno Feunou, Jean-S ebastien Fontaine, Anh

when devops meets regulation: integrating 'continuous' with 'government' @jezhumble public

A First Course on Kinetics and Reaction Engineering Class 29 on Unit 28 Where Were Going

PID controllers Lecture 18 Systems and Control Theory STADIUS - Center for Dynamical Systems,

NiagaraCQ: A Scalable Motivation Continuous Query System What is NiagaraCQ ? for Internet

Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE Juntang Zhuang, Nicha C.

Introduction to Mobile Robotics Techniques for 3D Mapping Wolfram Burgard, Cyrill Stachniss,

I ntroduction to Mobile Robotics Techniques for 3 D Mapping Wolfram Burgard 1 W hy 3 D

Ensuring Opportunity Campaign to End Poverty in Contra Costa County What is Ensuring

Discretely Relaxing Continuous Variables for tractable Variational - PowerPoint PPT Presentation

Discretely Relaxing Continuous Variables for tractable Variational Inference Trefor W. Evans & Prasanth B. Nair University of Toronto Neural Information Processing Systems (NeurIPS) Montral, Canada December, 2018 Evans & Nair

Formal Modeling in Cognitive Science 1 Continuous Random Variables Lecture 21: Continuous Random

Chapter 4 ICS-275 Fall 2010 Fall 2010 ICS 275 - Constraint Networks 1 Tractable Tractable

Continuous Probability 3 2 Continuous Probability Motivation I Sometimes you cant model

continuous random variables continuous random variables Discrete random variable: takes values in

P3 - Continuous random variables STAT 587 (Engineering) Iowa State University August 22, 2020

Computing the continuous discretely: The magic quest for a volume Matthias Beck San Francisco

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Relaxing Exclusive Control in Boolean Games Arianna Novaro IRIT, University of Toulouse F.

Relaxing IND-CCA: Indistinguishability Against Chosen Ciphertext Verification Attack Sumit Kumar

Chapter 5 Continuous Random Variables Continuous Probability Distributions Continuous Probability

Continuous Distributions 1.8-1.9: Continuous Random Variables 1.10.1: Uniform Distribution

Continuous Distributions 1.8-1.9: Continuous Random Variables 1.10.1: Uniform Distribution

continuous random variables Continuous random variable: takes values in an uncountable set, e.g.

On Sequential Monte Carlo Sampling of Discretely Observed Stochastic Differential Equations Simo

Closures &amp; Scoping Variables Parameters Local variables Free variables

Tractable Term Structure ModelsA New Approach Bruno Feunou, Jean-S ebastien Fontaine, Anh

when devops meets regulation: integrating 'continuous' with 'government' @jezhumble public

A First Course on Kinetics and Reaction Engineering Class 29 on Unit 28 Where Were Going

PID controllers Lecture 18 Systems and Control Theory STADIUS - Center for Dynamical Systems,

NiagaraCQ: A Scalable Motivation Continuous Query System What is NiagaraCQ ? for Internet

Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE Juntang Zhuang, Nicha C.

Introduction to Mobile Robotics Techniques for 3D Mapping Wolfram Burgard, Cyrill Stachniss,

I ntroduction to Mobile Robotics Techniques for 3 D Mapping Wolfram Burgard 1 W hy 3 D

Ensuring Opportunity Campaign to End Poverty in Contra Costa County What is Ensuring

Closures & Scoping Variables Parameters Local variables Free variables