Variational Inference for GPs: Presenters Group1: Stochastic - PowerPoint PPT Presentation

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28 Chaoqi Wang Sana Tonekaboni Will Grathwohl Group2: Variational inference for GPs. Slides 29 - 57 Trefor Evans Kingsley Chang Shems Saleh James Lucas Group3: PAC-Bayes. Slides 58 - 68 Wenyuan Zeng Shengyang Sun Variational Inference for GPs October 17, 2017 1 / 68

Variational Inference for GPs CSC2541 Presentation October 17, 2017 Variational Inference for GPs October 17, 2017 2 / 68

Stochastic Variational Inference, by Matt Hoffman, David M. Blei, Chong Wang, John Paisley Exponential family and Latent Dirichlet Allocation Variational Inference for GPs October 17, 2017 3 / 68

Exponential family Exponential family plays a very important role in statistics and it has many good properties. 1 Most of the commonly used distributions are in the exponential family, like, Gaussian, multinomial, exponential, Dirichlet, Poisson, Gamma... 2 Also, some are not in the exponential family: Cauchy, uniform... Variational Inference for GPs October 17, 2017 4 / 68

Exponential family: definition The exponential family is defined as the following form: p( x | η ) = exp { η T T ( x ) − A ( η ) } 1 η ∈ R d , the natural parameters. 2 T : X → R d , the sufficient statistic. � 3 A ( η ) = ln X exp { η T T ( x ) } d µ ( x ), the log normalizer. ( µ is the base measure on a space X ) Sometimes, it will be convenient to use a base measure function h ( x ) : X → R + , and define: p( x | η ) = h( x )exp { η T T ( x ) − A ( η ) } , though h can always be included in µ . Variational Inference for GPs October 17, 2017 5 / 68

Exponential family: examples Categorical distribution is a discrete probability distribution that describes the possible results of a random event that can be on one of K possible outcomes. It is defined as: 1 Parameters: k (#categories); µ 1 , ..., µ k (event probabilities, µ i > 0 and � µ i = 1) 2 Support set: x ∈ { 1 , ..., k } 3 PMF: p ( x ) = µ x 1 1 · · · µ x k k , (here, we overload x as ([ x = 1] , ..., [ x = k ])) 4 Mode: i when p i = max ( µ 1 , ..., µ k ) Variational Inference for GPs October 17, 2017 6 / 68

Exponential family: examples We can write the pmf in the standard representation: p ( x | µ ) = � k i = exp { � k i =1 µ x i i =1 x i ln µ i } , where x = ( x 1 , ..., x k ) T , and it also can be written as: p ( x | µ ) = exp { � k − 1 i =1 x i ln µ i + (1 − � k − 1 i =1 x i ) ln(1 − � k − 1 i =1 µ i ) } = exp { � k − 1 i =1 µ i ) + ln(1 − � k − 1 µ i i =1 x i ln( i =1 µ i ) } 1 − � k − 1 Now, we can identify that: j µ j ), T ( x ) = x , A ( η ) = ln(1 + � k − 1 µ i η i = ln( i =1 exp ( η i )), h ( x ) = 1 1 − � Then, p ( x | µ ) = p ( x | η ) = 1 · exp { η T T ( x ) − A ( η ) } Variational Inference for GPs October 17, 2017 7 / 68

Exponential family: property Exponential family has some properties. 1 D KL ( p ( x | η 1 ) || p ( x | η 2 )) = ( η 1 − η 2 ) T ∇ A ( η 1 ) − A ( η 1 ) + A ( η 2 ) 2 A ( η ) is convex. � 3 ∇ A ( η ) = E [ T ( x )] ≈ 1 i T ( x ( i ) ) N 4 ∇ 2 A ( η ) = E [ T ( x ) T ( x ) T ] − E [ T ( x )] E [ T ( x ) T ] = Var [ T ( x )] Variational Inference for GPs October 17, 2017 8 / 68

Latent Dirichlet Allocation Latent Dirichlet Allocation (LDA) is a generative probabilistic model for collections of discrete data such as text corpora. Variational Inference for GPs October 17, 2017 9 / 68

LDA: process The generative process of LDA model can be summarized as: 1 Draw topics β k from Dirichlet ( η, ..., η ) for k ∈ { 1 , ..., K } 2 For each document d ∈ { 1 , ..., D } : Draw topic proportions θ d from Dirichlet ( α, ..., α ) 1 For each word w ∈ { 1 , ..., N } : 2 Draw topic assignment z dn from Multinomial ( θ d ) Draw word w dn from Multinomial ( β z dn ) Variational Inference for GPs October 17, 2017 10 / 68

Latent Dirichlet Allocation: notations There are some notations used in LDA model: 1 w dn is the n th word in d th document. Each word is an element in the fixed vocabulary of V terms. 2 β k is a V dimensional vector, on a V − 1 simplex. The w th entry in topic k is β kw 3 θ d is the associated topic proportions of d th document. It is a point on the K − 1 simplex. 4 z dn indexes the topic from which w dn is drawn. It is assumed that each word in each document is drawn from a single topic. Variational Inference for GPs October 17, 2017 11 / 68

LDA: inference Graphical model representation of LDA. The boxes are plates representing replicates. The outer plate represents documents, while the inner plate represents the repeated choice of topics and words within a document. 1 The joint distribution is: p ( θ , z , w | β , α ) = p ( θ | α ) � N n =1 p ( z n | θ ) p ( w n | z n , β ) 1 Blei, David M.; Ng, Andrew Y.; Jordan, Michael I (January 2003). Lafferty, John, ed. ”Latent Dirichlet Allocation”. Journal of Machine Learning Research. 3 (45): pp. 9931022. doi:10.1162/jmlr.2003.3.4-5.993 Variational Inference for GPs October 17, 2017 12 / 68

LDA: inference The key inferential problem that we need to solve in order to use LDA is that of computing the posterior distribution of the hidden variables given a document: p ( θ , z | w , α , β ) = p ( θ , z , w | β,α ) p ( w | α,β ) However, the denominator is computationally intractable. Variational Inference for GPs October 17, 2017 13 / 68

LDA: inference One way to approximate the posterior is variational inference. In mean-field variational inference, the variational distributions of each variable are in the same family as the complete conditional. We have: p ( z dn = k | θ d , β 1: K , , w dn ) ∝ exp { ln θ dk + ln β k , w dn } , p ( θ d | z d ) = Dirichlet ( α + � N n =1 z dn ), p ( β k | z , w ) = Dirichlet ( η + � D � N n =1 z k dn w dn ) d =1 So, the corresponding variational distributions are: q ( z dn ) =Multinomial( φ dn ), for each update: φ dn ∝ exp { Ψ( γ dk ) + Ψ( λ k , w dn ) − Ψ( � v λ kv ) } for n ∈ { 1 , ..., N } q ( θ d ) = Dirichlet ( γ d ), for each update, γ d = α + � N n =1 φ dn q ( β k ) = Dirichlet ( λ k ), for each update, λ k = η + � D � N n =1 φ k dn w dn d =1 Variational Inference for GPs October 17, 2017 14 / 68

LDA: inference Before updating the topics λ 1: K , we need to compute the local variational parameters for every document. This is particularly wasteful in the beginning of the algorithm when, before completing the first iteration, we must analyze every document with randomly initialized topics. Variational Inference for GPs October 17, 2017 15 / 68

Stochastic Variational Inference, by Matt Hoffman, David M. Blei, Chong Wang, John Paisley Variational Inference Variational Inference for GPs October 17, 2017 16 / 68

Variational Inference Goal: approximate the posterior distribution of a probabilistic model by introducing a distribution over the hidden variables, and optimizing the parameters of that distribution. Our class of models involves: Obsevations x = x 1:N Global hidden variables β Local hidden variables z = z 1:N Fixed parameters α (For simplicity we assume that they only govern the global hidden variables) Variational Inference for GPs October 17, 2017 17 / 68

Global vs. Local Hidden Variables Global hidden variables β : parameters endowed with a prior p( β ) Local hidden variables z = z 1:N : contains the hidden structure that governs each observation The difference is determined by conditional dependencies : p(x n ,z n | x -n , z -n ,β , α ) = p(x n ,z n | β , α ) Also, the complete conditional distribution of the hidden variables are in the exponential family q( β | x , z ,α ) = h( β )exp( η g (x,z, α ) T t ( β )-a g η g (x,z, α )) q(z nj | x n ,z nj , β ) = h(z nj )exp( η l (x n ,z nj , β ) T t ( z nj )-a l η l (x n ,z nj , β )) Variational Inference for GPs October 17, 2017 18 / 68

Mean-field Variational Inference Mean-field variational inference: a variational inference family where each hidden variable is independent and governed by its own variational parameter λ govern the global variables and φ n govern the local variables N J � � q(z, β ) = q( β | λ ) q ( z nj | φ nj ) n =1 j =1 Also, we set q( β | λ ) and q(z nj | φ nj ) to be in the same exponential family as the complete conditional distributions p( β | x , z ) andp ( z nj | x n , z n-j ,β ) q( β | λ ) = h( β )exp λ T t( β )-a g ( λ ) q(z nj | φ nj ) = h(h nj )exp φ T nj t(z nj )-a l ( φ nj ) Variational Inference for GPs October 17, 2017 19 / 68

Batch Variational Bayes L = E [logq(z, β )]- E [logp(x,z, β )] Coordinate update for λ : λ = E q [ η g (x,z, α )] Coordinate update for φ : φ nj = E q [ η l (x n ,z n − j , β )] Therefore, we can optimize our objective function with an easy coordinate ascend and in closed form Variational Inference for GPs October 17, 2017 20 / 68

Batch Variational Bayes Algorithm 1 Initialize λ (0) randomly 2 Repeat 3 for each local variational parameter φ nj do 4 Update φ nj , φ ( t ) nj = E q ( t − 1) [ η l , j (x n ,z n − j , β )] 5 End for 6 Update the global variational parameters λ ( t ) = E q ( t ) [ η g (z 1: N ,x 1: N )] Variational Inference for GPs October 17, 2017 21 / 68

Variational Inference for GPs: Presenters Group1: Stochastic - PowerPoint PPT Presentation

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28 Chaoqi Wang Sana Tonekaboni Will Grathwohl Group2: Variational inference for GPs. Slides 29 - 57 Trefor Evans Kingsley Chang Shems Saleh James

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Localization with GPS Localization with GPS From GPS Theory and Practice Fifth Edition

Faster GPS via the Sparse Fourier Transform Haitham Hassanieh Fadel Adib Dina Katabi Piotr Indyk

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14 Variational

Targeted GPS spoofing Bart Hermans & Luc Gommans University of Amsterdam - RP2 How does GPS

6. Kinematic GPS and Applications Tectonic Geodesy GEOS 655 Kinematic GPS Development of

Die Mathematik des GPS Drei Segmente des GPS Koordinatensysteme Notwendige Lineare Algebra

Imp mprovin ing S Sensitivity o on K Kea CubeS eSat GPS GPS Rec ecei eiver ers Eamonn

Variational Mean Field Variational Mean Field for Graphical Models for Graphical Models

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

USTEC Model Analysis HANNA KRISTENSEN, PEPPERDINE UNIVERSITY Advisor: Mihail Codrescu

DSSY implementation on deal. II Imbunm Kim Seoul National University ibkim11@gmail.com 1 / 3

On the relation between possibilistic logic and modal logics of belief Mohua Banerjee 1 , Didier

Improving the Performance of CP2K on the Cray XT CUG 2010 27/05/2010 Iain Bethune EPCC

Learning Significant Locations and Predicting User Movement with GPS Daniel Ashbrook and Thad

Implications for timing system of possible CUC Cryo-Mezzanine move David Cussans Upstream

Le Learning De Deep Co Control Po Policies fo for Au Autonomous Ae Aerial Ve Vehicles wi

PHASOR MEASUREMENT UNIT (PMU) AKANKSHA PACHPINDE INTRODUCTION OUTLINE Conventional control

Variational Inference for GPs: Presenters Group1: Stochastic - PowerPoint PPT Presentation

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28 Chaoqi Wang Sana Tonekaboni Will Grathwohl Group2: Variational inference for GPs. Slides 29 - 57 Trefor Evans Kingsley Chang Shems Saleh James

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Localization with GPS Localization with GPS From GPS Theory and Practice Fifth Edition

Faster GPS via the Sparse Fourier Transform Haitham Hassanieh Fadel Adib Dina Katabi Piotr Indyk

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14 Variational

Targeted GPS spoofing Bart Hermans &amp; Luc Gommans University of Amsterdam - RP2 How does GPS

6. Kinematic GPS and Applications Tectonic Geodesy GEOS 655 Kinematic GPS Development of

Die Mathematik des GPS Drei Segmente des GPS Koordinatensysteme Notwendige Lineare Algebra

Imp mprovin ing S Sensitivity o on K Kea CubeS eSat GPS GPS Rec ecei eiver ers Eamonn

Variational Mean Field Variational Mean Field for Graphical Models for Graphical Models

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

USTEC Model Analysis HANNA KRISTENSEN, PEPPERDINE UNIVERSITY Advisor: Mihail Codrescu

DSSY implementation on deal. II Imbunm Kim Seoul National University ibkim11@gmail.com 1 / 3

On the relation between possibilistic logic and modal logics of belief Mohua Banerjee 1 , Didier

Improving the Performance of CP2K on the Cray XT CUG 2010 27/05/2010 Iain Bethune EPCC

Learning Significant Locations and Predicting User Movement with GPS Daniel Ashbrook and Thad

Implications for timing system of possible CUC Cryo-Mezzanine move David Cussans Upstream

Le Learning De Deep Co Control Po Policies fo for Au Autonomous Ae Aerial Ve Vehicles wi

PHASOR MEASUREMENT UNIT (PMU) AKANKSHA PACHPINDE INTRODUCTION OUTLINE Conventional control

Targeted GPS spoofing Bart Hermans & Luc Gommans University of Amsterdam - RP2 How does GPS