On the Properties of Variational Approximations in Statistical - PowerPoint PPT Presentation

Introduction Variational Approximations On the Properties of Variational Approximations in Statistical Learning. Pierre Alquier UCD Dublin - Statistics Seminar - 29/10/15 Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach Learning vs. estimation In many applications one would like to learn from a sample without being able to write the likelihood. Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach Typical machine learning problem Main ingredients : Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach Typical machine learning problem Main ingredients : observations object-label : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ... Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach Typical machine learning problem Main ingredients : observations object-label : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ... → either given once and for all (batch learning), once at a time (online learning), upon request... In this talk, ( X 1 , Y 1 ) , ..., ( X n , Y n ) i.i.d. Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach Typical machine learning problem Main ingredients : observations object-label : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ... → either given once and for all (batch learning), once at a time (online learning), upon request... In this talk, ( X 1 , Y 1 ) , ..., ( X n , Y n ) i.i.d. a restricted set of predictors ( f θ , θ ∈ Θ) . Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach Typical machine learning problem Main ingredients : observations object-label : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ... → either given once and for all (batch learning), once at a time (online learning), upon request... In this talk, ( X 1 , Y 1 ) , ..., ( X n , Y n ) i.i.d. a restricted set of predictors ( f θ , θ ∈ Θ) . → f θ ( X ) meant to predict Y . Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach Typical machine learning problem Main ingredients : observations object-label : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ... → either given once and for all (batch learning), once at a time (online learning), upon request... In this talk, ( X 1 , Y 1 ) , ..., ( X n , Y n ) i.i.d. a restricted set of predictors ( f θ , θ ∈ Θ) . → f θ ( X ) meant to predict Y . a criterion of success, R ( θ ) : Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach Typical machine learning problem Main ingredients : observations object-label : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ... → either given once and for all (batch learning), once at a time (online learning), upon request... In this talk, ( X 1 , Y 1 ) , ..., ( X n , Y n ) i.i.d. a restricted set of predictors ( f θ , θ ∈ Θ) . → f θ ( X ) meant to predict Y . a criterion of success, R ( θ ) : → for example R ( θ ) = P ( f θ ( X ) � = Y ) (classification error). In this talk R ( θ ) = E [ ℓ ( Y , f θ ( X ))] . We want to minimize R ( θ ) . But note that it is unknown in practice. Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach Typical machine learning problem Main ingredients : observations object-label : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ... → either given once and for all (batch learning), once at a time (online learning), upon request... In this talk, ( X 1 , Y 1 ) , ..., ( X n , Y n ) i.i.d. a restricted set of predictors ( f θ , θ ∈ Θ) . → f θ ( X ) meant to predict Y . a criterion of success, R ( θ ) : → for example R ( θ ) = P ( f θ ( X ) � = Y ) (classification error). In this talk R ( θ ) = E [ ℓ ( Y , f θ ( X ))] . We want to minimize R ( θ ) . But note that it is unknown in practice. an empirical proxy r ( θ ) for this criterion of success : Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach Typical machine learning problem Main ingredients : observations object-label : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ... → either given once and for all (batch learning), once at a time (online learning), upon request... In this talk, ( X 1 , Y 1 ) , ..., ( X n , Y n ) i.i.d. a restricted set of predictors ( f θ , θ ∈ Θ) . → f θ ( X ) meant to predict Y . a criterion of success, R ( θ ) : → for example R ( θ ) = P ( f θ ( X ) � = Y ) (classification error). In this talk R ( θ ) = E [ ℓ ( Y , f θ ( X ))] . We want to minimize R ( θ ) . But note that it is unknown in practice. an empirical proxy r ( θ ) for this criterion of success : � n → here r ( θ ) = 1 i = 1 ℓ ( Y i , f θ ( X i )) . n Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach Empirical risk minimization (ERM) ˆ θ n = arg min θ ∈ Θ r ( θ ) . Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach Empirical risk minimization (ERM) ˆ θ n = arg min θ ∈ Θ r ( θ ) . Theorem (Vapnik and Chervonenkis, in the 70’s) Vapnik, V. (1998). Statistical Learning Theory , Springer. Classification setting. Let d Θ denote the VC-dim. of Θ . � � d Θ log ( n + 1 ) + log ( 2 ) R (ˆ θ n ) ≤ inf θ ∈ Θ R ( θ ) + 4 P n � � log ( 2 /ε ) ≥ 1 − ε. + 2 n Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach ERM with linear classifiers Table: Linear classifiers in R p : d Θ = p + 1. Source : http ://mlpy.sourceforge.net/ Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach ERM with linear classifiers Here d Θ = 3, n = 500. Table: Linear classifiers in R p : d Θ = p + 1. Source : http ://mlpy.sourceforge.net/ Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach ERM with linear classifiers Here d Θ = 3, n = 500. With probability at least 90 % , R (ˆ θ n ) ≤ inf θ ∈ Θ R ( θ )+ 0 . 842 . Table: Linear classifiers in R p : d Θ = p + 1. Source : http ://mlpy.sourceforge.net/ Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach ERM with linear classifiers Here d Θ = 3, n = 500. With probability at least 90 % , R (ˆ θ n ) ≤ inf θ ∈ Θ R ( θ )+ 0 . 842 . With n = 5000 we would have Table: Linear classifiers in R p : d Θ = p + 1. Source : R (ˆ θ n ) ≤ inf http ://mlpy.sourceforge.net/ θ ∈ Θ R ( θ )+ 0 . 301 . Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach The PAC-Bayesian approach : origins Idea : combine these tools with a prior π on Θ . Shawe-Taylor, J. & Williamson, R. C. (1997). A PAC Analysis of a Bayesian Estimator. COLT’97 . McAllester, D. A. (1998). Some PAC-Bayesian Theorems. COLT’98 . “A PAC performance guarantee theorem applies to a broad class of experimental settings. A Bayesian correctness theorem applies to only experimental settings consistent with the prior used in the algorithm. However, in this restricted class of settings the Bayesian learning algorithm can be optimal and will generally outperform PAC learning algorithms. (...) The PAC-Bayesian theorems and algorithms (...) attempt to get the best of both PAC and Bayesian approaches by combining the ability to be tuned with an informal prior with PAC guarantees that hold in all i.i.d experimental settings.” Pierre Alquier Properties of Variational Approximations

Introduction Statistical Learning Setting Variational Approximations (Pseudo)-Bayesian Approach The PAC-Bayesian approach EWA / pseudo-posterior / Gibbs estimator / ... ρ λ ( d θ ) ∝ exp [ − λ r ( θ )] π ( d θ ) . ˆ Pierre Alquier Properties of Variational Approximations

On the Properties of Variational Approximations in Statistical - PowerPoint PPT Presentation

Introduction Variational Approximations On the Properties of Variational Approximations in Statistical Learning. Pierre Alquier UCD Dublin - Statistics Seminar - 29/10/15 Pierre Alquier Properties of Variational Approximations Introduction

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

On nonlinear approximations and the linear hull effect Anne Canteaut Inria, Paris, France joint

JUST THE MATHS SLIDES NUMBER 3.3 TRIGONOMETRY 3 (Approximations & inverse functions)

Efficient Sparse Approximations for Convolution Processes Mauricio A. lvarez Joint work with

Variational optimal power flow and dispatch problems and their approximations Anna Scaglione

Posterior distributions p ( |D , M ) = P ( D| ) p ( ) P ( D|M ) Laplace and variational

Web Course Web Course Physical Properties of Glass Physical Properties of Glass 1. Properties

Web Course Web Course Physical Properties of Glass Physical Properties of Glass 1. Properties

Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

Should we think of quantum probabilities as Bayesian probabilities? Carlton M. Caves C. M.

Multi-Dimensional Reflective BSDE July 29 2010, Cornell University By Qinghua Li, Columbia

Modeling the probability of occurrence of events with the new stpreg command Matteo Bottai, ScD 1

Exchangeability Peter Orbanz Columbia University P ARAMETERS AND P ATTERNS Parameters P ( X |

Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 L. Rosasco Bayesian

Embeddings of statistical manifolds H ong V an L e Institute of Mathematics, CAS

the multiple Chernoff distance Ke Li California Institute of Technology QMath 13, Georgia Tech

Kernel Methods for Topological Data Analysis Kenji Fukumizu The Institute of Statistical

On the Properties of Variational Approximations in Statistical - PowerPoint PPT Presentation

Introduction Variational Approximations On the Properties of Variational Approximations in Statistical Learning. Pierre Alquier UCD Dublin - Statistics Seminar - 29/10/15 Pierre Alquier Properties of Variational Approximations Introduction

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Probabilistic &amp; Unsupervised Learning Factored Variational Approximations and Variational

Probabilistic &amp; Unsupervised Learning Factored Variational Approximations and Variational

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

On nonlinear approximations and the linear hull effect Anne Canteaut Inria, Paris, France joint

JUST THE MATHS SLIDES NUMBER 3.3 TRIGONOMETRY 3 (Approximations &amp; inverse functions)

Efficient Sparse Approximations for Convolution Processes Mauricio A. lvarez Joint work with

Variational optimal power flow and dispatch problems and their approximations Anna Scaglione

Posterior distributions p ( |D , M ) = P ( D| ) p ( ) P ( D|M ) Laplace and variational

Web Course Web Course Physical Properties of Glass Physical Properties of Glass 1. Properties

Web Course Web Course Physical Properties of Glass Physical Properties of Glass 1. Properties

Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

Should we think of quantum probabilities as Bayesian probabilities? Carlton M. Caves C. M.

Multi-Dimensional Reflective BSDE July 29 2010, Cornell University By Qinghua Li, Columbia

Modeling the probability of occurrence of events with the new stpreg command Matteo Bottai, ScD 1

Exchangeability Peter Orbanz Columbia University P ARAMETERS AND P ATTERNS Parameters P ( X |

Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 L. Rosasco Bayesian

Embeddings of statistical manifolds H ong V an L e Institute of Mathematics, CAS

the multiple Chernoff distance Ke Li California Institute of Technology QMath 13, Georgia Tech

Kernel Methods for Topological Data Analysis Kenji Fukumizu The Institute of Statistical

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

JUST THE MATHS SLIDES NUMBER 3.3 TRIGONOMETRY 3 (Approximations & inverse functions)