A Tutorial on Bayesian Nonparametrics Fatima Al-Raisi Carnegie - PowerPoint PPT Presentation

A Tutorial on Bayesian Nonparametrics Fatima Al-Raisi Carnegie Mellon University fraisi@cs.cmu.edu October 25, 2016 Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 1 / 45

Introdution 1 Baseyan Non-Parametrics Motivation 2 Intuitions and Assumptions Theoretical Motivation Practical Motivation Dirichlet Process 3 Chinese Restaurant Process 4 Pitman-Yor Process Discussion and Concluding Remarks 5 List of Tutorials 6 Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 2 / 45

Development of Interest in Topic Over Time An interesting “interest over time” pattern! Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 3 / 45

Interest Over Time: Deep Learning Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 4 / 45

Interest Over Time: Reinforcement Learning Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 5 / 45

Interest Over Time: Nonparametric Statistics! Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 6 / 45

Interest Over Time: Bayesian Inference! Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 7 / 45

Terminology What does “Bayesian Nonparametrics” mean? Bayesian inference: data and parameters, priors and posterios P ( parameters | data ) ∝ P ( parameters ) P ( data | parameters ) Bayesian inference vs. Bayes rule (Bayesian inference does not mean using Bayes rule!) Non-parametric ⋆ (misnomer): large/unbounded number of parameters, growing number of parameters, infinite parameter space “the number of parameters grow with the amount of training data” No (strong) assumption about underlying distribution of the data Terminology note: non - parametric vs. non e parametric Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 8 / 45

Terminology Formal Definition A statistical model is a collection of distributions: { P θ : θ ∈ Θ } indexed by a parameter θ Parametric Model : indexing parameter is a finite-dimensional vector: Θ ⊂ R k Nonparametric Model : Θ ⊂ F for some possibly infinite-dimensional space F Semiparametric Model : parameter has both a finite-dimensional component and an infinite-dimensional component: Θ ⊂ R k × F where F is an infinite-dimensional space Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 9 / 45

Review Probabilistic Modeling Data: x 1 , x 2 , . . . , x n Latent variables: z 1 , z 2 , . . . , z n Parameter: θ A probabilistic model is a parametrized joint distribution over variables P ( x 1 , x 2 , . . . , x n , z 1 , z 2 , . . . , z n | θ ) Typically interpreted as a generative model of data Inference of latent variables given observed data: P ( z 1 , z 2 , . . . , z n | x 1 , x 2 , . . . , x n , θ ) = P ( x 1 , x 2 , . . . , x n , z 1 , z 2 , . . . , z n | θ ) P ( x 1 , x 2 , . . . , x n | θ ) Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 10 / 45

Review Probabilistic Modeling Learning , (e.g., by maximum likelihood): θ = argmax P ( x 1 , x 2 , . . . , x n | θ ) θ Prediction : P ( x n +1 | x 1 , x 2 , . . . , x n , θ ) Classification : argmax P ( x n +1 | θ c ) c Standard algorithms: EM, VI, MCMC, etc. Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 11 / 45

Review Bayesian Modeling Prior distribution: P ( θ ) Posterior distribution: P ( z 1 , . . . , z n , θ | x 1 , . . . , x n ) = P ( x 1 , . . . , x n , z 1 , . . . , z n | θ ) P ( θ ) P ( x 1 , . . . , x n ) The above is doing both inference and learning Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 12 / 45

Clustering Parametric Approach Think of data as generated from a number of sources Model each cluster using a parametric model A data item i is drawn as follows: z i | π ∼ Discrete ( π ) x i | z i , θ ⋆ k ∼ F ( θ ⋆ z i ) where F is a parametric model (e.g., Guassian with parameter vector θ = ( µ, σ )) Mixing proportions: π = ( π 1 , . . . , π k ) | α ∼ Dirichlet ( α k , . . . , α k ) More on the Dirichlet distribution later Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 13 / 45

Motivation Question: What is the number of sources? Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 14 / 45

Motivation Question: What is the number of sources? Is it 5? Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 15 / 45

Motivation Question: What is the number of sources? Or maybe 3? Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 16 / 45

Motivation Question: What is the number of sources? In practice an ad-hoc approach is followed to decide k. For example, guess the number of clusters, then run EM for Gaussian Mixture Model, look at results and goodness of fit, and then if needed try again with a different k or run hierarchical agglomerative clustering, and cut the tree at a “reasonable looking” level Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 17 / 45

Motivation Question: What is the number of sources? In practice an ad-hoc approach is followed to decide on k. But we want a principled approach for discovering k. After all, it is an essential part of the problem to be solved! Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 18 / 45

Motivation Intuitive and Theoretical Motivation Natural Phenomena: Topics: ◮ (Wikipedia) dynamic traversal ◮ Clustering Species discovery Annotation and labeling Knowledge-base entity types . . . For any fixed k, as we see more data, there is a positive probability that we will encounter a data point that does not fit in the current scheme; i.e., k grows with data Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 19 / 45

Motivation Theoretical Motivation: De Finetti’s Theorem Infinite Exchangeability A data sequence is infinitely exchangeable if the distribution of any N data points does not change under permutation: p ( X 1 , . . . , X n ) = p ( X σ (1) , . . . , X σ ( n ) ) Theoretical Motivation: De Finetti’s Theorem Theorem (De Finetti’s Theorem) A sequence X 1 , . . . , X n is infinitely exchangeable if and only if, for all N and some distribution P: N � � p ( X 1 , . . . , X n ) = p ( X n | θ ) P ( d θ ) θ n =1 Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 20 / 45

Motivation Theoretical Motivation De Finetti’s Theorem General proof: Hewitt, Savage 1955; Aldous 1983 Theorem (De Finetti’s Theorem) A sequence X 1 , . . . , X n is infinitely exchangeable if and only if, for all N and some distribution P: N � � p ( X 1 , . . . , X n ) = p ( X n | θ ) P ( d θ ) θ n =1 Motivates: Parameters Likelihood Priors Non-parametric Bayesian priors Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 21 / 45

Motivation Theoretical Motivation What happens under the parametric regime? Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 22 / 45

Motivation Theoretical Motivation What happens under the parametric regime? Let’s take the example of regression Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 23 / 45

Motivation Theoretical Motivation What happens under the parametric regime? When fitting/optimizing, we’re finding the best fit within the chosen (parametric) family of functions; i.e., we’re optimizing to get the closest approximation to the true taget function. Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 24 / 45

Motivation Theoretical Motivation What happens under the parametric regime? When fitting, we’re finding the best fit within the chosen (parametric) family of functions; i.e., we’re optimizing to get the closest approximation to the true taget function. But this may not be good enough Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 25 / 45

Motivation Theoretical Motivation: Non-parametric Bayesin Approach Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 26 / 45

Motivation Practical Problem-solving Motivation Human intuitions about high-dimentional problems are often misleading! Example : recent result from Random Matrix Theory: proving the proliferation of saddle points in comparison to local minina in high-dimentional problems [Dauphin et. al 2015] Assumptions often made when attempting to solve different problems are naturally part of the problem to be solved, e.g., Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 27 / 45

A Tutorial on Bayesian Nonparametrics Fatima Al-Raisi Carnegie - PowerPoint PPT Presentation

A Tutorial on Bayesian Nonparametrics Fatima Al-Raisi Carnegie Mellon University fraisi@cs.cmu.edu October 25, 2016 Fatima Al-Raisi (Carnegie Mellon University) A Tutorial on Bayesian Nonparametrics October 25, 2016 1 / 45 Introdution 1

Bayesian nonparametrics Dr. Jarad Niemi STAT 615 - Iowa State University December 5, 2017 Jarad

Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 L. Rosasco Bayesian

Bayesian Nonparametrics Charlie Frogner 9.520 Class 11 March 14, 2012 C. Frogner Bayesian

Variational Russian Roulette for Variational Russian Roulette for Deep Bayesian Nonparametrics

Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models Tutorial at CVPR 2012 Erik

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Magic for Complex Social Science Data: Fusion, Nonparametrics, Dynamics, Dyads, Networks

Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University

Structured Databases of Named Entities from Bayesian Nonparametrics Dr. Jacob Eisenstein

Bayesian Nonparametrics Peter Orbanz Columbia University P ARAMETERS AND P ATTERNS Parameters P

A GAMS TUTORIAL A GAMS TUTORIAL A GAMS TUTORIAL WHAT IS GAMS ? General Algebraic Modeling

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Gender Gap in Earnings in Vietnam: Why do Vietnamese Women Work in Lower Paid Occupations? By:

The geometrical destabilization of inflation Sbastien Renaux-Petel CNRS - Institut

Second-Quarter 2018 Results July 25, 2018 Safe Harbor This presentation includes forward

The Netw ork Neutrality Debate An Overview Barbara van Schewick Stanford Law School IETF

Smooth varying coefficient models in Stata Yet another semiparametric approach Rios-Avila,

Statistical Filtering and Control for AI and Robotics Part III. Extended Kalman filter, Particle

How to Estimate Statistical Continuous Case Characteristics Based on a Analysis of the Problem

STAT 401A - Statistical Methods for Research Workers Nonparametric two-sample tests Jarad Niemi