Motivation We are given a data set, and are told that it was - PDF document

Dirichlet Process Mixtures A gentle tutorial Graphical Models – 10708 Khalid El-Arini Carnegie Mellon University November 6 th , 2006 1 Motivation � We are given a data set, and are told that it was generated from a mixture of Gaussians. � Unfortunately, no one has any idea how many Gaussians produced the data. 10-708 2 1

Motivation � We are given a data set, and are told that it was generated from a mixture of Gaussians. � Unfortunately, no one has any idea how many Gaussians produced the data. 10-708 3 What to do? � We can guess the number of clusters, do EM for Gaussian Mixture Models, look at the results, and then try again… � We can do hierarchical agglomerative clustering, and cut the tree at a visually appealing level… � We want to cluster the data in a statistically principled manner, without resorting to hacks. 10-708 4 2

Review: Dirichlet Distribution � Let � We write: � Distribution over possible parameter vectors for a multinomial distribution, and is in fact the conjugate prior for the multinomial. � Beta distribution is the special case of a Dirichlet for 2 dimensions. � Samples from the distribution lie in the m-1 dimensional simplex � Thus, it is in fact a “distribution over distributions.” 10-708 5 Dirichlet Process � A Dirichlet Process is also a distribution over distributions. � We write: G ~ DP( α , G 0 ) � G 0 is a base distribution � α is a positive scaling parameter � G has the same support as G 0 10-708 6 3

Dirichlet Process � Consider Gaussian G 0 � G ~ DP( α , G 0 ) 10-708 7 Dirichlet Process � G ~ DP( α , G 0 ) � G 0 is continuous, so the probability that any two samples are equal is precisely zero. � However, G is a discrete distribution, made up of a countably infinite number of point masses [Blackwell] � Therefore, there is always a non-zero probability of two samples colliding 10-708 8 4

Dirichlet Process α values determine how close � G ~ DP( α 1 , G 0 ) G is to G 0 � G ~ DP( α 2 , G 0 ) 10-708 9 Sampling from a DP G ~ DP( α , G 0 ) X n | G ~ G for n = {1, …, N} (iid) Marginalizing out G introduces dependencies between the X n variables G X n N 10-708 10 5

Sampling from a DP Assume we view these variables in a specific order, and are interested in the behavior of X n given the previous n - 1 observations. Let there be K unique values for the variables: 10-708 11 Sampling from a DP Chain rule P(partition) P(draws) Notice that the above formulation of the joint does not depend on the order we consider the variables. We can arrive at a mixture model by assuming exchangeability and applying DeFinetti’s Theorem (1935). 10-708 12 6

Chinese Restaurant Process Let there be K unique values for the variables: Can rewrite as: 10-708 13 Chinese Restaurant Process Consider a restaurant with infinitely many tables, where the X n ’s represent the patrons of the restaurant. From the above conditional probability distribution, we can see that a customer is more likely to sit at a table if there are already many people sitting there. However, with probability proportional to α , the customer will sit at a new table. Also known as the “clustering effect,” and can be seen in the setting of social clubs. [Aldous] 10-708 14 7

Dirichlet Process Mixture countably infinite number G 0 of point masses G α draw N times from G to get parameters for different mixture components η n If η n were drawn from e.g. a Gaussian, no two values would be the same, but since they are drawn from a distribution drawn from a Dirichlet Process, we expect y n a clustering of the η n N # unique values for η n = # mixture components 10-708 15 CRP Mixture 10-708 16 8

Stick Breaking � So far, we’ve just mentioned properties of a distribution G drawn from a Dirichlet Process � In 1994, Sethuraman developed a constructive way of forming G, known as “stick breaking” 10-708 17 Stick Breaking 1. Draw η 1 * from G 0 2. Draw v 1 from Beta(1, α ) 3. π 1 = v 1 4. Draw η 2 * from G 0 5. Draw v 2 from Beta(1, α ) 6. π 2 = v 2 (1 – v 1 ) … 10-708 18 9

Formal Definition � Let α be a positive, real-valued scalar � Let G 0 be a non-atomic probability distribution over support set A � We say G ~ DP( α , G 0 ), if for all natural numbers k and k-partitions {A 1 , …, A k }, 10-708 19 Inference in a DPM � EM is generally used for G 0 inference in a mixture G model, but G is α nonparametric, making EM difficult η n � Markov Chain Monte Carlo y n techniques [Neal 2000] � Variational Inference [Blei N and Jordan 2006] 10-708 20 10

Gibbs Sampling [Neal 2000] � Algorithm 1: � Define H i to be the single G 0 observation posterior G � We marginalize out G from α our model, and sample each η n given everything else η n y n N SLOW TO CONVERGE! 10-708 21 Gibbs Sampling [Neal 2000] � Algorithm 2: [Grenager 2005] G 0 G 0 G α α c n η n η c ∞ y n y n N N 10-708 22 11

Gibbs Sampling [Neal 2000] Algorithm 2 (cont.): � � We sample from the distribution over an individual cluster assignment c n given y n , and all the other cluster assignments Initialize cluster assignments c 1 , …, c N 1. For i=1,…,N, draw c i from: 2. if c = c j for some j ≠ i otherwise For all c , draw η c | y i (for all i such that c i = c ) 3. 10-708 23 Conclusion � We now have a statistically principled mechanism for solving our original problem. � This was intended as a general and fairly shallow overview of Dirichlet Processes. 10-708 24 12

Acknowledgments � Much thanks goes to David Blei. � Some material for this presentation was inspired by slides from Teg Grenager and Zoubin Ghahramani. 10-708 25 References Blei, David M. and Michael I. Jordan. “Variational inference for Dirichlet process mixtures.” Bayesian Analysis 1(1), 2006. R.M. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics , 9:249-265, 2000. Ghahramani, Zoubin. “Non-parametric Bayesian Methods.” UAI Tutorial July 2005. Grenager, Teg. “Chinese Restaurants and Stick Breaking: An Introduction to the Dirichlet Process” Blackwell, David and James B. MacQueen. “Ferguson Distributions via Polya Urn Schemes.” The Annals of Statistics 1(2), 1973, 353-355. Ferguson, Thomas S. “A Bayesian Analysis of Some Nonparametric Problems” The Annals of Statistics 1(2), 1973, 209-230. 10-708 26 13

Motivation We are given a data set, and are told that it was - PDF document

Dirichlet Process Mixtures A gentle tutorial Graphical Models 10708 Khalid El-Arini Carnegie Mellon University November 6 th , 2006 1 Motivation We are given a data set, and are told that it was generated from a mixture of Gaussians.

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

WLCG Storage/ Data TEG: Recommenda7ons relevant for ROOT

TEG: A New Post-Layout TEG: A New Post-Layout Optimization Method Optimization Method Shuo

The Practical Neuroscience Of Lasting Happiness Marin Academy, 1.22.19 Rick Hanson, PhD.

Google what can possibly go wrong with my Sprint?

Announcements Homework 1: Search Has been released! Due Tuesday, Sep 4th, at 11:59pm .

CSE 473: Artificial Intelligence Spring 2014 A* Search Hanna Hajishirzi Based on slides

Housekeeping Cyhoeddiadau Badges Bathodynnau Paperless Dim Papur

A Trusted Safety Verifier for Processor Controller Code Stephen McLaughlin, Saman Zonouz, Devin

Motivation We are given a data set, and are told that it was - PDF document

Dirichlet Process Mixtures A gentle tutorial Graphical Models 10708 Khalid El-Arini Carnegie Mellon University November 6 th , 2006 1 Motivation We are given a data set, and are told that it was generated from a mixture of Gaussians.

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&amp;M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory &amp; practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

WLCG Storage/ Data TEG: Recommenda7ons relevant for ROOT

TEG: A New Post-Layout TEG: A New Post-Layout Optimization Method Optimization Method Shuo

The Practical Neuroscience Of Lasting Happiness Marin Academy, 1.22.19 Rick Hanson, PhD.

Google what can possibly go wrong with my Sprint?

Announcements Homework 1: Search Has been released! Due Tuesday, Sep 4th, at 11:59pm .

CSE 473: Artificial Intelligence Spring 2014 A* Search Hanna Hajishirzi Based on slides

Housekeeping Cyhoeddiadau Badges Bathodynnau Paperless Dim Papur

A Trusted Safety Verifier for Processor Controller Code Stephen McLaughlin, Saman Zonouz, Devin

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack