Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou - PowerPoint PPT Presentation

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1

Content • Introduction and Motivation • Dirichlet Processes • Hierarchical Dirichlet Processes – Definition – Three Analogs • Inference – Three Sampling Strategies 2

Introduction  Hierarchical approach to model-based clustering of grouped data  Find an unknown number of clusters to capture the structure of each group and allow for sharing among the groups  Documents with an arbitrary number of topics which are shared globably across the set of corpora.  A Dirichlet Process will be used as a prior mixture components  The DP will be extended to a HDP to allow for sharing clusters among related clustering problems 3

Motivation  Interested in problems with observations organized into groups  Let x ji be the ith observation of group j = x j = { x j1 , x j2 ...}  x ji is exchangeable with any other element of x j  For all j,k , x j is exchangeable with x k 4

Motivation  Assume each observation is drawn independently for a mixture model  Factor θ ji is the mixture component associated with x ji  Let F( θ ji ) be the distribution of x ji given θ ji  Let G j be the prior distribution of θ j1 , θ j2 ... which are conditionally independent given G j 5

The Dirichlet Process  Let ( Θ , β ) be a measureable space,  Let G 0 be a probability measure on that space  Let A = (A 1 ,A 2 ..,A r ) be a finite partition of that space  Let α 0 be a positive real number  G ~ DP( α 0, G 0 ) is defined s.t. for all A : 7

Stick Breaking Construction  The general idea is that the distribution G will be a weighted average of the distributions of a set of infinite random variables  2 infinite sets of i.i.d random variables  ϕ k ~ G 0 – Samples from the initial probability measure  π k ' ~ Beta (1, α 0 ) – Defines the weights of these samples 8

Stick Breaking Construction  π k ' ~ Beta (1, α 0 )  Define π k as 0 1 π 1 ' 1- π 1 ' ... (1- π 1 ' ) π 2 ' 9

Stick Breaking Construction  π k ~ GEM( α 0 )  These π k define the weight of drawing the value corresponding to ϕ k . 10

Polya urn scheme/ CRP  Let each θ 1 , θ 2 ,.. be i.i.d. Random variables distributed according to G  Consider the distribution of θ i , given θ 1 ,... θ i-1 , integrating out G:   11

Polya urn scheme  Consider a simple urn model representation. Each sample is a ball of a certain color  Balls are drawn equiprobably, and when a ball of color x is drawn, both that ball and a new ball of color x is returned to the urn  With Probability proportional to α 0 , a new atom is created from G 0 ,  A new ball of a new color is added to the urn 12

Polya urn scheme  Let ϕ 1 ... ϕ K be the distinct values taken on by θ 1 ,... θ i-1 ,  If m k is the number of values of θ 1 ,... θ i-1 , equal to ϕ k : 13

Chinese restaurant process: θ 4 θ 2 ϕ 1 ... ϕ 2 ϕ 3 θ 1 θ 3 14

Dirichlet Process Mixture Model  Dirichlet Process as nonparametric prior on the parameters of a mixture model:  15

Dirichlet Process Mixture Model  From the stick breaking representation:  θ i will be the distribution represented by ϕ k with probability π k  Let z i be the indicator variable representing which ϕ k θ i is associated with: 16

Infinite Limit of Finite Mixture Model  Consider a multinomial on L mixture components with parameters π = ( π 1 , … π L )  Let π have a symmetric Dirichlet prior with hyperparameters ( α 0 /L,.... α 0 /L)  If x i is drawn from a mixture component, z i , according to the defined distribution:  17

Infinite Limit of Finite Mixture Model  If , then as L approaches ∞ :  The marginal distribution of x 1 ,x 2 .... approaches that of a Dirichlet Process Mixture Model 18

HDP Definition • General idea – To model grouped data • Each group j <=> a Dirichlet process mixture model • Hierarchical prior to link these mixture models <=> hierarchical Dirichlet process – A hierarchical Dirichlet process is • A distribution over a set of random probability measures ( ) 20

HDP Definition (Cont.) • Formally, a hierarchical Dirichlet process defines – A set of random probability measures , one for each group j – A global random probability measure • is a distributed as a Dirichlet process is discrete! • are conditional independent given , also follow DP 21

Hierarchical Dirichlet Process Mixture Model • Hierarchical Dirichlet process as prior distribution over the factors for grouped data • For each group j – Each observation corresponds to a factor – The factors are i.i.d random. variables distributed as 22

Some Notices • HDP can be extended to more than two levels – The base measure H can be drawn from a DP, and so on and so forth – A tree can be formed • Each node is a DP • Children nodes are conditionally independent given their parent, which is a base measure • The atoms at a given node are shared among all its descendant nodes 23

Analog I: The stick-breaking construction • Stick-breaking representation of i.e., • Stick-breaking representation of i.e., 24

Equivalent representation using conditional distributions • 25

Analog II: the Chinese restaurant franchise • General idea: – Allow multiple restaurants to share a common menu, which includes a set of dishes – A restaurant has infinite tables, each table has only one dish 26

Notations • – The factor (dish) corresponding to • – The factors (dishes) drawn from H • – The dish chosen by table t in restaurant j • : the index of associated with • : the index of associated with 27

Conditional distributions • Integrate out G j (sampling table for customer) • Integrate out G 0 (sampling dish for table) Count notation: , number of customers in restaurant j, at table t, eating dish k , number of tables in restaurant j, eating dish k 28

Analog III: The infinite limit of finite mixture models • Two different finite models both yield HDPM – Global mixing proportions place a prior for group-specific mixing proportions As L goes infinity 29

– Each group choose a subset of T mixture components As L, T go to infinity 30

Introduction to three MCMC schemes • Assumption: H is conjugate to F – A straightforward Gibbs sampler based on Chinese restaurant franchise – An augmented representation involving both the Chinese restaurant franchise and the posterior for G 0 – A variation to scheme 2 with streamline bookkeeping 32

Conditional density of data under mixture component k • For data , conditional density under component k given all data items except is: • For data set , conditional density is similarly defined 33

Scheme I: Posterior sampling in the Chinese restaurant franchise • Sampling t and k – Sampling t – • If is a new t, sampling the k corresponding to it by • And 34

– Sampling k • Where is all the observations for table t in restaurant j 35

Scheme II: Posterior sampling with an augmented representation • Posterior of G 0 given : • An explicit construction for G 0 is given: 36

• Given a sample of G 0 , posterior for each group is factorized and sampling in each group can be performed separately • Sampling t and k : – Almost the same as in Scheme I • Except using to replace • When a new component k new is instantiated, draw , and set and 37

– Sampling for 38

Scheme III: Posterior sampling by direct assignment • Difference from Scheme I and II: – In I and II, data items are first assigned to some table t, and the tables are then assigned to some component k – In III, directly assign data items to component via variable , which is equivalent to • Tables are collapsed to numbers 39

• Sampling z : • Sampling m : • Sampling 40

Comparison of Sampling Schemes • In terms of ease of implementation – The direct assignment is better • In terms of convergence speed – Direct assignment changes the component membership of data items one at a time – Scheme I and II, component membership of one table will change the membership of multiple data items at the same time, leading to better performance 41

Applications • Hierarchical DP extension of LDA – In CRF representation: dishes are topics, customers are the observed words 42

Applications • HDP-HMM 43

References • Yee Whye Teh et. al., Hierarchical Dirichlet Processes, 2006 44

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou - PowerPoint PPT Presentation

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1 Content Introduction and Motivation Dirichlet Processes Hierarchical Dirichlet Processes Definition Three Analogs Inference Three

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Hierarchical Dirichlet Processes AMS 241, Fall 2010 Vadim von Brzeski vvonbrze@ucsc.edu

Nested Hierarchical Dirichlet Processes John Paisley, Chong Wang, David M. Blei, and Michael I.

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups Dongruo Zhou 1 Difan Zou 2

Reliable Variational Learning for Hierarchical Dirichlet Processes Erik Sudderth Brown University

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung &

Boundary Representation of Dirichlet Forms on Canonically Compactifiable Graphs Michael Schwarz

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Lecture 14: Inference in Dirichlet Processes (Blei & Jordan, Variational inference for

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

WISCONSIN TAX UPDATE Passthrough entities Corporations Withholding Sales and use

fml Introduction Features Live Demo Summary Outline Introduction 1 Features 2 Live Demo

The Algebra and Arithmetic of Vector-Valued Modular Forms on 0 (2) Richard Gottesman

M odels for Inexact Reasoning Fuzzy Logic Lesson 5 Fuzzy Relations M aster in Computational

Foundations and Theoretical Aspects of Propositional Satisfiability John Franco

ProtoDUNE-DP WA105 - 640 channels Signal Feedthrough Chimney configuration Overpressure check

Feel me Flow: A Review of Control-Flow Integrity Methods for User and Kernel Space Irene

Dynamical Supersymmetry Breaking from D-branes at Singularities Angel M. Uranga TH Division,

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou - PowerPoint PPT Presentation

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1 Content Introduction and Motivation Dirichlet Processes Hierarchical Dirichlet Processes Definition Three Analogs Inference Three

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Hierarchical Dirichlet Processes AMS 241, Fall 2010 Vadim von Brzeski vvonbrze@ucsc.edu

Nested Hierarchical Dirichlet Processes John Paisley, Chong Wang, David M. Blei, and Michael I.

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups Dongruo Zhou 1 Difan Zou 2

Reliable Variational Learning for Hierarchical Dirichlet Processes Erik Sudderth Brown University

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung &amp;

Boundary Representation of Dirichlet Forms on Canonically Compactifiable Graphs Michael Schwarz

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Lecture 14: Inference in Dirichlet Processes (Blei &amp; Jordan, Variational inference for

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

WISCONSIN TAX UPDATE Passthrough entities Corporations Withholding Sales and use

fml Introduction Features Live Demo Summary Outline Introduction 1 Features 2 Live Demo

The Algebra and Arithmetic of Vector-Valued Modular Forms on 0 (2) Richard Gottesman

M odels for Inexact Reasoning Fuzzy Logic Lesson 5 Fuzzy Relations M aster in Computational

Foundations and Theoretical Aspects of Propositional Satisfiability John Franco

ProtoDUNE-DP WA105 - 640 channels Signal Feedthrough Chimney configuration Overpressure check

Feel me Flow: A Review of Control-Flow Integrity Methods for User and Kernel Space Irene

Dynamical Supersymmetry Breaking from D-branes at Singularities Angel M. Uranga TH Division,

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung &

Lecture 14: Inference in Dirichlet Processes (Blei & Jordan, Variational inference for