Nested Hierarchical Dirichlet Processes John Paisley, Chong Wang, - PowerPoint PPT Presentation

Nested Hierarchical Dirichlet Processes John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan Review by David Carlson John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 1 / 25

Overview Dirichlet process (DP) Nested Chinese restaurant process topic model (nCRP) Hierarchical Dirichlet process topic model (HDP) Nested Hierarchical Dirichlet process topic model (nHDP) Outline of stochastic variational Bayesian procedure Results John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 2 / 25

Dirichlet Process In general, we can write a that a distribution G drawn from a Dirichlet process can be written as: G ∼ DP ( α G 0 ) (1) ∞ � G = p i δ θ i (2) i = 1 where p i is a probability and each θ i is an atom. We can construct a Dirichlet process mixture model over data W 1 , ..., W N : W n | ϕ n ∼ F W ( ϕ n ) (3) ϕ n | G ∼ G (4) John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 3 / 25

Generating the Dirichlet process There are two common methods for generating the Dirichlet process. The first is the Chinese restaurant process , where we integrate out G to get the a distribution for ϕ n + 1 given the previous values as: n α 1 � ϕ n + 1 | ϕ 1 , ..., ϕ n ∼ α + nG 0 + α + n δ ϕ i (5) i = 1 The second commonly used method is a stick-breaking construction . in this case, one can construct G as:   i − 1 ∞ iid iid � �  δ θ i , G = V i ( 1 − V j ) V i ∼ Beta ( 1 , α ) , θ i ∼ G 0 (6)  i = 1 j = 1 Because the stick-breaking construction maintains the independence among ϕ 1 , ..., ϕ N is has advantages over the CRP during mean-field variational inference. John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 4 / 25

Nested Chinese restaurant processes The CRP (or DP) is a flat model. Often, it is of interest to organize the topics (or atoms) hierarchically to have subcategories of larger categories in a tree-structure. One way to construct such a hierarchical data structure is through the nested Chinese restaurant process (nCRP). John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 5 / 25

Nested Chinese restaurant processes As an analogy, consider an extension of the CRP analogy. Each customer selects a table (parameter) according to the CRP . From that table, the customer chooses a restaurant accessible only from the table, where he/she chooses a table from that restaurant specific CRP .   As shown in the image, each customer (document) that draws from the CRP chooses a single path down the tree. John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 6 / 25

Modeling the nCRP Let i l = ( i 1 , ..., i l ) be a path to a node at level l of the tree. Then we can define the DP at the end of this path as: j − 1 ∞ � � G i l = V ( i l , j ) ( 1 − V ( i l , m ) ) δ θ ( i l , j ) (7) m = 1 j = 1 If the next node is child j, then the nCRP transitions to the DP G i l + 1 , where we define i l + 1 = ( i l , j ) John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 7 / 25

Nested CRP topic models We can use the nCRP to define a path down a shared tree, but we want to use this tree to model the data. One application of the tree-structure is a topic model, where we would define each atom θ i l , j defines a topic. θ i l , j ∼ Dir ( η ) (8) Each document in the nCRP would choose one path down the tree according to a Markov process, and the path provides a sequence of topics ϕ d = ( ϕ d , 1 , ϕ d , 2 , ... ) which we can use to generate the words in the document. The distribution over these topics is provided by a new document-specific stick-breaking process: j − 1 ∞ G ( d ) = iid � � U d , j ( 1 − U d , m ) δ ϕ d , j , U d , j ∼ Beta ( γ 1 , γ 2 ) (9) j = 1 m = 1 John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 8 / 25

Problems with nCRP There are several problems with the nCRP , including: Each document is only allowed to follow one path down the tree, limiting the number of topics for each document to the number of levels (typically ≤ 4), which can force topics to blend (have less specificity) Topics are often repeated on many different parts of the tree if they appear as random effects in documents The tree is shared, but very few topics are shared between a set of documents because they each follow independent single paths down the tree We would like to be able to learn a distribution over the entire shared tree for each document to give a more flexible modeling structure. The solution given to this problem is the nested hierarchical Dirichlet process. John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 9 / 25

Hierarchical Dirichlet processes The HDP is a multi-level version of the Dirichlet process. This is described as the hierarchical process: G d | G ∼ DP ( β G ) , G ∼ DP ( α G 0 ) (10) In this case, we have that each document has it’s own DP ( G d ) which is drawn from a shared DP G . In this way, the weights on each topic (atom) are allowed to vary smoothly from document to document, but still share statistical strength. This can be represented as a stick-breaking process as well: i − 1 ∞ iid iid � V d � ( 1 − V d V d G d = j ) δ φ i , ∼ Beta ( 1 , β ) , φ i ∼ G (11) i i i = 1 j = 1 John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 10 / 25

Nested Hierarchical Dirichlet Processes The nHDP formulation allows ( i ) each word to follow its own path to a topic, and (ii) each topic its own distribution over a shared tree. To formulate the nHDP , let a tree T be a draw from the global nCRP with stick-breaking construction. Instead of drawing a path for each document, we use each Dirichlet process in T as a base for a second level DP drawn independently for each document. In order words, each document d has tree T d , where for each G i l ∈ T we draw: G ( d ) ∼ DP ( β G i l ) (12) i l John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 11 / 25

Nested Hierarchical Dirichlet Processes We can write the second level DP as: j − 1 ∞ iid iid G ( d ) V ( d ) ( 1 − V ( d ) i l , j , V ( d ) ∼ Beta ( 1 , β ) , φ ( d ) � � = i l , m ) δ φ ( d ) ∼ G i l (13) i l i l , j ( i i , j i , j j = 1 m = 1 However, we would like to maintain the same tree structure in T d as in T . To do this, we can map the probabilities, so that the probability of being on node θ ( i l , j ) in document d is: G ( d ) G ( d ) i l ( { φ ( d ) i l , m } ) I ( φ ( d ) � i l ( { θ ( i l , j ) } ) = i l , m = θ ( i l , j ) ) (14) m John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 12 / 25

Generating a Document After generating the tree T d for document d , we draw document-specific beta random variables that act as a stochastic switch. I.E. if a word is at node i l , it determines the probability that the word uses the topic at that node or continues down the tree. So we stop at node i l with probability: idd U d , i l ∼ Beta ( γ 1 , γ 2 ) (15) From the stick-breaking construction, the probability that the topic ϕ d , n = θ i l for word W d , n is:   l − 1 � � G ( d )  � � Pr ( ϕ d , n = θ i l |T d , U d ) = i m ( { θ i m + 1 } ) U d , i l ( 1 − U d , i m )  m = 1 i m ⊂ i l (16) John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 13 / 25

Generative Procedure Algorithm 1 Generating Documents with the Nested Hierarchical Dirichlet Process Step 1. Generate a global tree T by constructing an nCRP as in Section II-B1. Step 2. Generate document tree T d and switching probabilities U ( d ) . For document d , a) For each DP in T , draw a second-level DP with this base distribution (Equation 8). b) For each node in T d (equivalently T ), draw a beta random variable (Equation 10). Step 3. Generate the documents. For word n in document d , a) Sample atom ' n,d = θ i l with probability given in Equation (11). b) Sample W n,d from the discrete distribution with parameter ' d,n . John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by David Carlson 14 / 25

Nested Hierarchical Dirichlet Processes John Paisley, Chong Wang, - PowerPoint PPT Presentation

Nested Hierarchical Dirichlet Processes John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan Review by David Carlson John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1 Content

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Nested Word Automata Jens Stimpfle 30.6.2014 Nested Words Nested Words Theoretically and

Hierarchical Dirichlet Processes AMS 241, Fall 2010 Vadim von Brzeski vvonbrze@ucsc.edu

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups Dongruo Zhou 1 Difan Zou 2

Nested Transactions Nested Transactions Flat transactions The rules for committing of

Nested and Composite Classes Lecture 14 COP 3252 Summer 2017 May 30, 2017 Nested Classes

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

6 Subsequences and sequential compactness 6.1 Nested intervals and nested d -cells Recall the

NEVE: Nested Virtualization Extensions for ARM Jin Tack Lim, Christo ff er Dall, Shih-Wei Li, Jason

Reliable Variational Learning for Hierarchical Dirichlet Processes Erik Sudderth Brown University

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung &

Boundary Representation of Dirichlet Forms on Canonically Compactifiable Graphs Michael Schwarz

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

IOGCC-ECOS Collaboration Opportunities Sam Sankar October 1, 2018 IOGCC Fall Conference What

energy to make things work for you. 1 Getting to know PSE&G Combined electric and gas

Tallinn 26 June 2018 SHERPA in the overall context Visualisation & Interpretation Aim :

Provisional Objective To better support the dynamism of New Zealand aviation, through a regular

Environment and Natural Resources Trust Fund 2012-2013 Request for Proposals (RFP) 015-B ENRTF

Energy Management System National Energy Efficiency Conference September 2012 PCMSB Company

Activities on Assessment of Environmental Policies and Dissemination at AIT Ram M. Shrestha

Pathways to the Middle Class: Balancing Personal and Public Responsibilities Isabel V. Sawhill

Nested Hierarchical Dirichlet Processes John Paisley, Chong Wang, - PowerPoint PPT Presentation

Nested Hierarchical Dirichlet Processes John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan Review by David Carlson John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan () Nested Hierarchical Dirichlet Processes Review by

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1 Content

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Nested Word Automata Jens Stimpfle 30.6.2014 Nested Words Nested Words Theoretically and

Hierarchical Dirichlet Processes AMS 241, Fall 2010 Vadim von Brzeski vvonbrze@ucsc.edu

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups Dongruo Zhou 1 Difan Zou 2

Nested Transactions Nested Transactions Flat transactions The rules for committing of

Nested and Composite Classes Lecture 14 COP 3252 Summer 2017 May 30, 2017 Nested Classes

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

6 Subsequences and sequential compactness 6.1 Nested intervals and nested d -cells Recall the

NEVE: Nested Virtualization Extensions for ARM Jin Tack Lim, Christo ff er Dall, Shih-Wei Li, Jason

Reliable Variational Learning for Hierarchical Dirichlet Processes Erik Sudderth Brown University

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung &amp;

Boundary Representation of Dirichlet Forms on Canonically Compactifiable Graphs Michael Schwarz

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

IOGCC-ECOS Collaboration Opportunities Sam Sankar October 1, 2018 IOGCC Fall Conference What

energy to make things work for you. 1 Getting to know PSE&amp;G Combined electric and gas

Tallinn 26 June 2018 SHERPA in the overall context Visualisation &amp; Interpretation Aim :

Provisional Objective To better support the dynamism of New Zealand aviation, through a regular

Environment and Natural Resources Trust Fund 2012-2013 Request for Proposals (RFP) 015-B ENRTF

Energy Management System National Energy Efficiency Conference September 2012 PCMSB Company

Activities on Assessment of Environmental Policies and Dissemination at AIT Ram M. Shrestha

Pathways to the Middle Class: Balancing Personal and Public Responsibilities Isabel V. Sawhill

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung &

energy to make things work for you. 1 Getting to know PSE&G Combined electric and gas

Tallinn 26 June 2018 SHERPA in the overall context Visualisation & Interpretation Aim :