hierarchical dirichlet processes
play

Hierarchical Dirichlet Processes Sharing Clusters Among Related - PowerPoint PPT Presentation

Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups Dongruo Zhou 1 Difan Zou 2 Yaodong Yu 3 1 , 2 , 3 University of Virginia 12/15/2017 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet


  1. Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups Dongruo Zhou 1 Difan Zou 2 Yaodong Yu 3 1 , 2 , 3 University of Virginia 12/15/2017 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 1 / 31

  2. Outline Model Introduction 1 General Problem Setting Dirichlet Process Hierarchical Dirichlet Process Inference 2 Posterior Sampling Experiments 3 Document Modeling Multiple Corpora Questions 4 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 2 / 31

  3. Outline Model Introduction 1 General Problem Setting Dirichlet Process Hierarchical Dirichlet Process Inference 2 Posterior Sampling Experiments 3 Document Modeling Multiple Corpora Questions 4 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 3 / 31

  4. Mixture Model We are interested in problems where the observations are organized into groups, and assumed exchangeable both within each group and across groups. Let j index the groups and i index the observations within each group, then θ ji | G j ∼ G j , for each j , i x ji | θ ji ∼ F ( θ ji ) , for each j , i where θ ji is the factor variable, F ( θ ji ) is the distribution of x ji given θ ji , G j is the prior distribution for the factor θ ji . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 4 / 31

  5. Outline Model Introduction 1 General Problem Setting Dirichlet Process Hierarchical Dirichlet Process Inference 2 Posterior Sampling Experiments 3 Document Modeling Multiple Corpora Questions 4 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 5 / 31

  6. Definition A Dirichlet process DP( α 0 , G 0 ) is defined to be the distribution of a random probability measure G over measure space (Θ , B ) We say G j ∼ DP( α 0 , G 0 ) if for any finite measurable partition ( A 1 , . . . , A r ) of Θ, ( G j ( A 1 ) , . . . , G j ( A r )) ∼ Dir( α 0 G 0 ( A 1 ) , . . . , α 0 G 0 ( A r )) , where y ∼ Dir( β i , 1 ≤ i ≤ r ) iff j =1 x β i − 1 p ( y i = x i , 1 ≤ i ≤ r , � r j =1 x i = 1) ∼ � r . i Distribution of distributions. Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 6 / 31

  7. Direct view from Chinese restaurant process Hard to describe G j directly from formal definition! Can we describe θ i ∼ G j directly from α 0 , G 0 without using G j ? Chinese restaurant process Suppose θ 1 , . . . are conditionally independent given G j , then i − 1 δ θ l α 0 � θ i | θ 1 , . . . , θ i − 1 , α 0 , G 0 ∼ + G 0 . i − 1 + α 0 i − 1 + α 0 l =1 i − 1 With probability i − 1+ α 0 , θ i takes existing values in θ 1 , . . . , θ i − 1 ; with α 0 probability i − 1+ α 0 , θ i takes values from G 0 . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 7 / 31

  8. Outline Model Introduction 1 General Problem Setting Dirichlet Process Hierarchical Dirichlet Process Inference 2 Posterior Sampling Experiments 3 Document Modeling Multiple Corpora Questions 4 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 8 / 31

  9. Definition How about another distribution on G 0 ? We consider G 0 also satisfies Dirichlet process DP( γ, H ), and each G j are conditionally independent given G 0 , with distribution DP( α 0 , G 0 ), named G 0 | γ, H ∼ DP( γ, H ) , G j | α 0 , G 0 ∼ DP( α 0 , G 0 ) . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 9 / 31

  10. Definition H γ G G 0 0 α α G G j 0 0 θ θ ji i x ji x i Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 10 / 31

  11. Interpretation of HDP as Chinese restaurant process θ θ θ 18 14 16 θ θ 13 15 ψ ψ ψ φ φ φ θ θ θ = = = 11 11 1 12 12 2 17 13 1 θ 26 θ θ θ 22 ψ 24 ψ ψ 28 ψ φ φ φ φ θ θ θ θ = = = = 21 21 3 23 22 1 25 23 3 27 24 1 θ θ 36 35 θ φ 32 34 ψ ψ φ φ θ φ = = 31 31 1 33 32 2 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 11 / 31

  12. Interpretation of HDP as Chinese restaurant process From previous definition of Chinese restaurant process, we have i − 1 δ θ jl α 0 � θ ji | θ j 1 , . . . , θ j , i − 1 , α 0 , G 0 ∼ + G 0 . i − 1 + α 0 i − 1 + α 0 l =1 which can also be written as m j · n jt · α 0 � θ ji | θ j 1 , . . . , θ j , i − 1 , α 0 , G 0 ∼ δ ψ jt + G 0 , i − 1 + α 0 i − 1 + α 0 t =1 where ψ jt are distinct values appearing in θ j 1 , . . . , θ j , i − 1 , m j · represents how many different values ψ jt are, and n jt · represents how many times ψ jt appears in θ j 1 , . . . , θ j , i − 1 . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 12 / 31

  13. Interpretation of HDP as Chinese restaurant process Integrate out G 0 , we finally have ψ jt | ψ 11 , . . . , ψ 21 , . . . , ψ j 1 , . . . , ψ j , t − 1 , γ, H K m · k γ � ∼ m ·· + γ δ φ k + m ·· + γ H , k =1 where φ k represents all different values which appear before ψ jt , K represents the number of how many different values, m · k represents how many times φ k appears before ψ jt , m ·· = � K i =1 m · i . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 13 / 31

  14. Outline Model Introduction 1 General Problem Setting Dirichlet Process Hierarchical Dirichlet Process Inference 2 Posterior Sampling Experiments 3 Document Modeling Multiple Corpora Questions 4 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 14 / 31

  15. Posterior Sampling Observations: x ji ∼ F ( θ ji ) Factor θ ji ∼ G j : m j · n jt · α 0 � θ ji | θ j 1 , . . . , θ j , i − 1 , α 0 , G 0 ∼ δ ψ jt + G 0 , i − 1 + α 0 i − 1 + α 0 t =1 Random Variable ψ jt ∼ G 0 ψ jt | ψ 11 , . . . , ψ 21 , . . . , ψ j 1 , . . . , ψ j , t − 1 , γ, H K m · k γ � ∼ m ·· + γ δ φ k + m ·· + γ H , k =1 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 15 / 31

  16. Posterior Sampling in the Chinese Restaurant Franchise Purpose: sample θ ji and ψ jt given observations x . Simplification: We sample the indexes t and k rather than θ ji and ψ jt . We first show the conditional density of x ji under component k ( φ k ) given all data items except x ji as follows: � f ( x ji | φ k )Π j ′ i ′ � = ji f ( x j ′ i ′ | φ k ) d φ k f − x ji ( x ji ) = k � Π j ′ i ′ � = ji f ( x j ′ i ′ | φ k ) h ( φ ) d φ k where h ( φ k ) denotes the density function of H . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 16 / 31

  17. Sampling t If t takes on a particular previously used value t , we have p ( t ij = t | t − ji , k ) ∝ n − ji jt · . Then the posterior probability p ( t ij | t − ji , k , x ) satisfies, p ( t ij = t | t − ji , k , x ) ∝ p ( x ji | t ij = t , t − ji , k ) · p ( t ij = t | t − ji , k ) jt · f − x ji = n − ji ( x ji ) k jt If t ji takes on a new value t new , we have p ( t ij = t new | t − ji , k ) ∝ α 0 . Thus p ( t ij = t new | t − ji , k , x ) ∝ α 0 p ( x ji | t ij = t new , t − ji , k ) K m · k γ m ·· + γ f − x ji m ·· + γ f − x ji p ( x ji | t ij = t new , t − ji , k ) = � ( x ji ) + k new ( x ji ) k k =1 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 17 / 31

  18. Sampling k Following the last page, if sampled value of t ji is t new , then we have � m · k f − x ji p ( k jt new = k | t , k − jt new ) ∝ ( x ji ) k is previously used k γ f − x ji k = k new k new ( x ji ) if t ji = t , we have � m · k f − x jt p ( k jt new = k | t , k − jt new ) ∝ ( x jt ) k is previously used k γ f − x jt k = k new k new ( x jt ) where x jt = ( x ji : all i with t ji = t ). Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 18 / 31

  19. Outline Model Introduction 1 General Problem Setting Dirichlet Process Hierarchical Dirichlet Process Inference 2 Posterior Sampling Experiments 3 Document Modeling Multiple Corpora Questions 4 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 19 / 31

  20. Document Modeling Dataset: Corpus of nematode biology abstracts 1 5,838 abstracts in total Data Processing: Remove standard stop words and words appearing less than 10 times. Left with 476,441 words in total and a vocabulary size of 5,699 Representation: Use “bag of words” to represent a document 1 Available at http://elegans.swmed.edu/wli/cgcbib. Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 20 / 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend