Hierarchical Dirichlet Processes Sharing Clusters Among Related - PowerPoint PPT Presentation

Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups Dongruo Zhou 1 Difan Zou 2 Yaodong Yu 3 1 , 2 , 3 University of Virginia 12/15/2017 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 1 / 31

Outline Model Introduction 1 General Problem Setting Dirichlet Process Hierarchical Dirichlet Process Inference 2 Posterior Sampling Experiments 3 Document Modeling Multiple Corpora Questions 4 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 2 / 31

Mixture Model We are interested in problems where the observations are organized into groups, and assumed exchangeable both within each group and across groups. Let j index the groups and i index the observations within each group, then θ ji | G j ∼ G j , for each j , i x ji | θ ji ∼ F ( θ ji ) , for each j , i where θ ji is the factor variable, F ( θ ji ) is the distribution of x ji given θ ji , G j is the prior distribution for the factor θ ji . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 4 / 31

Definition A Dirichlet process DP( α 0 , G 0 ) is defined to be the distribution of a random probability measure G over measure space (Θ , B ) We say G j ∼ DP( α 0 , G 0 ) if for any finite measurable partition ( A 1 , . . . , A r ) of Θ, ( G j ( A 1 ) , . . . , G j ( A r )) ∼ Dir( α 0 G 0 ( A 1 ) , . . . , α 0 G 0 ( A r )) , where y ∼ Dir( β i , 1 ≤ i ≤ r ) iff j =1 x β i − 1 p ( y i = x i , 1 ≤ i ≤ r , � r j =1 x i = 1) ∼ � r . i Distribution of distributions. Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 6 / 31

Direct view from Chinese restaurant process Hard to describe G j directly from formal definition! Can we describe θ i ∼ G j directly from α 0 , G 0 without using G j ? Chinese restaurant process Suppose θ 1 , . . . are conditionally independent given G j , then i − 1 δ θ l α 0 � θ i | θ 1 , . . . , θ i − 1 , α 0 , G 0 ∼ + G 0 . i − 1 + α 0 i − 1 + α 0 l =1 i − 1 With probability i − 1+ α 0 , θ i takes existing values in θ 1 , . . . , θ i − 1 ; with α 0 probability i − 1+ α 0 , θ i takes values from G 0 . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 7 / 31

Definition How about another distribution on G 0 ? We consider G 0 also satisfies Dirichlet process DP( γ, H ), and each G j are conditionally independent given G 0 , with distribution DP( α 0 , G 0 ), named G 0 | γ, H ∼ DP( γ, H ) , G j | α 0 , G 0 ∼ DP( α 0 , G 0 ) . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 9 / 31

Definition H γ G G 0 0 α α G G j 0 0 θ θ ji i x ji x i Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 10 / 31

Interpretation of HDP as Chinese restaurant process θ θ θ 18 14 16 θ θ 13 15 ψ ψ ψ φ φ φ θ θ θ = = = 11 11 1 12 12 2 17 13 1 θ 26 θ θ θ 22 ψ 24 ψ ψ 28 ψ φ φ φ φ θ θ θ θ = = = = 21 21 3 23 22 1 25 23 3 27 24 1 θ θ 36 35 θ φ 32 34 ψ ψ φ φ θ φ = = 31 31 1 33 32 2 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 11 / 31

Interpretation of HDP as Chinese restaurant process From previous definition of Chinese restaurant process, we have i − 1 δ θ jl α 0 � θ ji | θ j 1 , . . . , θ j , i − 1 , α 0 , G 0 ∼ + G 0 . i − 1 + α 0 i − 1 + α 0 l =1 which can also be written as m j · n jt · α 0 � θ ji | θ j 1 , . . . , θ j , i − 1 , α 0 , G 0 ∼ δ ψ jt + G 0 , i − 1 + α 0 i − 1 + α 0 t =1 where ψ jt are distinct values appearing in θ j 1 , . . . , θ j , i − 1 , m j · represents how many different values ψ jt are, and n jt · represents how many times ψ jt appears in θ j 1 , . . . , θ j , i − 1 . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 12 / 31

Interpretation of HDP as Chinese restaurant process Integrate out G 0 , we finally have ψ jt | ψ 11 , . . . , ψ 21 , . . . , ψ j 1 , . . . , ψ j , t − 1 , γ, H K m · k γ � ∼ m ·· + γ δ φ k + m ·· + γ H , k =1 where φ k represents all different values which appear before ψ jt , K represents the number of how many different values, m · k represents how many times φ k appears before ψ jt , m ·· = � K i =1 m · i . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 13 / 31

Posterior Sampling Observations: x ji ∼ F ( θ ji ) Factor θ ji ∼ G j : m j · n jt · α 0 � θ ji | θ j 1 , . . . , θ j , i − 1 , α 0 , G 0 ∼ δ ψ jt + G 0 , i − 1 + α 0 i − 1 + α 0 t =1 Random Variable ψ jt ∼ G 0 ψ jt | ψ 11 , . . . , ψ 21 , . . . , ψ j 1 , . . . , ψ j , t − 1 , γ, H K m · k γ � ∼ m ·· + γ δ φ k + m ·· + γ H , k =1 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 15 / 31

Posterior Sampling in the Chinese Restaurant Franchise Purpose: sample θ ji and ψ jt given observations x . Simplification: We sample the indexes t and k rather than θ ji and ψ jt . We first show the conditional density of x ji under component k ( φ k ) given all data items except x ji as follows: � f ( x ji | φ k )Π j ′ i ′ � = ji f ( x j ′ i ′ | φ k ) d φ k f − x ji ( x ji ) = k � Π j ′ i ′ � = ji f ( x j ′ i ′ | φ k ) h ( φ ) d φ k where h ( φ k ) denotes the density function of H . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 16 / 31

Sampling t If t takes on a particular previously used value t , we have p ( t ij = t | t − ji , k ) ∝ n − ji jt · . Then the posterior probability p ( t ij | t − ji , k , x ) satisfies, p ( t ij = t | t − ji , k , x ) ∝ p ( x ji | t ij = t , t − ji , k ) · p ( t ij = t | t − ji , k ) jt · f − x ji = n − ji ( x ji ) k jt If t ji takes on a new value t new , we have p ( t ij = t new | t − ji , k ) ∝ α 0 . Thus p ( t ij = t new | t − ji , k , x ) ∝ α 0 p ( x ji | t ij = t new , t − ji , k ) K m · k γ m ·· + γ f − x ji m ·· + γ f − x ji p ( x ji | t ij = t new , t − ji , k ) = � ( x ji ) + k new ( x ji ) k k =1 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 17 / 31

Sampling k Following the last page, if sampled value of t ji is t new , then we have � m · k f − x ji p ( k jt new = k | t , k − jt new ) ∝ ( x ji ) k is previously used k γ f − x ji k = k new k new ( x ji ) if t ji = t , we have � m · k f − x jt p ( k jt new = k | t , k − jt new ) ∝ ( x jt ) k is previously used k γ f − x jt k = k new k new ( x jt ) where x jt = ( x ji : all i with t ji = t ). Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 18 / 31

Document Modeling Dataset: Corpus of nematode biology abstracts 1 5,838 abstracts in total Data Processing: Remove standard stop words and words appearing less than 10 times. Left with 476,441 words in total and a vocabulary size of 5,699 Representation: Use “bag of words” to represent a document 1 Available at http://elegans.swmed.edu/wli/cgcbib. Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 20 / 31

Hierarchical Dirichlet Processes Sharing Clusters Among Related - PowerPoint PPT Presentation

Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups Dongruo Zhou 1 Difan Zou 2 Yaodong Yu 3 1 , 2 , 3 University of Virginia 12/15/2017 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1 Content

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Hierarchical Dirichlet Processes AMS 241, Fall 2010 Vadim von Brzeski vvonbrze@ucsc.edu

Nested Hierarchical Dirichlet Processes John Paisley, Chong Wang, David M. Blei, and Michael I.

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Reliable Variational Learning for Hierarchical Dirichlet Processes Erik Sudderth Brown University

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung &

Boundary Representation of Dirichlet Forms on Canonically Compactifiable Graphs Michael Schwarz

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Lecture 14: Inference in Dirichlet Processes (Blei & Jordan, Variational inference for

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Vertjcal Profjles of WaveCoherent Momentum Flux and Velocity Variances in the Marine

Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals

SIZE OPTIMIZATION OF WIND PHOTOVOLTAIC SYSTEMS Dr. Fatih Onur HOCAO LU Afyon Kocatepe

Specification of APERTIF Polyphase Filter Bank in C aSH Rinse Wester a , Dimitrios Sarakiotis a

Far-from-equilibrium dynamics of molecules in 4 He nanodroplets: a quasiparticle perspective

A RECY A RECYCL CLING ING SO SOCIETY CIETY TH THE E NATIO TIONAL AL WASTE WASTE PLA

FINNISH ECONOMY HELSINKI 1 FINLAND President Ms. Tarja HALONEN MER Prime Minister Prime

The Digital Competences and Agency of Older People Living in Rural Villages in Finnish Lapland

Hierarchical Dirichlet Processes Sharing Clusters Among Related - PowerPoint PPT Presentation

Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups Dongruo Zhou 1 Difan Zou 2 Yaodong Yu 3 1 , 2 , 3 University of Virginia 12/15/2017 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1 Content

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Hierarchical Dirichlet Processes AMS 241, Fall 2010 Vadim von Brzeski vvonbrze@ucsc.edu

Nested Hierarchical Dirichlet Processes John Paisley, Chong Wang, David M. Blei, and Michael I.

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Reliable Variational Learning for Hierarchical Dirichlet Processes Erik Sudderth Brown University

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung &amp;

Boundary Representation of Dirichlet Forms on Canonically Compactifiable Graphs Michael Schwarz

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Lecture 14: Inference in Dirichlet Processes (Blei &amp; Jordan, Variational inference for

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Vertjcal Profjles of WaveCoherent Momentum Flux and Velocity Variances in the Marine

Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals

SIZE OPTIMIZATION OF WIND PHOTOVOLTAIC SYSTEMS Dr. Fatih Onur HOCAO LU Afyon Kocatepe

Specification of APERTIF Polyphase Filter Bank in C aSH Rinse Wester a , Dimitrios Sarakiotis a

Far-from-equilibrium dynamics of molecules in 4 He nanodroplets: a quasiparticle perspective

A RECY A RECYCL CLING ING SO SOCIETY CIETY TH THE E NATIO TIONAL AL WASTE WASTE PLA

FINNISH ECONOMY HELSINKI 1 FINLAND President Ms. Tarja HALONEN MER Prime Minister Prime

The Digital Competences and Agency of Older People Living in Rural Villages in Finnish Lapland

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung &

Lecture 14: Inference in Dirichlet Processes (Blei & Jordan, Variational inference for