bayesian inference for parameter estimation topic modeling
play

Bayesian Inference for Parameter Estimation + Topic Modeling Matt - PowerPoint PPT Presentation

10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Inference for Parameter Estimation + Topic Modeling Matt Gormley Lecture 20 Nov. 4, 2019 1


  1. 10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Inference for Parameter Estimation + Topic Modeling Matt Gormley Lecture 20 Nov. 4, 2019 1

  2. Reminders • Homework 3: Structured SVM – Out: Fri, Oct. 24 – Due: Wed, Nov. 6 at 11:59pm • Homework 4: Topic Modeling – Out: Wed, Nov. 6 – Due: Mon, Nov. 18 at 11:59pm 3

  3. TOPIC MODELING 4

  4. Topic Modeling Motivation: Suppose you’re given a massive corpora and asked to carry out the following tasks • Organize the documents into thematic categories • Describe the evolution of those categories over time • Enable a domain expert to analyze and understand the content • Find relationships between the categories • Understand how authorship influences the content

  5. Topic Modeling Motivation: Suppose you’re given a massive corpora and asked to carry out the following tasks • Organize the documents into thematic categories • Describe the evolution of those categories over time • Enable a domain expert to analyze and understand the content • Find relationships between the categories • Understand how authorship influences the content Topic Modeling: A method of (usually unsupervised) discovery of latent or hidden structure in a corpus • Applied primarily to text corpora, but techniques are more general • Provides a modeling toolbox • Has prompted the exploration of a variety of new inference methods to accommodate large-scale datasets

  6. Topic Modeling Dirichlet-multinomial regression (DMR) topic model on ICML (Mimno & McCallum, 2008) http:// www.cs.umass.edu/~mimno/icml100.html

  7. Topic Modeling • Map of NIH Grants (Talley et al., 2011) https://app.nihmaps.org/

  8. Other Applications of Topic Models • Spacial LDA (Wang & Grimson, 2007) Manual LDA SLDA

  9. Outline • Applications of Topic Modeling • Latent Dirichlet Allocation (LDA) 1. Beta-Bernoulli 2. Dirichlet-Multinomial 3. Dirichlet-Multinomial Mixture Model 4. LDA • Bayesian Inference for Parameter Estimation – Exact inference – EM – Monte Carlo EM – Gibbs sampler – Collapsed Gibbs sampler • Extensions of LDA – Correlated topic models – Dynamic topic models – Polylingual topic models – Supervised LDA

  10. BAYESIAN INFERENCE FOR NAÏVE BAYES 12

  11. Beta-Bernoulli Model • Beta Distribution 1 B ( α , β ) x α − 1 (1 − x ) β − 1 f ( φ | α , β ) = 4 3 α = 0 . 1 , β = 0 . 9 f ( φ | α , β ) α = 0 . 5 , β = 0 . 5 2 α = 1 . 0 , β = 1 . 0 α = 5 . 0 , β = 5 . 0 α = 10 . 0 , β = 5 . 0 1 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 φ

  12. Beta-Bernoulli Model • Generative Process ⇤ ∼ Beta ( � , ⇥ ) [ draw distribution over words ] For each word n ∈ { 1 , . . . , N } x n ∼ Bernoulli ( ⇤ ) [ draw word ] • Example corpus (heads/tails) H T T H H T T H H H x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10

  13. Dirichlet-Multinomial Model • Dirichlet Distribution 1 B ( α , β ) x α − 1 (1 − x ) β − 1 f ( φ | α , β ) = 4 3 α = 0 . 1 , β = 0 . 9 f ( φ | α , β ) α = 0 . 5 , β = 0 . 5 2 α = 1 . 0 , β = 1 . 0 α = 5 . 0 , β = 5 . 0 α = 10 . 0 , β = 5 . 0 1 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 φ

  14. Dirichlet-Multinomial Model • Dirichlet Distribution K ⇥ K k =1 Γ ( � k ) 1 p ( ⌅ ⇤ α k − 1 ⇤ ⇤ | α ) = where B ( α ) = k Γ ( � K B ( α ) k =1 � k ) k =1 15 3 10 p ( ~ � | ~ ↵ ) 2 . 5 p 5 ( � ~ | ~ ↵ ) 2 0 0 1 . 5 0 0 . 25 0 . 25 1 0 . 5 � 1 0 . 8 0 . 5 1 � 0 . 8 1 0 . 6 0 . 75 0 . 6 0 . 75 0 . 4 0 . 4 � 2 � 2 0 . 2 0 . 2 1 1 0 0

  15. Dirichlet-Multinomial Model • Generative Process φ ∼ Dir ( β ) [ draw distribution over words ] For each word n ∈ { 1 , . . . , N } x n ∼ Mult (1 , φ ) [ draw word ] • Example corpus the he is the and the she she is is x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10

  16. Dirichlet-Multinomial Mixture Model • Generative Process !"#$%& ,)$-!(.*/ '"%()*+!& • Example corpus the he is the and the she she is is x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 x 34 Document 1 Document 2 Document 3

  17. Dirichlet-Multinomial Mixture Model • Generative Process For each topic k ∈ { 1 , . . . , K } : φ k ∼ Dir ( β ) [ draw distribution over words ] θ ∼ Dir ( α ) [ draw distribution over topics ] For each document m ∈ { 1 , . . . , M } z m ∼ Mult (1 , θ ) [ draw topic assignment ] For each word n ∈ { 1 , . . . , N m } x mn ∼ Mult (1 , φ z m ) [ draw word ] • Example corpus the he is the and the she she is is x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 x 34 Document 1 Document 2 Document 3

  18. Bayesian Inference for Naïve Bayes Whiteboard : – Naïve Bayes is not Bayesian – What if we observed both words and topics? – Dirichlet-Multinomial in the fully observed setting is just Naïve Bayes – Three ways of estimating parameters: 1. MLE for Naïve Bayes 2. MAP estimation for Naïve Bayes 3. Bayesian parameter estimation for Naïve Bayes 20

  19. Dirichlet-Multinomial Model • The Dirichlet is conjugate to the Multinomial φ ∼ Dir ( β ) [ draw distribution over words ] For each word n ∈ { 1 , . . . , N } x n ∼ Mult (1 , φ ) [ draw word ] • The posterior of ⇤ is p ( ⇤ | X ) = p ( X | φ ) p ( φ ) P ( X ) • Define the count vector n such that n t denotes the number of times word t appeared • Then the posterior is also a Dirichlet distribution: p ( ⇤ | X ) ∼ Dir ( β + n )

  20. LATENT DIRICHLET ALLOCATION (LDA) 22

  21. Mixture vs. Admixture (LDA) !"#$%& ,)$-!(.*/ '"%()*+!& !"#$%& ,0')$-!(.*/ '"%()*+!& Diagrams from Wallach, JHU 2011, slides

  22. Latent Dirichlet Allocation • Generative Process !"#$%& ,0')$-!(.*/ '"%()*+!& • Example corpus the he is the and the she she is is x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 x 34 Document 1 Document 2 Document 3

  23. Latent Dirichlet Allocation • Generative Process For each topic k ∈ { 1 , . . . , K } : φ k ∼ Dir ( β ) [ draw distribution over words ] For each document m ∈ { 1 , . . . , M } θ m ∼ Dir ( α ) [ draw distribution over topics ] For each word n ∈ { 1 , . . . , N m } z mn ∼ Mult (1 , θ m ) [ draw topic assignment ] x mn ∼ φ z mi [ draw word ] • Example corpus the he is the and the she she is is x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 x 34 Document 1 Document 2 Document 3

  24. (Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 • The generative story begins with only a Dirichlet prior over the topics. • Each topic is defined as a Multinomial distribution over the vocabulary, parameterized by ϕ k 26

  25. (Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.012 0.012 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 • The generative story begins with only a Dirichlet prior over the topics. • Each topic is defined as a Multinomial distribution over the vocabulary, parameterized by ϕ k 27

  26. (Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 team, season, hockey, player, penguins, ice, canadiens, puck, montreal, stanley, cup • A topic is visualized as its high probability words . • A pedagogical label is used to identify the topic. 28

  27. (Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 {hockey} 0.000 0.000 0.000 0.000 team, season, hockey, player, penguins, ice, canadiens, puck, montreal, stanley, cup • A topic is visualized as its high probability words . • A pedagogical label is used to identify the topic. 29

  28. (Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 { Canadian gov. } {government} {hockey} {U.S. gov.} {baseball} {Japan} • A topic is visualized as its high probability words. • A pedagogical label is used to identify the topic. 30

  29. (Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 { Canadian gov. } {government} {hockey} {U.S. gov.} {baseball} {Japan} Dirichlet( α ) θ 1 = 31

  30. (Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 { Canadian gov. } {government} {hockey} {U.S. gov.} {baseball} {Japan} Dirichlet( α ) θ 1 = The 54/40' boundary dispute is still unresolved, and Canadian and US 32

  31. (Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 { Canadian gov. } {government} {hockey} {U.S. gov.} {baseball} {Japan} Dirichlet( α ) θ 1 = The 54/40' boundary dispute is still unresolved, and Canadian and US 33

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend