Department of Computer Science CSCI 5622: Machine Learning Chenhao - - PowerPoint PPT Presentation

department of computer science csci 5622 machine learning
SMART_READER_LITE
LIVE PREVIEW

Department of Computer Science CSCI 5622: Machine Learning Chenhao - - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 20: Topic modeling and variational inferrence Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1 Administrivia Poster printing (stay tuned!) HW 5


slide-1
SLIDE 1

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 20: Topic modeling and variational inferrence Slides adapted from Jordan Boyd-Graber, Chris Ketelsen

1

slide-2
SLIDE 2

Administrivia

  • Poster printing (stay tuned!)
  • HW 5 (final homework) is due next Friday!
  • Midpoint feedback

2

slide-3
SLIDE 3

Learning Objectives

  • Learn about latent Dirichlet allocation
  • Understand the inituion behind variational inference

3

slide-4
SLIDE 4

Topic models

  • Discrete count data

4

slide-5
SLIDE 5

Topic models

5

  • Suppose you have a huge number
  • f documents
  • Want to know what's going on
  • Can't read them all (e.g. every New

York Times article from the 90's)

  • Topic models offer a way to get a

corpus-level view of major themes

  • Unsupervised
slide-6
SLIDE 6

Why should you care?

  • Neat way to explore/understand corpus collections
  • E-discovery
  • Social media
  • Scientific data
  • NLP Applications
  • Word sense disambiguation
  • Discourse segmentation
  • Psychology: word meaning, polysemy
  • A general way to model count data and a general inference

algorithm

6

slide-7
SLIDE 7

Conceptual approach

  • Input: a text corpus and number of topics K
  • Output:
  • K topics, each topic is a list of words
  • Topic assignment for each document

7

Forget the Bootleg, Just Download the Movie Legally Multiplex Heralded As Linchpin To Growth The Shape of Cinema, Transformed At the Click of a Mouse A Peaceful Crew Puts Muppets Where Its Mouth Is Stock Trades: A Better Deal For Investors Isn't Simple The three big Internet portals begin to distinguish among themselves as shopping malls Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens

Corpus

slide-8
SLIDE 8

Conceptual approach

8

computer, technology, system, service, site, phone, internet, machine play, film, movie, theater, production, star, director, stage sell, sale, store, product, business, advertising, market, consumer

TOPIC 1 TOPIC 2 TOPIC 3

  • K topics, each topic is a list of words
slide-9
SLIDE 9

Conceptual approach

9

  • Topic assignment for each document

Forget the Bootleg, Just Download the Movie Legally Multiplex Heralded As Linchpin To Growth The Shape of Cinema, Transformed At the Click of a Mouse A Peaceful Crew Puts Muppets Where Its Mouth Is Stock Trades: A Better Deal For Investors Isn't Simple Internet portals begin to distinguish among themselves as shopping malls Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens

TOPIC 2 "BUSINESS" TOPIC 3 "ENTERTAINMENT" TOPIC 1 "TECHNOLOGY"

slide-10
SLIDE 10

Topics from Science

10

slide-11
SLIDE 11

Topic models

  • Discrete count data
  • Gaussian distributions are not appropriate

11

slide-12
SLIDE 12

Generative model: Latent Dirichlet Allocation

  • Generate a document, or a bag of words
  • Blei, Ng, Jordan. Latent Dirichlet Allocation. JMLR, 2003.

12

slide-13
SLIDE 13

Generative model: Latent Dirichlet Allocation

  • Generate a document, or a bag
  • f words
  • Multinomial distribution
  • Distribution over discrete
  • utcomes
  • Represented by non-negative

vector that sums to one

  • Picture representation

13

(1,0,0) (0,0,1) (1/2,1/2,0) (1/3,1/3,1/3) (1/4,1/4,1/2) (0,1,0)

slide-14
SLIDE 14

Generative model: Latent Dirichlet Allocation

  • Generate a document, or a bag
  • f words
  • Multinomial distribution
  • Distribution over discrete
  • utcomes
  • Represented by non-negative

vector that sums to one

  • Picture representation
  • Come from a Dirichlet distribution

14

(1,0,0) (0,0,1) (1/2,1/2,0) (1/3,1/3,1/3) (1/4,1/4,1/2) (0,1,0)

slide-15
SLIDE 15

Generative story

15

computer, technology, system, service, site, phone, internet, machine play, film, movie, theater, production, star, director, stage sell, sale, store, product, business, advertising, market, consumer

TOPIC 1 TOPIC 2 TOPIC 3

slide-16
SLIDE 16

Generative story

16 Forget the Bootleg, Just Download the Movie Legally Multiplex Heralded As Linchpin To Growth The Shape of Cinema, Transformed At the Click of a Mouse A Peaceful Crew Puts Muppets Where Its Mouth Is Stock Trades: A Better Deal For Investors Isn't Simple The three big Internet portals begin to distinguish among themselves as shopping malls Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens

TOPIC 2 TOPIC 3 TOPIC 1

slide-17
SLIDE 17

Generative story

17

Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ...

computer, technology, system, service, site, phone, internet, machine play, film, movie, theater, production, star, director, stage sell, sale, store, product, business, advertising, market, consumer

slide-18
SLIDE 18

Generative story

18

Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ...

computer, technology, system, service, site, phone, internet, machine play, film, movie, theater, production, star, director, stage sell, sale, store, product, business, advertising, market, consumer

slide-19
SLIDE 19

Generative story

19

Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ...

computer, technology, system, service, site, phone, internet, machine play, film, movie, theater, production, star, director, stage sell, sale, store, product, business, advertising, market, consumer

slide-20
SLIDE 20

Generative story

20

Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ...

computer, technology, system, service, site, phone, internet, machine play, film, movie, theater, production, star, director, stage sell, sale, store, product, business, advertising, market, consumer

slide-21
SLIDE 21

Missing component: how to generate a multinomial distribution

21

slide-22
SLIDE 22

Missing component: how to generate a multinomial distribution

22

slide-23
SLIDE 23

Missing component: how to generate a multinomial distribution

23

slide-24
SLIDE 24

Conjugacy of Dirichlet and Multinomial

24

  • If φ ∼ Dir(α), w ∼ Mult(φ), and nk = |{wi : wi = k}| then

p(φ|α, w) ∝ p(w|φ)p(φ|α) (1) Y Y

slide-25
SLIDE 25

Conjugacy of Dirichlet and Multinomial

25

  • If φ ∼ Dir(α), w ∼ Mult(φ), and nk = |{wi : wi = k}| then

p(φ|α, w) ∝ p(w|φ)p(φ|α) (1) ∝ Y

k

φnk Y

k

φαk−1 (2) ∝ Y

k

φαk+nk−1 (3)

  • Conjugacy: this posterior has the same form as the prior
slide-26
SLIDE 26

Making the generative story formal

26

M N

θd zn wn

K

βk α λ

  • For each topic k ∈ {1, . . . , K}, draw a multinomial distribution βk

from a Dirichlet distribution with parameter λ

slide-27
SLIDE 27

27

M N

θd zn wn

K

βk α λ

  • For each topic k ∈ {1, . . . , K}, draw a multinomial distribution βk

from a Dirichlet distribution with parameter λ

  • For each document d ∈ {1, . . . , M}, draw a multinomial

distribution θd from a Dirichlet distribution with parameter α

Making the generative story formal

slide-28
SLIDE 28

28

Making the generative story formal

M N

θd zn wn

K

βk α λ

  • For each topic k ∈ {1, . . . , K}, draw a multinomial distribution βk

from a Dirichlet distribution with parameter λ

  • For each document d ∈ {1, . . . , M}, draw a multinomial

distribution θd from a Dirichlet distribution with parameter α

  • For each word position n ∈ {1, . . . , N}, select a hidden topic zn

from the multinomial distribution parameterized by θ.

slide-29
SLIDE 29

29

Making the generative story formal

M N

θd zn wn

K

βk α λ

  • For each topic k ∈ {1, . . . , K}, draw a multinomial distribution βk

from a Dirichlet distribution with parameter λ

  • For each document d ∈ {1, . . . , M}, draw a multinomial

distribution θd from a Dirichlet distribution with parameter α

  • For each word position n ∈ {1, . . . , N}, select a hidden topic zn

from the multinomial distribution parameterized by θ.

  • Choose the observed word wn from the distribution βzn.
slide-30
SLIDE 30

Topic models: What’s important

  • Topic models (latent variables)
  • Topics to word types—multinomial distribution
  • Documents to topics—multinomial distribution
  • Modeling & Algorithm
  • Model: story of how your data came to be
  • Latent variables: missing pieces of your story
  • Statistical inference: filling in those missing pieces
  • We use latent Dirichlet allocation (LDA), a fully Bayesian

version of pLSI, probabilistic version of LSA

30

slide-31
SLIDE 31

31

Which variables are hidden?

slide-32
SLIDE 32

32

Size of Variable

slide-33
SLIDE 33

Joint distribution

33

slide-34
SLIDE 34

Joint distribution

34

slide-35
SLIDE 35

Posterior distribution

35

slide-36
SLIDE 36

Variational inference

36

slide-37
SLIDE 37

KL divergence and evidence lower bound

37

slide-38
SLIDE 38

KL divergence and evidence lower bound

38

slide-39
SLIDE 39

A different way to get ELBO

  • Jensen’s inequality

39

slide-40
SLIDE 40

Evidence Lower Bound

40

slide-41
SLIDE 41

Evidence Lower Bound

41

slide-42
SLIDE 42

Variational inference

  • Propose variational distribution q
  • Find ELBO (evidence lower bound) using q
  • Set derivatives to 0 and update variables

42

slide-43
SLIDE 43

Variational distribution for LDA

43

slide-44
SLIDE 44

Overall Algorithm

44

slide-45
SLIDE 45

Updates to Maximize ELBO

45

slide-46
SLIDE 46

Homework 5

  • The original algorithm also updates alphas
  • Not required for the homework

46

slide-47
SLIDE 47

Example

  • Three topics

β = 2 6 6 4 cat dog hamburger iron pig .26 .185 .185 .185 .185 .185 .185 .26 .185 .185 .185 .185 .185 .26 .185 3 7 7 5 (4)

  • Assume uniform γ: (2.0, 2.0, 2.0)
  • Compute update for φ

φni ∝ βiv exp @Ψ (γi) − Ψ @X

j

γj 1 A 1 A (5)

  • For the first word (dog) in the document: dog cat cat pig

47

slide-48
SLIDE 48

48

Update φ for dog β =     cat dog hamburger iron pig .26 .185 .185 .185 .185 .185 .185 .26 .185 .185 .185 .185 .185 .26 .185    

φni ∝ βiv exp @Ψ (γi) − Ψ @X

j

γj 1 A 1 A

  • γ = (2.000, 2.000, 2.000)
  • φ(0) ∝ 0.185 × exp (Ψ (2.000) − Ψ (2.000 + 2.000 + 2.000)) =

0.051

  • φ(1) ∝ 0.185 × exp (Ψ (2.000) − Ψ (2.000 + 2.000 + 2.000)) =

0.051

  • φ(2) ∝ 0.185 × exp (Ψ (2.000) − Ψ (2.000 + 2.000 + 2.000)) =

0.051

  • After normalization: {0.333, 0.333, 0.333}
slide-49
SLIDE 49

49

Update φ for pig β =     cat dog hamburger iron pig .26 .185 .185 .185 .185 .185 .185 .26 .185 .185 .185 .185 .185 .26 .185    

φni ∝ βiv exp @Ψ (γi) − Ψ @X

j

γj 1 A 1 A

  • γ = (2.000, 2.000, 2.000)
  • φ(0) ∝ 0.185 × exp (Ψ (2.000) − Ψ (2.000 + 2.000 + 2.000)) =

0.051

  • φ(1) ∝ 0.185 × exp (Ψ (2.000) − Ψ (2.000 + 2.000 + 2.000)) =

0.051

  • φ(2) ∝ 0.185 × exp (Ψ (2.000) − Ψ (2.000 + 2.000 + 2.000)) =

0.051

  • After normalization: {0.333, 0.333, 0.333}
slide-50
SLIDE 50

Update φ for cat β =     cat dog hamburger iron pig .26 .185 .185 .185 .185 .185 .185 .26 .185 .185 .185 .185 .185 .26 .185    

φni ∝ βiv exp @Ψ (γi) − Ψ @X

j

γj 1 A 1 A

  • γ = (2.000, 2.000, 2.000)
  • φ(0) ∝ 0.260 × exp (Ψ (2.000) − Ψ (2.000 + 2.000 + 2.000)) =

0.072

  • φ(1) ∝ 0.185 × exp (Ψ (2.000) − Ψ (2.000 + 2.000 + 2.000)) =

0.051

  • φ(2) ∝ 0.185 × exp (Ψ (2.000) − Ψ (2.000 + 2.000 + 2.000)) =

0.051

  • After normalization: {0.413, 0.294, 0.294}

50

slide-51
SLIDE 51

Update γ

  • Document: dog cat cat pig
  • Update equation

γi = αi + X

n

φni (6)

  • Assume α = (.1, .1, .1)

φ0 φ1 φ2 dog .333 .333 .333 cat .413 .294 .294 x2 pig .333 .333 .333 α 0.1 0.1 0.1 sum 1.592 1.354 1.354

  • Note: do not normalize!

51

slide-52
SLIDE 52

Update β

  • Count up all of the φ across all documents
  • For each topic, divide by total
  • Corresponds to maximum likelihood of expected counts

52

slide-53
SLIDE 53

Recap

  • Topic models: a neat way to model discrete count data
  • Variational inference converts intractable optimization to

maximizing ELBO

53