Department of Computer Science CSCI 5622: Machine Learning Chenhao - - PowerPoint PPT Presentation

department of computer science csci 5622 machine learning
SMART_READER_LITE
LIVE PREVIEW

Department of Computer Science CSCI 5622: Machine Learning Chenhao - - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 19: EM algorithm, Topic modeling Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1 Administrivia HW4 due, HW5 out Remember that we only count the


slide-1
SLIDE 1

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 19: EM algorithm, Topic modeling Slides adapted from Jordan Boyd-Graber, Chris Ketelsen

1

slide-2
SLIDE 2

Administrivia

  • HW4 due, HW5 out
  • Remember that we only count the highest 4 homework scores
  • Final project midpoint presentation
  • For the final project, each person will be asked to summarize what

everyone in the team did

  • Contact information for printing

2

slide-3
SLIDE 3

Second Month Survey

3

Second survey First survey

slide-4
SLIDE 4

Second Month Survey

  • Conflicting opinions
  • wide variety of models, good explanations, good homeworks
  • Clarity of HW grading is the worst I have ever had for a class.
  • Depth of content covered
  • Course is too theory heavy
  • I liked that the instructor not only requested feedback often, but

also acted upon the feedback, changing a few things about how the class and slides are presented.

4

slide-5
SLIDE 5

Second Month Survey

  • Increase exam duration
  • The professor needs to slow down, and sacrifice some of the

math subtleties and complexities in favor of concrete understanding of the topics.

  • Go into the weeds of the math less

5

slide-6
SLIDE 6

Learning Objectives

  • Learn about Expectation-Maximization algorithm
  • Learn about latent Dirichlet allocation

6

slide-7
SLIDE 7

Gaussian Mixture Models

7

slide-8
SLIDE 8

Gaussian Mixture Models

8

slide-9
SLIDE 9

Gaussian Mixture Models

9

  • −4

−2 2 4 −4 −2 2 4 x1 x2

slide-10
SLIDE 10

Gaussian Mixture Models

10

slide-11
SLIDE 11

Gaussian Mixture Models

11

slide-12
SLIDE 12

Gaussian Mixture Models

12

slide-13
SLIDE 13

Gaussian Mixture Models

13

slide-14
SLIDE 14

Gaussian Mixture Models

14

slide-15
SLIDE 15

Gaussian Mixture Models

15

slide-16
SLIDE 16

Gaussian Mixture Models

16

slide-17
SLIDE 17

Gaussian Mixture Models

17

slide-18
SLIDE 18

Latent Variables

  • z’s correspond to the latent structure that we try to learn in

unsupervised learning

  • From a modeling perspective, they are usually referred to as

latent variables

18

slide-19
SLIDE 19

EM Algorithm

19

slide-20
SLIDE 20

EM Algorithm

20

slide-21
SLIDE 21

EM Algorithm

21

slide-22
SLIDE 22

EM Algorithm

22

slide-23
SLIDE 23

EM Algorithm

  • EM stands for Expectation-Maximization
  • A classic algorithm in Dempster, Laird, Rubin, 1977
  • An iterative method

23

slide-24
SLIDE 24

EM Algorithm

24

slide-25
SLIDE 25

EM Algorithm

25

slide-26
SLIDE 26

EM Algorithm

26

slide-27
SLIDE 27

EM Algorithm

27

slide-28
SLIDE 28

EM Algorithm

28

slide-29
SLIDE 29

EM Algorithm

29

slide-30
SLIDE 30

30

EM Algorithm

slide-31
SLIDE 31

31

EM Algorithm

slide-32
SLIDE 32

32

EM Algorithm

slide-33
SLIDE 33

33

EM Algorithm

slide-34
SLIDE 34

34

EM Algorithm

slide-35
SLIDE 35

35

EM Algorithm

slide-36
SLIDE 36

GMM and K-means

36

slide-37
SLIDE 37

GMMs and the EM algorithm

  • GMMs with the EM Algorithm suffer from some of the

same problems as K-Means

  • Doesn't really work with categorical data
  • Usually only converges to a local minimum
  • Have to determine the number of clusters
  • Only generates convex clusters
  • But, it also has certain advantages
  • The clusters are allowed different shapes
  • We get a soft partitioning of the data

37

slide-38
SLIDE 38

Topic models

  • Discrete count data

38

slide-39
SLIDE 39

Topic models

39

  • Suppose you have a huge number
  • f documents
  • Want to know what's going on
  • Can't read them all (e.g. every New

York Times article from the 90's)

  • Topic models offer a way to get a

corpus-level view of major themes

  • Unsupervised
slide-40
SLIDE 40

Conceptual approach

  • Input: a text corpus and number of topics K
  • Output:
  • K topics, each topic is a list of words
  • Topic assignment for each document

40

Forget the Bootleg, Just Download the Movie Legally Multiplex Heralded As Linchpin To Growth The Shape of Cinema, Transformed At the Click of a Mouse A Peaceful Crew Puts Muppets Where Its Mouth Is Stock Trades: A Better Deal For Investors Isn't Simple The three big Internet portals begin to distinguish among themselves as shopping malls Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens

Corpus

slide-41
SLIDE 41

Conceptual approach

41

computer, technology, system, service, site, phone, internet, machine play, film, movie, theater, production, star, director, stage sell, sale, store, product, business, advertising, market, consumer

TOPIC 1 TOPIC 2 TOPIC 3

  • K topics, each topic is a list of words
slide-42
SLIDE 42

Conceptual approach

42

  • Topic assignment for each document

Forget the Bootleg, Just Download the Movie Legally Multiplex Heralded As Linchpin To Growth The Shape of Cinema, Transformed At the Click of a Mouse A Peaceful Crew Puts Muppets Where Its Mouth Is Stock Trades: A Better Deal For Investors Isn't Simple Internet portals begin to distinguish among themselves as shopping malls Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens

TOPIC 2 "BUSINESS" TOPIC 3 "ENTERTAINMENT" TOPIC 1 "TECHNOLOGY"

slide-43
SLIDE 43

Topics from Science

43

slide-44
SLIDE 44

Why should you care?

  • Neat way to explore/understand corpus collections
  • E-discovery
  • Social media
  • Scientific data
  • NLP Applications
  • Word sense disambiguation
  • Discourse segmentation
  • Psychology: word meaning, polysemy
  • A general way to model count data and a general inference

algorithm

44

slide-45
SLIDE 45

Topic models

  • Discrete count data
  • Gaussian distributions are not appropriate

45

slide-46
SLIDE 46

Generative model: Latent Dirichlet Allocation

  • Generate a document, or a bag of words
  • Blei, Ng, Jordan. Latent Dirichlet Allocation. JMLR, 2003.

46

slide-47
SLIDE 47

Generative model: Latent Dirichlet Allocation

  • Generate a document, or a bag
  • f words
  • Multinomial distribution
  • Distribution over discrete
  • utcomes
  • Represented by non-negative

vector that sums to one

  • Picture representation

47

(1,0,0) (0,0,1) (1/2,1/2,0) (1/3,1/3,1/3) (1/4,1/4,1/2) (0,1,0)

slide-48
SLIDE 48

Generative model: Latent Dirichlet Allocation

  • Generate a document, or a bag
  • f words
  • Multinomial distribution
  • Distribution over discrete
  • utcomes
  • Represented by non-negative

vector that sums to one

  • Picture representation
  • Come from a Dirichlet distribution

48

(1,0,0) (0,0,1) (1/2,1/2,0) (1/3,1/3,1/3) (1/4,1/4,1/2) (0,1,0)

slide-49
SLIDE 49

Generative story

49

computer, technology, system, service, site, phone, internet, machine play, film, movie, theater, production, star, director, stage sell, sale, store, product, business, advertising, market, consumer

TOPIC 1 TOPIC 2 TOPIC 3

slide-50
SLIDE 50

Generative story

50 Forget the Bootleg, Just Download the Movie Legally Multiplex Heralded As Linchpin To Growth The Shape of Cinema, Transformed At the Click of a Mouse A Peaceful Crew Puts Muppets Where Its Mouth Is Stock Trades: A Better Deal For Investors Isn't Simple The three big Internet portals begin to distinguish among themselves as shopping malls Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens

TOPIC 2 TOPIC 3 TOPIC 1

slide-51
SLIDE 51

Generative story

51

Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ...

computer, technology, system, service, site, phone, internet, machine play, film, movie, theater, production, star, director, stage sell, sale, store, product, business, advertising, market, consumer

slide-52
SLIDE 52

Generative story

52

Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ...

computer, technology, system, service, site, phone, internet, machine play, film, movie, theater, production, star, director, stage sell, sale, store, product, business, advertising, market, consumer

slide-53
SLIDE 53

Generative story

53

Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ...

computer, technology, system, service, site, phone, internet, machine play, film, movie, theater, production, star, director, stage sell, sale, store, product, business, advertising, market, consumer

slide-54
SLIDE 54

Generative story

54

Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ...

computer, technology, system, service, site, phone, internet, machine play, film, movie, theater, production, star, director, stage sell, sale, store, product, business, advertising, market, consumer

slide-55
SLIDE 55

Missing component: how to generate a multinomial distribution

55

slide-56
SLIDE 56

Missing component: how to generate a multinomial distribution

56

slide-57
SLIDE 57

Missing component: how to generate a multinomial distribution

57

slide-58
SLIDE 58

Conjugacy of Dirichlet and Multinomial

58

  • If φ ∼ Dir(α), w ∼ Mult(φ), and nk = |{wi : wi = k}| then

p(φ|α, w) ∝ p(w|φ)p(φ|α) (1) Y Y

slide-59
SLIDE 59

Conjugacy of Dirichlet and Multinomial

59

  • If φ ∼ Dir(α), w ∼ Mult(φ), and nk = |{wi : wi = k}| then

p(φ|α, w) ∝ p(w|φ)p(φ|α) (1) ∝ Y

k

φnk Y

k

φαk−1 (2) ∝ Y

k

φαk+nk−1 (3)

  • Conjugacy: this posterior has the same form as the prior
slide-60
SLIDE 60

Making the generative story formal

60

M N

θd zn wn

K

βk α λ

  • For each topic k ∈ {1, . . . , K}, draw a multinomial distribution βk

from a Dirichlet distribution with parameter λ

slide-61
SLIDE 61

61

M N

θd zn wn

K

βk α λ

  • For each topic k ∈ {1, . . . , K}, draw a multinomial distribution βk

from a Dirichlet distribution with parameter λ

  • For each document d ∈ {1, . . . , M}, draw a multinomial

distribution θd from a Dirichlet distribution with parameter α

Making the generative story formal

slide-62
SLIDE 62

62

Making the generative story formal

M N

θd zn wn

K

βk α λ

  • For each topic k ∈ {1, . . . , K}, draw a multinomial distribution βk

from a Dirichlet distribution with parameter λ

  • For each document d ∈ {1, . . . , M}, draw a multinomial

distribution θd from a Dirichlet distribution with parameter α

  • For each word position n ∈ {1, . . . , N}, select a hidden topic zn

from the multinomial distribution parameterized by θ.

slide-63
SLIDE 63

63

Making the generative story formal

M N

θd zn wn

K

βk α λ

  • For each topic k ∈ {1, . . . , K}, draw a multinomial distribution βk

from a Dirichlet distribution with parameter λ

  • For each document d ∈ {1, . . . , M}, draw a multinomial

distribution θd from a Dirichlet distribution with parameter α

  • For each word position n ∈ {1, . . . , N}, select a hidden topic zn

from the multinomial distribution parameterized by θ.

  • Choose the observed word wn from the distribution βzn.
slide-64
SLIDE 64

Topic models: What’s important

  • Topic models (latent variables)
  • Topics to word types—multinomial distribution
  • Documents to topics—multinomial distribution
  • Modeling & Algorithm
  • Model: story of how your data came to be
  • Latent variables: missing pieces of your story
  • Statistical inference: filling in those missing pieces (next lecture)
  • We use latent Dirichlet allocation (LDA), a fully Bayesian

version of pLSI, probabilistic version of LSA

64

slide-65
SLIDE 65

Recap

  • Expectation-maximization: a general algorithm for mixture

models

  • Topic models: a neat way to model discrete count data

65