Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 2 Byron C. - - PowerPoint PPT Presentation

machine learning 2
SMART_READER_LITE
LIVE PREVIEW

Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 2 Byron C. - - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 2 Byron C. Wallace Last time: Topic Modeling! Word Mixtures Idea: Model text as a mixture over words (ignore order) gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02


slide-1
SLIDE 1

Machine Learning 2

DS 4420 - Spring 2020

Topic Modeling 2

Byron C. Wallace

slide-2
SLIDE 2

Last time: Topic Modeling!

slide-3
SLIDE 3

Word Mixtures

Topics: Words: Idea: Model text as a mixture over words (ignore order)

gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01

  • rganism 0.01

.,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,,

slide-4
SLIDE 4

Topic Modeling

Idea: Model corpus of documents with shared topics

gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01

  • rganism 0.01

.,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,,

assignments

Topics (shared) Words in Document (mixture over topics) Topic Proportions (document-specific)

slide-5
SLIDE 5

Topic Modeling

  • Each topic is a distribution over words
  • Each document is a mixture over topics
  • Each word is drawn from one topic distribution

gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01

  • rganism 0.01

.,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,,

assignments

Topics (shared) Words in Document (mixture over topics) Topic Proportions (document-specific)

slide-6
SLIDE 6

E-step: Update assignments Generative Model M-step: Update parameters

EM for Word Mixtures (PLSA)

slide-7
SLIDE 7

E-step: Update assignments Generative Model M-step: Update parameters

EM for Word Mixtures (PLSA)

slide-8
SLIDE 8

EM for Word Mixtures (PLSA)

E-step: Update assignments Generative Model M-step: Update parameters

slide-9
SLIDE 9

Today: A Bayesian view — topic modeling with priors (or, LDA)

slide-10
SLIDE 10

Latent Dirichlet Allocation

θd Zd,n Wd,n N D K

βk

α

η

Proportions parameter Per-document topic proportions Per-word topic assignment Observed word Topics Topic parameter

(a.k.a. PLSI/PLSA with priors)

slide-11
SLIDE 11

Dirichlet Distribution

slide-12
SLIDE 12

Dirichlet Distribution

Common choice in LDA: αk = 0.001

slide-13
SLIDE 13

Estimation via sampling (board)

slide-14
SLIDE 14

Extensions of LDA

  • EM inference (PLSA/PLSI) yields similar results 


to Variational inference or MAP inference (LDA) 


  • n most data
  • Reason for popularity of LDA: 


can be embedded in more complicated models

slide-15
SLIDE 15

Extensions of LDA

  • EM inference (PLSA/PLSI) yields similar results 


to Variational inference or MAP inference (LDA) 


  • n most data
  • Reason for popularity of LDA: 


can be embedded in more complicated models

slide-16
SLIDE 16

Extensions: Supervised LDA

θd Zd,n Wd,n N D K

βk

α Yd η, σ2

1 Draw topic proportions θ | α ∼ Dir(α). 2 For each word

  • Draw topic assignment zn | θ ∼ Mult(θ).
  • Draw word wn | zn, β1:K ∼ Mult(βzn).

3 Draw response variable y | z1:N, η, σ2 ∼ N

  • η>¯

z, σ2 , where ¯ z = (1/N) PN

n=1 zn.

slide-17
SLIDE 17

Extensions: Supervised LDA

θd Zd,n Wd,n N D K

βk

α Yd η, σ2

1 Draw topic proportions θ | α ∼ Dir(α). 2 For each word

  • Draw topic assignment zn | θ ∼ Mult(θ).
  • Draw word wn | zn, β1:K ∼ Mult(βzn).

3 Draw response variable y | z1:N, η, σ2 ∼ N

  • η>¯

z, σ2 , where ¯ z = (1/N) PN

n=1 zn.

slide-18
SLIDE 18

Extensions: Supervised LDA

θd Zd,n Wd,n N D K

βk

α Yd η, σ2

1 Draw topic proportions θ | α ∼ Dir(α). 2 For each word

  • Draw topic assignment zn | θ ∼ Mult(θ).
  • Draw word wn | zn, β1:K ∼ Mult(βzn).

3 Draw response variable y | z1:N, η, σ2 ∼ N

  • η>¯

z, σ2 , where ¯ z = (1/N) PN

n=1 zn.

slide-19
SLIDE 19

Extensions: Supervised LDA

θd Zd,n Wd,n N D K

βk

α Yd η, σ2

1 Draw topic proportions θ | α ∼ Dir(α). 2 For each word

  • Draw topic assignment zn | θ ∼ Mult(θ).
  • Draw word wn | zn, β1:K ∼ Mult(βzn).

3 Draw response variable y | z1:N, η, σ2 ∼ N

  • η>¯

z, σ2 , where ¯ z = (1/N) PN

n=1 zn.

slide-20
SLIDE 20

Extensions: Supervised LDA

both motion simple perfect fascinating power complex however cinematography screenplay performances pictures effective picture his their character many while performance between

−30 −20 −10 10 20

  • more

has than films director will characters

  • ne

from there which who much what awful featuring routine dry

  • ffered

charlie paris not about movie all would they its have like you was just some

  • ut

bad guys watchable its not

  • ne

movie least problem unfortunately supposed worse flat dull

slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23

Extensions: Analyzing RateMDs ratings via “Factorial LDA”

slide-24
SLIDE 24
slide-25
SLIDE 25

Factors

slide-26
SLIDE 26

Factorial LDA

  • We use f-LDA to model topic and sentiment
  • Each (topic,sentiment) pair has a word distribution
  • e.g. (Systems/Staff, Negative):
  • ffice

time doctor appointment rude staff room didn’t visit wait

slide-27
SLIDE 27
  • We use f-LDA to model topic and sentiment
  • Each (topic,sentiment) pair has a word distribution
  • e.g. (Systems/Staff, Positive):

dr time staff great helpful feel questions

  • ffice

really friendly

Factorial LDA

slide-28
SLIDE 28
  • We use f-LDA to model topic and sentiment
  • Each (topic,sentiment) pair has a word distribution
  • e.g. (Interpersonal, Positive):

dr doctor best years caring care patients patient recommend family

Factorial LDA

slide-29
SLIDE 29
  • Why should the word distributions for pairs

make any sense?

  • Parameters are tied across the priors of

each word distribution

– The prior for (Systems,Negative) shares parameters with (Systems,Positive) which shares parameters with the prior for (Interpersonal,Positive)

slide-30
SLIDE 30

staff time

  • ffice

questions wait helpful nice feel great appointment nurse recommend wonderful highly knowledgeable professional kind great dr best helpful amazing

Systems Positive

dr time staff great helpful feel doctor questions

  • ffice

friendly really

slide-31
SLIDE 31

Systems Positive

dr time staff great helpful feel doctor questions

  • ffice

friendly really dr time staff great helpful feel questions

  • ffice

really friendly doctor

multinomial parameters sampled from Dirichlet

slide-32
SLIDE 32
slide-33
SLIDE 33

Extensions: Correlated Topic Model

Estimate a covariance matrix Σ that parameterizes correlations between topics in a document

Zd,n Wd,n N D K

Σ µ ηd

βk

Noconjugate prior

  • n topic proportions
slide-34
SLIDE 34

Extensions: Dynamic Topic Models

Track changes in word distributions 
 associated with a topic over time.

AMONG the vicissitudes incident to life no event could have filled me with greater anxieties than that of which the notification was transmitted by your order...

1789

My fellow citizens: I stand here today humbled by the task before us, grateful for the trust you have bestowed, mindful

  • f the sacrifices borne by our ancestors...

2009 Inaugural addresses

slide-35
SLIDE 35

Extensions: Dynamic Topic Models

D θd Zd,n Wd,n N K α D θd Zd,n Wd,n N α D θd Zd,n Wd,n N α

βk,1 βk,2 βk,T

. . .

slide-36
SLIDE 36

Extensions: Dynamic Topic Models

1880 electric machine power engine steam two machines iron battery wire 1890 electric power company steam electrical machine two system motor engine 1900 apparatus steam power engine engineering water construction engineer room feet 1910 air water engineering apparatus room laboratory engineer made gas tube 1920 apparatus tube air pressure water glass gas made laboratory mercury 1930 tube apparatus glass air mercury laboratory pressure made gas small 1940 air tube apparatus glass laboratory rubber pressure small mercury gas 1950 tube apparatus glass air chamber instrument small laboratory pressure rubber 1960 tube system temperature air heat chamber power high instrument control 1970 air heat power system temperature chamber high flow tube design 1980 high power design heat system systems devices instruments control large 1990 materials high power current applications technology devices design device heat 2000 devices device materials current gate high light silicon material technology

slide-37
SLIDE 37

Summing up

  • Latent Dirichlet Allocation (LDA) is a Bayesian topic

model that is readily extensible

  • To estimate parameters, we used a sampling based
  • approach. General idea: draw samples of parameters

and keep those that make the observed data likely

  • Gibbs sampling is a particular variant of this approach,

and draws individual parameters conditioned on all others