Machine Learning 2
DS 4420 - Spring 2020
Topic Modeling 2
Byron C. Wallace
Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 2 Byron C. - - PowerPoint PPT Presentation
Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 2 Byron C. Wallace Last time: Topic Modeling! Word Mixtures Idea: Model text as a mixture over words (ignore order) gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02
DS 4420 - Spring 2020
Byron C. Wallace
Topics: Words: Idea: Model text as a mixture over words (ignore order)
gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01
.,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,,
Idea: Model corpus of documents with shared topics
gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01
.,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,,
assignments
Topics (shared) Words in Document (mixture over topics) Topic Proportions (document-specific)
gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01
.,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,,
assignments
Topics (shared) Words in Document (mixture over topics) Topic Proportions (document-specific)
E-step: Update assignments Generative Model M-step: Update parameters
E-step: Update assignments Generative Model M-step: Update parameters
E-step: Update assignments Generative Model M-step: Update parameters
θd Zd,n Wd,n N D K
βk
α
η
Proportions parameter Per-document topic proportions Per-word topic assignment Observed word Topics Topic parameter
(a.k.a. PLSI/PLSA with priors)
Common choice in LDA: αk = 0.001
to Variational inference or MAP inference (LDA)
can be embedded in more complicated models
to Variational inference or MAP inference (LDA)
can be embedded in more complicated models
θd Zd,n Wd,n N D K
βk
α Yd η, σ2
1 Draw topic proportions θ | α ∼ Dir(α). 2 For each word
3 Draw response variable y | z1:N, η, σ2 ∼ N
z, σ2 , where ¯ z = (1/N) PN
n=1 zn.
θd Zd,n Wd,n N D K
βk
α Yd η, σ2
1 Draw topic proportions θ | α ∼ Dir(α). 2 For each word
3 Draw response variable y | z1:N, η, σ2 ∼ N
z, σ2 , where ¯ z = (1/N) PN
n=1 zn.
θd Zd,n Wd,n N D K
βk
α Yd η, σ2
1 Draw topic proportions θ | α ∼ Dir(α). 2 For each word
3 Draw response variable y | z1:N, η, σ2 ∼ N
z, σ2 , where ¯ z = (1/N) PN
n=1 zn.
θd Zd,n Wd,n N D K
βk
α Yd η, σ2
1 Draw topic proportions θ | α ∼ Dir(α). 2 For each word
3 Draw response variable y | z1:N, η, σ2 ∼ N
z, σ2 , where ¯ z = (1/N) PN
n=1 zn.
both motion simple perfect fascinating power complex however cinematography screenplay performances pictures effective picture his their character many while performance between
−30 −20 −10 10 20
has than films director will characters
from there which who much what awful featuring routine dry
charlie paris not about movie all would they its have like you was just some
bad guys watchable its not
movie least problem unfortunately supposed worse flat dull
time doctor appointment rude staff room didn’t visit wait
dr time staff great helpful feel questions
really friendly
dr doctor best years caring care patients patient recommend family
– The prior for (Systems,Negative) shares parameters with (Systems,Positive) which shares parameters with the prior for (Interpersonal,Positive)
staff time
questions wait helpful nice feel great appointment nurse recommend wonderful highly knowledgeable professional kind great dr best helpful amazing
Systems Positive
dr time staff great helpful feel doctor questions
friendly really
Systems Positive
dr time staff great helpful feel doctor questions
friendly really dr time staff great helpful feel questions
really friendly doctor
multinomial parameters sampled from Dirichlet
Estimate a covariance matrix Σ that parameterizes correlations between topics in a document
Zd,n Wd,n N D K
Σ µ ηd
βk
Noconjugate prior
Track changes in word distributions associated with a topic over time.
AMONG the vicissitudes incident to life no event could have filled me with greater anxieties than that of which the notification was transmitted by your order...
1789
My fellow citizens: I stand here today humbled by the task before us, grateful for the trust you have bestowed, mindful
2009 Inaugural addresses
D θd Zd,n Wd,n N K α D θd Zd,n Wd,n N α D θd Zd,n Wd,n N α
βk,1 βk,2 βk,T
1880 electric machine power engine steam two machines iron battery wire 1890 electric power company steam electrical machine two system motor engine 1900 apparatus steam power engine engineering water construction engineer room feet 1910 air water engineering apparatus room laboratory engineer made gas tube 1920 apparatus tube air pressure water glass gas made laboratory mercury 1930 tube apparatus glass air mercury laboratory pressure made gas small 1940 air tube apparatus glass laboratory rubber pressure small mercury gas 1950 tube apparatus glass air chamber instrument small laboratory pressure rubber 1960 tube system temperature air heat chamber power high instrument control 1970 air heat power system temperature chamber high flow tube design 1980 high power design heat system systems devices instruments control large 1990 materials high power current applications technology devices design device heat 2000 devices device materials current gate high light silicon material technology
model that is readily extensible
and keep those that make the observed data likely
and draws individual parameters conditioned on all others