machine learning 2
play

Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 2 Byron C. - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 2 Byron C. Wallace Last time: Topic Modeling! Word Mixtures Idea: Model text as a mixture over words (ignore order) gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02


  1. Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 2 Byron C. Wallace

  2. Last time: Topic Modeling!

  3. Word Mixtures Idea: Model text as a mixture over words (ignore order) gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, Words: Topics:

  4. Topic Modeling Topics Words in Document Topic Proportions (shared) (mixture over topics) (document-specific) assignments gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, Idea: Model corpus of documents with shared topics

  5. Topic Modeling Topics Words in Document Topic Proportions (shared) (mixture over topics) (document-specific) assignments gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, • Each topic is a distribution over words • Each document is a mixture over topics • Each word is drawn from one topic distribution

  6. EM for Word Mixtures (PLSA) Generative Model E-step: Update assignments M-step: Update parameters

  7. EM for Word Mixtures (PLSA) Generative Model E-step: Update assignments M-step: Update parameters

  8. EM for Word Mixtures (PLSA) Generative Model E-step: Update assignments M-step: Update parameters

  9. Today: A Bayesian view — topic modeling with priors (or, LDA)

  10. Latent Dirichlet Allocation (a.k.a. PLSI/PLSA with priors) Per-word Proportions topic assignment parameter Per-document Observed Topic Topics topic proportions parameter word α θ d Z d,n W d,n β k η N D K

  11. Dirichlet Distribution

  12. Dirichlet Distribution Common choice in LDA: α k = 0.001

  13. Estimation via sampling (board)

  14. Extensions of LDA • EM inference (PLSA/PLSI) yields similar results 
 to Variational inference or MAP inference (LDA) 
 on most data • Reason for popularity of LDA: 
 can be embedded in more complicated models

  15. Extensions of LDA • EM inference (PLSA/PLSI) yields similar results 
 to Variational inference or MAP inference (LDA) 
 on most data • Reason for popularity of LDA: 
 can be embedded in more complicated models

  16. Extensions: Supervised LDA α θ d Z d,n W d,n β k N K η , σ 2 Y d D 1 Draw topic proportions θ | α ∼ Dir ( α ) . 2 For each word • Draw topic assignment z n | θ ∼ Mult ( θ ) . • Draw word w n | z n , β 1 : K ∼ Mult ( β z n ) . 3 Draw response variable y | z 1 : N , η , σ 2 ∼ N z , σ 2 � η > ¯ � , where z = ( 1 / N ) P N ¯ n = 1 z n .

  17. Extensions: Supervised LDA α θ d Z d,n W d,n β k N K η , σ 2 Y d D 1 Draw topic proportions θ | α ∼ Dir ( α ) . 2 For each word • Draw topic assignment z n | θ ∼ Mult ( θ ) . • Draw word w n | z n , β 1 : K ∼ Mult ( β z n ) . 3 Draw response variable y | z 1 : N , η , σ 2 ∼ N z , σ 2 � η > ¯ � , where z = ( 1 / N ) P N ¯ n = 1 z n .

  18. Extensions: Supervised LDA α θ d Z d,n W d,n β k N K η , σ 2 Y d D 1 Draw topic proportions θ | α ∼ Dir ( α ) . 2 For each word • Draw topic assignment z n | θ ∼ Mult ( θ ) . • Draw word w n | z n , β 1 : K ∼ Mult ( β z n ) . 3 Draw response variable y | z 1 : N , η , σ 2 ∼ N z , σ 2 � η > ¯ � , where z = ( 1 / N ) P N ¯ n = 1 z n .

  19. Extensions: Supervised LDA α θ d Z d,n W d,n β k N K η , σ 2 Y d D 1 Draw topic proportions θ | α ∼ Dir ( α ) . 2 For each word • Draw topic assignment z n | θ ∼ Mult ( θ ) . • Draw word w n | z n , β 1 : K ∼ Mult ( β z n ) . 3 Draw response variable y | z 1 : N , η , σ 2 ∼ N z , σ 2 � η > ¯ � , where z = ( 1 / N ) P N ¯ n = 1 z n .

  20. Extensions: Supervised LDA least bad more awful his both problem guys has featuring their motion unfortunately watchable than routine character simple supposed its films dry many perfect worse not director offered while fascinating flat one will charlie performance power dull movie characters paris between complex ● ● ● ● ● ● ● ● ● ● − 30 − 20 − 10 have not 0 one however 10 20 like about from cinematography you movie there screenplay was all which performances just would who pictures some they much effective out its what picture

  21. Extensions: Analyzing RateMDs ratings via “Factorial LDA”

  22. Factors

  23. Factorial LDA • We use f-LDA to model topic and sentiment • Each (topic,sentiment) pair has a word distribution • e.g. (Systems/Staff, Negative): office time doctor appointment rude staff room didn’t visit wait

  24. Factorial LDA • We use f-LDA to model topic and sentiment • Each (topic,sentiment) pair has a word distribution • e.g. (Systems/Staff, Positive): dr time staff great helpful feel questions office really friendly

  25. Factorial LDA • We use f-LDA to model topic and sentiment • Each (topic,sentiment) pair has a word distribution • e.g. (Interpersonal, Positive): dr doctor best years caring care patients patient recommend family

  26. • Why should the word distributions for pairs make any sense? • Parameters are tied across the priors of each word distribution – The prior for (Systems,Negative) shares parameters with (Systems,Positive) which shares parameters with the prior for (Interpersonal,Positive)

  27. Systems Positive staff recommend dr time wonderful time office highly staff questions knowledgeable great wait professional helpful helpful kind feel nice great doctor feel dr questions great best office appointment helpful friendly nurse amazing really

  28. Systems Positive dr dr time time multinomial parameters staff staff sampled from Dirichlet great great helpful helpful feel feel questions doctor office questions really office friendly friendly doctor really

  29. Extensions: Correlated Topic Model β k Σ η d Z d,n W d,n N D K µ Noconjugate prior on topic proportions Estimate a covariance matrix Σ that parameterizes correlations between topics in a document

  30. Extensions: Dynamic Topic Models 1789 2009 Inaugural addresses My fellow citizens: I stand here today humbled by the task AMONG the vicissitudes incident to life no event could before us, grateful for the trust you have bestowed, mindful have filled me with greater anxieties than that of which of the sacrifices borne by our ancestors... the notification was transmitted by your order... Track changes in word distributions 
 associated with a topic over time.

  31. Extensions: Dynamic Topic Models α α α θ d θ d θ d Z d,n Z d,n Z d,n W d,n W d,n W d,n N N N D D D . . . β k, 2 β k,T β k, 1 K

  32. Extensions: Dynamic Topic Models 1930 1940 1880 1890 1900 1910 1920 tube air electric electric apparatus air apparatus apparatus tube machine power steam water tube glass apparatus power company power engineering air air glass engine steam engine apparatus pressure mercury laboratory steam electrical engineering room water laboratory rubber two machine water laboratory glass pressure pressure machines two construction engineer gas made small iron system engineer made made gas mercury battery motor room gas laboratory small gas wire engine feet tube mercury 1950 1960 1970 1980 1990 2000 tube tube air high materials devices apparatus system heat power high device glass temperature power design power materials air air system heat current current chamber heat temperature system applications gate instrument chamber chamber systems technology high small power high devices devices light laboratory high flow instruments design silicon pressure instrument tube control device material rubber control design large heat technology

  33. Summing up • Latent Dirichlet Allocation (LDA) is a Bayesian topic model that is readily extensible • To estimate parameters, we used a sampling based approach. General idea: draw samples of parameters and keep those that make the observed data likely • Gibbs sampling is a particular variant of this approach, and draws individual parameters conditioned on all others

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend