Reading Tea Leaves: How Humans Interpret Topic Models
By Jonathan Chang, Jordan Boyd-Graber, (Chong Wang), et al. NIPS 2009 Presented by Stephen Mayhew Feb 2013
Reading Tea Leaves: How Humans Interpret Topic Models By Jonathan - - PowerPoint PPT Presentation
Reading Tea Leaves: How Humans Interpret Topic Models By Jonathan Chang, Jordan Boyd-Graber, (Chong Wang), et al. NIPS 2009 Presented by Stephen Mayhew Feb 2013 Motivation How to evaluate topic models? Anecdotally,
By Jonathan Chang, Jordan Boyd-Graber, (Chong Wang), et al. NIPS 2009 Presented by Stephen Mayhew Feb 2013
Crowdsourced approach using Amazon Mechanical Turk Evaluating three different approaches: LDA, pLSI, CTM.
“Spot the intruder word” Process:
If the topic set is coherent, then the users will agree on the outlier. If the topic set is incoherent, then the users will choose the outlier at random.
“Spot the intruder topic” Process:
Model parameters: MP
𝑙 𝑛 = 𝑡
𝟚(𝑗𝑙,𝑡
𝑛 = 𝑥𝑙 𝑛)/𝑇
Which is just a fancy way of saying: 𝑜𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝑞𝑓𝑝𝑞𝑚𝑓 𝑑𝑝𝑠𝑠𝑓𝑑𝑢 𝑢𝑝𝑢𝑏𝑚 𝑜𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝑞𝑓𝑝𝑞𝑚𝑓
NYT corpus, 50 topic LDA model
Topic Log Odds (TLO): TLO𝑒
𝑛 = ( 𝑡
log 𝜄𝑒,𝑘𝑒,∗
𝑛
𝑛
− log 𝜄𝑒,𝑘𝑒,𝑡
𝑛
𝑛
)/𝑇 Tran anslation: normalized difference between probability mass of actual “intruder” and selected “intruder”. Upper bound is 0, higher is better.
Wikipedia corpus, 50 topic LDA model
Measures homogeneity (synonymy), not topic strength (coherence) Example le document: curling Pos
ible top
Con
ider syn yntactic dif ifferences: