SLIDE 1
Discovering Coherent Topics Using General Knowledge Meichun Hsu - - PowerPoint PPT Presentation
Discovering Coherent Topics Using General Knowledge Meichun Hsu - - PowerPoint PPT Presentation
Discovering Coherent Topics Using General Knowledge Meichun Hsu Zhiyuan (Brett) Chen Malu Castellanos Arjun Mukherjee Riddhiman Ghosh Bing Liu http://www.cs.uic.edu/~zchen/ Topic Model Topic 1 Document 1 Topic Topic 2 Document 2 Model
SLIDE 2
SLIDE 3
Coherent Topics
Price Cheap Expensive Cost Money Pricey Dollar
SLIDE 4
Coherent Topics
Price Cheap Expensive Cost Money Pricey Dollar Price Family Cheap Expensive Politics Cost Size
SLIDE 5
Issues of Unsupervised Topic Models
Objective functions do not correlate well with human judgments (Chang et al., 2009). Many topics are not coherent.
SLIDE 6
Remedy: Knowledge-based Topic Models
SLIDE 7
Knowledge-based Topic Models
DF-LDA (Andrzejewski et al., 2009) Picture Photo
Must-Link
Picture Price
Cannot-Link
SLIDE 8
Knowledge-based Topic Models
Seeded models (Burns et al., 2012;
Jagarlamudi et al., 2012; Lu et al., 2011; Mukherjee and Liu, 2012)
DF-LDA (Andrzejewski et al., 2009)
SLIDE 9
Knowledge is correct for a domain.
Knowledge Assumptions
SLIDE 10
Knowledge is correct for a domain. Knowledge is domain dependent.
Knowledge Assumptions
SLIDE 11
Existing Model Flow
SLIDE 12
Existing Model Flow
SLIDE 13
Existing Model Flow
SLIDE 14
Existing Model Flow
SLIDE 15
Existing Model Flow
SLIDE 16
Existing Model Flow
SLIDE 17
Our Proposed Model Flow
SLIDE 18
Our Proposed Model Flow
SLIDE 19
Our Proposed Model Flow
General Knowledge
SLIDE 20
May be wrong for a domain Domain Independent
General Knowledge
SLIDE 21
Synonyms {Expensive, Pricey}
Lexical Semantic Relations
Antonyms {Expensive, Cheap} Adjective-Attributes {Expensive, Price}
SLIDE 22
Synonyms {Expensive, Pricey}
Lexical Semantic Relations
Antonyms {Expensive, Cheap} Adjective-Attributes {Expensive, Price}
WordNet (Fei et al. 2012)
SLIDE 23
LR-Sets
Example: {Expensive, Pricey, Cheap, Price}
SLIDE 24
LR-Sets (Lexical Relation)
Example: {Expensive, Pricey, Cheap, Price} Words should be in the same topic
SLIDE 25
Issues of LR-Sets
Partially wrong knowledge No correct LR- sets for a word
SLIDE 26
Issues of LR-Sets
No correct LR-sets for a word
Card {Card, Menu} {Card, Bill}
SLIDE 27
Issues of LR-Sets
No correct LR-sets for a word
{Card, Menu} {Card, Bill}
SLIDE 28
Issues of LR-Sets
No correct LR-sets for a word
{Card, Menu} {Card, Bill}
SLIDE 29
Issues of LR-Sets
Partially wrong knowledge
{Picture, Pic, Flick} Picture
SLIDE 30
Issues of LR-Sets
Partially wrong knowledge
{Picture, Pic, Flick}
SLIDE 31
Addressing Issues
Word Correlation + GPU Relaxing wrong sets for a word Partially wrong knowledge No correct LR- sets for a word
SLIDE 32
Addressing Issues
Word Correlation + GPU Relaxing wrong sets for a word Partially wrong knowledge No correct LR- sets for a word
SLIDE 33
Relaxing Wrong LR-sets
{Card, Menu} {Card, Bill}
SLIDE 34
Relaxing Wrong LR-sets
{Card, Menu} {Card, Bill} {Card}
SLIDE 35
Estimate Knowledge
{Picture, Image} {Picture, Painting}
SLIDE 36
Word Distributions From LDA
Word Prob
Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002
SLIDE 37
Estimate Word Correlation
Word Prob
Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002 {Picture, Image} {Picture, Painting}
SLIDE 38
Word Correlation Matrix C
Word Prob
Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002 {Picture, Image} {Picture, Painting} 0.15 / 0.20 0.0002 / 0.20
SLIDE 39
Quality of LR-set s Towards w
SLIDE 40
Relaxing Wrong LR-sets
{Card, Menu} {Card, Bill}
Q(s1, “Card”) < ɛ Q(s2, “Card”) < ɛ
SLIDE 41
Relaxing Wrong LR-sets
{Card, Menu} {Card, Bill} {Card}
Q(s1, “Card”) < ɛ Q(s2, “Card”) < ɛ
SLIDE 42
Addressing Issues
Word Correlation + GPU Relaxing wrong sets for a word Partially wrong knowledge No correct LR- sets for a word
SLIDE 43
Simple Pólya Urn Model (SPU)
SLIDE 44
Simple Pólya Urn Model (SPU)
SLIDE 45
Simple Pólya Urn Model (SPU)
SLIDE 46
Simple Pólya Urn Model (SPU)
SLIDE 47
Simple Pólya Urn Model (SPU)
SLIDE 48
Simple Pólya Urn Model (SPU) The richer get richer!
SLIDE 49
Interpreting LDA Under SPU
SLIDE 50
Topic 0 picture
Interpreting LDA Under SPU
SLIDE 51
Topic 0 picture
Interpreting LDA Under SPU
picture
SLIDE 52
Generalized Pólya Urn Model (GPU)
SLIDE 53
Generalized Pólya Urn Model (GPU)
SLIDE 54
Generalized Pólya Urn Model (GPU)
SLIDE 55
Generalized Pólya Urn Model (GPU)
SLIDE 56
Generalized Pólya Urn Model (GPU)
SLIDE 57
Topic 0
Applying GPU
picture
SLIDE 58
Topic 0
painting image picture picture
Applying GPU
SLIDE 59
Topic 0
painting image
Applying GPU
Word Correlation
picture picture
SLIDE 60
Addressing Issues
Word Correlation + GPU Relaxing wrong sets for a word Partially wrong knowledge No correct LR- sets for a word
SLIDE 61
Evaluation
SLIDE 62
Evaluation
Evaluation Human Evaluation Topic Coherence KL-Divergence Four domains
SLIDE 63
Model Comparison
LDA (Blei et al., 2003) DF-LDA (Andrzejewski et al., 2009)
MDK-LDA (Chen et al., 2013)
LDA-GPU (Mimno et al., 2011) GK-LDA
SLIDE 64
KL-Divergence
SLIDE 65
Topic Coherence (#T = 15)
SLIDE 66
Human Evaluation
SLIDE 67
Example Topics
love
SLIDE 68
Conclusions
Discovering Coherent Topics Using General Knowledge
SLIDE 69
Conclusions
Discovering Coherent Topics Using General Knowledge
Partially wrong knowledge No correct LR- sets for a word
SLIDE 70
Conclusions
Discovering Coherent Topics Using General Knowledge
Word Correlation + GPU Relaxing wrong sets for a word Partially wrong knowledge No correct LR- sets for a word
SLIDE 71
Datasets: http://www.cs.uic.edu/~zchen/
SLIDE 72