Discovering Coherent Topics Using General Knowledge Meichun Hsu - - PowerPoint PPT Presentation

discovering coherent topics using general knowledge
SMART_READER_LITE
LIVE PREVIEW

Discovering Coherent Topics Using General Knowledge Meichun Hsu - - PowerPoint PPT Presentation

Discovering Coherent Topics Using General Knowledge Meichun Hsu Zhiyuan (Brett) Chen Malu Castellanos Arjun Mukherjee Riddhiman Ghosh Bing Liu http://www.cs.uic.edu/~zchen/ Topic Model Topic 1 Document 1 Topic Topic 2 Document 2 Model


slide-1
SLIDE 1

Discovering Coherent Topics Using General Knowledge

Zhiyuan (Brett) Chen Arjun Mukherjee Bing Liu Meichun Hsu Malu Castellanos Riddhiman Ghosh http://www.cs.uic.edu/~zchen/

slide-2
SLIDE 2

Document 1 Topic Model

Document 2 Document M Topic 1 Topic 2 Topic T

Topic Model

slide-3
SLIDE 3

Coherent Topics

Price Cheap Expensive Cost Money Pricey Dollar

slide-4
SLIDE 4

Coherent Topics

Price Cheap Expensive Cost Money Pricey Dollar Price Family Cheap Expensive Politics Cost Size

slide-5
SLIDE 5

Issues of Unsupervised Topic Models

Objective functions do not correlate well with human judgments (Chang et al., 2009). Many topics are not coherent.

slide-6
SLIDE 6

Remedy: Knowledge-based Topic Models

slide-7
SLIDE 7

Knowledge-based Topic Models

DF-LDA (Andrzejewski et al., 2009) Picture Photo

Must-Link

Picture Price

Cannot-Link

slide-8
SLIDE 8

Knowledge-based Topic Models

Seeded models (Burns et al., 2012;

Jagarlamudi et al., 2012; Lu et al., 2011; Mukherjee and Liu, 2012)

DF-LDA (Andrzejewski et al., 2009)

slide-9
SLIDE 9

Knowledge is correct for a domain.

Knowledge Assumptions

slide-10
SLIDE 10

Knowledge is correct for a domain. Knowledge is domain dependent.

Knowledge Assumptions

slide-11
SLIDE 11

Existing Model Flow

slide-12
SLIDE 12

Existing Model Flow

slide-13
SLIDE 13

Existing Model Flow

slide-14
SLIDE 14

Existing Model Flow

slide-15
SLIDE 15

Existing Model Flow

slide-16
SLIDE 16

Existing Model Flow

slide-17
SLIDE 17

Our Proposed Model Flow

slide-18
SLIDE 18

Our Proposed Model Flow

slide-19
SLIDE 19

Our Proposed Model Flow

General Knowledge

slide-20
SLIDE 20

May be wrong for a domain Domain Independent

General Knowledge

slide-21
SLIDE 21

Synonyms {Expensive, Pricey}

Lexical Semantic Relations

Antonyms {Expensive, Cheap} Adjective-Attributes {Expensive, Price}

slide-22
SLIDE 22

Synonyms {Expensive, Pricey}

Lexical Semantic Relations

Antonyms {Expensive, Cheap} Adjective-Attributes {Expensive, Price}

WordNet (Fei et al. 2012)

slide-23
SLIDE 23

LR-Sets

Example: {Expensive, Pricey, Cheap, Price}

slide-24
SLIDE 24

LR-Sets (Lexical Relation)

Example: {Expensive, Pricey, Cheap, Price} Words should be in the same topic

slide-25
SLIDE 25

Issues of LR-Sets

Partially wrong knowledge No correct LR- sets for a word

slide-26
SLIDE 26

Issues of LR-Sets

No correct LR-sets for a word

Card {Card, Menu} {Card, Bill}

slide-27
SLIDE 27

Issues of LR-Sets

No correct LR-sets for a word

{Card, Menu} {Card, Bill}

slide-28
SLIDE 28

Issues of LR-Sets

No correct LR-sets for a word

{Card, Menu} {Card, Bill}

slide-29
SLIDE 29

Issues of LR-Sets

Partially wrong knowledge

{Picture, Pic, Flick} Picture

slide-30
SLIDE 30

Issues of LR-Sets

Partially wrong knowledge

{Picture, Pic, Flick}

slide-31
SLIDE 31

Addressing Issues

Word Correlation + GPU Relaxing wrong sets for a word Partially wrong knowledge No correct LR- sets for a word

slide-32
SLIDE 32

Addressing Issues

Word Correlation + GPU Relaxing wrong sets for a word Partially wrong knowledge No correct LR- sets for a word

slide-33
SLIDE 33

Relaxing Wrong LR-sets

{Card, Menu} {Card, Bill}

slide-34
SLIDE 34

Relaxing Wrong LR-sets

{Card, Menu} {Card, Bill} {Card}

slide-35
SLIDE 35

Estimate Knowledge

{Picture, Image} {Picture, Painting}

slide-36
SLIDE 36

Word Distributions From LDA

Word Prob

Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002

slide-37
SLIDE 37

Estimate Word Correlation

Word Prob

Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002 {Picture, Image} {Picture, Painting}

slide-38
SLIDE 38

Word Correlation Matrix C

Word Prob

Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002 {Picture, Image} {Picture, Painting} 0.15 / 0.20 0.0002 / 0.20

slide-39
SLIDE 39

Quality of LR-set s Towards w

slide-40
SLIDE 40

Relaxing Wrong LR-sets

{Card, Menu} {Card, Bill}

Q(s1, “Card”) < ɛ Q(s2, “Card”) < ɛ

slide-41
SLIDE 41

Relaxing Wrong LR-sets

{Card, Menu} {Card, Bill} {Card}

Q(s1, “Card”) < ɛ Q(s2, “Card”) < ɛ

slide-42
SLIDE 42

Addressing Issues

Word Correlation + GPU Relaxing wrong sets for a word Partially wrong knowledge No correct LR- sets for a word

slide-43
SLIDE 43

Simple Pólya Urn Model (SPU)

slide-44
SLIDE 44

Simple Pólya Urn Model (SPU)

slide-45
SLIDE 45

Simple Pólya Urn Model (SPU)

slide-46
SLIDE 46

Simple Pólya Urn Model (SPU)

slide-47
SLIDE 47

Simple Pólya Urn Model (SPU)

slide-48
SLIDE 48

Simple Pólya Urn Model (SPU) The richer get richer!

slide-49
SLIDE 49

Interpreting LDA Under SPU

slide-50
SLIDE 50

Topic 0 picture

Interpreting LDA Under SPU

slide-51
SLIDE 51

Topic 0 picture

Interpreting LDA Under SPU

picture

slide-52
SLIDE 52

Generalized Pólya Urn Model (GPU)

slide-53
SLIDE 53

Generalized Pólya Urn Model (GPU)

slide-54
SLIDE 54

Generalized Pólya Urn Model (GPU)

slide-55
SLIDE 55

Generalized Pólya Urn Model (GPU)

slide-56
SLIDE 56

Generalized Pólya Urn Model (GPU)

slide-57
SLIDE 57

Topic 0

Applying GPU

picture

slide-58
SLIDE 58

Topic 0

painting image picture picture

Applying GPU

slide-59
SLIDE 59

Topic 0

painting image

Applying GPU

Word Correlation

picture picture

slide-60
SLIDE 60

Addressing Issues

Word Correlation + GPU Relaxing wrong sets for a word Partially wrong knowledge No correct LR- sets for a word

slide-61
SLIDE 61

Evaluation

slide-62
SLIDE 62

Evaluation

Evaluation Human Evaluation Topic Coherence KL-Divergence Four domains

slide-63
SLIDE 63

Model Comparison

LDA (Blei et al., 2003) DF-LDA (Andrzejewski et al., 2009)

MDK-LDA (Chen et al., 2013)

LDA-GPU (Mimno et al., 2011) GK-LDA

slide-64
SLIDE 64

KL-Divergence

slide-65
SLIDE 65

Topic Coherence (#T = 15)

slide-66
SLIDE 66

Human Evaluation

slide-67
SLIDE 67

Example Topics

love

slide-68
SLIDE 68

Conclusions

Discovering Coherent Topics Using General Knowledge

slide-69
SLIDE 69

Conclusions

Discovering Coherent Topics Using General Knowledge

Partially wrong knowledge No correct LR- sets for a word

slide-70
SLIDE 70

Conclusions

Discovering Coherent Topics Using General Knowledge

Word Correlation + GPU Relaxing wrong sets for a word Partially wrong knowledge No correct LR- sets for a word

slide-71
SLIDE 71

Datasets: http://www.cs.uic.edu/~zchen/

slide-72
SLIDE 72

Datasets: http://www.cs.uic.edu/~zchen/