discovering coherent topics using general knowledge
play

Discovering Coherent Topics Using General Knowledge Meichun Hsu - PowerPoint PPT Presentation

Discovering Coherent Topics Using General Knowledge Meichun Hsu Zhiyuan (Brett) Chen Malu Castellanos Arjun Mukherjee Riddhiman Ghosh Bing Liu http://www.cs.uic.edu/~zchen/ Topic Model Topic 1 Document 1 Topic Topic 2 Document 2 Model


  1. Discovering Coherent Topics Using General Knowledge Meichun Hsu Zhiyuan (Brett) Chen Malu Castellanos Arjun Mukherjee Riddhiman Ghosh Bing Liu http://www.cs.uic.edu/~zchen/

  2. Topic Model Topic 1 Document 1 Topic Topic 2 Document 2 Model … … Topic T Document M

  3. Coherent Topics Price Cheap Expensive Cost Money Pricey Dollar

  4. Coherent Topics Price Price Cheap Family Expensive Cheap Cost Expensive Money Politics Pricey Cost Dollar Size

  5. Issues of Unsupervised Topic Models Many topics are not coherent. Objective functions do not correlate well with human judgments (Chang et al., 2009).

  6. Remedy: Knowledge-based Topic Models

  7. Knowledge-based Topic Models DF-LDA (Andrzejewski et al., 2009) Must-Link Picture Photo Cannot-Link Picture Price

  8. Knowledge-based Topic Models DF-LDA (Andrzejewski et al., 2009) Seeded models (Burns et al., 2012; Jagarlamudi et al., 2012; Lu et al., 2011; Mukherjee and Liu, 2012)

  9. Knowledge Assumptions Knowledge is correct for a domain.

  10. Knowledge Assumptions Knowledge is correct for a domain. Knowledge is domain dependent.

  11. Existing Model Flow

  12. Existing Model Flow

  13. Existing Model Flow

  14. Existing Model Flow

  15. Existing Model Flow

  16. Existing Model Flow

  17. Our Proposed Model Flow

  18. Our Proposed Model Flow

  19. Our Proposed Model Flow General Knowledge

  20. General Knowledge Domain Independent May be wrong for a domain

  21. Lexical Semantic Relations Synonyms {Expensive, Pricey} Antonyms {Expensive, Cheap} Adjective-Attributes {Expensive, Price}

  22. Lexical Semantic Relations Synonyms {Expensive, Pricey} WordNet Antonyms {Expensive, Cheap} Adjective-Attributes {Expensive, Price} (Fei et al. 2012)

  23. LR-Sets Example: {Expensive, Pricey, Cheap, Price}

  24. LR-Sets (Lexical Relation) Example: {Expensive, Pricey, Cheap, Price} Words should be in the same topic

  25. Issues of LR-Sets No correct LR- sets for a word Partially wrong knowledge

  26. Issues of LR-Sets No correct LR-sets for a word {Card, Menu} Card {Card, Bill}

  27. Issues of LR-Sets No correct LR-sets for a word {Card, Menu} {Card, Bill}

  28. Issues of LR-Sets No correct LR-sets for a word {Card, Menu} {Card, Bill}

  29. Issues of LR-Sets Partially wrong knowledge Picture {Picture, Pic, Flick}

  30. Issues of LR-Sets Partially wrong knowledge {Picture, Pic, Flick}

  31. Addressing Issues No correct LR- Relaxing wrong sets for a word sets for a word Partially wrong Word Correlation knowledge + GPU

  32. Addressing Issues No correct LR- Relaxing wrong sets for a word sets for a word Partially wrong Word Correlation knowledge + GPU

  33. Relaxing Wrong LR-sets {Card, Menu} {Card, Bill}

  34. Relaxing Wrong LR-sets {Card, Menu} {Card, Bill} {Card}

  35. Estimate Knowledge {Picture, Image} {Picture, Painting}

  36. Word Distributions From LDA Word Prob Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002

  37. Estimate Word Correlation Word Prob Picture 0.20 Image 0.15 Photo 0.12 {Picture, Image} Quality 0.10 Resolution 0.05 {Picture, Painting} … Painting 0.0002

  38. Word Correlation Matrix C Word Prob Picture 0.20 Image 0.15 Photo 0.12 {Picture, Image} Quality 0.10 0.15 / 0.20 Resolution 0.05 {Picture, Painting} … 0.0002 / 0.20 Painting 0.0002

  39. Quality of LR-set s Towards w

  40. Relaxing Wrong LR-sets {Card, Menu} Q(s1, “Card”) < ɛ {Card, Bill} Q(s2, “Card”) < ɛ

  41. Relaxing Wrong LR-sets {Card, Menu} Q(s1, “Card”) < ɛ {Card, Bill} Q(s2, “Card”) < ɛ {Card}

  42. Addressing Issues No correct LR- Relaxing wrong sets for a word sets for a word Partially wrong Word Correlation knowledge + GPU

  43. Simple Pólya Urn Model (SPU)

  44. Simple Pólya Urn Model (SPU)

  45. Simple Pólya Urn Model (SPU)

  46. Simple Pólya Urn Model (SPU)

  47. Simple Pólya Urn Model (SPU)

  48. Simple Pólya Urn Model (SPU) The richer get richer!

  49. Interpreting LDA Under SPU

  50. Interpreting LDA Under SPU picture Topic 0

  51. Interpreting LDA Under SPU picture picture Topic 0

  52. Generalized Pólya Urn Model (GPU)

  53. Generalized Pólya Urn Model (GPU)

  54. Generalized Pólya Urn Model (GPU)

  55. Generalized Pólya Urn Model (GPU)

  56. Generalized Pólya Urn Model (GPU)

  57. Applying GPU picture Topic 0

  58. Applying GPU picture picture image painting Topic 0

  59. Applying GPU picture picture image painting Word Correlation Topic 0

  60. Addressing Issues No correct LR- Relaxing wrong sets for a word sets for a word Partially wrong Word Correlation knowledge + GPU

  61. Evaluation

  62. Evaluation Four domains KL-Divergence Evaluation Topic Coherence Human Evaluation

  63. Model Comparison LDA (Blei et al., 2003) LDA-GPU (Mimno et al., 2011) DF-LDA (Andrzejewski et al., 2009) MDK-LDA (Chen et al., 2013) GK-LDA

  64. KL-Divergence

  65. Topic Coherence (#T = 15)

  66. Human Evaluation

  67. Example Topics love

  68. Conclusions Discovering Coherent Topics Using General Knowledge

  69. Conclusions Discovering Coherent Topics Using General Knowledge No correct LR- sets for a word Partially wrong knowledge

  70. Conclusions Discovering Coherent Topics Using General Knowledge No correct LR- Relaxing wrong sets for a word sets for a word Partially wrong Word Correlation knowledge + GPU

  71. Datasets: http://www.cs.uic.edu/~zchen/

  72. Datasets: http://www.cs.uic.edu/~zchen/

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend