how hierarchical topics evolve in large text corpora
play

How Hierarchical Topics Evolve in Large Text Corpora A - PowerPoint PPT Presentation

Intro Overview Weiwei Cui, Member, IEEE, Shixia Liu, Senior Member, IEEE, Zhuofeng Wu, Hao Wei How Hierarchical Topics Evolve in Large Text Corpora A visualization for how topics of texts change over time Case study Edward Snowden and the


  1. Intro Overview Weiwei Cui, Member, IEEE, Shixia Liu, Senior Member, IEEE, Zhuofeng Wu, Hao Wei How Hierarchical Topics Evolve in Large Text Corpora A visualization for how topics of texts change over time Case study Edward Snowden and the PRISM scandal Idea Make trees out of topics Problem Dense graphs are tough to read and navigate through Solution Facet, details on demand, and align Criticism

  2. Intro Case study PRISM scandal : Edward Snowden leaked documents Example topics • “Snowden” vs “NSA” • “Traitor” vs “Hero” Too broad?

  3. Intro How to read XKCD Inspiration for a similar system, this one uses same ideas

  4. Idea Topic trees as they evolve Snowden Snowden Hero Traitor On the lam Hero Russia Traitor TIME Problems • Topics are not at same level • Changes are tough to track • Users get lost when drilling down

  5. Solution Overview Workfmow Using Iterative analysis Visualize Algorithm Analyze Domain knowledge Refjne Interaction

  6. Solution Tree cut Tree cut Every path from the root of the tree to a leaf will contain exactly one node from the cut

  7. Solution Align Align twice • For a unit of time • For a level of the tree

  8. Solution Details on demand Word cloud exposes structure of visualization

  9. Solution Cut and repeat Break large topic into smaller topics • Large abstract topics may not be meaningful • Algorithm may not choose correctly Iterate • More in line with how people actually think

  10. Solution Analysis So, how do you glean meaning from this? (a) a new topic is emerging (b) a topic is still active but changes slowly (c) a topic is active, but changes immensely (d) a momentary topic emerges and disappears rapidly.

  11. Solution Analysis What does this tell us about the news cycle? What part is most important? Which story is most important? What aren't we seeing?

  12. Analysis Criticism Good • Lowers cognitive load • Manual manipulation makes sense • Supports natural exploration process OK, maybe not a problem • Not really an algorithmic solution • Requires domain knowledge to use Bad • Screen real estate ≠ importance • Absolute Y-Pos means nothing, but it looks like it should • Crossing, do we have to accept bad semantics?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend