How Hierarchical Topics Evolve in Large Text Corpora A - - PowerPoint PPT Presentation

how hierarchical topics evolve in large text corpora
SMART_READER_LITE
LIVE PREVIEW

How Hierarchical Topics Evolve in Large Text Corpora A - - PowerPoint PPT Presentation

Intro Overview Weiwei Cui, Member, IEEE, Shixia Liu, Senior Member, IEEE, Zhuofeng Wu, Hao Wei How Hierarchical Topics Evolve in Large Text Corpora A visualization for how topics of texts change over time Case study Edward Snowden and the


slide-1
SLIDE 1

Intro Overview

How Hierarchical Topics Evolve in Large Text Corpora

A visualization for how topics of texts change over time

Case study Edward Snowden and the PRISM scandal Idea Make trees out of topics Problem Dense graphs are tough to read and navigate through Solution Facet, details on demand, and align Criticism

Weiwei Cui, Member, IEEE, Shixia Liu, Senior Member, IEEE, Zhuofeng Wu, Hao Wei

slide-2
SLIDE 2

Intro Case study

Example topics

  • “Snowden” vs “NSA”
  • “Traitor” vs “Hero”

Too broad?

PRISM scandal : Edward Snowden leaked documents

slide-3
SLIDE 3

Intro How to read

XKCD Inspiration for a similar system, this one uses same ideas

slide-4
SLIDE 4

Idea Topic trees as they evolve

Snowden

Hero

On the lam

Traitor

Snowden

Traitor

Russia

Hero

TIME

Problems

  • Topics are not at same level
  • Changes are tough to track
  • Users get lost when drilling down
slide-5
SLIDE 5

Solution Overview Iterative analysis

Workfmow Using

Visualize Algorithm Analyze Domain knowledge Refjne Interaction

slide-6
SLIDE 6

Solution Tree cut

Tree cut Every path from the root of the tree to a leaf will contain exactly one node from the cut

slide-7
SLIDE 7

Solution Align Align twice

  • For a unit of time
  • For a level of the tree
slide-8
SLIDE 8

Solution Details on demand

Word cloud exposes structure of visualization

slide-9
SLIDE 9

Solution Cut and repeat

Break large topic into smaller topics

  • Large abstract topics may not be meaningful
  • Algorithm may not choose correctly

Iterate

  • More in line with how people actually think
slide-10
SLIDE 10

Solution Analysis

So, how do you glean meaning from this?

(a) a new topic is emerging (b) a topic is still active but changes slowly (c) a topic is active, but changes immensely (d) a momentary topic emerges and disappears rapidly.

slide-11
SLIDE 11

Solution Analysis

What does this tell us about the news cycle? What part is most important? Which story is most important? What aren't we seeing?

slide-12
SLIDE 12

Analysis Criticism

Good

  • Lowers cognitive load
  • Manual manipulation makes sense
  • Supports natural exploration process

OK, maybe not a problem

  • Not really an algorithmic solution
  • Requires domain knowledge to use

Bad

  • Screen real estate ≠ importance
  • Absolute Y-Pos means nothing, but it looks like it should
  • Crossing, do we have to accept bad semantics?