Serendip Topic Model-Driven Visual Exploration of Text Corpora Eric - - PowerPoint PPT Presentation

serendip
SMART_READER_LITE
LIVE PREVIEW

Serendip Topic Model-Driven Visual Exploration of Text Corpora Eric - - PowerPoint PPT Presentation

Serendip Topic Model-Driven Visual Exploration of Text Corpora Eric Alexander , Department of Computer Science, University of Wisconsin-Madison Joe Kohlmann , Department of Computer Science, University of Wisconsin-Madison Robin Valenza


slide-1
SLIDE 1

Serendip

Topic Model-Driven Visual Exploration

  • f Text Corpora

Eric Alexander, Department of Computer Science, University of Wisconsin-Madison

  • Joe Kohlmann, Department of Computer Science, University of Wisconsin-Madison
  • Robin Valenza, Department of English, University of Wisconsin-Madison
  • Michael Witmore, Folger Shakespeare Library in Washington, D.C.

Michael Gleicher, Member, IEEE, Department of Computer Science, University of Wisconsin-Madison

slide-2
SLIDE 2

Serendip: Topic Model-Driven Visual Exploration of Text Corpora 2

slide-3
SLIDE 3

Serendip: Topic Model-Driven Visual Exploration of Text Corpora

Serendipity

  • „The Three Princes of Serendip”
  • Happy accidents
  • Fate or extreme cleverness
  • Research: How to make coincidence more likely?
  • „The bohemian bookshelf“ by Thudt et al

3

slide-4
SLIDE 4

Serendip: Topic Model-Driven Visual Exploration of Text Corpora

Promoting Serendipity

  • A. Thudt, U. Hinrichs, and S. Carpendale

„The bohemian bookshelf: supporting serendipitous book discoveries through information visualization“ In Proc. ACM Human Factors in Computing Systems

  • Providing multiple access points
  • Highlighting adjacencies
  • Offering flexible pathways for exploration
  • Enticing curiosity and playfulness

4

slide-5
SLIDE 5

Serendip: Topic Model-Driven Visual Exploration of Text Corpora

Three Views

  • CorpusViewer: Re-orderable matrix
  • TextViewer: Examination within one document
  • RankViewer: Examine specific words

5

slide-6
SLIDE 6

Serendip: Topic Model-Driven Visual Exploration of Text Corpora

CorpusViewer

6 documents topics

slide-7
SLIDE 7

Serendip: Topic Model-Driven Visual Exploration of Text Corpora 7

  • Ordering
  • Aggregation
  • Annotation
  • Assigning colors
  • Details on demand

Features combatting scale

slide-8
SLIDE 8

Serendip: Topic Model-Driven Visual Exploration of Text Corpora

TextViewer

8

slide-9
SLIDE 9

Serendip: Topic Model-Driven Visual Exploration of Text Corpora

RankViewer

9

slide-10
SLIDE 10

Serendip: Topic Model-Driven Visual Exploration of Text Corpora

How are the factors for Serendipity implemented?

  • Multiple access points: 3 Views, Ordered to user’s liking
  • Highlighting adjacencies: Through ordering and vis
  • Flexible pathways: Jumping between views
  • Curiosity and playfulness: Interaction & Discovery

10

slide-11
SLIDE 11

Serendip: Topic Model-Driven Visual Exploration of Text Corpora

  • Vis Abstracts

1127 abstracts From SciVis, InfoVis, VAST, BioVis, and PacificVis papers from 2007-2013 each 30 to 389 words

  • Early Modern Literature

1080 digitized texts From English literature published between 1530 to 1799 each few hundred words to few hundred pages

11

2 use cases

  • To confirm serendipitous discoveries across multiple scales of

data and abstraction

  • Problem: How to evaluate serendipity?
  • Long-term user studies needed

Formal evaluation still due

slide-12
SLIDE 12

Serendip: Topic Model-Driven Visual Exploration of Text Corpora

Discussion

+ strengths

  • Suitable for any document and corpus size
  • Three layers (whole corpus, single document, single words)
  • Simple but effective visualizations
  • Easily accessible (Online tool; though topic modelling part still due)
  • weaknesses
  • Not for quick exploration
  • Sceptical about serendipitous discoveries

12

slide-13
SLIDE 13

Serendip: Topic Model-Driven Visual Exploration of Text Corpora 13

System Serendip What Data

  • Document collection (Text, Metadata, Topics)

Why Tasks

  • Explore document collections

Facilitate serendipitous discoveries How

  • Encode

Reorderable matrix, line graph, bar graph Reduce

  • Aggregation, Ordering

Manipulate

  • Order, Color, Annotate, Details-on-Demand

Scale Up to ~1,000 Documents (various length)