Word Storms: Multiples of Word Clouds for Visual Comparison of - - PowerPoint PPT Presentation

word storms multiples of word clouds for visual
SMART_READER_LITE
LIVE PREVIEW

Word Storms: Multiples of Word Clouds for Visual Comparison of - - PowerPoint PPT Presentation

Word Storms: Multiples of Word Clouds for Visual Comparison of Documents Quim Castell, Charles Sutton (WWW-2014) Zoltn Szab Gatsby Unit, Tea Talk Decembert 18, 2014 Zoltn Szab Words Storms Motivation Vast number of documents on


slide-1
SLIDE 1

Word Storms: Multiples of Word Clouds for Visual Comparison of Documents

Quim Castellá, Charles Sutton (WWW-2014) Zoltán Szabó

Gatsby Unit, Tea Talk Decembert 18, 2014

Zoltán Szabó Words Storms

slide-2
SLIDE 2

Motivation

Vast number of documents on the web. Need for quick scanning. Word clouds (Google: 963.000 hits; LDA - 172.000 hits):

One of the most popular generators: Wordle. Font size = frequency of the word.

Zoltán Szabó Words Storms

slide-3
SLIDE 3

Key Problem

Word clouds are difficult to compare visually. Word storm:

made of word clouds, word cloud = subset of documents, allows efficient contrasting, comparison of documents. Goal: visualize an entire corpus.

Zoltán Szabó Words Storms

slide-4
SLIDE 4

Cloud Examples

One cloud :=

  • ne document: comparing individual docs,
  • ne track of a conference: ∼ areas,

papers from a given period: ∼ time evolution,

  • ne scientific field (+its subfield): ∼ hierarchical categories.

Zoltán Szabó Words Storms

slide-5
SLIDE 5

Guiding Principles

1

Each cloud should represent its own document.

2

Clouds should be easy to compare/contrast. ⇒ Co-occuring words: similar

font size, color, position, orientation.

Zoltán Szabó Words Storms

slide-6
SLIDE 6

Creating a Single Cloud: Notations

Word cloud = set of words: W = {w1, . . . , wM}. Each word w ∈ W has a

position: pw = (xw, yw), font size: sw, color: cw.

Importance of a word (=:its weight): tf.

W = words with the top M weights.

Zoltán Szabó Words Storms

slide-7
SLIDE 7

Creating a Single Cloud

Font size ∝ word weight. Color, orientation: random. Position: spiral algorithm (next slide).

Zoltán Szabó Words Storms

slide-8
SLIDE 8

Creating a Single Cloud: Spiral Algorithm

Given: word cloud with i − 1 words. New word w to the desired/random location:

If

no intersection with previous words, and ∈ frame, then goto next word.

Else: w is moved outward until a valid position.

Zoltán Szabó Words Storms

slide-9
SLIDE 9

Spiral Algorithm: Formally

Zoltán Szabó Words Storms

slide-10
SLIDE 10

Creating a Storm

ith document: ui = (uiw): count of word w in the ith doc. ith word cloud: vi = (Wi, {piw}, {ciw}, {siw}). Alg-1:

Color: α-channel = idf = log

  • |docs|

|docs containing w|

  • .

⇒ transparent: the word appears in many docs. Locations:

Initialization: spiral method. Iterate: desired locations := ˆ Eclouds[previous locations].

Zoltán Szabó Words Storms

slide-11
SLIDE 11

Coordinated Layout: Alg-1

Problem: tends to move words far away from center.

Zoltán Szabó Words Storms

slide-12
SLIDE 12

Coordinated Layout: Alg-2 – Objective

Set of documents: u1:N = {u1, . . . , uN}. Storm: v1:N = {v1, . . . , vN}. Objective (how well the storm fits the corpus): fu1:N(v1:N) =

N

  • i,j=1

[du(ui, uj) − dv(vi, vj)]2

  • similar docs are mapped to similar clouds

+

N

  • i=1

c(ui, vi)

  • faithful repr. of the own doc

. First term: MDS. du: Euclidean distance. κ ≥ 0 dv(vi, vj) =

  • w∈Wi∪Wj

(siw − sjw)2 + κ

  • w∈Wi∩Wj
  • piw − pjw
  • 2

2 .

Second term: c(ui, vi) =

  • w∈Wi

(uiw − siw)2.

Zoltán Szabó Words Storms

slide-13
SLIDE 13

Coordinated Layout: Alg-2 – Objective

Two more penalties (λ > 0, µ > 0): r(v1:N) = λ

N

  • i=1
  • w,w′∈Wi

O2

i:w,w′

  • words do not overlap

N

  • i=1
  • w∈Wi

piw2

2

  • compact configuration

. Oi:w,w′: minimum distance required to separate

  • verlapping words (w, w′).

Final objective: fu1:N(v1:N) + r(v1:N) → minv1:N. Optimization:

homotopy scheme in λ, fixed subtask: gradient descent.

Zoltán Szabó Words Storms

slide-14
SLIDE 14

Coordinated Layout: Combined Algorithm

Iterative algorithm: fast, but not compact. Gradient method: compact storm, but slow. In practise: combination gives decent results.

Zoltán Szabó Words Storms

slide-15
SLIDE 15

Numerical Illustration

User study: users are better in

  • utlier document detection,

the discovery of the two most similar documents.

ICML-2012:

visualization of sessions, http://icml.cc/2012/whatson-all/.

Research grant abstract visualization (EPSRC):

1 − 5th = material sciences, 6th = maths. independent vs. coordinated layout.

Zoltán Szabó Words Storms

slide-16
SLIDE 16

EPSRC programmes: independent clouds

Zoltán Szabó Words Storms

slide-17
SLIDE 17

EPSRC programmes: coordinated storm

Zoltán Szabó Words Storms

slide-18
SLIDE 18

Coordinated Storm: Interpretation

(a)-(e) similar: ’material’, ’applications’, ’properties’. Contrast, absence of words:

’coating’ only in (b) and (d), no ’material’ in (f).

Informative words (transparency): ’electron’ (a), ’metal’ (b), ’light’ (c), ’crack’ (d), ’composite’ (e), ’problems’ (f).

Zoltán Szabó Words Storms

slide-19
SLIDE 19

Summary

Independent word clouds are difficult to compare. Word storm:

Similar clouds represent similar documents. Emphasizes the most informative words. Useful in comparing/contrasting documents.

Source code: http://groups.inf.ed.ac.uk/cup/ wordstorm/wordstorm.html

Zoltán Szabó Words Storms