Visualization for Rich Text Corpora Nan Cao, Jimeng Sun, Yu-Ru Lin, - - PowerPoint PPT Presentation

visualization for rich text
SMART_READER_LITE
LIVE PREVIEW

Visualization for Rich Text Corpora Nan Cao, Jimeng Sun, Yu-Ru Lin, - - PowerPoint PPT Presentation

1 FacetAtlas: Multifaceted Visualization for Rich Text Corpora Nan Cao, Jimeng Sun, Yu-Ru Lin, David Gotz Shixia Liu, Huamin Qu InfoVis 2010 2 Introduction 3 multiple facets 4 Symptoms Treatments multiple facets Causes Tests &


slide-1
SLIDE 1

1

Nan Cao, Jimeng Sun, Yu-Ru Lin, David Gotz

Shixia Liu, Huamin Qu InfoVis 2010

FacetAtlas: Multifaceted Visualization for Rich Text Corpora

slide-2
SLIDE 2

2

Introduction

slide-3
SLIDE 3

3

multiple facets

slide-4
SLIDE 4

4

multiple facets

Symptoms Treatments Causes Tests & Diagnosis Prognosis Prevention Complications

slide-5
SLIDE 5

5

Diabetes

slide-6
SLIDE 6

6

Type2

Metabolic Syndrome

Type1

Gestational Diabetes

Diabetes

slide-7
SLIDE 7

7

Type2

Metabolic Syndrome

Type1

Gestational Diabetes

Diabetes

How to visualize the relations

  • f multifaceted document contents?
slide-8
SLIDE 8

8

Type2

Metabolic Syndrome

Type1

Gestational Diabetes

Diabetes

(Q1) How to model the document contents into multifaceted relation data? (Q2) How to intuitively visualize multifaceted document contents and their relations? (Q3) How to find the insight patterns visually driven by users’ interests?

slide-9
SLIDE 9

9

Solution

  • Goal :

– Visualize both the global (clusters) and local (relations) patterns in rich text corpora with multiple facets.

  • Approach :

– Multifaceted entity-relational data model – Intuitive visual encoding and automatic layout – Users’ interests driven interaction for pattern detection

slide-10
SLIDE 10

10

Demo

slide-11
SLIDE 11

11

(Q1) How to model the document contents into multifaceted relation data?

(Q2) How to intuitively visualize multifaceted document contents and their relations? (Q3) How to find the insight patterns visually driven by users’ interests?

Key Challenges

slide-12
SLIDE 12

12 (Q1) How to model the document contents into multifaceted relational data ?

document set entity set multifaceted entity relational data model facet segmentation symptom disease treatment entity extraction

type 1 diabetes type 2 diabetes take medications blood sugar control thirst blurred vision

Internal relations External relations

slide-13
SLIDE 13

13

(Q1) How to model the document contents into multifaceted relation data?

(Q2) How to intuitively visualize multifaceted document contents and their relations?

(Q3) How to find the insight patterns visually driven by users’ interests?

Key Challenges

slide-14
SLIDE 14

14

1, 2 3, 4 3, 4 5, 6

<4, 2> <4, 3> <5, 3> <5, 1>

1, 2 2, 3 4

<1, 2> <1, 3>

1

encoding

2

layout data model

(Q2) How to visualize multifaceted document contents and their relations?

slide-15
SLIDE 15

15

1, 2 3, 4 3, 4 5, 6

<4, 2> <4, 3> <5, 3> <5, 1>

1, 2 2, 3 4

<1, 2> <1, 3>

1

encoding

2

layout data model

(Q2) How to visualize multifaceted document contents and their relations?

slide-16
SLIDE 16

16

16

Multifaceted Entity Relational Model

Encoding

slide-17
SLIDE 17

17

17

1 2 3 4 1 2 4 5 6 3

Type 2 Diabetes

disease

Type 1 Diabetes

Multifaceted Entity Relational Model

Encoding

slide-18
SLIDE 18

18

18

1 2 3 4 1 2 4 5 6 3

symptoms treatments

Type 2 Diabetes

disease

Type 1 Diabetes

Multifaceted entities External relations Internal relations

Multifaceted Entity Relational Model

Encoding

slide-19
SLIDE 19

19

19

1 2 3 4 1 2 4 5 6 3

symptoms treatments

disease

Type 2 Diabetes Type 1 Diabetes

Group entities by external relations Multifaceted entities External relations Internal relations

Multifaceted Entity Relational Model

Group internal relations

Encoding

slide-20
SLIDE 20

20

Encoding

1 2 3 4

1, 2 3, 4 3, 4 5, 6 treatments

Type 2 Diabetes Type 1 Diabetes

  • 1. Encode external

relations by neighborhood

  • 2. Split overlap

entities into multiple replicas

  • 3. Group related

entities and their replicas in into the facet node

  • 4. Grouping the

related internal linkages in the symptom facet

<1, 5> <2, 4> <3, 4> <3, 5>

Encoded external relation between disease facet and symptom facet Facet Node Grouped internal relation Overlapped entities has multiple replicas

slide-21
SLIDE 21

21

Encoding

1, 2 2, 3 4

1, 2 3, 4 3, 4 5, 6 Symptom facet node treatments

Type 2 Diabetes

disease

Type 1 Diabetes

  • 1. Similarly groups

the treatments entities into the treatment facet node

  • 2. Then we

encoded the data model into visual form

<1, 5> <2, 4> <3, 4> <3, 5> <1, 2> <1, 3>

slide-22
SLIDE 22

22

1, 2 3, 4 3, 4 5, 6

<4, 2> <4, 3> <5, 3> <5, 1>

1, 2 2, 3 4

<1, 2> <1, 3>

1

encoding

2

layout data model

(Q2) How to visualize multifaceted document contents and their relations?

slide-23
SLIDE 23

23

23

Layout

10,000 entities and 30,000 external relations

slide-24
SLIDE 24

24

24

density estimation entity layout sampling link layout

Layout

slide-25
SLIDE 25

25

Layout

Sampling by DOI

density estimation entity layout link layout sampling

build indices

  • ffline
  • nline

query related samples

document set entity extraction facet segmentation symptom disease treatment

slide-26
SLIDE 26

26

Layout

 

           

 

  j i i i j i ij j i ij

X pre X d X X d

2 2 2

) ( 1 min

Cluster Together More smoothly

Stabilized Layout

Based on the hidden internal relations of primary facet Keep users’ mental map while data changed

density estimation entity layout link layout sampling

slide-27
SLIDE 27

27

Layout

Kernel Density Estimation

RNN Cluster Layout density estimation entity layout link layout sampling

slide-28
SLIDE 28

28

Layout

Link Layout (1)

Layout external relations

swapping density estimation entity layout link layout sampling rotating

slide-29
SLIDE 29

29

Layout

Link Layout (2)

graph partition edge bundling density estimation entity layout link layout sampling

slide-30
SLIDE 30

30

Fever

slide-31
SLIDE 31

31

Diabetes

slide-32
SLIDE 32

32

HIV

slide-33
SLIDE 33

33

Where are our patterns? What can we find ?

HIV

slide-34
SLIDE 34

34

(Q1) How to model the document contents into multifaceted relation data? (Q2) How to visualize multifaceted information to reveal both global and local patterns?

(Q3) How to find the insight patterns visually driven by users’ interests?

Key Challenges

slide-35
SLIDE 35

35

(Q3) How to find insights via user interactions?

Keyword Query Context Switch Filtering Highlighting

A set of interactions are designed to address users’ interests

context switch filtering Disease view Symptom view

slide-36
SLIDE 36

36

  • Global cluster patterns
  • Local multifaceted relational pattern

– Co-occurrences pattern – Outlier pattern

Visual Patterns

Outlier Fever Headache Fatigue Shortness of Breath Co-occurrence Symptoms of HIV

slide-37
SLIDE 37

37

What did domain experts (3 physicians) say?

  • “enhance the current thought process of physicians, and help

create the subtle associations between different concepts.”

  • “this will be very helpful for nurses who run the self-care

education activities to better engage patients.”

  • “this tool has great potential as an education tool for interns

and residents who have just started their medical career”

  • “extremely creative and has great potential for clinical therapeutic

usage and diagnosis decision support”

(Q3) Interview of domain experts

slide-38
SLIDE 38

38

Summary

  • Problem : How to visualize relations of multifaceted

document contents ? Global / Local patterns

  • Approach :
  • Result :
slide-39
SLIDE 39

39

Nan Cao, Jimeng Sun, Yu-Ru Lin, David Gotz, Shixia Liu, Huamin Qu InfoVis 2010

FacetAtlas: Multifaceted Visualization for Rich Text Corpora

slide-40
SLIDE 40

40

Related Work

Visualizing Local Relational Patterns Visualizing Global Content Patterns Search Interface

  • S. Havre, et al.

InfoVis 2000

  • H. Strobelt, et al.

InfoVis 09

Tag Cloud

  • F. van Ham, et al.

InfoVis 2009

  • G. Smith

TVCG 2006

Grokker

  • M. W. Christopher, et al.

Vast 2009

  • F. van Ham, et al.

InfoVis 2009

  • A. Pere, et al.

InfoVis 2006

slide-41
SLIDE 41

41

Related Work

Visualizing Local Relational Patterns Visualizing Global Content Patterns Search Interface

  • S. Havre, et al.

InfoVis 2000

  • H. Strobelt, et al.

InfoVis 09

Tag Cloud

  • F. van Ham, et al.

InfoVis 2009

  • G. Smith

TVCG 2006

Grokker

  • M. W. Christopher, et al.

Vast 2009

  • F. van Ham, et al.

InfoVis 2009

  • A. Pere, et al.

InfoVis 2006

Our Focus :

Extract complex relations from document contents by considering different aspects

slide-42
SLIDE 42

42

Evaluations

42

slide-43
SLIDE 43

43

43

  • Participants

– 3 domain experts (2 physicians with 30 years experience in the healthcare domain, and 1 young medical professional) – 20 common users without medical background (2 groups and 10 for each)

  • 6 study tasks based on the Google

Health online documents

– T4 : identify the facet with the most cross-cluster connections. – T6 : identify the facet with the most

  • verall connection across entities.
  • Baseline

– Enhanced Traditional Graph Visualization – Based on the same framework with similarly interactions on the same dataset

User study

slide-44
SLIDE 44

44

Evaluation Results from non-experts

surveys

Result (based on two tail t-test)

  • Significant efficiency improvement in

– Visualizing the clusters – Showing an overview of multiple connections across clusters – Representing the details of multifaceted connection between entities

  • Slight improvement in

– Finding the most connective facet within a cluster

Complete Time Task Success Rate

slide-45
SLIDE 45

46

(Q3) How to find insights via user interactions?

Keyword Query Context Switch Zooming Filtering Highlighting

Interaction interpretation Query Generator Result Processor

data indices

Interactive UI

query driven interaction mechanism

What data content need to be fetched? How to fetch? (sql, lucene query)

Fetch the data and convert the result into visual form A set of interactions are designed to address users’ interests

slide-46
SLIDE 46

47

(Q3) How to find visual patterns driven by user interests

Interaction interpretation Query Generator Result Processor

data indices

query driven interaction mechanism

Fetch symptoms

  • f diabetes

SQL Select symptom from table where disease = “diabetes”

Covert the data into visual form

disease view

Increased thirst blur of vision

symptom view

show all symptoms related with diabetes

Context Switch

slide-47
SLIDE 47

48

Layout

Link Layout (2)

  



k i i k i k i k i k

R r f   sin min

Rotating step tunes node and linkage

  • rientations by minimize the global

tension based on a force model

density estimation entity layout link layout sampling

slide-48
SLIDE 48

49

Local Patterns

  • Co-occurrence

– Entities have strong connections over multiple entities – Semantic similarity metric defines what is “strong”

  • Outlier

– Entities have “strong ” connections however “far away” from each other – Layout closeness defines what is “far away” – “strong ” and “far away”

  • Enhancement by colors

– Automatic adjust the saturation of node

color by pattern metrics

 

M k k ij

j i sim sim

1

,

ij ij ij

sim d c  

ij

d

the shortest path in the graph