Visual Analysis of High-Dimensional Event Sequence Data via Dynamic - - PowerPoint PPT Presentation

visual analysis of high dimensional event sequence data
SMART_READER_LITE
LIVE PREVIEW

Visual Analysis of High-Dimensional Event Sequence Data via Dynamic - - PowerPoint PPT Presentation

Visual Analysis of High-Dimensional Event Sequence Data via Dynamic Hierarchical Aggregation David Gotz, Jonathan Zhang, Wenyuan Wang, Joshua Shrestha, David Borland University of North Carolina at Chapel Hill IEEE Transactions on Visualization


slide-1
SLIDE 1

Visual Analysis of High-Dimensional Event Sequence Data via Dynamic Hierarchical Aggregation

David Gotz, Jonathan Zhang, Wenyuan Wang, Joshua Shrestha, David Borland

University of North Carolina at Chapel Hill IEEE Transactions on Visualization and Computer Graphics, 2019 CPSC 547 | Kevin Chow

1

slide-2
SLIDE 2

Event Sequences

  • Time-ordered lists of discrete events
  • Analyze to discover patterns or rare event paths
  • But… real-world datasets are large and complex:
  • Volume and length of event sequences
  • High-dimensional event data

2

slide-3
SLIDE 3

3

Volume and length of event sequences

slide-4
SLIDE 4

3

Volume and length of event sequences

Aggregate sequences

slide-5
SLIDE 5

3

Volume and length of event sequences High-dimensional event data

Aggregate sequences

slide-6
SLIDE 6

3

Volume and length of event sequences High-dimensional event data

Aggregate sequences Group events

slide-7
SLIDE 7

Grouping Events

  • Typically, events are grouped in a pre-processing step
  • Requires foreknowledge and expertise about events

4

ICD-10 Coding System

I50: Heart Failure I50.2: Systolic Heart Failure

I50.21: Acute Systolic Heart Failure

……

Event type hierarchy

slide-8
SLIDE 8

Grouping Events

  • Can’t change event groups interactively
  • May want multiple groupings — different levels of detail
  • An ideal grouping may not exist — data- and task-

dependent

5

slide-9
SLIDE 9

Cadence

6

Visual Analysis for Medical Event Sequences

slide-10
SLIDE 10

Cadence

6

Visual Analysis for Medical Event Sequences

slide-11
SLIDE 11

Cadence

7

Visual Analysis for Medical Event Sequences

slide-12
SLIDE 12

Cadence

8

Visual Analysis for Medical Event Sequences

slide-13
SLIDE 13

Dynamic Hierarchical Aggregation

9

slide-14
SLIDE 14

Dynamic Hierarchical Aggregation

  • 1. Determining an optimal and adjustable level of grouping

events based on an informativeness score

10

slide-15
SLIDE 15

Dynamic Hierarchical Aggregation

  • 1. Determining an optimal and adjustable level of grouping

events based on an informativeness score

  • 2. Supporting navigation of the event type hierarchy with a

scatter-plus-focus visualization

11

slide-16
SLIDE 16

Dynamic Hierarchical Aggregation

  • 1. Determining an optimal and adjustable level of grouping

events based on an informativeness score

  • 2. Supporting navigation of the event type hierarchy with a

scatter-plus-focus visualization

  • 3. Scenting to enable discovery of interesting event types

12

slide-17
SLIDE 17

Informativeness Score

  • Computed for each event type j in the event type hierarchy
  • Measures the strength of the association between an

event type and the outcome

  • If this patient had outcome v, did they also experience event

type j?

  • Based on the chi-square test statistic

13

slide-18
SLIDE 18

Algorithm: Optimal Grouping Level

  • Goal: Determine the most

informative cut through the event type hierarchy

  • Recursively traverse event type

hierarchy

  • Compare informativeness score
  • f parent with each child

14

slide-19
SLIDE 19

Algorithm: Optimal Grouping Level

15

Rj = # of children more informative than parent

total # of children

  • 1. No more children (leaf)
  • 2. where

Rj ≤ R 0 ≤ R ≤ 1

Add j to cut if: (else, recurse)

R controls level of aggregation (larger = more aggregation)

slide-20
SLIDE 20

Scatter-plus-Focus

16

Scatter plot Focused dual-view

slide-21
SLIDE 21

Scatter-plus-Focus

17

  • Challenges of overplotting!
  • Grey hexes hint at density
  • f all possible event types
  • Marks are only event types

part of informative cut

  • Control with slider

R

slide-22
SLIDE 22

Scatter-plus-Focus

18

  • Focuses on hierarchy of

selected event type

  • X-axis is centred on

correlation

  • Y-axis: determined by
  • ptimization-based layout

algorithm

slide-23
SLIDE 23

Algorithm: Optimize Layout

  • Cost function that balances two layout priorities:
  • Y-positions should be close to original in scatter view
  • Marks should not overlap
  • Two constraints:
  • Optimized y-positions must be within y-axis scale
  • Original y-position order of marks must be preserved

19

slide-24
SLIDE 24

Algorithm: Optimize Layout

20

No changes to y-positions With algorithm

slide-25
SLIDE 25

21

Scenting

  • Shows up when exploring type hierarchy in focused view
  • Scent value: range of correlations to outcome in children
  • Size of glyph indicates magnitude of scent value
slide-26
SLIDE 26

Evaluation

  • 3 medical experts: health researchers with data analysis

experience

  • Hands-on demonstration and semi-structured interviews
  • Results from thematic analysis:
  • Training is required
  • Automated selection of aggregation level useful
  • Navigating through event type hierarchy was intuitive

22

slide-27
SLIDE 27

What-Why-How Analysis

23

What: Data

  • Tree (event type

hierarchy)

  • Table (patient data)

What: Derived

  • Optimal event grouping
  • Informativeness score,

scent value, optimized y- positions

Why

  • Discover and

produce (event type groupings)

Scale: 5,000 patients, 700,000 events, 10,000 unique event types

slide-28
SLIDE 28

What-Why-How Analysis

24

How: Encode

  • Scatterplots
  • Color (outcome correlation)

How: Reduce

  • Item aggregation (grouping

event types)

  • Scenting (picking event type)

How: Change

  • Select (mark in scatter)

How: Facet

  • Overview+detail view

(scatter-plus-focus)

  • Layering (grey hexes in

background)

slide-29
SLIDE 29

Critique

  • Strengths
  • Intuitive, simple algorithms
  • Dealt with challenges of occlusion and distortion
  • Switching between views and parameter control reduces

load

  • Generalizable to contexts other than health

25

slide-30
SLIDE 30

Critique

  • Weaknesses/Limitations
  • Automated approach to aggregation may hide better

custom groupings

  • Adding event type groups can be tedious
  • Reliance on tree-based event type hierarchy

26

slide-31
SLIDE 31

27

Thank You!

Visual Analysis of High-Dimensional Event Sequence Data via Dynamic Hierarchical Aggregation

slide-32
SLIDE 32

28