CS-5630 / CS-6630 Visualization for Data Science Graphs Alexander - - PowerPoint PPT Presentation

cs 5630 cs 6630 visualization for data science graphs
SMART_READER_LITE
LIVE PREVIEW

CS-5630 / CS-6630 Visualization for Data Science Graphs Alexander - - PowerPoint PPT Presentation

CS-5630 / CS-6630 Visualization for Data Science Graphs Alexander Lex alex@sci.utah.edu [xkcd] Graph Exercise Links and Link Attributes Nodes and Node Attributes Co-author, co-author - # joint papers Author (# papers) Carolina, Alex -


slide-1
SLIDE 1

CS-5630 / CS-6630 Visualization for 
 Data Science Graphs

Alexander Lex alex@sci.utah.edu

[xkcd]

slide-2
SLIDE 2

Graph Exercise

Nodes and Node Attributes

Author (# papers)

Carolina (6), Miriah (42) Alex (36), Sean (8), Marc (40) Nils (51), Silvia (110)

Links and Link Attributes

Co-author, co-author - # joint papers

Carolina, Alex - 2 Sean, Miriah - 7 Miriah, Alex - 2 Alex, Sean - 1 Alex, Nils - 10 Alex, Marc - 24 Marc, Silvia - 1 Marc, Nils - 8

slide-3
SLIDE 3

Carolina(6) Miriah(42) Alex(36) Sean(8) Marc(40) Nils(51) Silvia(110) 2 7 2 1 24 10 8 2

slide-4
SLIDE 4

Carolina (6) Miriah (42) Alex (36) Sean (8) Marc (40) Nils (51) Silvia (110) Carolina (6) 2 Miriah (42) 2 7 Alex (36) 2 2 1 14 10 Sean (8) 7 1 Marc (40) 14 8 1 Nils (51) 10 8 Silvia (110) 1

slide-5
SLIDE 5

Graphs

slide-6
SLIDE 6

Without graphs, there would be none of these:`

Applications of Graphs

slide-7
SLIDE 7

www.itechnews.net

slide-8
SLIDE 8

Biological Networks

Interaction between genes, proteins and chemical products The brain: connections between neurons Your ancestry: the relations between you and your family Phylogeny: the evolutionary relationships of life

[Beyer 2014]

slide-9
SLIDE 9

Michal 2000

slide-10
SLIDE 10

Graph Analysis Case Study

slide-11
SLIDE 11

Graph Theory Fundamentals

See also “Network Science”, Barabasi
 http://barabasi.com/networksciencebook/chapter/2

Network Tree Bipartite Graph Hypergraph

slide-12
SLIDE 12

Königsberg Bridge Problem (1736)

http://barabasi.com/networksciencebook/chapter/2#bridges

Leonhard Euler: 
 Only possible with a graph with at most two nodes with an odd number of links. This graph has four nodes with odd number of links.

Can you take a walk and visit every land mass without crossing a bridge twice?

slide-13
SLIDE 13

Graph Terms

A graph G(V,E) consists of a set

  • f vertices V (also called nodes)

and a set of edges E (also called links) connecting these vertices. Graph and Network are often used interchangeably

slide-14
SLIDE 14

Graph Term: Simple Graph

A simple graph G(V,E) is a graph which contains no multi-edges and no loops

Not a simple graph!
 à A general graph

slide-15
SLIDE 15

Graph Term: Directed Graph

A directed graph (digraph) is a graph that discerns between the edges and .

B A B A

slide-16
SLIDE 16

Graph Terms: Hypergraph

A hypergraph is a graph
 with edges connecting
 any number of vertices.

Hypergraph Example

slide-17
SLIDE 17

Unconnected Graphs, Articulation Points

Unconnected graph
 An edge traversal starting from
 a given vertex cannot reach any


  • ther vertex.

Articulation point
 Vertices, which if deleted from
 the graph, would break up the
 graph in multiple sub-graphs.

Unconnected Graph Articulation Point (red)

slide-18
SLIDE 18

Biconnected, 
 Bipartite Graphs

Biconnected graph
 A graph without articulation
 points. 
 Bipartite graph
 The vertices can be partitioned
 in two independent sets.

Biconnected Graph Bipartite Graph

slide-19
SLIDE 19

Tree

A graph with no cycles - or: A collection of nodes contains a root node and 0-n subtrees subtrees are connected to root by an edge

root

T1 T2 T3 Tn …

slide-20
SLIDE 20

A C D B E F G H I A D C B F E G H I

Ordered Tree

slide-21
SLIDE 21

Different Kinds of Graphs

Network Tree Bipartite Graph Hypergraph

  • A. Brandstädt et al. 1999

Over 1000 different graph classes

slide-22
SLIDE 22

Degree

Node degree deg(x)
 The number of edges being incident to this node. For directed graphs indeg/outdeg are considered separately. Average degree Degree distribution

slide-23
SLIDE 23

Degree Distribution of a real Network

Protein Interaction Network

Percent of Nodes Degree

slide-24
SLIDE 24

Degrees

Degree is a measure of local importance

slide-25
SLIDE 25

Betweenness Centrality

a measure of how many shortest paths pass through a node good measure for the overall relevance of a node in a graph

slide-26
SLIDE 26

Degree vs BC

slide-27
SLIDE 27

Paths & Distances

Path is route along links Path length is the number of links contained Shortest paths connects nodes i and j with the smallest number of links Diameter of graph G
 The longest shortest path within G.

A path from 1 to 6 Shortest paths (two) from 1 to 7.

slide-28
SLIDE 28

Graph and Tree Visualization

slide-29
SLIDE 29

Setting the Stage

GRAPH DATA GOAL / TASK Visualization Interaction GRAPHICAL
 REPRESENTATION

How to decide which representation to use for which type of graph in order to achieve which kind of goal?

slide-30
SLIDE 30

Different Kinds of Tasks/Goals

Two principal types of tasks: attribute-based (ABT) and topology-based (TBT)
 Localize – find a single or multiple nodes/edges that fulfill a given property

  • ABT: Find the edge(s) with the maximum edge weight.
  • TBT: Find all adjacent nodes of a given node.


Quantify – count or estimate a numerical property of the graph

  • ABT: Give the number of all nodes.
  • TBT: Give the degree of a node.


Sort/Order – enumerate the nodes/edges according to a given criterion

  • ABT: Sort all edges according to their weight.
  • TBT: Traverse the graph starting from a given node.

list adapted from Schulz 2010

slide-31
SLIDE 31

Three Types of Graph Representations

Matrix Explicit 
 (Node-Link) Implicit

slide-32
SLIDE 32

Explicit Graph Representations

Node-link diagrams: vertex = point, edge = line/arc

A C B D E

Free Styled Fixed

HJ Schulz 2006

slide-33
SLIDE 33

Criteria for Good 
 Node-Link Layout

Minimized edge crossings Minimized distance of neighboring nodes Minimized drawing area Uniform edge length Minimized edge bends Maximized angular distance between different edges Aspect ratio about 1 (not too long and not too wide) Symmetry: similar graph structures should look similar

list adapted from Battista et al. 1999

slide-34
SLIDE 34

Conflicting Criteria

Schulz 2004

Minimum number


  • f edge crossings



 vs.
 
 Uniform edge length Space utilization
 
 vs.
 
 Symmetry

slide-35
SLIDE 35

Force Directed Layouts

Physics model: 
 edges = springs,
 vertices = repulsive magnets

Spring Coil
 (pulling nodes together) Expander 
 (pushing nodes apart)

slide-36
SLIDE 36

Algorithm

Place Vertices in random locations While not equilibrium

calculate force on vertex sum of pairwise repulsion of all nodes attraction between connected nodes move vertex by c * force on vertex

slide-37
SLIDE 37

Properties

Generally good layout Uniform edge length Clusters commonly visible Not deterministic Computationally expensive: O(n3) n2 in every step, it takes about n cycles to reach equilibrium Limit (interactive): ~1000 nodes in practice: damping, center of gravity

http://bl.ocks.org/steveharoz/8c3e2524079a8c440df60c1ab72b5d03

slide-38
SLIDE 38

[van Ham et al. 2009]

Giant Hairball

slide-39
SLIDE 39

Adress Computational Scalability: Multilevel Approaches

[Schulz 2004]

real vertex virtual vertex internal spring external spring virtual spring Metanode A Metanode B Metanode C

slide-40
SLIDE 40

Abstraction/Aggregation

750 nodes 30k nodes 18 nodes 90 nodes

cytoscape.org

slide-41
SLIDE 41

Collapsible Force Layout

Supernodes: aggregate of nodes manual or algorithmic

clustering

slide-42
SLIDE 42

HOLA: Human-like Orthogonal Layout

Study how humans lay-out a graph Try to emulate layout

Left: human, middle: conventional algo, right new algo

[Kieffer et al, InfoVis 2015]

slide-43
SLIDE 43
slide-44
SLIDE 44

Graphs in 3D

Why, why not visualize graphs in 3D? Why, why not use AR/VR?

https://twitter.com/alexsigaras/status/860560655031685121

slide-45
SLIDE 45

Styled / Restricted Layouts

Circular Layout Node ordering Edge Clutter

  • ca. 3% of all possible edges
  • ca. 6,3% of all possible edges
slide-46
SLIDE 46

Reduce Clutter: Edge Bundling

Holten et al. 2006

slide-47
SLIDE 47

Hierarchical Edge Bundling

Bundling Strength

Holten et al. 2006

slide-48
SLIDE 48

Bundling Strength

Michael Bostock

mbostock.github.com/d3/talk/20111116/bundle.html

slide-49
SLIDE 49

Fixed Layouts

Can’t vary position of nodes Edge routing important

slide-50
SLIDE 50

Aggregation

https://www.youtube.com/watch?v=E1PVTitj7h0

slide-51
SLIDE 51

Explicit Tree Visualization

Reingold– Tilford layout

http://billmill.org/pymag- trees/

slide-52
SLIDE 52

Manipulating Aggregation Levels

First interactive tree manipulation

Douglas Engelbart 1968 - http://www.1968demo.org

(a) Drill-Down (b) Roll-Up (a) Unbalanced Drill-Down “The mother of all demos“ https://www.youtube.com/watch?v=yJDv-zdhzMY

slide-53
SLIDE 53

Tree Interaction, Tree Comparison

slide-54
SLIDE 54

Explicit Representations

Pros:

is able to depict all graph classes can be customized by weighing the layout constraints very well suited for TBTs, if also a suitable layout is chosen


Cons:

computation of an optimal graph layout is in NP
 (even just achieving minimal edge crossings is already in NP) even heuristics are still slow/complex (e.g., naïve spring embedder is in O(n3)) has a tendency to clutter (edge clutter, “hairball”)

slide-55
SLIDE 55

Design Critique

slide-56
SLIDE 56

Connected China

http://china.fathom.info/ https://goo.gl/YXkWYX

slide-57
SLIDE 57

Multivariate Graphs

slide-58
SLIDE 58

Networks and Attributes


Attributes can influence topology Path can be slow / blocked

best route when driving depends on traffic biological network depends on many factors

slide-59
SLIDE 59

Challenge: Data Scale & Heterogeneity

Large number of values

Large datasets have more than 500 experiments

Multiple groups/conditions Different types of data

slide-60
SLIDE 60

Challenge: Supporting Multiple Tasks

Two central tasks:

Explore topology of network Explore the attributes of the nodes 
 (experimental data)

Need to support both!

C B D F A E

slide-61
SLIDE 61

Many Node Attributes

Pathway A A F B C E D G

Node Sample 1 Sample 2 Sample 3 … 0.55 0.12 0.33 … 0.95 0.42 0.65 … 0.83 0.16 0.38 … … … … A B C … Node Sample 1 Sample 2 Sample 3 … low normal high … low low very low … very high high normal … … … … A B C …

C

How to visualize attribute data on networks?

slide-62
SLIDE 62

Good Old Color Coding

A

  • 3.4

B 2.8 C 3.1 D

  • 3

E 0.5 F 0.3

C B D F A E

4.2 5.1 4.2 1.8 1.3 1.1

  • 2.2 2.4 2.2
  • 2.8 1.6 1.0

0.3 -1.1 1.3 0.3 1.8 -0.3

[Lindroos2002]

slide-63
SLIDE 63

Node Attributes

Coloring Glyphs

  • > Limited in scalability
slide-64
SLIDE 64

Small Multiples

Cerebral [Barsky, 2008] Each dimension in its

  • wn window
slide-65
SLIDE 65

Data-driven node positioning

GraphDice Nodes are laid out according to attribute values

[Bezerianos et al, 2010]

slide-66
SLIDE 66

Pathway View A E C B D F enRoute View

Path Extraction: enRoute

Group 1 Dataset 1 Group 2 Dataset 1 Group 1 Dataset 2

B C F A D E D A E

Non-Genetic Dataset

slide-67
SLIDE 67

enRoute

slide-68
SLIDE 68

Video

slide-69
SLIDE 69
slide-70
SLIDE 70

Case Study: CCLE Data

22

slide-71
SLIDE 71
slide-72
SLIDE 72

Pathfinder: 
 Visual Analysis of Paths in Graphs

[EuroVis ‘16] Honorable Mention Award

slide-73
SLIDE 73

Intelligence Data: How are two suspects connected?

slide-74
SLIDE 74

Intelligence Data: How are two suspects connected?

slide-75
SLIDE 75

Biological Network: How do two genes interact?

slide-76
SLIDE 76

Coauthor Network: How is HP Pfister connected to Ben Shneiderman?

Photo by John Consoli

slide-77
SLIDE 77

Pathfinder

Visual Analysis of Paths 
 in Large Multivariate Graphs

slide-78
SLIDE 78

Pathfinder Approach

Query for paths

slide-79
SLIDE 79

Pathfinder Approach

Show query result only… … as node-link diagram

slide-80
SLIDE 80

Pathfinder Approach

1. 2. Path Score … and as ranked list Update ranking to identify important paths

slide-81
SLIDE 81

Pathfinder Approach

1. 2. Path Score Update ranking to identify important paths

slide-82
SLIDE 82

Query Interface

slide-83
SLIDE 83

Path Representation

Numerical Attributes Sets

slide-84
SLIDE 84

Pathways Grouped Copy Number and Gene Expression Data

slide-85
SLIDE 85

Visualizing Edge Attributes

Most common ways to encode edge attributes QuanRtaRve: Width Ordinal: Saturation Nominal: Style

slide-86
SLIDE 86

Visualizing Edge Attributes

In practice very limited Example: Sashimi Plots

slide-87
SLIDE 87

What’s the Problem?

10 8 15 7

slide-88
SLIDE 88

Junction View

slide-89
SLIDE 89

Junction View - Group Comparison

slide-90
SLIDE 90

Junction View - Group Comparison

slide-91
SLIDE 91

Junction View - Group Comparison

slide-92
SLIDE 92

Case Study: Leukemia vs Glioblastoma

average expression for exon 4 exon 4 exon 8 (p1) (p2) (p3)

slide-93
SLIDE 93

Matrix Representations

slide-94
SLIDE 94

Matrix Representations

Instead of node link diagram, use adjacency matrix

A C B D E A B C D E A B C D E

slide-95
SLIDE 95

Matrix Representations

Examples:

HJ Schulz 2007

slide-96
SLIDE 96

Matrix Representations

Well suited for 
 neighborhood-related TBTs

van Ham et al. 2009 Shen et al. 2007

Not suited for 
 path-related TBTs

slide-97
SLIDE 97

McGuffin 2012

slide-98
SLIDE 98

Order Critical!

slide-99
SLIDE 99

Matrix Representations

Pros:

can represent all graph classes except for hypergraphs puts focus on the edge set, not so much on the node set simple grid -> no elaborate layout or rendering needed well suited for ABT on edges via coloring of the matrix cells well suited for neighborhood-related TBTs via traversing rows/columns

Cons:

quadratic screen space requirement (any possible edge takes up space) not suited for path-related TBTs

slide-100
SLIDE 100

Special Case: Genealogy

slide-101
SLIDE 101

Hybrid Explicit/Matrix

NodeTrix
 [Henry et al. 2007]

slide-102
SLIDE 102

Matrix Representations

Problem #1: used screen real estate is quadratic in the number of nodes Solution approach: hierarchization of the representation

[van Ham et al. 2009]

slide-103
SLIDE 103

Trees

Tree-Exercise

slide-104
SLIDE 104

Tree Exercise

Here is part of a directory structure used for the material for this class and the relative file size. datavis-17/ lectures/ Intro.key (110 MB) perception/ Perception.key (113 MB) Blindness.mov (15MB) Data.key (12 MB) Graphs.key (180 MB) exams/ Exam1-solution.doc (5MB) Exam1.doc (1MB) exercise/ Graph.doc (3MB) Graph-video.doc (210MB)

Sketch two different visualizations that show both, the directory structure and the size of the directories and the contained files.

slide-105
SLIDE 105

Implicit Layouts for Trees

slide-106
SLIDE 106

Tree Maps

Johnson and Shneiderman 1991

slide-107
SLIDE 107

Squarified Treemaps

Original Algorithm lead to thin slices Squarified treemaps [Bruls, Huizing, Van Wijk 2000]

Before After

slide-108
SLIDE 108

Seeing Tree Structure

Unframed Framed

slide-109
SLIDE 109

Zoomable Treemap

slide-110
SLIDE 110

Software

Mac: GrandPerspective Windows: Sequoia View

slide-111
SLIDE 111

Example: Interactive TreeMap of a Million Items

Fekete et al. 2002

slide-112
SLIDE 112

Sunburst: Radial Layout

[Sunburst by John Stasko, Implementation in Caleydo by Christian Partl]

slide-113
SLIDE 113

Differences? Pros, Cons?

slide-114
SLIDE 114

Implicit Representations

Pros:

space-efficient because of the lack of explicitly drawn edges: scale well up to very large graphs in most cases well suited for ABTs on the node set depending on the spatial encoding also useful for TBTs

Cons:

can only represent trees since the node positions are used to represent edges, they can no longer be freely arranged (e.g., to reflect geographical positions) useless to pursue any task on the edges spatial relations such as overlap or inclusion lead to occlusion

slide-115
SLIDE 115

Tree Visualization Reference

slide-116
SLIDE 116

Graph Tools & Applications

slide-117
SLIDE 117

Gephi

http://gephi.org

slide-118
SLIDE 118

Cytoscape

Open source platform for complex network analysis

http://www.cytoscape.org/

slide-119
SLIDE 119

Cytoscape Web


http://cytoscapeweb.cytoscape.org/

slide-120
SLIDE 120

NetworkX


https://networkx.github.io/