Network Analysis Ma Maneesh Agrawala CS 448B: Visualization Fall - - PDF document

network analysis
SMART_READER_LITE
LIVE PREVIEW

Network Analysis Ma Maneesh Agrawala CS 448B: Visualization Fall - - PDF document

Network Analysis Ma Maneesh Agrawala CS 448B: Visualization Fall 2020 1 Last Time: Network Layout 2 1 Force-Directed Layout 3 Interactive Example: Configurable Force Layout 4 2 5 d3.force 7,922 nodes 11,881 edges [Kai Chang] 6 3


slide-1
SLIDE 1

1

Network Analysis

Ma Maneesh Agrawala

CS 448B: Visualization Fall 2020

1

Last Time: Network Layout

2

slide-2
SLIDE 2

2

Force-Directed Layout

3

Interactive Example: Configurable Force Layout

4

slide-3
SLIDE 3

3

5

d3.force 7,922 nodes 11,881 edges

[Kai Chang]

6

slide-4
SLIDE 4

4

Use the Force!

http://mbostock.github.io/d3/talk/20110921/

7

Force-Directed Layout

Nodes = charged particles

F = qi* qj / dij2

with air resistance

F = -b * vi

Edges = springs

F = k * (L - dij)

D3’s force layout uses velocity Verlet integration Assume uniform mass m and timestep Δt: F = ma → F = a → F = Δv / Δt → F = Δv Forces simplify to velocity offsets! Repeatedly calculate forces, update node positions

Naïve approach O(N2) Speed up to O(N log N) using quadtree or k-d tree Numerical integration of forces at each time step 8

slide-5
SLIDE 5

5

9 10

slide-6
SLIDE 6

6

Naive calculation of forces at a point uses sum of forces from all other n-1 points.

11

For fast approximate calculation, we build a spatial index (here, a quadtree) and use it to compare with distant groups of points instead.

12

slide-7
SLIDE 7

7

The Barnes-Hut θ parameter controls when to compare with an aggregate center of charge. wquadnode / dij < θ ? θ = 0.5

13

θ = 0.9 (default setting)

14

slide-8
SLIDE 8

8

θ = 1.5

15

θ = 2.0

16

slide-9
SLIDE 9

9

Alternative Layouts

18

Linear node layout, circular arcs show connections. Layout quality sensitive to node ordering!

19

slide-10
SLIDE 10

10

The Shape of Song [Wattenberg ’01]

20

Limitations of Node-Link Layout

Edge-crossings and occlusion

21

slide-11
SLIDE 11

11

22

Attribute-Driven Layout

Large node-link diagrams get messy! Is there additional structure we can exploit? Idea: Use data attributes to perform layout

I e.g., scatter plot based on node values

Dynamic queries and/or brushing can be used to explore connectivity

23

slide-12
SLIDE 12

12

Attribute-Driven Layout

The “Skitter” Layout

  • Internet Connectivity
  • Radial Scatterplot

Angle = Longitude

  • Geography

Radius = Degree

  • # of connections
  • (a statistic of the nodes)

24

Semantic Substrates [Shneiderman

06]

Semantic Substrates [Shneiderman 06]

25

slide-13
SLIDE 13

13

Summary

Tree Layout

Indented / Node-Link / Enclosure / Layers How to address issues of scale?

I Filtering and Focus + Context techniques

Graph Layout

Tree layout over spanning tree Hierarchical “Sugiyama” Layout Optimization (Force-Directed Layout) Attribute-Driven Layout

26

Announcements

27

slide-14
SLIDE 14

14

Final project

Data analysis/explainer or conduct research

I Data analysis: Analyze dataset in depth & make a visual explainer I Research: Pose problem, Implement creative solution

Deliverables

I Data analysis/explainer: Article with multiple interactive

visualizations

I Research: Implementation of solution and web-based demo if possible I Short video (2 min) demoing and explaining the project

Schedule

I Project proposal: Thu 10/29 I Design Review and Feedback: Tue 11/17 & Thu 11/19 I Final code and video: Sat 11/21 11:59pm

Grading

I Groups of up to 3 people, graded individually I Clearly report responsibilities of each member

28

Network Analysis

*Slides adapted from E. Adar’s / L. Adamic’s Network Theory and Applications course slides.

29

slide-15
SLIDE 15

15

http://diseasome.eu/

Diseases

31

http://www.lx97.com/maps/

Transportation

32

slide-16
SLIDE 16

16

Lombardi, M. ‘George W. Bush, Harken Energy and Jackson Stephens, ca 1979–90’

34

Actors and movies (bipartite)

35

slide-17
SLIDE 17

17

36

Characterizing networks

What does it look like?

37

slide-18
SLIDE 18

18

www.opte.org

Size? Density? Centralization? Clustering? Components? Cliques? Motifs?

  • Avg. path length?

38

Topics

Network Analysis

  • Centrality / centralization
  • Community structure
  • Pattern identification
  • Models

39

slide-19
SLIDE 19

19

Centrality

40

How far apart are things?

41

slide-20
SLIDE 20

20

Distance: shortest paths

Shortest path (geodesic path)

I The shortest sequence of links connecting two nodes I Not always unique

n A and C are connected by 2 shortest paths

n A – E – B - C n A – E – D - C A B C D E

42

1 2 4 3 5 6 7 Shortest path from 2 to 3: 1

Distance: shortest paths

43

slide-21
SLIDE 21

21

1 2 4 3 5 6 7

Distance: shortest paths

Shortest path from 2 to 3?

44

Most important node?

45

slide-22
SLIDE 22

22

Centrality

X X X X Y Y Y Y

  • utdegree

indegree betweenness closeness 46

Degree centrality (undirected)

å

= = =

+ j ij i i D

A A n d C ) (

47

slide-23
SLIDE 23

23

Normalized degree centrality

CD(i) = d(i)

N−1 48

When is degree not sufficient?

Does not capture

Ability to broker between groups Likelihood that information originating anywhere in the network reaches you

49

slide-24
SLIDE 24

24

Betweenness

Assuming nodes communicate using the most direct (shortest) route, how many pairs of nodes have to pass information through target node?

Y X Y X

X Y

50

Betweenness - examples

non-normalized:

A B C E D

51

slide-25
SLIDE 25

25

CB(i) = gjk(i) / gjk

j,k≠i, j<k

gjk = the number of paths connecting jk gjk(i) = the number that node i is on. Normalization:

CB

' (i) = CB(i )/[(n −1)(n − 2)/2]

number of pairs of vertices excluding the vertex itself

Betweenness: definition

52

When are Cd, Cb not sufficient?

Do not capture

Likelihood that information originating anywhere in the network reaches you

54

slide-26
SLIDE 26

26

Cc(i) = d(i, j)

j=1, j≠i N

# $ % % & ' ( (

−1

CC

' (i) = (CC(i)) / (N −1) =

N −1 d(i, j)

j=1, j≠i N

Closeness Centrality: Normalized Closeness Centrality

Closeness: definition

Being close to the center of the graph

55

Examples - closeness

56

slide-27
SLIDE 27

27

Centrality in directed networks

Prestige ~ indegree centrality Betweenness ~ consider directed shortest paths Closeness ~ consider nodes from which target node can be reached Influence range ~ nodes reachable from target node Straight-forward modifications to equations for non-directed graphs 57

  • generally different centrality metrics will be positively correlated
  • when they are not, there is likely something interesting about the network
  • suggest possible topologies and node positions to fit each square

Low Degree Low Closeness Low Betweenness High Degree High Closeness High Betweenness

Node embedded in cluster that is far from the rest of the network Node's connections are redundant - communication bypasses him/her Node links to a small number of important/active

  • ther nodes.

Many paths likely to be in network; node is near many people, but so are many others Node’s few ties are crucial for network flow

  • Rare. Node

monopolizes the ties from a small number of people to many others.

Characterizing nodes

58

slide-28
SLIDE 28

28

Freeman’s general formula for centralization:

CD = CD(n*) − CD(i)

[ ]

i=1 g

[(N −1)(N − 2)]

Variation in the centrality scores among the nodes

maximum value in the network

Centralization – how equal

59

Examples

CD = (5− 5)+(5−1)×5 (6 −1)(6 − 2) =1

[ ]

)] 2 )( 1 [( ) ( ) (

1 *

  • = å =

N N n C n C C

g i i D D D

60

slide-29
SLIDE 29

29

CD = 0.167 CD = 0.167 CD = 1.0

Examples

61

Community Structure

66

slide-30
SLIDE 30

30

How dense is it?

  • Max. possible edges:

I Directed: emax = n*(n-1) I Undirected: emax = n*(n-1)/2

density = e/ emax

67

Is everything connected?

68

slide-31
SLIDE 31

31

Strongly connected components

I Each node in component can be reached from every other node

in component by following directed links

n B C D E n A n G H n F

A B C D E F G H

n A B C D E n G H F

Weakly connected components

I Each node can be reached from every other node by following

links in either direction

Connected Components - Directed

69

Community finding (clustering)

72

slide-32
SLIDE 32

32

Hierarchical clustering

Process:

I Calculate affinity weights W for all pairs of vertices I Start: N disconnected vertices I Adding edges (one by one) between pairs of clusters in order of

decreasing weight (use closest distance to compare clusters)

I Result: nested components

73

Cluster Dendrograms

74

slide-33
SLIDE 33

33

Hierarchical clustering (closeness)

76

Betweenness clustering

Girvan and Newman 2002 iterative algorithm:

I Compute Cb of all edges I Remove edge i where Cb(i) == max(Cb) I Recalculate betweenness

77

slide-34
SLIDE 34

34

Clustering coefficient

Local clustering coefficient: Global clustering coefficient:

CG = 3* number of closed triplets number of connected triplets

CG= 3*1/5 = 0.6

Ci = number of closed triplets centered on i number of connected triplets centered on i

i Ci = 1/3 = 0.33

78

Pattern finding - motifs

Define / search for a particular structure, e.g. complete triads

W X Y Z

80

slide-35
SLIDE 35

35

Motifs can overlap in the network

http://mavisto.ipk-gatersleben.de/frequency_concepts.html

motif matches motif to be found graph

81

4 node subgraphs

82

slide-36
SLIDE 36

36

Simulating network models

95

Small world network

Milgram (1967)

I Mean path length in US social networks I ~ 6 hops separate any two people

96

slide-37
SLIDE 37

37

Small world networks

Watts and Strogatz 1998

I a few random links in an otherwise structured graph make

the network a small world

regular lattice: my friend’s friend is always my friend small world: mostly structured with a few random connections random graph: all connections random

97

Defining small world phenomenon

Pattern:

I high clustering I low mean shortest path

Examples

I neural network of C. elegans, I semantic networks of languages, I actor collaboration graph I food webs

) ln(

network

N l »

graph random network

C C >>

98

slide-38
SLIDE 38

38

Summary

Structural analysis

I Centrality I Community structure I Pattern finding

Widely applicable across domains

103