1
Network Analysis
Ma Maneesh Agrawala
CS 448B: Visualization Winter 2020
1
Last Time: Network Layout
2
Network Analysis Ma Maneesh Agrawala CS 448B: Visualization - - PDF document
Network Analysis Ma Maneesh Agrawala CS 448B: Visualization Winter 2020 1 Last Time: Network Layout 2 1 Interactive Example: Configurable Force Layout 3 Linear node layout, circular arcs show connections. Layout quality sensitive to
1
1
2
2
3
4
3
5
Edge-crossings and occlusion
6
4
8 9
5
Goal: Ensure similar items placed near each other. E.g., minimize sum of distances of adjacent items. Requires combinatorial optimization: NP-Hard! Instead, approximate / heuristic approaches used:
Perform hierarchical clustering, sort cluster tree Apply approximate traveling salesperson solver
Seriation initially used in archaeology for relative dating of artifacts based on observed properties
11
I e.g., scatter plot based on node values
12
6
The “Skitter” Layout
Angle = Longitude
Radius = Degree
13
Semantic Substrates [Shneiderman 06]
14
7
I Filtering and Focus + Context techniques
15
16
8
New visualization research or data analysis project
I Research: Pose problem, Implement creative solution I Data analysis: Analyze dataset in depth & make a visual explainer
Deliverables
I Research: Implementation of solution I Data analysis/explainer: Article with multiple interactive
visualizations
I 6-8 page paper
Schedule
I Project proposal: Wed 2/19 I Design review and feedback: 3/9 and 3/11 I Final presentation: 3/16 (7-9pm) Location: TBD I Final code and writeup: 3/18 11:59pm
Grading
I Groups of up to 3 people, graded individually I Clearly report responsibilities of each member
17
*Slides adapted from E. Adar’s / L. Adamic’s Network Theory and Applications course slides.
18
9
http://diseasome.eu/
20
http://www.lx97.com/maps/
21
10
Lombardi, M. ‘George W. Bush, Harken Energy and Jackson Stephens, ca 1979–90’
23
24
11
25
26
12
www.opte.org
Size? Density? Centralization? Clustering? Components? Cliques? Motifs?
…
27
28
13
29
30
14
Shortest path (geodesic path)
I The shortest sequence of links connecting two nodes I Not always unique
n A and C are connected by 2 shortest paths
n A – E – B - C n A – E – D - C A B C D E
31
1 2 4 3 5 6 7 Shortest path from 2 to 3: 1
32
15
1 2 4 3 5 6 7
Shortest path from 2 to 3?
33
34
16
X X X X Y Y Y Y
indegree betweenness closeness 35
+ j ij i i D
36
17
N−1 37
Ability to broker between groups Likelihood that information originating anywhere in the network reaches you
38
18
Y X Y X
X Y
39
A B C E D
40
19
j,k≠i, j<k
gjk = the number of paths connecting jk gjk(i) = the number that node i is on. Normalization:
' (i) = CB(i )/[(n −1)(n − 2)/2]
number of pairs of vertices excluding the vertex itself
41
Likelihood that information originating anywhere in the network reaches you
43
20
j=1, j≠i N
−1
' (i) = (CC(i)) / (N −1) =
j=1, j≠i N
Closeness Centrality: Normalized Closeness Centrality
Being close to the center of the graph
44
45
21
Prestige ~ indegree centrality Betweenness ~ consider directed shortest paths Closeness ~ consider nodes from which target node can be reached Influence range ~ nodes reachable from target node Straight-forward modifications to equations for non-directed graphs 46
Low Degree Low Closeness Low Betweenness High Degree High Closeness High Betweenness
Node embedded in cluster that is far from the rest of the network Node's connections are redundant - communication bypasses him/her Node links to a small number of important/active
Many paths likely to be in network; node is near many people, but so are many others Node’s few ties are crucial for network flow
monopolizes the ties from a small number of people to many others.
47
22
Freeman’s general formula for centralization:
i=1 g
maximum value in the network
48
CD = (5− 5)+(5−1)×5 (6 −1)(6 − 2) =1
1 *
g i i D D D
49
23
CD = 0.167 CD = 0.167 CD = 1.0
50
51
24
55
I Directed: emax = n*(n-1) I Undirected: emax = n*(n-1)/2
56
25
57
Strongly connected components
I Each node in component can be reached from every other node
in component by following directed links
n B C D E n A n G H n F
A B C D E F G H
n A B C D E n G H F
Weakly connected components
I Each node can be reached from every other node by following
links in either direction
58
26
61
Process:
I Calculate affinity weights W for all pairs of vertices I Start: N disconnected vertices I Adding edges (one by one) between pairs of clusters in order of
decreasing weight (use closest distance to compare clusters)
I Result: nested components
62
27
63
65
28
I Compute Cb of all edges I Remove edge i where Cb(i) == max(Cb) I Recalculate betweenness
66
Local clustering coefficient: Global clustering coefficient:
CG = 3* number of closed triplets number of connected triplets
CG= 3*1/5 = 0.6
Ci = number of closed triplets centered on i number of connected triplets centered on i
i Ci = 1/3 = 0.33
67
29
69
http://mavisto.ipk-gatersleben.de/frequency_concepts.html
motif matches motif to be found graph
70
30
71
84
31
Milgram (1967)
I Mean path length in US social networks I ~ 6 hops separate any two people
85
Watts and Strogatz 1998
I a few random links in an otherwise structured graph make
the network a small world
regular lattice: my friend’s friend is always my friend small world: mostly structured with a few random connections random graph: all connections random
86
32
I high clustering I low mean shortest path
I neural network of C. elegans, I semantic networks of languages, I actor collaboration graph I food webs
network
graph random network
87
many nodes with few edges fat tail: a few nodes with a very large number of edges
number of edges number of nodes
90
33
I Centrality I Community structure I Pattern finding
à Widely applicable across domains
92