1
Big Data Era
https://vimeo.com/1029987741
Big Data Era 1 1 https://vimeo.com/102998774 The big problem: - - PowerPoint PPT Presentation
Big Data Era 1 1 https://vimeo.com/102998774 The big problem: Scalability Visualization Algorithm Hardware 2 The big problem: Scalability Visualization Algorithm Hardware https://upload.wikimedia.org/wikipedia/commons/0/05/Sna_large.png
1
1
2
The big problem: Scalability
Hardware Algorithm Visualization
3
The big problem: Scalability
Hardware Algorithm Visualization
https://upload.wikimedia.org/wikipedia/commons/0/05/Sna_large.png https://upload.wikimedia.org/wikipedia/commons/9/9b/Social_Network_Analysis_Visualization.png https://c1.staticflickr.com/5/4033/4520018121_6dd39e8d7e_z.jpg https://c1.staticflickr.com/1/1/916142_ddc2fd0140.jpg4
represents the original unfiltered graph:
Graph Sampling
5
Which sampling strategy to use?
6
[Leskovec and Faloutsos, KDD 2006]
Graph Sampling Evaluation
Random Walk (RW) v.s. Forest Fire (FF)
7
Graph Sampling Evaluation in Visualization
Original Graph
Power-law degree distribution
Power-law degree distribution
Random Walk (RW) Forest Fire (FF) Distinct Visual Result!
8
Graph Sampling Evaluation in Visualization
Statistical Features: Hub Inclusion Clustering Coeff. Discovery Quotient … ?
Data Mining Visualization
Similarity Measurements
9
Graph Sampling Evaluation in Visualization
Statistical Features: Hub Inclusion Clustering Coeff. Discovery Quotient … Visual Factors: ?
Data Mining Visualization
Similarity Measurements
G1: Identify the key visual factors that makes the sampled graphs representative G2: Evaluate the performance of different sampling algorithms on these visual factors
Goals Procedure
Pilot Study Formal Studies
10
Outline
11
Node-Based Sampling
Original Graph Random Node Sampling
12
Node-Based Sampling
Original Graph Random Node Sampling
13
Node-Based Sampling
Original Graph Random Node Sampling
14
Node-Based Sampling
Original Graph Random Node Sampling
15
Edge-Based Sampling
Original Graph Random Edge Sampling
16
Edge-Based Sampling
Original Graph Random Edge Sampling
17
Edge-Based Sampling
Original Graph Random Edge Sampling
18
Traversal-Based Sampling: Random Walk
Original Graph Random Walk
19
Traversal-Based Sampling: Random Walk
Original Graph Random Walk
20
Traversal-Based Sampling: Random Jump
Original Graph Random Jump
21
Traversal-Based Sampling: Random Jump
Original Graph Random Jump
22
Traversal-Based Sampling: Forest Fire
Original Graph Forest Fire
23
Traversal-Based Sampling: Forest Fire
Original Graph Forest Fire
24
Outline
25
sampled graphs
Pilot Study
Dataset: 5 Real-World Graphs Visual Factor Candidates
26
High Degree Nodes Cluster Quality Coverage Area
Pilot Study
Results (key visual factors)
sampled graphs
Visual Factor Candidates
27
Outline
28
Formal Study I: High Degree Nodes
Original Graph
20 high degree nodes
Sampled Graph
8 high degree nodes?
A B A B
29
Formal Study I: High Degree Nodes
30
Formal Study I: High Degree Nodes
20 high degree nodes
N: 1024, D: S N: 2048, D: S N: 1024, D: L N: 2048, D: LExperiment Setting Data Generation
31
Formal Study I: High Degree Nodes Results
32
Number of high degree nodes perceived (Visualization): +
Formal Study I: High Degree Nodes Results
Number of high degree nodes remained (Data Mining): *
Contradiction with metric-based results!
RW FF
33
Formal Study I: High Degree Nodes Results
Random Walk (RW) Forest Fire (FF)
16 high degree nodes remained 7 high degree nodes remained
34
Formal Study I: High Degree Nodes Results
Random Walk (RW) Forest Fire (FF)
16 high degree nodes remained 7 high degree nodes remained 6 high degree nodes perceived 3 high degree nodes perceived
35
Outline
36
Formal Study II: Cluster Quality
37
Formal Study II: Cluster Quality
Experiment Setting Data Generation
38
Formal Study II: Cluster Quality Results
39
Formal Study II: Cluster Quality Results
The number of clusters remained is important for perceiving the cluster quality in visualization!
40
Outline
41
Formal Study III: Coverage Area
42
Formal Study III: Coverage Area
Experiment Setting Data Generation
N: 1024, D: S N: 2048, D: S N: 1024, D: L N: 2048, D: L43
Formal Study III: Coverage Area Results
Contradiction with metric-based results!
44
Formal Study III: Coverage Area Results
RW RN
45
influence the perception of node-link visualizations
cluster quality
Conclusion
Graph sampling performance in visualization may VARY from previous metric-based results!
Evaluation of Graph Sampling: A Visualization Approach
Yanhong Wu, Nan Cao, Daniel Archambault, Qiaomu Shen, Huamin Qu, and Weiwei Cui
yanhong.wu@ust.hk http://yhwu.me