A Sampling-Based Tool for Scaling Graph Datasets ICPE2020 11 th ACM - - PowerPoint PPT Presentation

a sampling based tool for scaling graph datasets
SMART_READER_LITE
LIVE PREVIEW

A Sampling-Based Tool for Scaling Graph Datasets ICPE2020 11 th ACM - - PowerPoint PPT Presentation

A Sampling-Based Tool for Scaling Graph Datasets ICPE2020 11 th ACM / SPEC International Conference on Performance Engineering Ahmed Musaafir, Alexandru Uta, Henk Dreuning, Ana-Lucia Varbanescu Vrije Universiteit Amsterdam & University of


slide-1
SLIDE 1

A Sampling-Based Tool for Scaling Graph Datasets

ICPE2020 11th ACM / SPEC International Conference on Performance Engineering Ahmed Musaafir, Alexandru Uta, Henk Dreuning, Ana-Lucia Varbanescu Vrije Universiteit Amsterdam & University of Amsterdam

slide-2
SLIDE 2

Context

  • Graph datasets
  • Used in different domains (e.g., logistics, biology, social networks, infrastructure networks)
  • Graph processing
  • Different graph processing platforms: Giraph, GraphMat, Gunrock, etc.
  • Graph analytics benchmarking
  • Platform, Algorithm, Dataset, Hardware
  • No in-depth evaluation or performance analysis
  • Which properties of the graph dataset affect performance?

2

slide-3
SLIDE 3

Context

3

Correlated datasets Uncorrelated datasets

slide-4
SLIDE 4

Problem

  • Lack of representative graph datasets
  • Synthetic graph generators
  • Generate a graph from scratch
  • Allow controlling specific graph properties only
  • Graph archives
  • Few types of graphs
  • Small collection and size

4

slide-5
SLIDE 5

Solution

  • Graph scaling
  • Control certain graph properties
  • Predict and tune the properties of scaled-up graphs based on models, guidelines
  • Tool to generate diverse families of graphs fast

5

slide-6
SLIDE 6

Solution

  • Graph scaling
  • Control certain graph properties
  • Predict and tune the properties of scaled-up graphs based on models, guidelines
  • Tool to generate diverse families of graphs fast

Graph Scaling Tool

Input

  • Graph G
  • Scaling factor s
  • Additional parameters

Output

Scaled graph Ge (s times)

6

slide-7
SLIDE 7

Scaling Down

7

slide-8
SLIDE 8

Scaling Down: Graph Sampling

  • Node-based Sampling
  • Node Sampling
  • Edge-based Sampling
  • Random Edge Sampling
  • Totally-Induced Edge Sampling (TIES)
  • Traversal-based Sampling
  • Random Walk
  • Forest Fire

8

slide-9
SLIDE 9

Scaling Down: Graph Sampling

  • Node-based Sampling
  • Node Sampling
  • Edge-based Sampling
  • Random Edge Sampling
  • Totally-Induced Edge Sampling (TIES)
  • Traversal-based Sampling
  • Random Walk
  • Forest Fire

9

Property preservation quality per sampling algorithm, represented as likelihood from low (--) to high (++)

slide-10
SLIDE 10

Scaling Down: Results

10

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3 #Nodes 3,072,441 2,457,952 1,536,220 921,733 #Edges 117,185,083 108,686,099 73,626,482 42,194,208

  • Avg. degree

76.28 88.44 95.85 91.55 Diameter 9 9 10 8 Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05 Components 1 7 17 36

  • Avg. Clustering Coeff.

0.16 0.15 0.15 0.14

  • Avg. Shortest path

4.19 4.05 3.97 3.95

slide-11
SLIDE 11

Scaling Down: Results

11

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3 #Nodes 3,072,441 2,457,952 1,536,220 921,733 #Edges 117,185,083 108,686,099 73,626,482 42,194,208

  • Avg. degree

76.28 88.44 95.85 91.55 Diameter 9 9 10 8 Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05 Components 1 7 17 36

  • Avg. Clustering Coeff.

0.16 0.15 0.15 0.14

  • Avg. Shortest path

4.19 4.05 3.97 3.95

slide-12
SLIDE 12

Scaling Down: Results

12

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3 #Nodes 3,072,441 2,457,952 1,536,220 921,733 #Edges 117,185,083 108,686,099 73,626,482 42,194,208

  • Avg. degree

76.28 88.44 95.85 91.55 Diameter 9 9 10 8 Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05 Components 1 7 17 36

  • Avg. Clustering Coeff.

0.16 0.15 0.15 0.14

  • Avg. Shortest path

4.19 4.05 3.97 3.95

slide-13
SLIDE 13

Scaling Down: Results

13

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3 #Nodes 3,072,441 2,457,952 1,536,220 921,733 #Edges 117,185,083 108,686,099 73,626,482 42,194,208

  • Avg. degree

76.28 88.44 95.85 91.55 Diameter 9 9 10 8 Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05 Components 1 7 17 36

  • Avg. Clustering Coeff.

0.16 0.15 0.15 0.14

  • Avg. Shortest path

4.19 4.05 3.97 3.95

slide-14
SLIDE 14

Scaling Down: Results

14

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3 #Nodes 3,072,441 2,457,952 1,536,220 921,733 #Edges 117,185,083 108,686,099 73,626,482 42,194,208

  • Avg. degree

76.28 88.44 95.85 91.55 Diameter 9 9 10 8 Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05 Components 1 7 17 36

  • Avg. Clustering Coeff.

0.16 0.15 0.15 0.14

  • Avg. Shortest path

4.19 4.05 3.97 3.95

slide-15
SLIDE 15

Scaling Down: Results

15

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3 #Nodes 3,072,441 2,457,952 1,536,220 921,733 #Edges 117,185,083 108,686,099 73,626,482 42,194,208

  • Avg. degree

76.28 88.44 95.85 91.55 Diameter 9 9 10 8 Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05 Components 1 7 17 36

  • Avg. Clustering Coeff.

0.16 0.15 0.15 0.14

  • Avg. Shortest path

4.19 4.05 3.97 3.95

slide-16
SLIDE 16

Scaling Down: Results

16

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3 #Nodes 3,072,441 2,457,952 1,536,220 921,733 #Edges 117,185,083 108,686,099 73,626,482 42,194,208

  • Avg. degree

76.28 88.44 95.85 91.55 Diameter 9 9 10 8 Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05 Components 1 7 17 36

  • Avg. Clustering Coeff.

0.16 0.15 0.15 0.14

  • Avg. Shortest path

4.19 4.05 3.97 3.95

slide-17
SLIDE 17

Scaling Down: Results

17

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3 #Nodes 3,072,441 2,457,952 1,536,220 921,733 #Edges 117,185,083 108,686,099 73,626,482 42,194,208

  • Avg. degree

76.28 88.44 95.85 91.55 Diameter 9 9 10 8 Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05 Components 1 7 17 36

  • Avg. Clustering Coeff.

0.16 0.15 0.15 0.14

  • Avg. Shortest path

4.19 4.05 3.97 3.95

slide-18
SLIDE 18

Scaling Down: Results

18

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3 #Nodes 3,072,441 2,457,952 1,536,220 921,733 #Edges 117,185,083 108,686,099 73,626,482 42,194,208

  • Avg. degree

76.28 88.44 95.85 91.55 Diameter 9 9 10 8 Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05 Components 1 7 17 36

  • Avg. Clustering Coeff.

0.16 0.15 0.15 0.14

  • Avg. Shortest path

4.19 4.05 3.97 3.95

slide-19
SLIDE 19

Scaling Down: Results

19

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3 #Nodes 3,072,441 2,457,952 1,536,220 921,733 #Edges 117,185,083 108,686,099 73,626,482 42,194,208

  • Avg. degree

76.28 88.44 95.85 91.55 Diameter 9 9 10 8 Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05 Components 1 7 17 36

  • Avg. Clustering Coeff.

0.16 0.15 0.15 0.14

  • Avg. Shortest path

4.19 4.05 3.97 3.95

slide-20
SLIDE 20

Scaling Down: Results

20

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3 #Nodes 3,072,441 2,457,952 1,536,220 921,733 #Edges 117,185,083 108,686,099 73,626,482 42,194,208

  • Avg. degree

76.28 88.44 95.85 91.55 Diameter 9 9 10 8 Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05 Components 1 7 17 36

  • Avg. Clustering Coeff.

0.16 0.15 0.15 0.14

  • Avg. Shortest path

4.19 4.05 3.97 3.95

slide-21
SLIDE 21

Scaling Down: Results

21

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3 #Nodes 3,072,441 2,457,952 1,536,220 921,733 #Edges 117,185,083 108,686,099 73,626,482 42,194,208

  • Avg. degree

76.28 88.44 95.85 91.55 Diameter 9 9 10 8 Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05 Components 1 7 17 36

  • Avg. Clustering Coeff.

0.16 0.15 0.15 0.14

  • Avg. Shortest path

4.19 4.05 3.97 3.95

slide-22
SLIDE 22

Scaling Up

22

slide-23
SLIDE 23

Scaling Up: Existing work

  • Graph generators
  • Datagen, Graph500, R-MAT
  • Graph evolution algorithms
  • Focus on evolving the graph
  • Graph scalers
  • GScaler, ReCoN, Musketeer

23

slide-24
SLIDE 24
  • Obtain samples Gi of the original graph G
  • Interconnect the different samples

Scaling Up: Method

24

slide-25
SLIDE 25
  • Obtain samples Gi of the original graph G
  • Interconnect the different samples
  • Example: scale up a graph 4.5 times
  • Sample size: 0.5
  • Results in 9 different samples

Scaling Up: Method

25

Example of scaling up a graph Gs 0...8 = Sampled versions of the graph

slide-26
SLIDE 26
  • Interconnection topologies
  • Star; Chain; Ring; Fully-connected
  • Selecting bridge vertices
  • Random; High-degree
  • Multi-edge interconnections
  • n number of interconnections
  • Directed; undirected

Scaling Up: Method

26

slide-27
SLIDE 27

Scaling Up: Impact on properties

  • Different parameters
  • Interconnection topologies
  • Selecting bridge vertices
  • Multi-edge interconnections
  • Sampling algorithm
  • Sample size
  • Scaling factor
  • Dataset

27

slide-28
SLIDE 28

Scaling Up: Measuring the quality of graph output

  • Given the same parameters, the properties of the expanded graph should be

predictable.

  • Models & guidelines
  • "In case you want to have the scaled-up graph with a larger diameter, choose a chain

topology with a single random bridge".

28

slide-29
SLIDE 29

Scaling Up: Measuring the quality of graph output

  • Given the same parameters, the properties of the expanded graph should be

predictable.

  • Models & guidelines
  • "In case you want to have the scaled-up graph with a larger diameter, choose a chain

topology with a single random bridge". Maximum diameter:

29

slide-30
SLIDE 30

Scaling Up: Results

FB G (original) G x3 G x3 G x3 G x3 G x3 Sample size

  • 0.5

0.5 0.5 0.5 0.5 Topology

  • Star

Chain Fully Connected Star Star Bridge

  • Random

Random Random Random High-degree #Interconnection

  • 1

1 1 45,000 45,000 #Nodes 4,039 12,117 12,114 12,114 12,114 12,115 #Edges 88,234 339,497 340,091 339,777 559,798 560,168

  • Avg. degree

43.69 56.04 56.15 56.09 92.42 92.48 Diameter 8 19 31 15 6 6 Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3 Components 1 7 9 7 2 10

  • Avg. Clustering Coeff.

0.62 0.63 0.63 0.63 0.31 0.46

  • Avg. Shortest path

3.69 9.26 11.79 6.35 2.65 2.92 30

slide-31
SLIDE 31

Scaling Up: Results

FB G (original) G x3 G x3 G x3 G x3 G x3 Sample size

  • 0.5

0.5 0.5 0.5 0.5 Topology

  • Star

Chain Fully Connected Star Star Bridge

  • Random

Random Random Random High-degree #Interconnection

  • 1

1 1 45,000 45,000 #Nodes 4,039 12,117 12,114 12,114 12,114 12,115 #Edges 88,234 339,497 340,091 339,777 559,798 560,168

  • Avg. degree

43.69 56.04 56.15 56.09 92.42 92.48 Diameter 8 19 31 15 6 6 Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3 Components 1 7 9 7 2 10

  • Avg. Clustering Coeff.

0.62 0.63 0.63 0.63 0.31 0.46

  • Avg. Shortest path

3.69 9.26 11.79 6.35 2.65 2.92 31

slide-32
SLIDE 32

Scaling Up: Results

FB G (original) G x3 G x3 G x3 G x3 G x3 Sample size

  • 0.5

0.5 0.5 0.5 0.5 Topology

  • Star

Chain Fully Connected Star Star Bridge

  • Random

Random Random Random High-degree #Interconnection

  • 1

1 1 45,000 45,000 #Nodes 4,039 12,117 12,114 12,114 12,114 12,115 #Edges 88,234 339,497 340,091 339,777 559,798 560,168

  • Avg. degree

43.69 56.04 56.15 56.09 92.42 92.48 Diameter 8 19 31 15 6 6 Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3 Components 1 7 9 7 2 10

  • Avg. Clustering Coeff.

0.62 0.63 0.63 0.63 0.31 0.46

  • Avg. Shortest path

3.69 9.26 11.79 6.35 2.65 2.92 32

slide-33
SLIDE 33

Scaling Up: Results

FB G (original) G x3 G x3 G x3 G x3 G x3 Sample size

  • 0.5

0.5 0.5 0.5 0.5 Topology

  • Star

Chain Fully Connected Star Star Bridge

  • Random

Random Random Random High-degree #Interconnection

  • 1

1 1 45,000 45,000 #Nodes 4,039 12,117 12,114 12,114 12,114 12,115 #Edges 88,234 339,497 340,091 339,777 559,798 560,168

  • Avg. degree

43.69 56.04 56.15 56.09 92.42 92.48 Diameter 8 19 31 15 6 6 Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3 Components 1 7 9 7 2 10

  • Avg. Clustering Coeff.

0.62 0.63 0.63 0.63 0.31 0.46

  • Avg. Shortest path

3.69 9.26 11.79 6.35 2.65 2.92 33

slide-34
SLIDE 34

Scaling Up: Results

FB G (original) G x3 G x3 G x3 G x3 G x3 Sample size

  • 0.5

0.5 0.5 0.5 0.5 Topology

  • Star

Chain Fully Connected Star Star Bridge

  • Random

Random Random Random High-degree #Interconnection

  • 1

1 1 45,000 45,000 #Nodes 4,039 12,117 12,114 12,114 12,114 12,115 #Edges 88,234 339,497 340,091 339,777 559,798 560,168

  • Avg. degree

43.69 56.04 56.15 56.09 92.42 92.48 Diameter 8 19 31 15 6 6 Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3 Components 1 7 9 7 2 10

  • Avg. Clustering Coeff.

0.62 0.63 0.63 0.63 0.31 0.46

  • Avg. Shortest path

3.69 9.26 11.79 6.35 2.65 2.92 34

slide-35
SLIDE 35

Scaling Up: Results

FB G (original) G x3 G x3 G x3 G x3 G x3 Sample size

  • 0.5

0.5 0.5 0.5 0.5 Topology

  • Star

Chain Fully Connected Star Star Bridge

  • Random

Random Random Random High-degree #Interconnection

  • 1

1 1 45,000 45,000 #Nodes 4,039 12,117 12,114 12,114 12,114 12,115 #Edges 88,234 339,497 340,091 339,777 559,798 560,168

  • Avg. degree

43.69 56.04 56.15 56.09 92.42 92.48 Diameter 8 19 31 15 6 6 Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3 Components 1 7 9 7 2 10

  • Avg. Clustering Coeff.

0.62 0.63 0.63 0.63 0.31 0.46

  • Avg. Shortest path

3.69 9.26 11.79 6.35 2.65 2.92 35

slide-36
SLIDE 36

Scaling Up: Results

FB G (original) G x3 G x3 G x3 G x3 G x3 Sample size

  • 0.5

0.5 0.5 0.5 0.5 Topology

  • Star

Chain Fully Connected Star Star Bridge

  • Random

Random Random Random High-degree #Interconnection

  • 1

1 1 45,000 45,000 #Nodes 4,039 12,117 12,114 12,114 12,114 12,115 #Edges 88,234 339,497 340,091 339,777 559,798 560,168

  • Avg. degree

43.69 56.04 56.15 56.09 92.42 92.48 Diameter 8 19 31 15 6 6 Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3 Components 1 7 9 7 2 10

  • Avg. Clustering Coeff.

0.62 0.63 0.63 0.63 0.31 0.46

  • Avg. Shortest path

3.69 9.26 11.79 6.35 2.65 2.92 36

slide-37
SLIDE 37

Scaling Up: Results

FB G (original) G x3 G x3 G x3 G x3 G x3 Sample size

  • 0.5

0.5 0.5 0.5 0.5 Topology

  • Star

Chain Fully Connected Star Star Bridge

  • Random

Random Random Random High-degree #Interconnection

  • 1

1 1 45,000 45,000 #Nodes 4,039 12,117 12,114 12,114 12,114 12,115 #Edges 88,234 339,497 340,091 339,777 559,798 560,168

  • Avg. degree

43.69 56.04 56.15 56.09 92.42 92.48 Diameter 8 19 31 15 6 6 Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3 Components 1 7 9 7 2 10

  • Avg. Clustering Coeff.

0.62 0.63 0.63 0.63 0.31 0.46

  • Avg. Shortest path

3.69 9.26 11.79 6.35 2.65 2.92 37

slide-38
SLIDE 38

Scaling Up: Results

FB G (original) G x3 G x3 G x3 G x3 G x3 Sample size

  • 0.5

0.5 0.5 0.5 0.5 Topology

  • Star

Chain Fully Connected Star Star Bridge

  • Random

Random Random Random High-degree #Interconnection

  • 1

1 1 45,000 45,000 #Nodes 4,039 12,117 12,114 12,114 12,114 12,115 #Edges 88,234 339,497 340,091 339,777 559,798 560,168

  • Avg. degree

43.69 56.04 56.15 56.09 92.42 92.48 Diameter 8 19 31 15 6 6 Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3 Components 1 7 9 7 2 10

  • Avg. Clustering Coeff.

0.62 0.63 0.63 0.63 0.31 0.46

  • Avg. Shortest path

3.69 9.26 11.79 6.35 2.65 2.92 38

slide-39
SLIDE 39

Scaling Up: Results

FB G (original) G x3 G x3 G x3 G x3 G x3 Sample size

  • 0.5

0.5 0.5 0.5 0.5 Topology

  • Star

Chain Fully Connected Star Star Bridge

  • Random

Random Random Random High-degree #Interconnection

  • 1

1 1 45,000 45,000 #Nodes 4,039 12,117 12,114 12,114 12,114 12,115 #Edges 88,234 339,497 340,091 339,777 559,798 560,168

  • Avg. degree

43.69 56.04 56.15 56.09 92.42 92.48 Diameter 8 19 31 15 6 6 Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3 Components 1 7 9 7 2 10

  • Avg. Clustering Coeff.

0.62 0.63 0.63 0.63 0.31 0.46

  • Avg. Shortest path

3.69 9.26 11.79 6.35 2.65 2.92 39

slide-40
SLIDE 40

Scaling Up: Results

FB G (original) G x3 G x3 G x3 G x3 G x3 Sample size

  • 0.5

0.5 0.5 0.5 0.5 Topology

  • Star

Chain Fully Connected Star Star Bridge

  • Random

Random Random Random High-degree #Interconnection

  • 1

1 1 45,000 45,000 #Nodes 4,039 12,117 12,114 12,114 12,114 12,115 #Edges 88,234 339,497 340,091 339,777 559,798 560,168

  • Avg. degree

43.69 56.04 56.15 56.09 92.42 92.48 Diameter 8 19 31 15 6 6 Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3 Components 1 7 9 7 2 10

  • Avg. Clustering Coeff.

0.62 0.63 0.63 0.63 0.31 0.46

  • Avg. Shortest path

3.69 9.26 11.79 6.35 2.65 2.92 40

slide-41
SLIDE 41

Scaling Up: Results

FB G (original) G x3 G x3 G x3 G x3 G x3 Sample size

  • 0.5

0.5 0.5 0.5 0.5 Topology

  • Star

Chain Fully Connected Star Star Bridge

  • Random

Random Random Random High-degree #Interconnection

  • 1

1 1 45,000 45,000 #Nodes 4,039 12,117 12,114 12,114 12,114 12,115 #Edges 88,234 339,497 340,091 339,777 559,798 560,168

  • Avg. degree

43.69 56.04 56.15 56.09 92.42 92.48 Diameter 8 19 31 15 6 6 Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3 Components 1 7 9 7 2 10

  • Avg. Clustering Coeff.

0.62 0.63 0.63 0.63 0.31 0.46

  • Avg. Shortest path

3.69 9.26 11.79 6.35 2.65 2.92 41

slide-42
SLIDE 42

Scaling Up: Results

FB G (original) G x3 G x3 G x3 G x3 G x3 Sample size

  • 0.5

0.5 0.5 0.5 0.5 Topology

  • Star

Chain Fully Connected Star Star Bridge

  • Random

Random Random Random High-degree #Interconnection

  • 1

1 1 45,000 45,000 #Nodes 4,039 12,117 12,114 12,114 12,114 12,115 #Edges 88,234 339,497 340,091 339,777 559,798 560,168

  • Avg. degree

43.69 56.04 56.15 56.09 92.42 92.48 Diameter 8 19 31 15 6 6 Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3 Components 1 7 9 7 2 10

  • Avg. Clustering Coeff.

0.62 0.63 0.63 0.63 0.31 0.46

  • Avg. Shortest path

3.69 9.26 11.79 6.35 2.65 2.92 42

slide-43
SLIDE 43

Scaling Up: Results

43

slide-44
SLIDE 44
  • Generate families of scaled-up graphs
  • Used for performance analysis
  • How do properties of the scaled-up graphs impact performance?
  • Processing time of Breadth-First Search and PageRank
  • Using GraphMat and the Graphalytics benchmark suite
  • Two datasets
  • Available in graph archives: "com-livejournal" and "12month1"

Case studies

44

slide-45
SLIDE 45
  • Fixed parameters
  • 0.5 sample size
  • Random bridge interconnections
  • Combining parameters
  • Scaling factor: 2, 4, 8*
  • Topologies: star, chain, ring, fully-connected
  • Number of interconnections: 1, 20,000
  • Different sampling algorithms and graph copies
  • Sampling algorithms
  • Total-Induced Edge Sampling (TIES)
  • Forest-Fire

*only for graph copies

Case studies: Scale-up Configuration

45

slide-46
SLIDE 46

Case studies: Results BFS

46

slide-47
SLIDE 47

Case studies: Results BFS

47

Forest Fire

slide-48
SLIDE 48

Case studies: Results BFS

48

Forest Fire

slide-49
SLIDE 49

Case studies: Results BFS

49

TIES

slide-50
SLIDE 50

Case studies: Results BFS

50

Graph Copy

slide-51
SLIDE 51

Case studies: Results PR

51

Forest Fire

slide-52
SLIDE 52

Case studies: Results PR

52

TIES

slide-53
SLIDE 53

Case studies: Results PR

53

Graph Copy

slide-54
SLIDE 54

Tool Implementation

  • Performance
  • Single node/thread
  • Parallel & Distributed Implementation
  • Auto-tuner
  • Tool publicly available

github.com/amusaafir/graph-scaling

54

slide-55
SLIDE 55

Conclusion

55

  • Sampling-based method for scaling graph datasets
  • Obtain families of similar graphs
  • Certain properties controlled by user requirements
  • Validated our tool on a set of different graph datasets
  • Diverse graph families used for understanding graph processing behavior