BIG DATA 2 This is the Big Data era Big Data are linked System G - - PowerPoint PPT Presentation

big data
SMART_READER_LITE
LIVE PREVIEW

BIG DATA 2 This is the Big Data era Big Data are linked System G - - PowerPoint PPT Presentation

GraphBIG : Understanding Graph Computing in the Context Of Industrial Solutions Lifeng Nai , Hyesoon Kim (Georgia Tech) Yinglong Xia, IlieTanase, Ching-Yung Lin (IBM Research) BIG DATA 2 This is the Big Data era Big Data are linked


slide-1
SLIDE 1

GraphBIG: Understanding Graph Computing

in the Context Of Industrial Solutions

Lifeng Nai, Hyesoon Kim (Georgia Tech) Yinglong Xia, IlieTanase, Ching-Yung Lin (IBM Research)

slide-2
SLIDE 2

System G

BIG DATA

⎮This is the Big Data era ⎮Big Data are linked

2

slide-3
SLIDE 3

System G

WHAT IS GRAPH COMPUTING

⎮Graph traversal?

This is NOT the FULL picture

3

slide-4
SLIDE 4

System G

GRAPH COMPUTING

⎮The GRAPH can be

} Big or Small

4

slide-5
SLIDE 5

System G

GRAPH COMPUTING

⎮The GRAPH can be

} Static or Dynamic

5

slide-6
SLIDE 6

System G

GRAPH COMPUTING

⎮The GRAPH can be

} Property or Bayesian

6

slide-7
SLIDE 7

System G

Biased Understanding of Graph Computing GRAPH COMPUTING

⎮Graph computing contains a BIG scope

} 𝑈𝑠𝑏𝑤𝑓𝑠𝑡𝑏𝑚 ≠ 𝐻𝑠𝑏𝑞ℎ 𝐷𝑝𝑛𝑞𝑣𝑢𝑗𝑜𝑕

Understand Full-spectrumGraph Computing

7

slide-8
SLIDE 8

System G

GRAPHBIG

8

⎮Understand full-spectrum graph computing

} Diverse workloads + Framework

⎮Propose an open-source benchmark suite: GraphBIG

} Workloads from real-world use cases } Cover major graph computing types and data types } Both CPU and GPU implementations

⎮An open-source graph framework: OpenG

} Designed from scratch } Similar design methodology as IBM System G commercial toolkits

slide-9
SLIDE 9

System G

OUTLINE

⎮Motivation ⎮GraphBIG: Key factors ⎮Characterizations ⎮Conclusion

9

slide-10
SLIDE 10

System G

GRAPHBIG

OpenG Framework

Representative Graph Workloads Graph Datasets

Vertex-centric Data Representation

10

slide-11
SLIDE 11

System G

GRAPHBIG

OpenG Framework

Representative Graph Workloads Graph Datasets

Vertex-centric Data Representation

11

slide-12
SLIDE 12

System G

GRAPHBIG: FRAMEWORK

⎮Graph applications ß Framework primitives ⎮OpenG: IBM System G-like Framework

0% 20% 40% 60% 80% 100%

BFS kCore CComp SPath DCentr TC Gibbs GUp % of Execution Time in Framework Average 76%

12

slide-13
SLIDE 13

System G

GRAPHBIG

OpenG Framework

Representative Graph Workloads Graph Datasets

Vertex-centric Data Representation

13

slide-14
SLIDE 14

System G

GRAPHBIG: DATA REPRESENTATION

1 2 3 4 5 1 2 6 8 10 1 2 3 4 5 2 1 3 4 5 2 5 2 5 2 3 4 Vertices Edges Edge Properties Vertex Properties (a) Graph G (b) CSR Representation of G (c) Vertex-centric Representation of G Vertex Property Edge Edge Property 2 Vertex 1 1 3 4 5 Vertex 2 2 5 Vertex 3 2 5 Vertex 4 2 3 4 Vertex 5

Vertex Adjacency List

14

slide-15
SLIDE 15

System G

GRAPHBIG

OpenG Framework

Representative Graph Workloads Graph Datasets

Vertex-centric Data Representation

15

slide-16
SLIDE 16

System G

GRAPHBIG: WORKLOAD SELECTION

16

⎮Coverage

} Workloads cover all computation types

⎮Representativeness

} Workloads are selected from real-world use cases

slide-17
SLIDE 17

System G

GRAPHBIG: COMPUTATION TYPES

⎮Computation on graph structure (CompStruct)

} Example: Breadth-first search } Irregular access pattern, heavy read access

⎮Computation on graph property (CompProp)

} Example: Belief propagation } Heavy numeric operations on graph property

⎮Computation on dynamic graph (CompDyn)

} Example: Streaming Graph } Dynamic graph structure, dynamic memory usage

17

slide-18
SLIDE 18

System G

GRAPHBIG: WORKLOAD SELECTION

18

⎮Selected from 21 real-world use cases of IBM System G

slide-19
SLIDE 19

System G

GRAPHBIG: WORKLOADS

Category Workload ComputationType CPU GPU Graph traversal BFS CompStruct ✔ ✔ DFS CompStruct ✔ Graph update Graph construction (GCons) CompDyn ✔ Graph update (GUp) CompDyn ✔ Topology morphing (TMorph) CompDyn ✔ Graph analytics Shortest path (SPath) CompStruct ✔ ✔ kCore CompStruct ✔ ✔ Connected component (CComp) CompStruct ✔ ✔ Graph coloring (GColor) CompStruct ✔ Triangle counting (TC) CompProp ✔ ✔ Gibbs Inference (GI) CompProp ✔ Social analytics Betweenness Centrality (BCentr) CompStruct ✔ ✔ Degree Centrality (DCentr) CompStruct ✔ ✔ 19

slide-20
SLIDE 20

System G

GRAPHBIG

OpenG Framework

Representative Graph Workloads Graph Datasets

Vertex-centric Data Representation

20

slide-21
SLIDE 21

System G

GRAPHBIG: DATA TYPES

Type 1 Type 3 Type 2 Type 4 21

slide-22
SLIDE 22

System G

GRAPHBIG: DATASETS

Data set Type Vertex # Edge # Twitter Graph Type 1 120M 1.9B IBM Knowledge Repo Type 2 154K 1.72M IBM Watson Gene Graph Type 3 2M 12.2M CA Road Network Type 4 1.9M 2.8M LDBC Graph Synthetic Any Any 22

slide-23
SLIDE 23

System G

CHARACTERIZATION

⎮Methodology

} Real machine + hardware performance counters } CPU: tool integrated within benchmarks } GPU: CUDA nvprof

23

slide-24
SLIDE 24

System G

CHARACTERIZATION

Processor Type Xeon E5-2670 Frequency 2.6 GHz Core # 2 sockets x 8 cores x 2 threads Cache 32KB L1, 256KB L2, 20MB L3 MemoryBW 51.2 GB/s (DDR3) GPU Type Nvidia Tesla K40 CUDA Core 2880 Memory 12 GB Memory BW 288 GB/s Frequency Core-745MHz, mem-3 GHz System Memory 192 GB Disk 2 TB HDD OS RHEL 6 24

slide-25
SLIDE 25

System G

CHARACTERIZATION

25

⎮Showcase (Data: LDBC-graph 1M vertices)

} CPU execution time breakdown } CPU core analysis } CPU cache performance } GPU divergence } GPU speedup

⎮More experiment results can be found in the paper

} More analysis (memory bandwidth, IPC, etc.) } Input data sensitivity (all data sets are evaluated)

slide-26
SLIDE 26

System G

CPU: EXECUTION TIME BREAKDOWN

⎮Four categories:

} Frontend, Backend, Bad Speculation, and Retiring

26

slide-27
SLIDE 27

System G

CPU: EXECUTION TIME BREAKDOWN

⎮Backend is the bottleneck: memory sub-system issue

} CompProp is different: TC-triangle counting Gibss-gibbs inference

CompStruct CompDyn CompProp

0% 20% 40% 60% 80% 100%

Breakdown of Execution Cycles

Backend Retiring BadSpeculation Frontend

27

slide-28
SLIDE 28

System G

CPU: CORE ANALYSIS

⎮Significantly high DTLB penalty ⎮ICache and Branch prediction: not a major bottleneck

0% 10% 20%

DTLB Miss Cycle %

0.4 0.8

ICache MPKI

0% 4% 8% 12%

Branch Miss Prediction % CompStruct CompDyn CompProp

28

slide-29
SLIDE 29

System G

CPU: CACHE PERFORMANCE

⎮High cache MPKI because of irregular access pattern

29

slide-30
SLIDE 30

System G

GPU DIVERGENCE

⎮Branch divergence

} Branch divergence rate = inactive threads per warp/warp size

⎮Memory divergence

} Memory divergence rate = replayed instructions/issued instructions

30

slide-31
SLIDE 31

System G

GPU DIVERGENCE

⎮High branch & memory divergence ⎮Diverse behaviors across workloads

31

slide-32
SLIDE 32

System G

GPU SPEEDUP

⎮Significant speedup over 16-core CPU

32

slide-33
SLIDE 33

System G

GRAPHBIG: TAKE AWAY

33

⎮Graph computing has a wide scope, not just BFS ⎮Multiple factors influence graph computing significantly, not

  • nly workload algorithms.

} Framework, data representation, datasets

⎮Characterization

} CPU: irregular access pattern -> poor cache performance } CPU: properly design code hierarchy can avoid ICacheissue } GPU: memory and branch divergence issue } Diversity across workloads: both CPU and GPU sides

slide-34
SLIDE 34

System G

CONCLUSION

⎮Graph Computing has a wide scope. To understand it, we have to consider multiple key factors in a holistic way ⎮We proposed GraphBIG, a suite of CPU/GPU graph benchmarks based on real-world industrial practices, and characterized it on real machines comprehensively ⎮GraphBIG is open sourced (BSD license)

} Check: https://github.com/graphbig } Workloads, datasets, and documents

34

slide-35
SLIDE 35

System G

THANK YOU!

35

HPArch Lab

http://comparch.gatech.edu/hparch/ http://systemg.research.ibm.com/

GraphBIG

http://github.com/graphbig

slide-36
SLIDE 36

BACKUP SLIDES

slide-37
SLIDE 37

System G

WORKLOAD SELECTION

SystemG(Use(Cases( Computa/on(Types( Workloads( Graph(Data(Types( Datasets( Representa/ve( Workloads( GraphBIG( Representa/ve( Datasets( Reselec/on( Summarize( Select( 37

slide-38
SLIDE 38

System G

GRAPHBIG FEATURES

⎮Design

} Framework: property graph frame based on industrial practices } Representativeness: workloads selected from real-world use cases } Coverage: cover major computation types, much more than just traversal } CPU + GPU workloads

⎮Code

} C++ code base: requiring only c++0x } Standalone package: no external package dependencies } Integrated profiling tool: profiling via hardware performance counters

38

slide-39
SLIDE 39

System G

GRAPHBIG HANDS-ON

⎮Fetch Code

} Code: https://github.com/graphbig/graphBIG } Doc: https://github.com/graphbig/GraphBIG-Doc

39

slide-40
SLIDE 40

System G

GRAPHBIG HANDS-ON

⎮Compile

} Require: gcc/g++ (>4.3), gnu make } Just “make all”

40

slide-41
SLIDE 41

System G

GRAPHBIG HANDS-ON

⎮Test Run

} Just “make run” } Using default “small” dataset

41

slide-42
SLIDE 42

System G

GRAPHBIG HANDS-ON

⎮More Datasets

} Download: https://github.com/graphbig/graphBIG/wiki/GraphBIG- Dataset } Untar and specify the correct path in benchmark argument “--dataset” } Other 3rd party datasets (csv format) are also possible

42

slide-43
SLIDE 43

System G

SCALE UP VS. SCALE OUT

⎮Scale up before Scale out

43

slide-44
SLIDE 44

System G

COMPUTATION TYPE BEHAVIOR

⎮Diverse behaviors across different computation types

25 50 75 100 MPKI L1D L2 L3 0% 4% 8% 12% Branch Miss % 0.1 0.2 0.3 0.4 A B C IPC A – CompStruct B – CompProp C – CompDyn 0% 5% 10% 15% A B C DTLB Miss Cycle %

44

slide-45
SLIDE 45

System G

CACHE BEHAVIORS

45

slide-46
SLIDE 46

System G

GPU ARCH BEHAVIOR

⎮Cannot fully utilize available memory bandwidth ⎮Significantly low IPC

46

slide-47
SLIDE 47

System G

GPU DATA SENSITIVITY

⎮Sensitive to input data ⎮Memory divergence shows higher sensitivity

L: LDBC-1M C: CA-RoadNet T: Twitter W: Watson-Gene K: Knowledge-Repo

C K; BFS L T W; BFS C K L T W C K L T W C K L T W C K L T W C K; SPath L T W; SPath C K L T W

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Branch Divergence Memory Divergence

BFS CComp DCentr GColor KCore SPath TC

47