GraphP phP : Reducing Communication for PIM-based Graph Processing - - PowerPoint PPT Presentation

graphp
SMART_READER_LITE
LIVE PREVIEW

GraphP phP : Reducing Communication for PIM-based Graph Processing - - PowerPoint PPT Presentation

GraphP phP : Reducing Communication for PIM-based Graph Processing with Efficient Data Partition Mingxing Zhang, Youwei Zhuo (equal contribution ), Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, Xuehai Qian Tsinghua


slide-1
SLIDE 1

GraphP phP: Reducing

Communication for PIM-based Graph Processing with Efficient Data Partition

Mingxing Zhang, Youwei Zhuo (equal contribution),

Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, Xuehai Qian Tsinghua University University of Southern California Stanford University

slide-2
SLIDE 2

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Outline

  • Motivation
  • Graph applications
  • Processing-In-Memory
  • The drawbacks of the current solution
  • GraphP
  • Evaluation
slide-3
SLIDE 3

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Graph Applications

  • Social network analytics
  • Recommendation system
  • Bioinformatics
slide-4
SLIDE 4

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Challenges

  • High bandwidth requirement
  • Small amount of computation per vertex
  • Data movement overhead
slide-5
SLIDE 5

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Challenges

  • High bandwidth requirement
  • Small amount of computation per vertex
  • Data movement overhead

mem comp L1 L3 L2

slide-6
SLIDE 6

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

PIM: Processing-In-Memory

slide-7
SLIDE 7

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

PIM: Processing-In-Memory

  • Idea: Computation logic inside memory
  • Advantage: High memory bandwidth
slide-8
SLIDE 8

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

PIM: Processing-In-Memory

  • Idea: Computation logic inside memory
  • Advantage: High memory bandwidth
  • Example: Hybrid Memory Cubes (HMC)
slide-9
SLIDE 9

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

PIM: Processing-In-Memory

  • Idea: Computation logic inside memory
  • Advantage: High memory bandwidth
  • Example: Hybrid Memory Cubes (HMC)

comp 320GB/s intra-cube 4x120GB/s inter-cube mem mem mem mem

…..

slide-10
SLIDE 10

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

HMC: Hybrid Memory Cubes

320

Intra-cube bandwidth (GB/s)

slide-11
SLIDE 11

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

HMC: Hybrid Memory Cubes

320 120

Intra-cube Inter-cube bandwidth (GB/s)

slide-12
SLIDE 12

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

HMC: Hybrid Memory Cubes

320 120

Intra-cube Inter-cube bandwidth (GB/s)

slide-13
SLIDE 13

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

HMC: Hybrid Memory Cubes

320 120

Intra-cube Inter-cube bandwidth (GB/s)

slide-14
SLIDE 14

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

HMC: Hybrid Memory Cubes

320 120 120

Intra-cube Inter-cube Inter-group bandwidth (GB/s)

slide-15
SLIDE 15

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

HMC: Hybrid Memory Cubes

320 120 120

Intra-cube Inter-cube Inter-group bandwidth (GB/s)

Bottleneck: Inter-cube communication

slide-16
SLIDE 16

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Outline

  • Motivation
  • Graph applications
  • Processing-In-Memory
  • The drawbacks of the current solution
  • GraphP
  • Evaluation
slide-17
SLIDE 17

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-in- memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

Current Solution: Tesseract

  • First PIM-based graph processing

architecture

slide-18
SLIDE 18

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-in- memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

Current Solution: Tesseract

  • First PIM-based graph processing

architecture

  • Programming model
  • Vertex program
slide-19
SLIDE 19

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-in- memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

Current Solution: Tesseract

  • First PIM-based graph processing

architecture

  • Programming model
  • Vertex program
  • Partition
  • Based on vertex program
slide-20
SLIDE 20

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

PageRank in Vertex Program

for (v: vertices) { } update = 0.85 * v.rank / v.out_degree;

slide-21
SLIDE 21

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

PageRank in Vertex Program

for (v: vertices) { for (w: edges.destination) { } } update = 0.85 * v.rank / v.out_degree; put(w.id, function{ w.next_rank += update; });

slide-22
SLIDE 22

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

PageRank in Vertex Program

for (v: vertices) { for (w: edges.destination) { } } update = 0.85 * v.rank / v.out_degree; put(w.id, function{ w.next_rank += update; }); barrier();

slide-23
SLIDE 23

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Graph Partition

3 4 5 2 1 2 1 3 5 4 hmc0 hmc1 1 vertex

slide-24
SLIDE 24

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Graph Partition

3 4 5 2 1 2 1 3 5 4 hmc0 hmc1 1 vertex intra edge inter edge

slide-25
SLIDE 25

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Graph Partition

3 4 5 2 1 2 1 3 5 4 hmc0 hmc1 put(w.id, function{ w.next_rank += update; }); 1 vertex intra edge inter edge comm

slide-26
SLIDE 26

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Graph Partition

3 4 5 2 1 2 1 3 5 4 hmc0 hmc1 1 vertex intra edge inter edge comm communication = # of cross-cube edges

slide-27
SLIDE 27

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Drawback of Tesseract

  • Excessive data communication
  • Why?

Programming Model Graph Partition Data

Communication

Tesseract

slide-28
SLIDE 28

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Drawback of Tesseract

  • Excessive data communication
  • Why?

Programming Model Graph Partition Data

Communication

Tesseract ?

slide-29
SLIDE 29

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Drawback of Tesseract

  • Excessive data communication
  • Why?

Programming Model Graph Partition Data

Communication

Tesseract ?

slide-30
SLIDE 30

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Drawback of Tesseract

  • Excessive data communication
  • Why?

Programming Model Graph Partition Data

Communication

Tesseract ?

slide-31
SLIDE 31

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Outline

  • Motivation
  • GraphP
  • Evaluation
slide-32
SLIDE 32

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

GraphP

  • Consider graph partition first.
  • Graph Partition
  • Source-Cut
  • Programming model
  • Two-phase vertex program
  • Reduces inter-cube communication
slide-33
SLIDE 33

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Source-Cut Partition

3 4 5 2 1 2 1 3 5 4 1 vertex hmc0 hmc1

slide-34
SLIDE 34

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Source-Cut Partition

3 4 5 2 1 2 1 3 5 4 1 vertex intra edge inter edge hmc0 hmc1

slide-35
SLIDE 35

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Source-Cut Partition

3 4 5 2 1 2 1 3 5 4 1 vertex intra edge inter edge 2 2 replica hmc0 hmc1

slide-36
SLIDE 36

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Source-Cut Partition

3 4 5 2 1 2 1 3 5 4 1 vertex intra edge inter edge 2 2 replica hmc0 hmc1

slide-37
SLIDE 37

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Source-Cut Partition

3 4 5 2 1 2 1 3 5 4 1 vertex intra edge inter edge 2 2 replica hmc0 hmc1

slide-38
SLIDE 38

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Two-Phase Vertex Program

for (r: replicas) { } r.next_rank = 0.85 * r.next_rank / r.out_degree; 2 3 4 5 //apply updates from previous iterations

slide-39
SLIDE 39

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Two-Phase Vertex Program

for (r: replicas) { } r.next_rank = 0.85 * r.next_rank / r.out_degree; 2 3 4 5 //apply updates from previous iterations

slide-40
SLIDE 40

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Two-Phase Vertex Program

2 3 4 5

slide-41
SLIDE 41

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Two-Phase Vertex Program

for (v: vertices) { for (u: edges.sources) { } 2 3 4 5

slide-42
SLIDE 42

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Two-Phase Vertex Program

for (v: vertices) { for (u: edges.sources) { } update += u.rank; 2 3 4 5

slide-43
SLIDE 43

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Two-Phase Vertex Program

for (v: vertices) { for (u: edges.sources) { } update += u.rank; 2 3 4 5

slide-44
SLIDE 44

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

for (r: replicas) { } } barrier(); put(r.id, function { r.next_rank = update}); 2 3 4 5 4

Two-Phase Vertex Program

3

slide-45
SLIDE 45

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Benefits

  • Strictly less data communication
  • Enables architecture optimizations
slide-46
SLIDE 46

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Less Communication

2 5 4

Tesseract GraphP

slide-47
SLIDE 47

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Less Communication

2 5 4 2 5 4 2

Tesseract GraphP

slide-48
SLIDE 48

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Broadcast Optimization

for (r: replicas) { } } barrier(); put(r.id, function { r.next_rank = update});

broadcast

4 4 4 4

slide-49
SLIDE 49

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Naïve Broadcast

  • 15 point to point messages

src dst dst dst dst

slide-50
SLIDE 50

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Hierarchical communication

  • 3 intergroup messages

src dst dst dst dst

slide-51
SLIDE 51

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Other Optimizations

  • Computation/communication overlap
  • Leveraging low-power state of SerDes

Please see the paper for more details

slide-52
SLIDE 52

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Outline

  • Motivation
  • GraphP
  • Evaluation
slide-53
SLIDE 53

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Evaluation Methodology

  • Simulation Infrastructure
  • zSim with HMC support
  • ORION for NOC Energy modeling
  • Configurations
  • Same as Tesseract
  • 16 HMCs
  • Interconnection: Dragonfly and Mesh2D
  • 512 CPUs
  • Single-issue in-order cores
  • Frequency: 1GHz
slide-54
SLIDE 54

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Workloads

  • 4 graph algorithms
  • 5 real-world graphs
slide-55
SLIDE 55

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Workloads

  • 4 graph algorithms
  • Breadth First Search
  • Single Source Shortest Path
  • Weakly Connected Component
  • PageRank
  • 5 real-world graphs
  • Wiki-Vote (WV)
  • ego-Twitter (TT)
  • Soc-Slashdot0902 (SD)
  • Amazon0302 (AZ)
  • ljournal-2008 (LJ)
slide-56
SLIDE 56

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Performance

memory bandwidth

Tesseract

slide-57
SLIDE 57

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

5 10 15 20 DDR3 SOTA GraphP-SC GraphP-SC

  • BRD

Speedup

Performance

data partition

memory bandwidth

1.7x Tesseract

slide-58
SLIDE 58

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

5 10 15 20 DDR3 SOTA GraphP-SC GraphP-SC

  • BRD

Speedup

Performance

data partition

memory bandwidth

1.7x <1.1x Tesseract

slide-59
SLIDE 59

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Communication Amount

48.2% 7.0% 1.7% 51.8% 7.1% 0.4%

0% 25% 50% 75% 100% Tesseract GraphP-SC GraphP-SC

  • BRD

Normalized to Tesseract Intra-group Inter-group

slide-60
SLIDE 60

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Energy consumption

100.0% 24.9% 15.9% 0% 25% 50% 75% 100% Tesseract GraphP-SC GraphP-SC

  • BRD

Normalized to Tesseract

slide-61
SLIDE 61

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Other results

  • Bandwidth utilization
  • Scalability
  • Replication overhead

Please see the paper for more details

slide-62
SLIDE 62

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Conclusions

  • We propose GraphP
  • A new PIM-based graph processing framework
  • Key contributions
slide-63
SLIDE 63

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Conclusions

  • We propose GraphP
  • A new PIM-based graph processing framework
  • Key contributions
  • Data partition as first-order design

consideration

slide-64
SLIDE 64

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Conclusions

  • We propose GraphP
  • A new PIM-based graph processing framework
  • Key contributions
  • Data partition as first-order design

consideration

  • Source-cut partition
slide-65
SLIDE 65

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Conclusions

  • We propose GraphP
  • A new PIM-based graph processing framework
  • Key contributions
  • Data partition as first-order design

consideration

  • Source-cut partition
  • Two-phase vertex program
slide-66
SLIDE 66

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Conclusions

  • We propose GraphP
  • A new PIM-based graph processing framework
  • Key contributions
  • Data partition as first-order design

consideration

  • Source-cut partition
  • Two-phase vertex program
  • Enable additional architecture optimizations
slide-67
SLIDE 67

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Conclusions

  • We propose GraphP
  • A new PIM-based graph processing framework
  • Key contributions
  • Data partition as first-order design

consideration

  • Source-cut partition
  • Two-phase vertex program
  • Enable additional architecture optimizations
  • GraphP drastically reduces inter-cube

communication and improves energy efficiency.

slide-68
SLIDE 68

GraphP phP: Reducing

Communication for PIM-based Graph Processing with Efficient Data Partition

Mingxing Zhang, Youwei Zhuo (equal contribution),

Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, Xuehai Qian Tsinghua University University of Southern California Stanford University

slide-69
SLIDE 69

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Workload Size & Capacity

  • 128 GB (16 * 8GB)
  • ~16 billion edges
  • ~400 million edges (SNAP)
  • ~7 billion edges (WebGraph)

https://snap.stanford.edu/data/ http://law.di.unimi.it/datasets.php

slide-70
SLIDE 70

ALCHEM

alchem.usc.edu GraphP: A PIM-based Graph Processing Framework

Two-phase vertex program

  • Equivalent Expressiveness as vertex

programs