Mizan: A System for Dynamic Load Balancing in Large-scale Graph - - PowerPoint PPT Presentation

mizan a system for dynamic load balancing in large scale
SMART_READER_LITE
LIVE PREVIEW

Mizan: A System for Dynamic Load Balancing in Large-scale Graph - - PowerPoint PPT Presentation

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of Science and Technology Thuwal, Saudi Arabia 2 IBM


slide-1
SLIDE 1

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Zuhair Khayyat1 Karim Awara1 Amani Alonazi1 Hani Jamjoom2 Dan Williams2 Panos Kalnis1

1King Abdullah University of Science and Technology

Thuwal, Saudi Arabia

2IBM Watson Research Center

Yorktown Heights, NY

1/34

slide-2
SLIDE 2

Importance of Graphs

Graphs abstract application-specific algorithms into generic problems represented as interactions using vertices and edges Max flow in road network Diameter of WWW Ranking in social networks Simulating protein interactions Algorithms vary in their computational requirements

2/34

slide-3
SLIDE 3

How to Scale Graph Processing

Pregel was introduced by Google as a scalable abstraction for graph processing

Overcomes the limitations of processing graphs on MapReduce Based on vertex-centric computation Utilizes bulk synchronous parallel (BSP)

System Programming Data Computations Abstraction Exchange map() Key based MapReduce combine() grouping Disk based reduce() compute() Message Pregel combine() passing In memory aggregate()

3/34

slide-4
SLIDE 4

Pregel’s BSP Graph Processing

Superstep 1 Superstep 2 Superstep 3 Worker 3 Worker 2 Worker 1 Worker 3 Worker 2 Worker 1 Worker 3 Worker 2 Worker 1 BSP Barrier

Balanced computation and communication is fundamental to Pregel’s efficiency

4/34

slide-5
SLIDE 5

Optimizing Graph Processing

Existing work focus on optimizing for graph structure (static

  • ptimization):

Optimize graph partitioning:

Simple graph partitioning schemes (e.g., hash or range): Giraph User-defined partitioning function: Pregel Sophisticated partitioning techniques (e.g., min-cuts): GraphLab, PowerGraph, GoldenOrb and Surfer

Optimize graph data access:

Distributed data stores and graph indexing: GoldenOrb, Hama and Trinity

5/34

slide-6
SLIDE 6

Optimizing Graph Processing

Existing work focus on optimizing for graph structure (static

  • ptimization):

Optimize graph partitioning:

Simple graph partitioning schemes (e.g., hash or range): Giraph User-defined partitioning function: Pregel Sophisticated partitioning techniques (e.g., min-cuts): GraphLab, PowerGraph, GoldenOrb and Surfer

Optimize graph data access:

Distributed data stores and graph indexing: GoldenOrb, Hama and Trinity

What about algorithm behavior?

Pregel provides coarse-grained load balancing, is it enough?

5/34

slide-7
SLIDE 7

Types of Algorithms

Algorithms behaves differently, we classify algorithms in two categories depending on their behavior Algorithm Type In/Out Vertices Examples Messages State PageRank, Stationary Predictable Fixed Diameter estimation, WCC Distributed minimum spanning tree, Non-stationary Variable Variable Belief propagation, Graph queries, Ad propagation

6/34

slide-8
SLIDE 8

Types of Algorithms – First Superstep

PageRank

V1 V3 V6 V8

DMST

V5 V2 V4 V7 V1 V3 V6 V8 V5 V2 V4 V7

7/34

slide-9
SLIDE 9

Types of Algorithms – Superstep k

PageRank

V1 V3 V6 V8

DMST

V5 V2 V4 V7 V1 V3 V6 V8 V5 V2 V4 V7

8/34

slide-10
SLIDE 10

Types of Algorithms – Superstep k + m

PageRank

V1 V3 V6 V8

DMST

V5 V2 V4 V7 V1 V3 V6 V8 V5 V2 V4 V7

9/34

slide-11
SLIDE 11

What Causes Computation Imbalance in Non-stationary Algorithms?

Superstep 1 Superstep 2 Superstep 3

  • Vertex response time
  • Time to send out messages
  • Time to receive in messages

Worker 3 Worker 2 Worker 1 Worker 3 Worker 2 Worker 1 Worker 3 Worker 2 Worker 1 BSP Barrier

10/34

slide-12
SLIDE 12

Why to Optimize for Algorithm Behavior?

Difference between stationary and non-stationary algorithms

0.001 0.01 0.1 1 10 100 1000 10 20 30 40 50 60 In Messages (Millions) SuperSteps PageRank - T

  • tal

PageRank - Max/W DMST - T

  • tal

DMST - Max/W

11/34

slide-13
SLIDE 13

What is Mizan?

BSP-based graph processing framework Uses runtime fine-grained vertex migrations to balance computation and communication Follows the Pregel programming model Open source, written in C++

12/34

slide-14
SLIDE 14

PageRank’s compute() in Mizan

void compute(messageIterator * messages, userVertexObject * data, messageManager * comm) { double currVal = data->getVertexValue().getValue(); double newVal = 0; double c = 0.85; if (data->getCurrentSS() > 1) { while (messages->hasNext()) { newVal = newVal + messages->getNext().getValue(); } newVal = newVal * c + (1.0 - c) / ((double) vertexTotal); data->setVertexValue(mDouble(newVal)); } else {newVal = currVal;} if (data->getCurrentSS() <= maxSuperStep) { mDouble outVal(newVal / ((double) data->getOutEdgeCount())); for (int i = 0; i < data->getOutEdgeCount(); i++) { comm->sendMessage(data->getOutEdgeID(i), outVal); } } else {data->voteToHalt();} }

13/34

slide-15
SLIDE 15

Migration Planning Objectives

Decentralized Simple Transparent

No need to change Pregel’s API Does not assume any a priori knowledge to graph structure or algorithm

14/34

slide-16
SLIDE 16

Mizan’s Migration Barrier

Mizan performs both planning and migrations after all workers reach the BSP barrier

Superstep 1 Superstep 2 Superstep 3 Worker 3 Worker 2 Worker 1 Worker 3 Worker 2 Worker 1 Worker 3 Worker 2 Worker 1 BSP Barrier Migration Barrier Migration Planner

15/34

slide-17
SLIDE 17

Mizan’s Migration Planning Steps

1 Identify the source of imbalance: By comparing the worker’s

execution time against a normal distribution and flagging outliers

V1

Worker 2 Worker 1

Remote Incoming Messages Remote Outgoing Messages Vertex Response Time V3 V2 V4

Mizan

V5 V6

Mizan

Local Incoming Messages

16/34

slide-18
SLIDE 18

Mizan’s Migration Planning Steps

1 Identify the source of imbalance: By comparing the worker’s

execution time against a normal distribution and flagging outliers Mizan monitors for each vertex:

Remote outgoing messages All incoming messages Response time

High level summaries are broadcast to each worker

V1

Worker 2 Worker 1

Remote Incoming Messages Remote Outgoing Messages Vertex Response Time V3 V2 V4

Mizan

V5 V6

Mizan

Local Incoming Messages

16/34

slide-19
SLIDE 19

Mizan’s Migration Planning Steps

1 Identify the source of imbalance 2 Select the migration objective:

Mizan finds the strongest cause of workload imbalance by comparing statistics for outgoing messages and incoming messages of all workers with the worker’s execution time

17/34

slide-20
SLIDE 20

Mizan’s Migration Planning Steps

1 Identify the source of imbalance 2 Select the migration objective:

Mizan finds the strongest cause of workload imbalance by comparing statistics for outgoing messages and incoming messages of all workers with the worker’s execution time The migration objective is either:

Optimize for outgoing messages, or Optimize for incoming messages, or Optimize for response time

17/34

slide-21
SLIDE 21

Mizan’s Migration Planning Steps

1 Identify the source of imbalance 2 Select the migration objective 3 Pair over-utilized workers with under-utilized ones:

All workers create and execute the migration plan in parallel without centralized coordination Each worker is paired with one other worker at most

W7 W2 W1 W5 W8 W4 W0 W6 W3 1 2 3 4 5 6 7 8 W9

18/34

slide-22
SLIDE 22

Mizan’s Migration Planning Steps

1 Identify the source of imbalance 2 Select the migration objective 3 Pair over-utilized workers with under-utilized ones 4 Select vertices to migrate: Depending on the migration

  • bjective

19/34

slide-23
SLIDE 23

Mizan’s Migration Planning Steps

1 Identify the source of imbalance 2 Select the migration objective 3 Pair over-utilized workers with under-utilized ones 4 Select vertices to migrate 5 Migrate vertices:

How to migrate vertices freely across workers while maintaining vertex ownership and fast updates? How to minimize migration costs for large vertices?

20/34

slide-24
SLIDE 24

Vertex Ownership

Mizan uses distributed hash table (DHT) to implement a distributed lookup service:

V can execute at any worker V ’s home worker ID = (hash(ID) mod N) Workers ask the home worker of V for its current location The home worker is notified with the new location as V migrates

V6

Worker 2 Worker 1

V4 V1

V1:W1 V4:W2 V7:W3

V3 V2

V2:W1 V5:W2 V8:W3 V3:W1 V6:W2 V9:W3

V9 V8

Worker 3

V5 V7 V5 V8 V9 V3 V2 V1

DHT Compute Where is V5 & V8

21/34

slide-25
SLIDE 25

Migrating Vertices with Large Message Size

Introduce delayed migration for very large vertices:

At SS t: only ownership of vertex v is moved to workernew

1 2 3 4 5 6 7 1 2 3 4 5 6 7 '7 1 2 3 4 5 6 7

Superstep t Superstep t+1 Superstep t+2 Worker_new Worker_old Worker_old Worker_new Worker_new Worker_old

22/34

slide-26
SLIDE 26

Migrating Vertices with Large Message Size

Introduce delayed migration for very large vertices:

At SS t: only ownership of vertex v is moved to workernew At SS t + 1:

workernew receives vertex 7 messages workerold process vertex 7

1 2 3 4 5 6 7 1 2 3 4 5 6 7 '7 1 2 3 4 5 6 7

Superstep t Superstep t+1 Worker_new Worker_old Worker_old Worker_new Worker_new Worker_old

22/34

slide-27
SLIDE 27

Migrating Vertices with Large Message Size

Introduce delayed migration for very large vertices:

At SS t: only ownership of vertex v is moved to workernew At SS t + 1:

workernew receives vertex 7 messages workerold process vertex 7

After SS t + 1: workerold moves state of vertex 7 to workernew

1 2 3 4 5 6 7 1 2 3 4 5 6 7 '7 1 2 3 4 5 6 7

Superstep t Superstep t+1 Superstep t+2 Worker_new Worker_old Worker_old Worker_new Worker_new Worker_old

22/34

slide-28
SLIDE 28

Experimental Setup

Mizan is implemented on C++ with MPI with three variations:

Static Mizan: Emulates Giraph, disables dynamic migration Work stealing (WS): Emulates Pregel’s coarse-grained dynamic load balancing Mizan: Activates dynamic migration

Local cluster of 21 machines:

Mix of i5 and i7 processors with 16GB RAM Each

IBM Blue Gene/P supercomputer with 1024 compute nodes:

Each is a 4 core PowerPC450 CPU at 850MHz with 4GB RAM

23/34

slide-29
SLIDE 29

Datasets

Experiments on public datasets:

Stanford Network Analysis Project (SNAP) The Laboratory for Web Algorithmics (LAW) Kronecker generator

name |nodes| |edges| kg1 (synthetic) 1,048,576 5,360,368 kg4m68m (synthetic) 4,194,304 68,671,566 web-Google 875,713 5,105,039 LiveJournal1 4,847,571 68,993,773 hollywood-2011 2,180,759 228,985,632 arabic-2005 22,744,080 639,999,458

24/34

slide-30
SLIDE 30

Giraph vs. Static Mizan

5 10 15 20 25 30 35 40 w e b

  • G
  • g

l e k g 1 L i v e J

  • u

r n a l 1 k g 4 m 6 8 m Run Time (Min) Giraph Static Mizan

Figure: PageRank on social network and random graphs

2 4 6 8 10 12 1 2 4 8 16 Run Time (Min) Vertex count (Millions) Giraph Static Mizan

Figure: PageRank on regular random graphs, each has around 17M edge

25/34

slide-31
SLIDE 31

Effectiveness of Dynamic Vertex Migration

PageRank on a social graph (LiveJournal1) The shaded columns: algorithm runtime The unshaded columns: initial partitioning cost

5 10 15 20 25 30

Static WS Mizan Static WS Mizan Static WS Mizan

Run Time (Min) Metis Range Hash

26/34

slide-32
SLIDE 32

Effectiveness of Dynamic Vertex Migration

Comparing the performance on highly variable messaging pattern algorithms, which are DMST and ad propagation simulation, on a metis partitioned social graph (LiveJournal1)

50 100 150 200 250 300 Advertisment DMST Run Time (Min) Static Work Stealing Mizan

27/34

slide-33
SLIDE 33

Mizan’s Overhead with Scaling

1 2 3 4 5 6 7 8 2 4 6 8 10 12 14 16 Speedup Compute Nodes Mizan - hollywood-2011

Figure: Speedup on Linux Cluster

2 4 6 8 10 64 128 256 512 1024 Speedup Compute Nodes Mizan - Arabic-2005

Figure: Speedup on IBM Blue Gene/P supercomputer

28/34

slide-34
SLIDE 34

Future Work

Work with skewed graphs Improving Pregel’s fault tolerance

29/34

slide-35
SLIDE 35

Conclusion

Mizan is a Pregel system that uses fine-grained vertex migration to load balance computation and communication across supersteps Mizan is an open source project developed within InfoCloud group in KAUST in collaboration with IBM, programmed in C++ with MPI Mizan scales up to thousands of workers Mizan improves the overall computation cost between 40% up to two order of magnitudes with less than 10% migration overhead Download it at: http://cloud.kaust.edu.sa Try it on EC2 (ami-52ed743b)

30/34

slide-36
SLIDE 36

Extra: Migration Details and Costs

0.5 1 1.5 2 2.5 3 3.5 4 5 10 15 20 25 30 Run Time (Minutes) Supersteps Superstep Runtime Average Workers Runtime Migration Cost Balance Outgoing Messages Balance Incoming Messages Balance Response Time

31/34

slide-37
SLIDE 37

Extra: Migrated Vertices per Superstep

100 200 300 400 500 5 10 15 20 25 30 1 2 3 4 Migrated Vertices (x1000) Migration Cost (Minutes) Supersteps Migration Cost T

  • tal Migrated Vertices

Max Vertices Migrated by Single Worker

32/34

slide-38
SLIDE 38

Extra: Mizan’s Migration Planning

Compute() No Yes No Yes Superstep Imbalanced? Migration Barrier BSP Barrier Worker Overutilized? Pair with Underutilized Worker Select Vertices and Migrate Identify Source

  • f Imbalance

33/34

slide-39
SLIDE 39

Extra: Architecture of Mizan

Communicator - DHT Vertex Compute() BSP Processor Storage Manager Migration Planner

HDFS/Local Disks IO

Mizan Worker

34/34