Optimizing Shuffle in Wide-Area Data Analytics Shuhao Liu * , Hao - - PowerPoint PPT Presentation

▶

Aug 19, 2022 270 likes •483 views

Optimizing Shuffle in Wide-Area Data Analytics Shuhao Liu * , Hao Wang, Baochun Li Department of Electrical & Computer Engineering University of Toronto What is: - Wide-Area Data Analytics? - Shuffle? 2 Wide-Area Data Analytics

SLIDE 1

Optimizing Shuffle in Wide-Area Data Analytics

Shuhao Liu*, Hao Wang, Baochun Li Department of Electrical & Computer Engineering University of Toronto

SLIDE 2

What is:

Wide-Area Data Analytics?
Shuffle?

SLIDE 3

Wide-Area Data Analytics

N. California

Ireland Singapore Oregon

N. Virginia

data 1 data 2 data 3

Large volumes of data are generated, stored and processed across geographically distributed DCs.

SLIDE 4

Existing work focuses on Task Placement

SLIDE 5

Rethink the root cause of inter-DC traffic: Shuffle

SLIDE 6

Fetch-based Shuffle

Mapper 1 Mapper 2 Mapper 3 Reducer 1 Reducer 3 Reducer 2 All-to-all Communication at Beginning of Reduce Tasks

SLIDE 7

Problems with Fetch

Under-utilize the inter-datacenter bandwidth
Start late: beginning of reduce
Start concurrently: share bandwidth
Need for refetch
Possible reduce task failure

SLIDE 8

Push-based Shuffle

Bandwidth Utilization

time 0 4 8 12 16 time 0 4 8 12 16 Map Reduce Reduce Map Data Transfer Data Transfer Reduce Reduce Map Shuffle Read Shuffle Read

Shuffle Write Shuffle Write

Map Stage N Stage N+1 Stage N Stage N+1 worker A worker B worker A worker B

Shuffle Read

(a) (b)

SLIDE 9

Push-based Shuffle

Failure Recovery

time 0 4 8 12 16 20 24 Map Failed Reduce Reduce Refetch Reduce Shuffle Read Shuffle Read

Shuffle Write

time 0 4 8 12 16 20 24 Map Stage N Stage N+1 worker A worker B

(a)

Map Failed Reduce Reduce Reduce Map Stage N Stage N+1 worker A worker B

(b)

Data Transfer Data Transfer

Shuffle Write Shuffle Read Refetch

SLIDE 10

Where to Push?

Optional: existing task placement algorithms
Know reducer placement before hand
Require prior knowledge
e.g., predictable jobs, inter-DC available bandwidth
Our solution: Push/Aggregate

SLIDE 11

Send shuffle input to a subset of datacenters with a large

portion of shuffle input

Reduce inter-datacenter traffic in future shuffles
Likely to reduce inter-datacenter traffic at current

shuffle

Aggregating Shuffle Input

Datacenter A Receiver

f shuffle

input Reducer 1 Reducer 3 Datacenter B Reducer 2 Reducer 4 Reducer 5 Inter-DC Transfers

SLIDE 12

For any partition of shuffle input, the expected inter-datacenter traffic in next shuffle is proportional to the number of non-colocated reducers.

SLIDE 13

Aggregating Shuffle Input

Send shuffle input to a subset of datacenters with a

large portion of shuffle input

Reducer is likely to be placed close to shuffle input
More aggregated data -> less inter-datacenter traffic with

reasonable task placement

SLIDE 14

Implementation in Spark

Requirements:
Push before writing to disk
Destined to the aggregator datacenters
transferTo() as an RDD transformation
Allow implicit or explicit usage

SLIDE 15

Implementation in Spark

InputRDD A1 B1 A2 map A1 map A2 map B1 reduce A1, A2 B1 reduce A1, A2 B1 InputRDD .map(…) .reduce(…) … InputRDD .map(…) .transferTo([A]) .reduce(…) … InputRDD A1 B1 A2 map A1 map A2 map B1 reduce A1, A2, Ax transferTo A1 transferTo A2 transferTo A* reduce A1, A2, Ax (a) (b)

SLIDE 16

Implementation in Spark

5/12/2016 ScalaWordCount - Details for Job 0 http://54.173.130.234:4040/jobs/job/?id=0 1/2

1.6.1 (/)

application UI

(ill) (/stages/stage/ill/ iterminatetrue) UI

Stage map seuenceile map atap Stage map reucee

//
s
Jobs (/jobs/)

Stages (/stages/) Storage (/storage/) Environment (/environment/) Executors (/executors/)

5/12/2016 ScalaWordCount - Details for Job 0 http://54.173.130.234:4040/jobs/job/?id=0 1/2

1.6.1 (/)

application UI

Stage transero sequenceFile map map atap Stage map eucee

Jobs (/jobs/) Stages (/stages/) Storage (/storage/) Environment (/environment/) Executors (/executors/)

(a) (b) embedded transformation

SLIDE 17

Implementation in Spark

transferTo() implicit insertion

DC1 DC2 Origin Code

val InRDD = In1+In2 InRDD .filter(…) .groupByKey(…) .collect()

Produced Code

val InRDD = In1+In2 InRDD .filter(…) .transferTo(…) .groupByKey(…) .collect() In1 filter group ByKey Shuffle Input collect In2 filter group ByKey Shuffle Input

DC1 DC2

In1 filter groupByKey Shuffle Input collect In2 filter groupByKey Shuffle Input transferTo Processed By DAGScheduler transferTo

SLIDE 18

Evaluation

Amazon EC2, m3.large instances
26 nodes in 6 different locations

4 6 4 4 4 4

N. Virginia
N. California

São Paulo Frankfurt Singapore Sydney

SLIDE 19

Performance

The lower, the better

SLIDE 20

Take-Away Messages

Push-based shuffle mechanism is beneficial in wide-area

data analytics

Aggregating shuffle input to a subset of datacenters is

likely to help when you have no priori knowledge

Implementation in Apache Spark as a data transformation
Performance: reduced shuffle time and its variance

SLIDE 21

Optimizing Shuffle in Wide-Area Data Analytics

Shuhao Liu*, Hao Wang, Baochun Li Department of Electrical & Computer Engineering University of Toronto

What is:

Wide-Area Data Analytics

data 1 data 2 data 3

Large volumes of data are generated, stored and processed across geographically distributed DCs.

Existing work focuses on Task Placement

Rethink the root cause of inter-DC traffic: Shuffle

Fetch-based Shuffle

Mapper 1 Mapper 2 Mapper 3 Reducer 1 Reducer 3 Reducer 2 All-to-all Communication at Beginning of Reduce Tasks

Problems with Fetch

Push-based Shuffle

Push-based Shuffle

Where to Push?

portion of shuffle input

shuffle

Aggregating Shuffle Input

Datacenter A Receiver

input Reducer 1 Reducer 3 Datacenter B Reducer 2 Reducer 4 Reducer 5 Inter-DC Transfers

For any partition of shuffle input, the expected inter-datacenter traffic in next shuffle is proportional to the number of non-colocated reducers.

Aggregating Shuffle Input

large portion of shuffle input

reasonable task placement

Implementation in Spark

Implementation in Spark

Implementation in Spark

Implementation in Spark

Evaluation

Performance

Take-Away Messages

data analytics

likely to help when you have no priori knowledge

Thanks! Q&A