Optimizing Shuffle in Wide-Area Data Analytics Shuhao Liu * , Hao - - PowerPoint PPT Presentation

optimizing shuffle in wide area data analytics
SMART_READER_LITE
LIVE PREVIEW

Optimizing Shuffle in Wide-Area Data Analytics Shuhao Liu * , Hao - - PowerPoint PPT Presentation

Optimizing Shuffle in Wide-Area Data Analytics Shuhao Liu * , Hao Wang, Baochun Li Department of Electrical & Computer Engineering University of Toronto What is: - Wide-Area Data Analytics? - Shuffle? 2 Wide-Area Data Analytics


slide-1
SLIDE 1

Optimizing Shuffle in Wide-Area Data Analytics

Shuhao Liu*, Hao Wang, Baochun Li Department of Electrical & Computer Engineering University of Toronto

slide-2
SLIDE 2

What is:

  • Wide-Area Data Analytics?
  • Shuffle?

2

slide-3
SLIDE 3

Wide-Area Data Analytics

3

  • N. California

Ireland Singapore Oregon

  • N. Virginia

data 1 data 2 data 3

Large volumes of data are generated, stored and processed across geographically distributed DCs.

slide-4
SLIDE 4

Existing work focuses on Task Placement

4

slide-5
SLIDE 5

Rethink the root cause of inter-DC traffic: Shuffle

5

slide-6
SLIDE 6

Fetch-based Shuffle

6

Mapper 1 Mapper 2 Mapper 3 Reducer 1 Reducer 3 Reducer 2 All-to-all Communication at Beginning of Reduce Tasks

slide-7
SLIDE 7

Problems with Fetch

  • Under-utilize the inter-datacenter bandwidth
  • Start late: beginning of reduce
  • Start concurrently: share bandwidth
  • Need for refetch
  • Possible reduce task failure

7

slide-8
SLIDE 8

Push-based Shuffle

  • Bandwidth Utilization

8

time 0 4 8 12 16 time 0 4 8 12 16 Map Reduce Reduce Map Data Transfer Data Transfer Reduce Reduce Map Shuffle Read Shuffle Read

Shuffle Write Shuffle Write

Map Stage N Stage N+1 Stage N Stage N+1 worker A worker B worker A worker B

Shuffle Read

(a) (b)

slide-9
SLIDE 9

Push-based Shuffle

  • Failure Recovery

9

time 0 4 8 12 16 20 24 Map Failed Reduce Reduce Refetch Reduce Shuffle Read Shuffle Read

Shuffle Write

time 0 4 8 12 16 20 24 Map Stage N Stage N+1 worker A worker B

(a)

Map Failed Reduce Reduce Reduce Map Stage N Stage N+1 worker A worker B

(b)

Data Transfer Data Transfer

Shuffle Write Shuffle Read Refetch

slide-10
SLIDE 10

Where to Push?

  • Optional: existing task placement algorithms
  • Know reducer placement before hand
  • Require prior knowledge
  • e.g., predictable jobs, inter-DC available bandwidth
  • Our solution: Push/Aggregate

10

slide-11
SLIDE 11
  • Send shuffle input to a subset of datacenters with a large

portion of shuffle input

  • Reduce inter-datacenter traffic in future shuffles
  • Likely to reduce inter-datacenter traffic at current

shuffle

Aggregating Shuffle Input

11

Datacenter A Receiver

  • f shuffle

input Reducer 1 Reducer 3 Datacenter B Reducer 2 Reducer 4 Reducer 5 Inter-DC Transfers

slide-12
SLIDE 12

For any partition of shuffle input, the expected inter-datacenter traffic in next shuffle is proportional to the number of non-colocated reducers.

12

slide-13
SLIDE 13

Aggregating Shuffle Input

  • Send shuffle input to a subset of datacenters with a

large portion of shuffle input

  • Reducer is likely to be placed close to shuffle input
  • More aggregated data -> less inter-datacenter traffic with

reasonable task placement

13

slide-14
SLIDE 14

Implementation in Spark

  • Requirements:
  • Push before writing to disk
  • Destined to the aggregator datacenters
  • transferTo() as an RDD transformation
  • Allow implicit or explicit usage

14

slide-15
SLIDE 15

Implementation in Spark

15

InputRDD A1 B1 A2 map A1 map A2 map B1 reduce A1, A2 B1 reduce A1, A2 B1 InputRDD .map(…) .reduce(…) … InputRDD .map(…) .transferTo([A]) .reduce(…) … InputRDD A1 B1 A2 map A1 map A2 map B1 reduce A1, A2, Ax transferTo A1 transferTo A2 transferTo A* reduce A1, A2, Ax (a) (b)

slide-16
SLIDE 16

Implementation in Spark

16

5/12/2016 ScalaWordCount - Details for Job 0 http://54.173.130.234:4040/jobs/job/?id=0 1/2

1.6.1 (/)

application UI

(ill) (/stages/stage/ill/ iterminatetrue) UI

Stage map seuenceile map atap Stage map reucee

1

  • //
  • s
  • Jobs (/jobs/)

Stages (/stages/) Storage (/storage/) Environment (/environment/) Executors (/executors/)

/

5/12/2016 ScalaWordCount - Details for Job 0 http://54.173.130.234:4040/jobs/job/?id=0 1/2

1.6.1 (/)

application UI

UI

Stage transero sequenceFile map map atap Stage map eucee

1

Jobs (/jobs/) Stages (/stages/) Storage (/storage/) Environment (/environment/) Executors (/executors/)

(a) (b) embedded transformation

slide-17
SLIDE 17

Implementation in Spark

  • transferTo() implicit insertion

17

DC1 DC2 Origin Code

val InRDD = In1+In2 InRDD .filter(…) .groupByKey(…) .collect()

Produced Code

val InRDD = In1+In2 InRDD .filter(…) .transferTo(…) .groupByKey(…) .collect() In1 filter group ByKey Shuffle Input collect In2 filter group ByKey Shuffle Input

DC1 DC2

In1 filter groupByKey Shuffle Input collect In2 filter groupByKey Shuffle Input transferTo Processed By DAGScheduler transferTo

slide-18
SLIDE 18

Evaluation

  • Amazon EC2, m3.large instances
  • 26 nodes in 6 different locations

18

4 6 4 4 4 4

  • N. Virginia
  • N. California

São Paulo Frankfurt Singapore Sydney

slide-19
SLIDE 19

Performance

19

The lower, the better

slide-20
SLIDE 20

Take-Away Messages

  • Push-based shuffle mechanism is beneficial in wide-area

data analytics

  • Aggregating shuffle input to a subset of datacenters is

likely to help when you have no priori knowledge

  • Implementation in Apache Spark as a data transformation
  • Performance: reduced shuffle time and its variance

20

slide-21
SLIDE 21

Thanks! Q&A

21