Optimizing Shuffle in Wide-Area Data Analytics Shuhao Liu * , Hao - - PowerPoint PPT Presentation
Optimizing Shuffle in Wide-Area Data Analytics Shuhao Liu * , Hao - - PowerPoint PPT Presentation
Optimizing Shuffle in Wide-Area Data Analytics Shuhao Liu * , Hao Wang, Baochun Li Department of Electrical & Computer Engineering University of Toronto What is: - Wide-Area Data Analytics? - Shuffle? 2 Wide-Area Data Analytics
What is:
- Wide-Area Data Analytics?
- Shuffle?
2
Wide-Area Data Analytics
3
- N. California
Ireland Singapore Oregon
- N. Virginia
data 1 data 2 data 3
Large volumes of data are generated, stored and processed across geographically distributed DCs.
Existing work focuses on Task Placement
4
Rethink the root cause of inter-DC traffic: Shuffle
5
Fetch-based Shuffle
6
Mapper 1 Mapper 2 Mapper 3 Reducer 1 Reducer 3 Reducer 2 All-to-all Communication at Beginning of Reduce Tasks
Problems with Fetch
- Under-utilize the inter-datacenter bandwidth
- Start late: beginning of reduce
- Start concurrently: share bandwidth
- Need for refetch
- Possible reduce task failure
7
Push-based Shuffle
- Bandwidth Utilization
8
time 0 4 8 12 16 time 0 4 8 12 16 Map Reduce Reduce Map Data Transfer Data Transfer Reduce Reduce Map Shuffle Read Shuffle Read
Shuffle Write Shuffle Write
Map Stage N Stage N+1 Stage N Stage N+1 worker A worker B worker A worker B
Shuffle Read
(a) (b)
Push-based Shuffle
- Failure Recovery
9
time 0 4 8 12 16 20 24 Map Failed Reduce Reduce Refetch Reduce Shuffle Read Shuffle Read
Shuffle Write
time 0 4 8 12 16 20 24 Map Stage N Stage N+1 worker A worker B
(a)
Map Failed Reduce Reduce Reduce Map Stage N Stage N+1 worker A worker B
(b)
Data Transfer Data Transfer
Shuffle Write Shuffle Read Refetch
Where to Push?
- Optional: existing task placement algorithms
- Know reducer placement before hand
- Require prior knowledge
- e.g., predictable jobs, inter-DC available bandwidth
- Our solution: Push/Aggregate
10
- Send shuffle input to a subset of datacenters with a large
portion of shuffle input
- Reduce inter-datacenter traffic in future shuffles
- Likely to reduce inter-datacenter traffic at current
shuffle
Aggregating Shuffle Input
11
Datacenter A Receiver
- f shuffle
input Reducer 1 Reducer 3 Datacenter B Reducer 2 Reducer 4 Reducer 5 Inter-DC Transfers
For any partition of shuffle input, the expected inter-datacenter traffic in next shuffle is proportional to the number of non-colocated reducers.
12
Aggregating Shuffle Input
- Send shuffle input to a subset of datacenters with a
large portion of shuffle input
- Reducer is likely to be placed close to shuffle input
- More aggregated data -> less inter-datacenter traffic with
reasonable task placement
13
Implementation in Spark
- Requirements:
- Push before writing to disk
- Destined to the aggregator datacenters
- transferTo() as an RDD transformation
- Allow implicit or explicit usage
14
Implementation in Spark
15
InputRDD A1 B1 A2 map A1 map A2 map B1 reduce A1, A2 B1 reduce A1, A2 B1 InputRDD .map(…) .reduce(…) … InputRDD .map(…) .transferTo([A]) .reduce(…) … InputRDD A1 B1 A2 map A1 map A2 map B1 reduce A1, A2, Ax transferTo A1 transferTo A2 transferTo A* reduce A1, A2, Ax (a) (b)
Implementation in Spark
16
5/12/2016 ScalaWordCount - Details for Job 0 http://54.173.130.234:4040/jobs/job/?id=0 1/2
1.6.1 (/)
application UI
(ill) (/stages/stage/ill/ iterminatetrue) UI
Stage map seuenceile map atap Stage map reucee
1
- //
- s
- Jobs (/jobs/)
Stages (/stages/) Storage (/storage/) Environment (/environment/) Executors (/executors/)
/
5/12/2016 ScalaWordCount - Details for Job 0 http://54.173.130.234:4040/jobs/job/?id=0 1/2
1.6.1 (/)
application UI
UI
Stage transero sequenceFile map map atap Stage map eucee
1
Jobs (/jobs/) Stages (/stages/) Storage (/storage/) Environment (/environment/) Executors (/executors/)
(a) (b) embedded transformation
Implementation in Spark
- transferTo() implicit insertion
17
DC1 DC2 Origin Code
val InRDD = In1+In2 InRDD .filter(…) .groupByKey(…) .collect()
Produced Code
val InRDD = In1+In2 InRDD .filter(…) .transferTo(…) .groupByKey(…) .collect() In1 filter group ByKey Shuffle Input collect In2 filter group ByKey Shuffle Input
DC1 DC2
In1 filter groupByKey Shuffle Input collect In2 filter groupByKey Shuffle Input transferTo Processed By DAGScheduler transferTo
Evaluation
- Amazon EC2, m3.large instances
- 26 nodes in 6 different locations
18
4 6 4 4 4 4
- N. Virginia
- N. California
São Paulo Frankfurt Singapore Sydney
Performance
19
The lower, the better
Take-Away Messages
- Push-based shuffle mechanism is beneficial in wide-area
data analytics
- Aggregating shuffle input to a subset of datacenters is
likely to help when you have no priori knowledge
- Implementation in Apache Spark as a data transformation
- Performance: reduced shuffle time and its variance
20
Thanks! Q&A
21