ShuffleWatcher : Shuffle-aware Scheduling in Mul5-tenant - PowerPoint PPT Presentation

ShuffleWatcher ¡: ¡Shuffle-‑aware ¡Scheduling ¡ in ¡Mul5-‑tenant ¡MapReduce ¡Clusters ¡ † � Faraz ¡Ahmad †* ¡ Srimat ¡T. ¡Chakradhar ‡ ¡ ¡ ‡ � Anand ¡Raghunathan † ¡ T. ¡N. ¡Vijaykumar † ¡ * ¡ ATC 2014 Philadelphia, PA � 1 � 1 �

Mul%-‑tenancy ¡in ¡MapReduce ¡Clusters ¡ Data Sharing � Cost � Utilization � U SERS � M AP R EDUCE � J OBS � Challenges: � (1) Cluster throughput (2) Jobs’ latency � (3) Fairness among users � Our contribution: We achieve high throughput and low latency while maintaining fairness among users � 2 � 2 � 2 �

Significance ¡of ¡Shuffle ¡in ¡Mul%-‑tenant ¡Clusters ¡ • Shuffle ¡ Input data (HDFS) � – All-‑Map-‑to-‑All-‑Reduce ¡communica%on ¡ ¡ • Impact ¡of ¡Shuffle-‑heavy ¡jobs ¡ – 60% ¡jobs ¡in ¡Yahoo, ¡20% ¡jobs ¡in ¡ ¡ map � ¡Facebook ¡are ¡Shuffle-‑heavy ¡ tasks � – Shuffle-‑heavy ¡jobs ¡run ¡much ¡ ¡ shuffle � ¡longer ¡than ¡Shuffle-‑light ¡ à ¡high ¡network ¡traffic ¡volume ¡ • Impact ¡of ¡Mul%-‑tenancy ¡ reduce � tasks � – Mul%ple ¡concurrent ¡jobs’ ¡shuffle ¡ ¡ Output data (HDFS) � à ¡ ¡high ¡network ¡bisec%on ¡pressure ¡ Net impact : Low cluster throughput / high jobs’ latency � 3 � 3 � 3 �

Related ¡Work: ¡Mul%tenant ¡Scheduling ¡ • Targe%ng ¡fairness ¡ – FIFO, ¡Capacity, ¡Fair ¡(Hadoop) ¡ – Dominant ¡Resource ¡Fairness ¡[NSDI ¡‘11] ¡ • Targe%ng ¡throughput ¡ – Delay ¡Scheduler ¡[EuroSys ¡‘10], ¡Quincy ¡[SOSP ¡’09] ¡ – Op%mizes ¡remote ¡Map ¡traffic ¡ Traffic ¡type ¡ Job ¡Type ¡ Traffic ¡volume ¡(% ¡of ¡total) ¡ Remote ¡Map ¡Traffic ¡ Shuffle-‑heavy ¡ 5% ¡ Remote ¡Map ¡Traffic ¡ Shuffle-‑light ¡ 5% ¡ Shuffle ¡ Shuffle-‑heavy ¡ 78% ¡ Shuffle ¡ Shuffle-‑light ¡ 12% ¡ Our contribution : We improve throughput by focusing on Shuffle while maintaining fairness among users � 4 � 4 � 4 �

ShuffleWatcher: ¡Contribu%ons ¡ • Achieves ¡high ¡throughput ¡and ¡low ¡job ¡latency ¡by ¡shaping ¡ and ¡reducing ¡Shuffle ¡traffic ¡ • Leverages ¡factors ¡unique ¡to ¡mul%-‑tenancy ¡ • Exploits ¡trade-‑off ¡: ¡intra-‑job ¡concurrency ¡vs. ¡Shuffle ¡locality ¡ • Employs ¡three ¡mechanisms ¡: ¡ ¡ – Network-‑Aware ¡Shuffle ¡Scheduling ¡(NASS) ¡(traffic ¡shaping) ¡ – Shuffle-‑Aware ¡Map ¡Placement ¡(SAMP) ¡(traffic ¡reduc%on) ¡ – Shuffle-‑Aware ¡Reduce ¡Placement ¡(SARP) ¡(traffic ¡reduc%on) ¡ • Keeps ¡the ¡underlying ¡fairness ¡policy ¡intact ¡ ¡ ShuffleWatcher achieves 46% higher throughput, 32% lower job latency, 48% reduced traffic compared to Delay Scheduler � 5 � 5 � 5 �

Outline ¡ • Introduc%on ¡ • ShuffleWatcher ¡mechanisms ¡ – Network-‑aware ¡Shuffle ¡Scheduling ¡(NASS) ¡ – Shuffle-‑aware ¡Reduce ¡Placement ¡(SARP) ¡ – Shuffle-‑aware ¡Map ¡Placement ¡(SAMP) ¡ • Experimental ¡Evalua%on ¡ • Conclusion ¡ 6 � 6 � 6 �

Network-‑Aware ¡Shuffle ¡Scheduling ¡(NASS) ¡ Observa%on ¡#1 ¡: ¡ • Network ¡not ¡saturated ¡all ¡the ¡%me ¡ – 40% ¡jobs ¡in ¡Yahoo, ¡80% ¡jobs ¡in ¡Facebook ¡are ¡Shuffle-‑light ¡ ¡ Shuffle profile in 100-node Amazon EC2 cluster (Fair Scheduler) � – Provides ¡an ¡opportunity ¡to ¡shape ¡traffic. ¡ • Simple ¡delaying ¡a ¡job ¡– ¡not ¡an ¡op%on ¡ – fairness ¡ compromised ¡ 7 � 7 � 7 �

Network-‑Aware ¡Shuffle ¡Scheduling ¡(NASS) ¡(contd.) ¡ Observa%on ¡#2 ¡: ¡ • Mul%-‑tenancy ¡offers ¡flexibility ¡in ¡Shuffle ¡schedule ¡ High intra-job � single- � map phase � Map-Shuffle concurrency � tenancy � reduce phase � shuffle � time � reduce scheduled � Low intra-job Map- � ¡ Shuffle concurrency � multi- � tasks ¡from ¡other ¡jobs ¡ map phase � tenancy � tasks ¡from ¡other ¡jobs ¡ reduce phase � shuffle � time � reduce scheduled � traffic � volume � Shuffle may be delayed in multi-tenancy, if needed, without � loss of communication-computation overlap � 8 � 8 � 8 �

Network-‑Aware ¡Shuffle ¡Scheduling ¡(NASS) ¡(contd.) ¡ ¡ ¡• ¡Contribu%on: ¡ ¡ – Delay ¡a ¡job’s ¡shuffle ¡at ¡high ¡network ¡loads ¡to ¡shape ¡traffic ¡ reduce � Shuffle load � increase � Shuffle load � – At ¡high ¡load, ¡ ¡schedule ¡other ¡tasks ¡of ¡the ¡same ¡user ¡that ¡do ¡not ¡ stress ¡network. ¡ NASS achieves high throughput while maintaining fairness � 9 � 9 � 9 �

Shuffle-‑Aware ¡Reduce ¡Placement ¡(SARP) ¡ ¡ Multi-tenancy � Single-tenancy � Map � execution � server � rack � Intermediate � data � distribution � 10 � 10 � 10 �

Shuffle-‑Aware ¡Reduce ¡Placement ¡(SARP) ¡(contd.) ¡ • Observa%on: ¡ – A ¡job’s ¡intermediate ¡data ¡is ¡likely ¡to ¡be ¡ skewed ¡ • Mul%ple ¡jobs ¡run ¡concurrently ¡ – Exploit ¡NASS’s ¡delayed ¡Shuffle ¡to ¡improve ¡ locality ¡ • ¡Intermediate ¡data ¡loca%ons ¡are ¡known ¡ • Contribu%on: ¡ – Assign ¡Reduce ¡tasks ¡to ¡racks ¡with ¡more ¡intermediate ¡data ¡ • Improves ¡Shuffle ¡ locality ¡ à ¡lowers ¡cross-‑rack ¡Shuffle ¡traffic ¡ ¡ 11 � 11 � 11 �

Shuffle-‑Aware ¡Map ¡Placement ¡(SAMP) ¡ Ideal map placement for Shuffle locality � Original Map Execution � Intermediate data distribution � server � rack � 12 � 12 � 12 �

Shuffle-‑Aware ¡Map ¡Placement ¡(SAMP) ¡(contd.) ¡ • Observa%on: ¡ – Input ¡data ¡redundancy ¡provides ¡flexibility ¡in ¡Map ¡scheduling ¡ • Contribu%on: ¡ – Localize ¡Map ¡tasks ¡to ¡fewer ¡racks ¡to ¡improve ¡Shuffle ¡locality ¡ • Provides ¡further ¡opportuni%es ¡for ¡SARP ¡to ¡localize ¡Shuffle ¡ – Lowers ¡the ¡sum ¡(Remote ¡Map ¡Traffic ¡+ ¡cross-‑rack ¡Shuffle) ¡ • May ¡incur ¡some ¡remote ¡map ¡traffic ¡for ¡larger ¡savings ¡in ¡Shuffle ¡ 13 � 13 � 13 �

ShuffleWatcher ¡– ¡Overall ¡Picture ¡ U SERS � M AP R EDUCE � S HUFFLE W ATCHER � J OBS � Network � Saturated? � Network-‑ NetSat � Aware ¡Shuffle ¡ C LUSTER � Scheduling ¡ (NASS) ¡ Free � workers � task � assignment � Shuffle-‑ Shuffle-‑ Aware ¡ Aware ¡Map ¡ Reduce ¡ Placement ¡ Placement ¡ (SAMP) ¡ (SARP) ¡ 14 � 14 � 14 �

Outline ¡ • Introduc%on ¡ • ShuffleWatcher ¡mechanisms ¡ – Network-‑aware ¡Shuffle ¡Scheduling ¡(NASS) ¡ – Shuffle-‑aware ¡Reduce ¡Placement ¡(SARP) ¡ – Shuffle-‑aware ¡Map ¡Placement ¡(SAMP) ¡ • Experimental ¡Evalua%on ¡ • Conclusion ¡ 15 � 15 � 15 �

Experimental ¡Methodology ¡ ¡ • Baseline: ¡ – Fair ¡Scheduler ¡[Hadoop ¡implementa%on] ¡ – DRF ¡Scheduler ¡[developed ¡in-‑house ¡as ¡per ¡NSDI ¡‘11] ¡ – Delay ¡Scheduler ¡[Eurosys ¡’10, ¡Hadoop ¡implementa%on] ¡ • Planorm: ¡ – 100-‑node ¡Amazon ¡EC2 ¡cluster ¡ ¡ • 4 ¡virtual ¡cores ¡per ¡node, ¡15 ¡GB ¡memory ¡ • 10 ¡racks ¡of ¡10 ¡nodes ¡each ¡ • 50 ¡Mbps ¡cross-‑rack ¡per-‑node ¡bisec%on ¡bandwidth ¡ • Workloads: ¡ – Job ¡submission ¡: ¡exponen%al ¡distribu%on ¡(mean ¡arrival ¡rate ¡: ¡40 ¡s) ¡ – 30 ¡users, ¡jobs ¡from ¡12 ¡MapReduc%ons, ¡run ¡for ¡4 ¡hours ¡ • 40% ¡shuffle-‑heavy, ¡20% ¡shuffle-‑medium, ¡40% ¡shuffle-‑light ¡ – Job ¡sizes ¡: ¡mix ¡of ¡small, ¡medium, ¡large ¡sizes ¡vary ¡from ¡100 ¡MB-‑1 ¡TB ¡ 16 � 16 � 16 �

Throughput ¡Comparison ¡ 1.66 ¡ Normalized ¡Throughput ¡ 1.75 ¡ 1.40 ¡ 1.39 ¡ 1.50 ¡ 1.14 ¡ 1.25 ¡ 1.00 ¡ 0.75 ¡ 0.50 ¡ 0.25 ¡ 0.00 ¡ ShuffleWatcher is independent of fairness policy and � improves throughput (39%, 46%, 40%) over multiple schemes � 17 � 17 � 17 �

Jobs ¡Latency ¡(turn-‑around ¡%me) ¡Comparison ¡ 1.25 ¡ 1.00 ¡ 0.90 ¡ Normalized ¡5me ¡ 0.73 ¡ 0.71 ¡ 0.75 ¡ 0.61 ¡ 0.50 ¡ 0.25 ¡ 0.00 ¡ ShuffleWatcher improves latency (27%, 32%, 29%) over multiple schemes � 18 � 18 � 18 �

ShuffleWatcher : Shuffle-aware Scheduling in Mul5-tenant - PowerPoint PPT Presentation

ShuffleWatcher : Shuffle-aware Scheduling in Mul5-tenant MapReduce Clusters Faraz Ahmad * Srimat T. Chakradhar Anand Raghunathan T. N.

LANDLORD TENANT LAW UPDATES HIGHLIGHTS FROM LAWS PASSED IN 2019 STARTING A LANDLORD-TENANT

Understanding Dataset Design Choices for Mul5-hop Reasoning Jifan Chen and Greg Durre; The

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Election s Agenda Agenda 1. Welcome and Introduction 2. Why we are here today 3. Tenant

February 14, 2018 Staff: Kim Painter, Laura London Queens Court Tenant Profile Tenant Profile

Northampton Tenant Panel Housing Options Appraisal Appendix 2 Results and Analysis of the Tenant

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

A SHUFFLE ARGUMENT SECURE IN THE GENERIC MODEL Prastudy Fauzi, Helger Lipmaa, Michal Zajac

The RI Ocean S AMP Fisheries S takeholder Process: A Case S tudy in Transparent

WHY ARE WE HERE? Why have a Master Plan? Responsibility to meet demand o Without a plan,

MOTION 2014 AGM Agenda Approval 1 13/01/2017 2016 AGM 2016 AGM AGM SPONSORS add

STUDY OF VIRTUAL TRANSNATIONAL LESSON DELIVERY THROUGH A GLOBAL SOFTWARE DEVELOPMENT PROJECT

CEMS-London Student Alumni Mentoring Program (SAMP) London 9 th March 2012 Thomson Reuters What

MAREC EC= Mining ng Area a Rehabi abili lita tati tion on and Environ onmenta tal l

Livestock Gene Bank of the Republic of Croatia Ramljak J., pehar M., Ivankovi A., Bara, Z.

Workshop: Minnelusa I Day 3 10:40 11:40 am ASP Blend Optimization Challenges and Strategies