Elasecutor: Elastic Executor Scheduling in Data Analytics Systems - - PowerPoint PPT Presentation

elasecutor elastic executor scheduling in data analytics
SMART_READER_LITE
LIVE PREVIEW

Elasecutor: Elastic Executor Scheduling in Data Analytics Systems - - PowerPoint PPT Presentation

Elasecutor: Elastic Executor Scheduling in Data Analytics Systems Libin Liu , Hong Xu City University of Hong Kong ACM Symposium on Cloud Computing 2018 1 Data Analytics Systems Various workloads running in data analytics systems


slide-1
SLIDE 1

Elasecutor: Elastic Executor Scheduling 
 in Data Analytics Systems

Libin Liu, Hong Xu City University of Hong Kong

ACM Symposium on Cloud Computing 2018

1

slide-2
SLIDE 2

Data Analytics Systems

  • Various workloads running in data analytics

systems concurrently

  • The workflow of an analytics application can be

expressed as a DAG

2

slide-3
SLIDE 3

Data Analytics Systems

  • Various workloads running in data analytics

systems concurrently

  • The workflow of an analytics application can be

expressed as a DAG

2

slide-4
SLIDE 4

Data Analytics Systems

  • Various workloads running in data analytics

systems concurrently

  • The workflow of an analytics application can be

expressed as a DAG

2

slide-5
SLIDE 5

Data Analytics Systems

  • Various workloads running in data analytics

systems concurrently

  • The workflow of an analytics application can be

expressed as a DAG

2

slide-6
SLIDE 6

Data Analytics Systems

  • Various workloads running in data analytics

systems concurrently

  • The workflow of an analytics application can be

expressed as a DAG

3

slide-7
SLIDE 7

Data Analytics Systems

  • Various workloads running in data analytics

systems concurrently

  • The workflow of an analytics application can be

expressed as a DAG

3

Directed Acyclic Graph (DAG)

Stage 1

parallelize filter map

Stage 2

reduceByKey map

Stage 3

parallelize filter map

Stage 4

join

slide-8
SLIDE 8

Resource Scheduling

  • Resource schedulers for various objectives, e.g.,

fairness, cluster utilization, application completion time, etc.

4

slide-9
SLIDE 9

Resource Scheduling

  • Resource schedulers for various objectives, e.g.,

fairness, cluster utilization, application completion time, etc.

4

Efficient resource scheduling is an important and practical issue in data analytics systems

slide-10
SLIDE 10

Current Solutions

  • Static allocation according to peak demands
  • “Task-based” resource schedulers adopted in

“executor-based” systems

  • Assign executors to machines randomly

5

slide-11
SLIDE 11

Need for an Elastic Scheduler

6

slide-12
SLIDE 12

Need for an Elastic Scheduler

7

slide-13
SLIDE 13

Need for an Elastic Scheduler

7

Executor resource usage exhibits significant temporal variations

slide-14
SLIDE 14

Need for an Elastic Scheduler

8

Resource CPU Memory Network Disk

Terasort

Peak/Avg. 1.8 1.7 6.2 1.5 Peak/Trough 60 3.3 237 6.1

K-means

Peak/Avg. 1.7 1.2 11.5 5.6 Peak/Trough 75 6 53 100

Pagerank

Peak/Avg. 3.9 1.3 20.2 9.1 Peak/Trough 50 11.5 119 50

Logistic Regression

Peak/Avg. 2.1 1.4 5.5 6.1 Peak/Trough 50 12 409.6 42.5

slide-15
SLIDE 15

Need for an Elastic Scheduler

8

Resource CPU Memory Network Disk

Terasort

Peak/Avg. 1.8 1.7 6.2 1.5 Peak/Trough 60 3.3 237 6.1

K-means

Peak/Avg. 1.7 1.2 11.5 5.6 Peak/Trough 75 6 53 100

Pagerank

Peak/Avg. 3.9 1.3 20.2 9.1 Peak/Trough 50 11.5 119 50

Logistic Regression

Peak/Avg. 2.1 1.4 5.5 6.1 Peak/Trough 50 12 409.6 42.5

slide-16
SLIDE 16

Need for an Elastic Scheduler

8

Resource CPU Memory Network Disk

Terasort

Peak/Avg. 1.8 1.7 6.2 1.5 Peak/Trough 60 3.3 237 6.1

K-means

Peak/Avg. 1.7 1.2 11.5 5.6 Peak/Trough 75 6 53 100

Pagerank

Peak/Avg. 3.9 1.3 20.2 9.1 Peak/Trough 50 11.5 119 50

Logistic Regression

Peak/Avg. 2.1 1.4 5.5 6.1 Peak/Trough 50 12 409.6 42.5

slide-17
SLIDE 17

Need for an Elastic Scheduler

8

Resource CPU Memory Network Disk

Terasort

Peak/Avg. 1.8 1.7 6.2 1.5 Peak/Trough 60 3.3 237 6.1

K-means

Peak/Avg. 1.7 1.2 11.5 5.6 Peak/Trough 75 6 53 100

Pagerank

Peak/Avg. 3.9 1.3 20.2 9.1 Peak/Trough 50 11.5 119 50

Logistic Regression

Peak/Avg. 2.1 1.4 5.5 6.1 Peak/Trough 50 12 409.6 42.5

Static allocation using peak demands would cause severe resource wastage and performance issues

slide-18
SLIDE 18

Our Idea

9

Dynamically allocate and explicitly size resources to executors over time, and strategically assign executors to machines

slide-19
SLIDE 19

Our Idea

9

Dynamically allocate and explicitly size resources to executors over time, and strategically assign executors to machines

slide-20
SLIDE 20

Our Idea

9

Dynamically allocate and explicitly size resources to executors over time, and strategically assign executors to machines Elasecutor, a novel executor scheduler for data analytics systems

slide-21
SLIDE 21

Outline

  • Motivation
  • Elasecutor Design

‒Elastic Executor Scheduling ‒Demand Prediction ‒Dynamic Reprovisioning

  • Implementation
  • Evaluation
  • Conclusion

10

slide-22
SLIDE 22

Elastic Executor Scheduling

  • Challenge

− Scheduling executors with their multi-resource demand time-series − Multi-dimensional packing − APX-hard − Analyzed in detail in section 3.2.1

  • Objective

− Minimizing makespan − i.e., avoid resource underutilization and minimize machine-level resource fragmentation

11

slide-23
SLIDE 23

Elastic Executor Scheduling - DRR

  • Dominant Remaining Resource: “dominant” =

“maximum”

  • An example: We select as the time point to

calculate DRR for machine 1. and , and its DRR is

12

slide-24
SLIDE 24

Elastic Executor Scheduling - DRR

  • Dominant Remaining Resource: “dominant” =

“maximum”

  • An example: We select as the time point to

calculate DRR for machine 1. and , and its DRR is

12

DRR is defined as the maximum remaining resource along the time dimension up to time 𝑢

slide-25
SLIDE 25

Why DRR

  • Convert multi-dimensional metrics into scalars
  • Better reflect resource utilization

− “Maximum” , not “Minimum”

  • Better than alternative metric TRC

− TRC sums up the relative remaining capacity of each resource

13

Improvement of DRR over TRC as an alternative metric for executor placement

slide-26
SLIDE 26

Elastic Executor Scheduling - MinFrag

  • Base on BFD (Best Fit Decreasing)
  • Iteratively assigning the “largest” executor to a

machine that yields the minimum DRR

14

slide-27
SLIDE 27

Elastic Executor Scheduling - MinFrag

  • Base on BFD (Best Fit Decreasing)
  • Iteratively assigning the “largest” executor to a

machine that yields the minimum DRR

14

Heartbeat received

slide-28
SLIDE 28

Elastic Executor Scheduling - MinFrag

  • Base on BFD (Best Fit Decreasing)
  • Iteratively assigning the “largest” executor to a

machine that yields the minimum DRR

14

Heartbeat received Search executors in the queue

slide-29
SLIDE 29

Elastic Executor Scheduling - MinFrag

  • Base on BFD (Best Fit Decreasing)
  • Iteratively assigning the “largest” executor to a

machine that yields the minimum DRR

14

Heartbeat received Search executors in the queue Calculate DRR for any executor placed on the machine

slide-30
SLIDE 30

Elastic Executor Scheduling - MinFrag

  • Base on BFD (Best Fit Decreasing)
  • Iteratively assigning the “largest” executor to a

machine that yields the minimum DRR

14

Heartbeat received Search executors in the queue Calculate DRR for any executor placed on the machine Choose the one producing minimum DRR to schedule

slide-31
SLIDE 31

Elastic Executor Scheduling - MinFrag

  • Base on BFD (Best Fit Decreasing)
  • Iteratively assigning the “largest” executor to a

machine that yields the minimum DRR

14

Heartbeat received Search executors in the queue Calculate DRR for any executor placed on the machine Choose the one producing minimum DRR to schedule Update placement results

slide-32
SLIDE 32

Elastic Executor Scheduling - MinFrag

  • Base on BFD (Best Fit Decreasing)
  • Iteratively assigning the “largest” executor to a

machine that yields the minimum DRR

14

Heartbeat received Search executors in the queue Calculate DRR for any executor placed on the machine Choose the one producing minimum DRR to schedule Update placement results

Termination Repeat the process

slide-33
SLIDE 33

Elastic Executor Scheduling - MinFrag

15

(a) Available resources of machine (b) Resource demands of executor 1 (c) Resource demands of executor 2

slide-34
SLIDE 34

Elastic Executor Scheduling - MinFrag

15

(a) Available resources of machine (b) Resource demands of executor 1 (c) Resource demands of executor 2

𝐸𝑆𝑆(1,𝑘) = max{ 53 112 , 165 448 } = 53 112

slide-35
SLIDE 35

Elastic Executor Scheduling - MinFrag

15

(a) Available resources of machine (b) Resource demands of executor 1 (c) Resource demands of executor 2

𝐸𝑆𝑆(1,𝑘) = max{ 53 112 , 165 448 } = 53 112 𝐸𝑆𝑆(2,𝑘) = max{ 13 32 , 43 128 } = 13 32

slide-36
SLIDE 36

Prediction Module

  • Recurring workloads

− Average resource time series of the latest 3 runs as the prediction result

  • New workloads

− Support Vector Regression

16

slide-37
SLIDE 37

Dynamic Reprovisioning

  • To prevent possible prediction errors and

unpredicted issues

  • Mechanism

− Monitoring stage execution time − Once observing longer than 1.1x expected one − Allocating all remaining resource to the executor for

  • ne monitoring period

17

slide-38
SLIDE 38

Implementation

  • Spark 2.1.0
  • Allocation Module (Cgroups, modified OpenJDK)
  • Scheduling Module
  • Resource Usage Depository
  • Reprovisioning Module
  • Prediction Module
  • Monitor Surrogate

18

slide-39
SLIDE 39

Elasecutor System

19

  • Monitor

Surrogate

Executor

Tasks

CPU Mem Net Disk

Allocation Module Prediction Module Resource Usage Depository Scheduling Module Reprovisioning Module Resource Manager

Master Workers

slide-40
SLIDE 40

Elasecutor System

20

  • Monitor

Surrogate

Executor

Tasks

CPU Mem Net Disk

Allocation Module Prediction Module Resource Usage Depository

Report Profiles

Scheduling Module Reprovisioning Module Resource Manager

Master Workers

slide-41
SLIDE 41

Elasecutor System

21

  • Monitor

Surrogate

Executor

Tasks

CPU Mem Net Disk

Allocation Module Prediction Module Resource Usage Depository

Report Profiles

Scheduling Module

Predicted Demands

Reprovisioning Module Resource Manager

Master Workers

slide-42
SLIDE 42

Elasecutor System

22

  • Monitor

Surrogate

Executor

Tasks

CPU Mem Net Disk

Allocation Module Prediction Module Resource Usage Depository

Report Profiles

Scheduling Module

Predicted Demands Available Resources

Reprovisioning Module Resource Manager

Master Workers

slide-43
SLIDE 43

Elasecutor System

23

  • Monitor

Surrogate

Executor

Tasks

CPU Mem Net Disk

Allocation Module Prediction Module Resource Usage Depository

Report Profiles

Scheduling Module

Predicted Demands Available Resources

Reprovisioning Module Resource Manager

Scheduling Decision

Master Workers

slide-44
SLIDE 44

Elasecutor System

24

  • Monitor

Surrogate

Executor

Tasks

CPU Mem Net Disk

Allocation Module Prediction Module Resource Usage Depository

Report Profiles

Scheduling Module

Predicted Demands Available Resources

Reprovisioning Module Resource Manager

Scheduling Decision Launch Adjust

Master Workers

slide-45
SLIDE 45

Elasecutor System

25

  • Monitor

Surrogate

Executor

Tasks

CPU Mem Net Disk

Allocation Module Prediction Module Resource Usage Depository

Report Profiles

Scheduling Module

Predicted Demands Available Resources

Reprovisioning Module Resource Manager

Scheduling Decision Trigger Launch Adjust

Master Workers

slide-46
SLIDE 46

Elasecutor System

26

  • Monitor

Surrogate

Executor

Tasks

CPU Mem Net Disk

Allocation Module Prediction Module Resource Usage Depository

Report Profiles

Scheduling Module

Predicted Demands Available Resources

Reprovisioning Module Resource Manager

Scheduling Decision Trigger Reprovision Launch Adjust

Master Workers

slide-47
SLIDE 47

Testbed Experiments

  • Testbed Setup

‒35 dell servers ‒Each server with two CPUs, 64GB RAM, and a quad-port 10GbE NIC ‒A 10GbE Switch

  • Methodology

‒120 recurring applications with different workloads, input data sizes, and resource settings ‒12 new applications ‒Arriving according to a Poisson process

27

slide-48
SLIDE 48

Schemes Compared

  • Static

− Statically allocating CPU and memory for each executor based on peak demands − Launching a fixed number of executors

  • Dynamic

− Scaling the number of executors dynamically, − each executor allocated a multiple of <1 core, 2GB RAM>

  • Tetris (SIGCOMM’14)

− Allocating peak demanded resources to executors − BFD-like algorithm for executor placement

28

slide-49
SLIDE 49

Evaluation - Makespan

29

Makespan measures the total time used to complete all applications

(a) Makespan Reduction (b) Stability of makespan

slide-50
SLIDE 50

Evaluation - Makespan

29

Makespan measures the total time used to complete all applications

(a) Makespan Reduction (b) Stability of makespan

High makespan reduction and more stable performance guarantee

slide-51
SLIDE 51

Evaluation - ACT

30

The CDFs of reduction in application completion time

slide-52
SLIDE 52

Evaluation - ACT

30

The CDFs of reduction in application completion time

Significant ACT improvement and more consistent application level performance

slide-53
SLIDE 53

Evaluation - Resource Utilization

31

Elasecutor’s average utilization improvement over other policies

slide-54
SLIDE 54

Evaluation - Resource Utilization

31

Elasecutor’s average utilization improvement over other policies

slide-55
SLIDE 55

Evaluation - Resource Utilization

31

Elasecutor’s average utilization improvement over other policies

slide-56
SLIDE 56

Evaluation - Microbenchmark

32

CDFs of reductions in ACT and AET by comparing Elasecutor with and without reprovisioning module

slide-57
SLIDE 57

Evaluation - Microbenchmark

32

CDFs of reductions in ACT and AET by comparing Elasecutor with and without reprovisioning module

Reprovisioning is important for prediction based resource schedulers to improve application QoS

slide-58
SLIDE 58

Conclusion

  • Elasecutor

− Elastically allocating resources to avoid overallocation − Placing executors strategically to minimize multi- resource fragmentation

  • Experiment results

− Reducing makespan by more than 42% on average − Reducing the median application completion time by up to 40% − Improving cluster resource utilization by up to 55%

33

slide-59
SLIDE 59

Thanks!
 Q & A

34

slide-60
SLIDE 60

Overhead

35

Monitor surrogate ‘s resource consumption Resource scheduler’s processing delay

slide-61
SLIDE 61

Predictability

  • Workloads: Sort, WordCount, Terasort, Bayes, K-

means, LR, PageRank, NWeight

36

Statistical analysis of CoVs The CDFs of coefficient of variations

slide-62
SLIDE 62

Predictability

  • Workloads: Sort, WordCount, Terasort, Bayes, K-

means, LR, PageRank, NWeight

37

Statistical analysis of CoVs The CDFs of coefficient of variations

slide-63
SLIDE 63

Predictability

  • Workloads: Sort, WordCount, Terasort, Bayes, K-

means, LR, PageRank, NWeight

37

Statistical analysis of CoVs The CDFs of coefficient of variations

slide-64
SLIDE 64

Predictability

  • Workloads: Sort, WordCount, Terasort, Bayes, K-

means, LR, PageRank, NWeight

38

Statistical analysis of CoVs The CDFs of coefficient of variations

slide-65
SLIDE 65

Predictability

  • Workloads: Sort, WordCount, Terasort, Bayes, K-

means, LR, PageRank, NWeight

38

Statistical analysis of CoVs The CDFs of coefficient of variations

slide-66
SLIDE 66

Predictability

  • Workloads: Sort, WordCount, Terasort, Bayes, K-

means, LR, PageRank, NWeight

38

Statistical analysis of CoVs The CDFs of coefficient of variations

5.5

slide-67
SLIDE 67

Predictability

  • Workloads: Sort, WordCount, Terasort, Bayes, K-

means, LR, PageRank, NWeight

38

Statistical analysis of CoVs The CDFs of coefficient of variations

5.5

For most recurring workloads, it is accurate enough to use the profiling results from previous runs with the same setting to represent the resource demands

slide-68
SLIDE 68

Resource Utilization

39

Tetris Elasecutor

slide-69
SLIDE 69

Resource Utilization

39

Tetris Elasecutor

Utilize resource efficiently and save cost for

  • perators