ML for Resource Management Arjun Karuvally, Priyanka Mary Mammen - - PowerPoint PPT Presentation

ml for resource management
SMART_READER_LITE
LIVE PREVIEW

ML for Resource Management Arjun Karuvally, Priyanka Mary Mammen - - PowerPoint PPT Presentation

ML for Resource Management Arjun Karuvally, Priyanka Mary Mammen Introduction Big data analytics on cloud is very crucial for industry and it is growing rapidly A number of techniques are used for data processing - Map Reduce,


slide-1
SLIDE 1

ML for Resource Management

Arjun Karuvally, Priyanka Mary Mammen

slide-2
SLIDE 2

Introduction

  • Big data analytics on cloud is very crucial for industry and it is growing

rapidly

  • A number of techniques are used for data processing - Map Reduce,

SQL-like languages, Deep Learning and in memory analytics

  • A cluster of virtual machines is the execution environment for these type
  • f jobs
  • Different analytic jobs have diverse behavior and resource requirements
slide-3
SLIDE 3

Problem Statement

  • The task of resource management is to find the right cloud configuration

for an application

  • This configuration includes the number of VMs, number of CPUs, CPU

speed per core, RAM, disk count, disk speed, network capacity etc

  • Any technique that is used for resource management in cloud need to

create a performance model

  • This performance model indicates which configuration of the cloud is best

for the particular job that is being run

slide-4
SLIDE 4

Motivation

  • Choosing the right configuration for an application is essential to service

quality and commercial competitiveness.

  • Lot of jobs are recurring - means that similar workloads are executed

repeatedly

  • Choosing poorly can result in a slowdown of 2-3x on average and 12x in

the worst case

slide-5
SLIDE 5

Challenges

  • Evaluation of all the possible cloud configuration to find the best is

prohibitively expensive

  • Each workload has its own prefered choice of cloud configuration -

difficult to come up with one configuration for all workloads

  • Resource requirements to achieve a certain objective (execution time or

running cost) for a specific workload are opaque

  • The running time and cost has complex relation to the resources of cloud

instances

slide-6
SLIDE 6

CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics

slide-7
SLIDE 7

Features

  • Uses Bayesian Optimization to build performance model for various

applications

  • Models are just accurate enough to find near optimal configuration with
  • nly a few test runs.
  • Bayesian Optimization enables to obtains minimum number of samples

to get near optimal configurations with good confidence interval

slide-8
SLIDE 8

Problem Formulation

  • For a given application workload the objective is to find optimal or near
  • ptimal cloud configuration that satisfies a performance requirement
  • The problem is formulated mathematically as
  • The cloud configuration is represented by x, C represents the cost
  • P is the price per unit time for VMs using x, T is the running time function
slide-9
SLIDE 9

Problem Formulation

  • The unknown required to compute the cost is the function T for different

configurations x

  • Since this is expensive bayesian optimization is used to directly search for

an approximate solution of the equation with significantly smaller cost

slide-10
SLIDE 10

Bayesian Optimization

  • Bayesian Optimization is used to solve optimization problems like the

previous equation where the objective function C is unknown but can be

  • bserved using experiments
  • Cost (C) can be modeled as a stochastic process (eg. Gaussian),

confidence interval can be computed using one or more samples from C

  • Observational noise can be incorporated in the computation of

confidence interval of the objective function

  • By integrating this CherryPick has the ability to learn the objective

function quickly and only take samples in the areas that most likely contain the minimum point.

slide-11
SLIDE 11

Working of BO

slide-12
SLIDE 12

Prior and Acquisition function

  • Prior is given assuming a Gaussian Process
  • Acquisition function is given using Expected Improvement

𝜚 and 𝛠 are the standard normal cumulative function and standard normal probability density function respectively

slide-13
SLIDE 13

Design options and decisions

  • Prior function - Gaussian Process is chosen as the prior function
  • C is described using a mean function and a kernel covariance function
  • Matern with parameter 5/2 is chosen as the covariance function between

inputs because it does not require strong smoothness

  • Acquisition function - Expected Improvement is chosen as the

acquisition function

slide-14
SLIDE 14

Design options and decisions

  • Stopping condition - When the EI is less than a threshold(10%) and at

least N cloud configurations have been observed

  • Starting points - Quasi random sequence to generate the starting points
  • Encoding cloud configuration - x is a vector of number of VMs. number
  • f cores, CPU speed per core, average RAM per core, disk count, disk

speed and network capacity of the VM

  • Normalization and discretization of most of the features
slide-15
SLIDE 15

Handling uncertainties in clouds

  • The resources of clouds are shared by multiple users so different

workloads may interfere with one another

  • Failures and resource overloading can impact the completion time of a

job.

slide-16
SLIDE 16

Implementation

slide-17
SLIDE 17

Experimental Setup

  • Benchmark applications on Spark and Hadoop to exercise different

CPU/Disk/RAM/Network resources

  • TPS-DS - a recent benchmark for big data systems that models a decision

support workload.

  • TPC-H - another SQL benchmark that contains a number of ad-hoc

decision support queries that process large amounts of data

  • Terasort - common benchmarking application for big data analytics
  • SparkReg - Machine learning workloads on top of Spark
  • SparkKm - A clustering machine learning working
slide-18
SLIDE 18

Experimental Setup

  • Cloud configurations - Four families in Amazon EC2: M4 (general purpose),

C4 (compute optimized), R3 (memory optimized), I2 (disk optimized)

  • EI = 10%, N=6 and 3 initial samples. EI is chosen such that it gives a good

tradeoff between search cost and accuracy

  • Baselines - Exhaustive Search, Coordinate descent
  • Metrics - running cost, search cost
slide-19
SLIDE 19

Results

  • CherryPick finds the optimal configuration with low search time
slide-20
SLIDE 20

Results

  • It reaches better configurations with more stability compared to random

search on similar budget

slide-21
SLIDE 21

Results

  • CherryPick comes up with similar running costs with a linear predictor

based model but with lower search cost and time

slide-22
SLIDE 22

Results

  • CherryPick can tune EI to trade-off between search cost and accuracy
slide-23
SLIDE 23

Results

  • Effectiveness of CherryPick

Navigation of search space Estimation of running time vs cluster size Scaling with workload size

slide-24
SLIDE 24

Discussion

  • Reliance on good representative workloads
  • Larger search space - Complexity depends only on number of samples

and not the number of candidates.

  • Choice of prior - Choice of Gaussian as prior the assumption is that the

final function is a sample from a Gaussian distribution.

slide-25
SLIDE 25

Shortcomings of CherryPick

  • Model Accuracy - Tries to accurately model the performance metric which

requires more data

  • Cold Start - Bayesian Approximation requires initial data to build the

performance space.

  • Fragility - Overly sensitive to initial parameters - initial points, kernel

function, process

slide-26
SLIDE 26

Scout: An Experienced Guide to Find the Best Cloud Configuration

slide-27
SLIDE 27

Exploration and Exploitation

  • Any search based method has two aspects - exploration and exploitation
  • Exploration - Gather new information about the search space by

executing a new cloud configuration

  • Exploitation - Choose the most promising configuration based on

information enclosed

  • Additional exploration incurs high cost and exploitation without

exploration leads to suboptimal solutions - exploration exploitation dilemma.

slide-28
SLIDE 28

Features

  • Search process efficiency - Performance and workload characterization

derived from historical data of previous workloads/

  • Search process effectiveness - Uses comprehensive performance data for

prediction, uses low level performance information.

  • Search process reliability - Using different sets for unevaluated and

evaluated configurations, historical data to create a model for current workload.

slide-29
SLIDE 29

Methodology

  • Low level information is incorporated into the feature vector of the

configuration

  • The set of all possible configurations are taken and split into unevaluated

and evaluated

  • To search for the next best configuration given a starting configuration a

function f(F(S_i), F(S_j), L_i) is learned

  • This function is a classification function that classifies as “better”, “fair”

and “worse”

slide-30
SLIDE 30

Search Strategy

  • Given <F(S_i), L_i> we can obtain the different prediction classes for

unevaluated configurations

  • The next best configuration is chosen such that the expected

performance is improved.

  • Due to the use of historical data, the search space is minimized and

exploitation is more.

  • Search stops when it can no longer find a better configuration
  • Also stops if it fails to find better solutions due to an inaccurate

performance model

slide-31
SLIDE 31

Experimental Setup

Workloads: Diverse workloads (CPU intensive, memory heavy, IO-intensive and network intensive) such as PageRank, sorting, recommendation, OLAP etc run on Apache Hadoop and Apache Spark Deployment Choices: Single node as well as multiple node settings Parameters: 1) Labelled classes: “better+”, “better”, “fair”, “worse” and “worse+” 2) Probability thresholds: 0.5 3) Misprediction tolerance: 3 and 4 for single and multiple nodes resp.

slide-32
SLIDE 32

Evaluation: Baselines

Random Search - Uniformly samples the configuration space. Naive baseline

  • method. Random-4, -6, -8 represent random sample of 4,6 and 8 resp.

Coordination Descent - This method searches one dimension (CPU type, memory size etc) at a time. It determines the best choice of dimension and continues to choose the best from other dimensions. CherryPick

slide-33
SLIDE 33

Evaluation: Metrics

1) Normalized Performance - Execution time or deployment cost 2) Search Cost - Number of cloud configurations measured to find the right configuration. 3) Reliability across the workloads

slide-34
SLIDE 34

Performance Comparison

slide-35
SLIDE 35

Reliability Comparison

  • Although both CherryPick and SCOUT find the near optimal-solutions in

most of the time, SCOUT is less fragile.

slide-36
SLIDE 36

Why Scout works better?

Scout knows the stopping point of the optimizer, which avoids the unnecessary search. It uses “probability threshold” and “misprediction tolerance” as the stopping criteria. Convergence speed of Scout is better than other solutions. Scout finds a better solution with 25% improvement in accuracy at every iteration.

slide-37
SLIDE 37

Pros and Cons

Pros:

  • No need of accurate performance model.
  • Formulation of search as a classification problem.
  • Incorporation of historical data.

Cons:

  • Collection of low-level information incurs overhead (which must be amortized).
  • Bias of the model due to the incorporation of historical data.
  • Configuration space is not fully explored.
slide-38
SLIDE 38

Micky: A Cheaper Alternative for Selecting Cloud Instances

slide-39
SLIDE 39

Terminologies

Exemplar Configuration: Configuration that is near-optimal or satisfactory in the majority of workloads. Workload: Combination of application and data Performance Metrics: Execution Time, Operational cost

slide-40
SLIDE 40

Need for Collective Optimization

  • Search Performance: Searching for the most effective configuration

using a single optimizer can lead to an increase in cost.

  • Measurement Cost: Total cost of running an optimizer
  • Large Scale Cloud Migration: Elaborate optimizers are expensive and

time-consuming.

  • Limited budgets- Cost of single-optimizers are useful only Recurring

workloads

  • Expanding cloud portfolio: Cloud providers expand their portfolio more

than 20 times a year.

  • Seed cloud optimizers: Exemplar configuration can be used to seed

single-optimizer

slide-41
SLIDE 41

Normalized performance of workloads on VMS

slide-42
SLIDE 42

Problem Formulation

Objective: Find the best configuration (vm* ∈ VM) for multiple workloads (W) with fewer measurements ( |Ew1∪Ew2∪Ew3...∪Ewn|) using a collective

  • ptimization method, while corresponding performance measure ( yw,vm*) is

comparable to the ones in single optimizer. SW- set of cloud configuration options for a workload w (|UW|+|Ew|=|SW|) UW- Unevaluated configuration Ew- Evaluated pool

slide-43
SLIDE 43

Multi-Armed Bandit Problem

Problem: An agent sequentially searches for a slot machine to maximize the total reward collected in the long run. To find the suitable slot machine, agent needs to do exploration and exploitation Exploration: Acquiring information about the arms Exploitation: Using the acquired information to select the arms.

slide-44
SLIDE 44

Applying MAB to VM selection!

Objective: Find the best VM (slot machine) that maximizes the reward for a group of workloads. Arm: Choices of VM (slot machines) Pull: Act of selecting a VM and measuring the performance of the workload on the selected VM. Reward: The difference in performance between the selected and optimal choice Budget: Number of measurements. Minimum budget is |VM| and maximum budget is |VM| * |W|( number of Virtual Machines* number of workloads)

slide-45
SLIDE 45

Slot-selection strategies

  • Epsilon-greedy: Oscillates between exploration and exploitation
  • Thompson sampling : Selects the arms with highest probability being the
  • ptimal choice.
  • Upper Confidence Bound: Chooses the arm that has the highest Upper

Confidence Bound. It tends to use the arms with high expected rewards

  • r high uncertainty.
slide-46
SLIDE 46

Experimental setup

  • Diverse workloads such as data processing, OLAP queries etc on Apache

Hadoop 2.7, Spark 2.1 and Spark 1.5

  • Evaluated on Amazon EC2 (Elastic Cloud Compute)
  • Used 18 different VM types belonging to three instance families:

1) compute optimized instances 2) memory optimized instances and 3) general purpose instances.

slide-47
SLIDE 47

Evaluation

Baselines: 1) Brute Force - measures all possible configurations 2) Cherry Pick - state of the art method 3) Random-4 and Random-8 - randomly measures 4 and 8 configurations Metrics: Normalized performance in terms of execution time and operational cost

slide-48
SLIDE 48

Search Performance

gggg

slide-49
SLIDE 49

Measurement Cost Comparison

slide-50
SLIDE 50

When to not use Micky?

Ans: When a user demands near-optimal solutions for highly recurring loads. Knee point is calculated as: K.f(Δp,Cp)>= g(Δm,Cm)

K is the recurrence of a workload as the knee point, f is the

  • pportunity loss due to inferior search performance, g is the

reduction of measurement cost with collective optimization, Δp is the delta of normalized search performance, Δm is the delta of measurement cost, Cp and Cm are the costs defined by the user

slide-51
SLIDE 51

Performance of MAB algorithms

UCB is found to be more stable

slide-52
SLIDE 52

Alleviation of sub-optimal choices

Combination of Micky and Scout Micky has low measurement cost Scout assures performance guarantee

slide-53
SLIDE 53

Learnings

  • Optimization a batch of workloads is to found to be cheaper than single

workload optimization except in case of recurring loads

  • A slight decrease in performance can result in a large reduction in

measurement cost (using an Exemplar configuration)

  • The loads in which the exemplar configuration performs poorly can be

mitigated by the use of a single - optimizer (like Scout)

slide-54
SLIDE 54

Determining factors for a Cloud Optimizer

Performance delta (Δ) : Represents the search performance Low-level Metrics (LLM): Runtime information such as CPU utilization, memory usage and I/O rates. Historical data: Execution records of workloads on cloud configurations. Budget: Measurement cost a user willing to pay for an optimizer

slide-55
SLIDE 55

How to select a Cloud Optmizer?

slide-56
SLIDE 56

Questions?