ML for Resource Management Arjun Karuvally, Priyanka Mary Mammen - - PowerPoint PPT Presentation
ML for Resource Management Arjun Karuvally, Priyanka Mary Mammen - - PowerPoint PPT Presentation
ML for Resource Management Arjun Karuvally, Priyanka Mary Mammen Introduction Big data analytics on cloud is very crucial for industry and it is growing rapidly A number of techniques are used for data processing - Map Reduce,
Introduction
- Big data analytics on cloud is very crucial for industry and it is growing
rapidly
- A number of techniques are used for data processing - Map Reduce,
SQL-like languages, Deep Learning and in memory analytics
- A cluster of virtual machines is the execution environment for these type
- f jobs
- Different analytic jobs have diverse behavior and resource requirements
Problem Statement
- The task of resource management is to find the right cloud configuration
for an application
- This configuration includes the number of VMs, number of CPUs, CPU
speed per core, RAM, disk count, disk speed, network capacity etc
- Any technique that is used for resource management in cloud need to
create a performance model
- This performance model indicates which configuration of the cloud is best
for the particular job that is being run
Motivation
- Choosing the right configuration for an application is essential to service
quality and commercial competitiveness.
- Lot of jobs are recurring - means that similar workloads are executed
repeatedly
- Choosing poorly can result in a slowdown of 2-3x on average and 12x in
the worst case
Challenges
- Evaluation of all the possible cloud configuration to find the best is
prohibitively expensive
- Each workload has its own prefered choice of cloud configuration -
difficult to come up with one configuration for all workloads
- Resource requirements to achieve a certain objective (execution time or
running cost) for a specific workload are opaque
- The running time and cost has complex relation to the resources of cloud
instances
CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics
Features
- Uses Bayesian Optimization to build performance model for various
applications
- Models are just accurate enough to find near optimal configuration with
- nly a few test runs.
- Bayesian Optimization enables to obtains minimum number of samples
to get near optimal configurations with good confidence interval
Problem Formulation
- For a given application workload the objective is to find optimal or near
- ptimal cloud configuration that satisfies a performance requirement
- The problem is formulated mathematically as
- The cloud configuration is represented by x, C represents the cost
- P is the price per unit time for VMs using x, T is the running time function
Problem Formulation
- The unknown required to compute the cost is the function T for different
configurations x
- Since this is expensive bayesian optimization is used to directly search for
an approximate solution of the equation with significantly smaller cost
Bayesian Optimization
- Bayesian Optimization is used to solve optimization problems like the
previous equation where the objective function C is unknown but can be
- bserved using experiments
- Cost (C) can be modeled as a stochastic process (eg. Gaussian),
confidence interval can be computed using one or more samples from C
- Observational noise can be incorporated in the computation of
confidence interval of the objective function
- By integrating this CherryPick has the ability to learn the objective
function quickly and only take samples in the areas that most likely contain the minimum point.
Working of BO
Prior and Acquisition function
- Prior is given assuming a Gaussian Process
- Acquisition function is given using Expected Improvement
𝜚 and 𝛠 are the standard normal cumulative function and standard normal probability density function respectively
Design options and decisions
- Prior function - Gaussian Process is chosen as the prior function
- C is described using a mean function and a kernel covariance function
- Matern with parameter 5/2 is chosen as the covariance function between
inputs because it does not require strong smoothness
- Acquisition function - Expected Improvement is chosen as the
acquisition function
Design options and decisions
- Stopping condition - When the EI is less than a threshold(10%) and at
least N cloud configurations have been observed
- Starting points - Quasi random sequence to generate the starting points
- Encoding cloud configuration - x is a vector of number of VMs. number
- f cores, CPU speed per core, average RAM per core, disk count, disk
speed and network capacity of the VM
- Normalization and discretization of most of the features
Handling uncertainties in clouds
- The resources of clouds are shared by multiple users so different
workloads may interfere with one another
- Failures and resource overloading can impact the completion time of a
job.
Implementation
Experimental Setup
- Benchmark applications on Spark and Hadoop to exercise different
CPU/Disk/RAM/Network resources
- TPS-DS - a recent benchmark for big data systems that models a decision
support workload.
- TPC-H - another SQL benchmark that contains a number of ad-hoc
decision support queries that process large amounts of data
- Terasort - common benchmarking application for big data analytics
- SparkReg - Machine learning workloads on top of Spark
- SparkKm - A clustering machine learning working
Experimental Setup
- Cloud configurations - Four families in Amazon EC2: M4 (general purpose),
C4 (compute optimized), R3 (memory optimized), I2 (disk optimized)
- EI = 10%, N=6 and 3 initial samples. EI is chosen such that it gives a good
tradeoff between search cost and accuracy
- Baselines - Exhaustive Search, Coordinate descent
- Metrics - running cost, search cost
Results
- CherryPick finds the optimal configuration with low search time
Results
- It reaches better configurations with more stability compared to random
search on similar budget
Results
- CherryPick comes up with similar running costs with a linear predictor
based model but with lower search cost and time
Results
- CherryPick can tune EI to trade-off between search cost and accuracy
Results
- Effectiveness of CherryPick
Navigation of search space Estimation of running time vs cluster size Scaling with workload size
Discussion
- Reliance on good representative workloads
- Larger search space - Complexity depends only on number of samples
and not the number of candidates.
- Choice of prior - Choice of Gaussian as prior the assumption is that the
final function is a sample from a Gaussian distribution.
Shortcomings of CherryPick
- Model Accuracy - Tries to accurately model the performance metric which
requires more data
- Cold Start - Bayesian Approximation requires initial data to build the
performance space.
- Fragility - Overly sensitive to initial parameters - initial points, kernel
function, process
Scout: An Experienced Guide to Find the Best Cloud Configuration
Exploration and Exploitation
- Any search based method has two aspects - exploration and exploitation
- Exploration - Gather new information about the search space by
executing a new cloud configuration
- Exploitation - Choose the most promising configuration based on
information enclosed
- Additional exploration incurs high cost and exploitation without
exploration leads to suboptimal solutions - exploration exploitation dilemma.
Features
- Search process efficiency - Performance and workload characterization
derived from historical data of previous workloads/
- Search process effectiveness - Uses comprehensive performance data for
prediction, uses low level performance information.
- Search process reliability - Using different sets for unevaluated and
evaluated configurations, historical data to create a model for current workload.
Methodology
- Low level information is incorporated into the feature vector of the
configuration
- The set of all possible configurations are taken and split into unevaluated
and evaluated
- To search for the next best configuration given a starting configuration a
function f(F(S_i), F(S_j), L_i) is learned
- This function is a classification function that classifies as “better”, “fair”
and “worse”
Search Strategy
- Given <F(S_i), L_i> we can obtain the different prediction classes for
unevaluated configurations
- The next best configuration is chosen such that the expected
performance is improved.
- Due to the use of historical data, the search space is minimized and
exploitation is more.
- Search stops when it can no longer find a better configuration
- Also stops if it fails to find better solutions due to an inaccurate
performance model
Experimental Setup
Workloads: Diverse workloads (CPU intensive, memory heavy, IO-intensive and network intensive) such as PageRank, sorting, recommendation, OLAP etc run on Apache Hadoop and Apache Spark Deployment Choices: Single node as well as multiple node settings Parameters: 1) Labelled classes: “better+”, “better”, “fair”, “worse” and “worse+” 2) Probability thresholds: 0.5 3) Misprediction tolerance: 3 and 4 for single and multiple nodes resp.
Evaluation: Baselines
Random Search - Uniformly samples the configuration space. Naive baseline
- method. Random-4, -6, -8 represent random sample of 4,6 and 8 resp.
Coordination Descent - This method searches one dimension (CPU type, memory size etc) at a time. It determines the best choice of dimension and continues to choose the best from other dimensions. CherryPick
Evaluation: Metrics
1) Normalized Performance - Execution time or deployment cost 2) Search Cost - Number of cloud configurations measured to find the right configuration. 3) Reliability across the workloads
Performance Comparison
Reliability Comparison
- Although both CherryPick and SCOUT find the near optimal-solutions in
most of the time, SCOUT is less fragile.
Why Scout works better?
Scout knows the stopping point of the optimizer, which avoids the unnecessary search. It uses “probability threshold” and “misprediction tolerance” as the stopping criteria. Convergence speed of Scout is better than other solutions. Scout finds a better solution with 25% improvement in accuracy at every iteration.
Pros and Cons
Pros:
- No need of accurate performance model.
- Formulation of search as a classification problem.
- Incorporation of historical data.
Cons:
- Collection of low-level information incurs overhead (which must be amortized).
- Bias of the model due to the incorporation of historical data.
- Configuration space is not fully explored.
Micky: A Cheaper Alternative for Selecting Cloud Instances
Terminologies
Exemplar Configuration: Configuration that is near-optimal or satisfactory in the majority of workloads. Workload: Combination of application and data Performance Metrics: Execution Time, Operational cost
Need for Collective Optimization
- Search Performance: Searching for the most effective configuration
using a single optimizer can lead to an increase in cost.
- Measurement Cost: Total cost of running an optimizer
- Large Scale Cloud Migration: Elaborate optimizers are expensive and
time-consuming.
- Limited budgets- Cost of single-optimizers are useful only Recurring
workloads
- Expanding cloud portfolio: Cloud providers expand their portfolio more
than 20 times a year.
- Seed cloud optimizers: Exemplar configuration can be used to seed
single-optimizer
Normalized performance of workloads on VMS
Problem Formulation
Objective: Find the best configuration (vm* ∈ VM) for multiple workloads (W) with fewer measurements ( |Ew1∪Ew2∪Ew3...∪Ewn|) using a collective
- ptimization method, while corresponding performance measure ( yw,vm*) is
comparable to the ones in single optimizer. SW- set of cloud configuration options for a workload w (|UW|+|Ew|=|SW|) UW- Unevaluated configuration Ew- Evaluated pool
Multi-Armed Bandit Problem
Problem: An agent sequentially searches for a slot machine to maximize the total reward collected in the long run. To find the suitable slot machine, agent needs to do exploration and exploitation Exploration: Acquiring information about the arms Exploitation: Using the acquired information to select the arms.
Applying MAB to VM selection!
Objective: Find the best VM (slot machine) that maximizes the reward for a group of workloads. Arm: Choices of VM (slot machines) Pull: Act of selecting a VM and measuring the performance of the workload on the selected VM. Reward: The difference in performance between the selected and optimal choice Budget: Number of measurements. Minimum budget is |VM| and maximum budget is |VM| * |W|( number of Virtual Machines* number of workloads)
Slot-selection strategies
- Epsilon-greedy: Oscillates between exploration and exploitation
- Thompson sampling : Selects the arms with highest probability being the
- ptimal choice.
- Upper Confidence Bound: Chooses the arm that has the highest Upper
Confidence Bound. It tends to use the arms with high expected rewards
- r high uncertainty.
Experimental setup
- Diverse workloads such as data processing, OLAP queries etc on Apache
Hadoop 2.7, Spark 2.1 and Spark 1.5
- Evaluated on Amazon EC2 (Elastic Cloud Compute)
- Used 18 different VM types belonging to three instance families:
1) compute optimized instances 2) memory optimized instances and 3) general purpose instances.
Evaluation
Baselines: 1) Brute Force - measures all possible configurations 2) Cherry Pick - state of the art method 3) Random-4 and Random-8 - randomly measures 4 and 8 configurations Metrics: Normalized performance in terms of execution time and operational cost
Search Performance
gggg
Measurement Cost Comparison
When to not use Micky?
Ans: When a user demands near-optimal solutions for highly recurring loads. Knee point is calculated as: K.f(Δp,Cp)>= g(Δm,Cm)
K is the recurrence of a workload as the knee point, f is the
- pportunity loss due to inferior search performance, g is the
reduction of measurement cost with collective optimization, Δp is the delta of normalized search performance, Δm is the delta of measurement cost, Cp and Cm are the costs defined by the user
Performance of MAB algorithms
UCB is found to be more stable
Alleviation of sub-optimal choices
Combination of Micky and Scout Micky has low measurement cost Scout assures performance guarantee
Learnings
- Optimization a batch of workloads is to found to be cheaper than single
workload optimization except in case of recurring loads
- A slight decrease in performance can result in a large reduction in
measurement cost (using an Exemplar configuration)
- The loads in which the exemplar configuration performs poorly can be