ml for resource management
play

ML for Resource Management Arjun Karuvally, Priyanka Mary Mammen - PowerPoint PPT Presentation

ML for Resource Management Arjun Karuvally, Priyanka Mary Mammen Introduction Big data analytics on cloud is very crucial for industry and it is growing rapidly A number of techniques are used for data processing - Map Reduce,


  1. ML for Resource Management Arjun Karuvally, Priyanka Mary Mammen

  2. Introduction Big data analytics on cloud is very crucial for industry and it is growing ● rapidly A number of techniques are used for data processing - Map Reduce, ● SQL-like languages, Deep Learning and in memory analytics A cluster of virtual machines is the execution environment for these type ● of jobs Different analytic jobs have diverse behavior and resource requirements ●

  3. Problem Statement The task of resource management is to find the right cloud configuration ● for an application This configuration includes the number of VMs, number of CPUs, CPU ● speed per core, RAM, disk count, disk speed, network capacity etc Any technique that is used for resource management in cloud need to ● create a performance model This performance model indicates which configuration of the cloud is best ● for the particular job that is being run

  4. Motivation Choosing the right configuration for an application is essential to service ● quality and commercial competitiveness. Lot of jobs are recurring - means that similar workloads are executed ● repeatedly Choosing poorly can result in a slowdown of 2-3x on average and 12x in ● the worst case

  5. Challenges Evaluation of all the possible cloud configuration to find the best is ● prohibitively expensive Each workload has its own prefered choice of cloud configuration - ● difficult to come up with one configuration for all workloads Resource requirements to achieve a certain objective (execution time or ● running cost) for a specific workload are opaque The running time and cost has complex relation to the resources of cloud ● instances

  6. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics

  7. Features Uses Bayesian Optimization to build performance model for various ● applications Models are just accurate enough to find near optimal configuration with ● only a few test runs. Bayesian Optimization enables to obtains minimum number of samples ● to get near optimal configurations with good confidence interval

  8. Problem Formulation For a given application workload the objective is to find optimal or near ● optimal cloud configuration that satisfies a performance requirement The problem is formulated mathematically as ● The cloud configuration is represented by x, C represents the cost ● P is the price per unit time for VMs using x, T is the running time function ●

  9. Problem Formulation The unknown required to compute the cost is the function T for different ● configurations x Since this is expensive bayesian optimization is used to directly search for ● an approximate solution of the equation with significantly smaller cost

  10. Bayesian Optimization Bayesian Optimization is used to solve optimization problems like the ● previous equation where the objective function C is unknown but can be observed using experiments Cost (C) can be modeled as a stochastic process (eg. Gaussian), ● confidence interval can be computed using one or more samples from C Observational noise can be incorporated in the computation of ● confidence interval of the objective function By integrating this CherryPick has the ability to learn the objective ● function quickly and only take samples in the areas that most likely contain the minimum point.

  11. Working of BO

  12. Prior and Acquisition function Prior is given assuming a Gaussian Process ● Acquisition function is given using Expected Improvement ● 𝜚 and 𝛠 are the standard normal cumulative function and standard normal probability density function respectively

  13. Design options and decisions Prior function - Gaussian Process is chosen as the prior function ● C is described using a mean function and a kernel covariance function ● Matern with parameter 5/2 is chosen as the covariance function between ● inputs because it does not require strong smoothness Acquisition function - Expected Improvement is chosen as the ● acquisition function

  14. Design options and decisions Stopping condition - When the EI is less than a threshold(10%) and at ● least N cloud configurations have been observed Starting points - Quasi random sequence to generate the starting points ● Encoding cloud configuration - x is a vector of number of VMs. number ● of cores, CPU speed per core, average RAM per core, disk count, disk speed and network capacity of the VM Normalization and discretization of most of the features ●

  15. Handling uncertainties in clouds The resources of clouds are shared by multiple users so different ● workloads may interfere with one another Failures and resource overloading can impact the completion time of a ● job.

  16. Implementation

  17. Experimental Setup Benchmark applications on Spark and Hadoop to exercise different ● CPU/Disk/RAM/Network resources TPS-DS - a recent benchmark for big data systems that models a decision ● support workload. TPC-H - another SQL benchmark that contains a number of ad-hoc ● decision support queries that process large amounts of data Terasort - common benchmarking application for big data analytics ● SparkReg - Machine learning workloads on top of Spark ● SparkKm - A clustering machine learning working ●

  18. Experimental Setup Cloud configurations - Four families in Amazon EC2: M4 (general purpose), ● C4 (compute optimized), R3 (memory optimized), I2 (disk optimized) EI = 10%, N=6 and 3 initial samples. EI is chosen such that it gives a good ● tradeoff between search cost and accuracy Baselines - Exhaustive Search, Coordinate descent ● Metrics - running cost, search cost ●

  19. Results CherryPick finds the optimal configuration with low search time ●

  20. Results It reaches better configurations with more stability compared to random ● search on similar budget

  21. Results CherryPick comes up with similar running costs with a linear predictor ● based model but with lower search cost and time

  22. Results CherryPick can tune EI to trade-off between search cost and accuracy ●

  23. Results Effectiveness of CherryPick ● Scaling with workload size Navigation of search space Estimation of running time vs cluster size

  24. Discussion Reliance on good representative workloads ● Larger search space - Complexity depends only on number of samples ● and not the number of candidates. Choice of prior - Choice of Gaussian as prior the assumption is that the ● final function is a sample from a Gaussian distribution.

  25. Shortcomings of CherryPick Model Accuracy - Tries to accurately model the performance metric which ● requires more data Cold Start - Bayesian Approximation requires initial data to build the ● performance space. Fragility - Overly sensitive to initial parameters - initial points, kernel ● function, process

  26. Scout: An Experienced Guide to Find the Best Cloud Configuration

  27. Exploration and Exploitation Any search based method has two aspects - exploration and exploitation ● Exploration - Gather new information about the search space by ● executing a new cloud configuration Exploitation - Choose the most promising configuration based on ● information enclosed Additional exploration incurs high cost and exploitation without ● exploration leads to suboptimal solutions - exploration exploitation dilemma.

  28. Features Search process efficiency - Performance and workload characterization ● derived from historical data of previous workloads/ Search process effectiveness - Uses comprehensive performance data for ● prediction, uses low level performance information. Search process reliability - Using different sets for unevaluated and ● evaluated configurations, historical data to create a model for current workload.

  29. Methodology Low level information is incorporated into the feature vector of the ● configuration The set of all possible configurations are taken and split into unevaluated ● and evaluated To search for the next best configuration given a starting configuration a ● function f(F(S_i), F(S_j), L_i) is learned This function is a classification function that classifies as “better”, “fair” ● and “worse”

  30. Search Strategy Given <F(S_i), L_i> we can obtain the different prediction classes for ● unevaluated configurations The next best configuration is chosen such that the expected ● performance is improved. Due to the use of historical data, the search space is minimized and ● exploitation is more. Search stops when it can no longer find a better configuration ● Also stops if it fails to find better solutions due to an inaccurate ● performance model

  31. Experimental Setup Workloads: Diverse workloads (CPU intensive, memory heavy, IO-intensive and network intensive) such as PageRank, sorting, recommendation, OLAP etc run on Apache Hadoop and Apache Spark Deployment Choices: Single node as well as multiple node settings Parameters: 1) Labelled classes: “better+”, “better”, “fair”, “worse” and “worse+” 2) Probability thresholds: 0.5 3) Misprediction tolerance: 3 and 4 for single and multiple nodes resp.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend