Resource Management Marco Serafini COMPSCI 532 Lecture 17 What - PowerPoint PPT Presentation

Resource Management Marco Serafini COMPSCI 532 Lecture 17

What Are the Functions of an OS? • Virtualization • CPU scheduling • Memory management (e.g. virtual memory) •… • Concurrency • E.g. allocate Processes, Threads • Persistence • Access to I/O 2 2

The Era of Clusters • “The cluster as a computer” • Q: Is there an OS for “the cluster” • Q: What should it do? 3

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica University of California, Berkeley Abstract Two common solutions for sharing a cluster today are either to statically partition the cluster and run one frame- We present Mesos, a platform for sharing commod- work per partition, or to allocate a set of VMs to each ity clusters between multiple diverse cluster computing framework. Unfortunately, these solutions achieve nei- frameworks, such as Hadoop and MPI. Sharing improves ther high utilization nor efficient data sharing. The main cluster utilization and avoids per-framework data repli- problem is the mismatch between the allocation granular- cation. Mesos shares resources in a fine-grained man- ities of these solutions and of existing frameworks. Many ner, allowing frameworks to achieve data locality by frameworks, such as Hadoop and Dryad, employ a fine- taking turns reading data stored on each machine. To grained resource sharing model, where nodes are subdi- support the sophisticated schedulers of today’s frame- vided into “slots” and jobs are composed of short tasks works, Mesos introduces a distributed two-level schedul- that are matched to slots [25, 38]. The short duration of ing mechanism called resource offers. Mesos decides tasks and the ability to run multiple tasks per node allow how many resources to offer each framework, while jobs to achieve high data locality, as each job will quickly frameworks decide which resources to accept and which get a chance to run on nodes storing its input data. Short computations to run on them. Our results show that tasks also allow frameworks to achieve high utilization, Mesos can achieve near-optimal data locality when shar- as jobs can rapidly scale when new nodes become avail- ing the cluster among diverse frameworks, can scale to able. Unfortunately, because these frameworks are de- 50,000 (emulated) nodes, and is resilient to failures. veloped independently, there is no way to perform fine- 1 Introduction grained sharing across frameworks, making it difficult to share clusters and data efficiently between them. Clusters of commodity servers have become a major computing platform, powering both large Internet ser- In this paper, we propose Mesos, a thin resource shar- vices and a growing number of data-intensive scientific ing layer that enables fine-grained sharing across diverse

Why Resource Management? • Many data analytics frameworks • No one-size-fits-all solution • Need to run multiple frameworks on same cluster • Desired: fine-grained sharing across frameworks 5 5

Even with Only One Framework • Production clusters • Run business-critical applications • Strict performance and reliability requirements • Experimental clusters • R&D teams trying to extract new intelligence from data • New versions of framework • Rolled out in beta 6 6

Challenges • Each framework has different scheduling needs • Programming model, communication, dependencies • High scalability • Scale to up to 10 4 s nodes running 100s jobs and Ms of tasks • Fault tolerance 7 7

Mesos Approach • No one-size-fits-all framework, can we find a one- size-fits-all scheduler? • Excessive complexity, unclear semantics • New frameworks appear all the time • Mesos: separation of concerns • Resource scheduling à Mesos • Framework scheduling à Framework • Q: Examples of these two types of scheduling? 8 8

Mesos Architecture Hadoop MPI ZooKeeper scheduler scheduler quorum Mesos Standby Standby master master master Mesos slave Mesos slave Mesos slave Hadoop MPI Hadoop MPI executor executor executor executor task task task task task task Figure 2: Mesos architecture diagram, showing two running frameworks (Hadoop and MPI). 9 9

Component • Resource offer • List of free resources on multiple slaves • Decided based on organizational policies • Framework-specific components • Scheduler • Registers with master and requests resources • Executor 10 10

Resource Offers Framework 1 Framework 2 Job 1 Job 2 Job 1 Job 2 FW Scheduler FW Scheduler <task1, s1, 2cpu, 1gb, … > 3 2 <s1, 4cpu, 4gb, … > <task2, s1, 1cpu, 2gb, … > Mesos Allocation master module <fw1, task1, 2cpu, 1gb, … > <s1, 4cpu, 4gb, … > 1 4 <fw1, task2, 1cpu, 2gb, … > Slave 2 Slave 1 Executor Executor Task Task Task Task Figure 3: Resource offer example. 11 11

Resource Allocation • Rejects: Framework can reject what is offered • Does not specify what is needed • May lead to starvation • Works well in practice • Default strategies • Priorities • Min-Max fairness • Nodes with low demands pick first, • Nodes with unmet demands share what is left • Can revoke (kill) tasks using application-specific policies 12 12

Performance Isolation • Each framework should expect to run in isolation • Uses containers • Equivalent to “lightweight VMs” • Managed on top of the OS (not below, like a VM) • Bundle tools, libraries, configuration files, etc. 13 13

Fault Tolerance • Master process • Soft state • Can be reconstructed from slaves • Hot-standby • Only need leader election: Zookeeper 14 14

Framework Incentives • Short tasks • Easier to find resources, less wasted work with revocations • Scale elastically • Making use of new resources enables starting earlier • Parsimony • Any resource obtained counts towards budget 15 15

Limitations • Fragmentation • Decentralized scheduling worse than centralized bin packing • Fine with large resources and small tasks • Minimum offer size to accommodate large tasks • Framework complexity • Q: Is Mesos a bottleneck? 16 16

Elastic Resource Utilization 17 17

Resource Sharing Across FWs 18 18

CPU Utilization 19 19

Apache Yarn: Yet Another Resource Negotiator 20

Apache YARN • Generalizes the Hadoop MapReduce job scheduler • Allows other services to share • Hadoop Distributed File System (open-source GFS) • Hadoop computing nodes 21 21

Hadoop Evolution Map Other Frameworks Reduce Map Reduce YARN HDFS HDFS Original Hadoop Hadoop 2.0 22 22

Differences with Mesos • YARN is a monolithic scheduler • Receives job requests • Directly places the job (not the framework) • Optimized for scheduling MapReduce jobs • Batch jobs, long running time • Not optimal for • Long-running services • Short-lived queries 23 23

Large-scale cluster management at Google with Borg Abhishek Verma † Luis Pedrosa ‡ Madhukar Korupolu David Oppenheimer Eric Tune John Wilkes Google Inc. Abstract config file command-line Google’s Borg system is a cluster manager that runs hun- web browsers borgcfg web browsers tools dreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to Cell BorgMaster BorgMaster UI shard tens of thousands of machines. BorgMaster UI shard BorgMaster read/UI UI shard BorgMaster UI shard shard It achieves high utilization by combining admission con- persistent store Scheduler scheduler trol, efficient task-packing, over-commitment, and machine (Paxos) link shard link shard link shard sharing with process-level performance isolation. It supports link shard link shard high-availability applications with runtime features that min- imize fault-recovery time, and scheduling policies that reduce the probability of correlated failures. Borg simplifies Borglet Borglet Borglet Borglet life for its users by offering a declarative job specification language, name service integration, real-time job monitor- ing, and tools to analyze and simulate system behavior. We present a summary of the Borg system architecture and features, important design decisions, a quantitative anal- Figure 1: The high-level architecture of Borg. Only a tiny fraction ysis of some of its policy decisions, and a qualitative ex- of the thousands of worker nodes are shown. amination of lessons learned from a decade of operational

Borg: Google’s RM One of Borg’s primary goals is to make efficient use of Google’s fleet of machines, which represents a significant financial investment: increasing utilization by a few percentage points can save millions of dollars. 25

Some Takeaways • Segregating production and non-production work would need 20–30% more machines in the median cell • Production jobs reserve resources to deal with load spikes • They rarely use those resources • Most Borg cells (clusters) shared by 1000s of users 26 26

Sharing is Vital 100 80 Percentage of cells 60 40 20 0 -10 0 10 20 30 40 50 60 Overhead from segregation [%] (b) CDF of additional machines that would be needed if we segregated the workload of 15 representative cells. 27 27

Resource Management Marco Serafini COMPSCI 532 Lecture 17 What - PowerPoint PPT Presentation

Resource Management Marco Serafini COMPSCI 532 Lecture 17 What Are the Functions of an OS? Virtualization CPU scheduling Memory management (e.g. virtual memory) Concurrency E.g. allocate Processes, Threads

Resource Resource Management Management RESOURCE MANAGEMENT RESOURCE MANAGEMENT We have a

SDR CLOUDS SDR CLOUDS RESOURCE MANAGEMENT RESOURCE MANAGEMENT IMPLICATIONS IMPLICATIONS INDEX

New Resource Implementation Shawna Warneke, Resource Management Specialist Christina Weiler,

Chapter 6 Cloud Resource Management and Scheduling Contents Resource management and

HUMAN RESOURCE MANAGEMENT Topic: Strategic Human Resource Management Company: Shan Foods (Pvt)

Resource Management with systemd LinuxCon North America 2013 Lennart Poettering September 2013

Deadlock Example Process 1 Process 2 Resource 1 Resource 2 Example Process 1 Process 2

Placement resource view visualization $ openstack resource provider tree balazs.gibizer@est.tech

and Scheduling Techniques Agenda for Today Resource management encompasses all the

Water Resource Management The Oakdale Irrigation Districts strategic approach to resource

Fisheries Relevant Resources Resource Resource Development Research Habitat External

Hillsdale Historic Resource Survey Historic Maps: 1851 Hillsdale Historic Resource Survey

Resource efficiency targets and indicators Dr. Martin Hirschnitz-Garbers Coordinator Resource

City of Watsonville, Water Resource Center Entrance City of Watsonville, Water Resource Center

The Solar Resource The Solar Resource Overview Overview of the solar resource in the U.S.

Integrated Resource Plan Integrated Resource Plan Rick Haener September 4th, 2015 Integrated

Estimating scene typicality from human ratings and image features

Precedential Patent Law During May 2020 Rick Neifeld NEIFELD IP LAW

Neighborhood semantics for deontic and agency logics Olga Pacheco FAST Group DI/CCTC,

Comunicazione nei Sistemi Distribuiti Parte 2 Corso di Sistemi Distribuiti e Cloud Computing

Welcome! Office Hours will start at 2pm and run until 3pm Please mute your microphone As time

Inferring shared demographic changes from genomic data Jamie R. Oaks Department of Biological

1 Difficult phylogenetic problem Lockhart et al. , Heterotachy and tree From Huson and Bryant,

Presenting Kani ina: The spoken Hawaiian language repository Keiki Kawai ae a* Dannii