Auto-scaling deadline- constrained workloads in containers in - - PowerPoint PPT Presentation

auto scaling deadline constrained workloads
SMART_READER_LITE
LIVE PREVIEW

Auto-scaling deadline- constrained workloads in containers in - - PowerPoint PPT Presentation

Auto-scaling deadline- constrained workloads in containers in the cloud Jay Jay DesLauriers DesLauriers Research Associate, University of Westminster Project COLA Horizon 2020 33 months Completion September 2019 14


slide-1
SLIDE 1

Auto-scaling deadline- constrained workloads

in containers in the cloud

Jay Jay DesLauriers DesLauriers Research Associate, University of Westminster

slide-2
SLIDE 2

Project COLA

June 5th 2019 www.project-cola.eu 2

  • Horizon 2020
  • 33 months
  • Completion September 2019
  • 14 Partners in 6 Countries
  • 10 SME/Public Sector
  • 4 HE/Research Institutions

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 731574

slide-3
SLIDE 3

Head in the clouds

June 5th 2019 www.project-cola.eu 3

On-Premise

Capital Expense High Upfront Cost High Maintenance Cost

Off-Premise

Pay-as-you-go No Upfront Cost No Maintenance Cost

slide-4
SLIDE 4

A match made in ...

June 5th 2019 www.project-cola.eu 4

Containers

Operating-system virtualisation and application packaging for reusable, portable software

slide-5
SLIDE 5

The Problem

June 5th 2019 www.project-cola.eu 5

Application 1 Application 2 Application N Service 1 Service 2 Service 3 Service 4 Service 5 Baseline resource consumption Variable resource consumption Cloud services Dynamic demand Manually adjusted supply Resource requirements To be replaced by automatically adjusted supply

Some requirements:

  • Dynamic Supply

(auto-scaling)

  • Vendor–free
  • Modular
  • Flexible
  • Secure
slide-6
SLIDE 6

Finding a solution...

June 5th 2019 www.project-cola.eu 6

slide-7
SLIDE 7

The Solution

June 5th 2019 www.project-cola.eu 7

cAdvisor Orchestrate VMs Monitor VMs & containers Translates ADT Enforces scaling Orchestrate containers

Occopus

Prometheus Submitter Policy Keeper Kubernetes TOSCA Application Description Template (ADT) Describes application, infrastructure, scaling policies, security policies Export VM/ container metrics Container Runtime

MiCADO MASTER NODE MiCADO WORKER NODE

ML based

  • ptimisation

Optimiser Node Exporter Docker

slide-8
SLIDE 8

Scaling Use-Case No.1

June 5th 2019 www.project-cola.eu 8

  • Resource intensive services
  • Typically CPU/memory –bound apps/services
  • Containers & underlying VMs scale to meet demand
slide-9
SLIDE 9

Scaling Use-Case No.2 ... ?

June 5th 2019 www.project-cola.eu 9

  • Multi-job experiments
  • Typically batch/parameter sweep jobs
  • Containers/VMs scale to complete jobs by deadline
  • Where do we put the jobs?
  • How do we execute them

(in containers!)

MICADO MASTER ADT: infrastructure and scaling rules MiCADO Submitter Policy Keeper (Scaling logic)

cqueue worker

MICADO WORKER

cqueue worker Jobs

Scale up/ down

R R jQueuer Agent

Jobs

Container and Cloud Orchestrator MICADO

<insert queue here>

slide-10
SLIDE 10

JQueuer

June 5th 2019 www.project-cola.eu 10

  • Asynchronous Distributed Task Queue
  • Master Component
  • Runs externally
  • Queue & monitoring
  • Agent Component
  • Runs on worker VMs
  • Fetch & execute jobs

MICADO MASTER ADT: infrastructure and scaling rules

End user

MiCADO Submitter Policy Keeper (Scaling logic)

cqueue worker

MICADO WORKER

cqueue worker Jobs

Scale up/ down

jQUEUER MASTER jQueuer Agent

Jobs

experiment .json Container and Cloud Orchestrator MICADO

slide-11
SLIDE 11

JQueuer Metrics

June 5th 2019 www.project-cola.eu 11

Metrics exported to MiCADO for scaling:

Queue length Jobs completed Jobs failed Jobs running Jobs remaining Time elapsed Average job length Time to deadline

slide-12
SLIDE 12

The experiment

June 5th 2019 www.project-cola.eu 12

Determining the impact of changes in behavior on the spread of a disease across a population

slide-13
SLIDE 13

Experiment design

June 5th 2019 www.project-cola.eu 13

  • Agent-based simulation
  • Repast Simphony
  • Three agents
  • Infected
  • Susceptible
  • Recovered
  • Simulate movement & interaction
  • f agents in an environment
slide-14
SLIDE 14

Manual job allocation (baseline)

June 5th 2019 www.project-cola.eu 14

200 jobs

VM 3

Repast 3

1-hour

to complete all jobs VM 4

Repast 4

VM 1

Repast 1

VM 2

Repast 2

VM 5

Repast 5

equal distribution 40x jobs per VM

slide-15
SLIDE 15

Automatic job allocation (MiCADO)

June 5th 2019 www.project-cola.eu 15

experiment.json

JQueuer Manager

MiCADO Worker 1

JQueuer Agent

Repast 1

MiCADO Master

200 jobs

MiCADO Worker 2

JQueuer Agent

Repast 2

MiCADO Worker n

JQueuer Agent

Repast n 1-hour

deadline

slide-16
SLIDE 16

Results

June 5th 2019 www.project-cola.eu 16

  • Dynamic allocation of variable length jobs

results in a better use of cloud resources

Manually Allocated Allocated by MiCADO Manually Allocated Allocated by MiCADO

Manual allocation (baseline) Dynamic allocation (MiCADO)

5 VMs 3.86 VMs

slide-17
SLIDE 17

Thanks!

June 5th 2019 www.project-cola.eu 17

  • github.com/micado-scale/ansible-micado
  • project-cola.eu/
  • T. Kiss, J. DesLauriers, G. Gesmier et al.,

A cloud-agnostic queuing system to support the implementation of deadline-based application execution policies, Future Generation Computer Systems (2019), https://doi.org/10.1016/j.future.2019.05.062

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 731574 Project Director: Dr. Tamas Kiss, University of Westminster, UK