LESSONS LEARNED FROM THE CHAMELEON TESTBED Kate Keahey University - - PowerPoint PPT Presentation

lessons learned from the chameleon testbed
SMART_READER_LITE
LIVE PREVIEW

LESSONS LEARNED FROM THE CHAMELEON TESTBED Kate Keahey University - - PowerPoint PPT Presentation

www. chameleoncloud.org LESSONS LEARNED FROM THE CHAMELEON TESTBED Kate Keahey University of Chicago, Argonne National Laboratory Jason Anderson (UC), Zhuo Zhen (UC), Pierre Riteau (StackHPC), Paul Ruth (RENCI), Dan Stanzione (TACC), Mert


slide-1
SLIDE 1
  • www. chameleoncloud.org

Kate Keahey University of Chicago, Argonne National Laboratory Jason Anderson (UC), Zhuo Zhen (UC), Pierre Riteau (StackHPC), Paul Ruth (RENCI), Dan Stanzione (TACC), Mert Cevik (RENCI), Jacob Colleran (UC), Haryadi Gunawi (UC), Cody Hammock (TACC), Joe Mambretti (Northwestern), Alexander Barnes (TACC), François Halbach (TACC), Alex Rocha (TACC), Joe Stubbs (TACC)

LESSONS LEARNED FROM THE CHAMELEON TESTBED

slide-2
SLIDE 2
  • www. chameleoncloud.org

CHAMELEON IN A NUTSHELL

„ We like to change: a testbed that adapts itself to your experimental needs

„ Deep reconfigurability (bare metal) and isola7on „ power on/off, reboot, custom kernel, serial console access, etc.

„ Balance: large-scale versus diverse hardware

„ Large-scale: ~large homogenous par77on (~15,000 cores), ~6 PB of storage distributed over 2

sites (UC, TACC) connected with 100G network

„ Diverse: ARMs, Atoms, FPGAs, GPUs, Corsa switches, etc.

„ Cloud++: leveraging mainstream cloud technologies

„ Powered by OpenStack with bare metal reconfigura7on (Ironic) + “special sauce” „ Blazar contribu7on recognized as official OpenStack component

„ We live to serve: open, produc@on testbed for Computer Science Research

„ Started in 10/2014, available since 07/2015, renewed in 10/2017, working on renewal now! „ Currently 4,000+ users, 600+ projects, 100+ ins7tu7ons

slide-3
SLIDE 3
  • www. chameleoncloud.org

systems experiments you can run experimenters Traditional HPC resources Virtual cloud resources Custom testbed Chameleon

THE MOST EXPERIMENTS FOR THE MOST USERS

Hardware Expressiveness Configurability and isolation Cost (per user/exp) and isolation Usability (user tools) Familiarity

sharing ecosystem

Expressing experiments (cost per exp) Publication and discovery (cost of sharing)

slide-4
SLIDE 4
  • www. chameleoncloud.org

EXPERIMENTS: HARDWARE

„ Largest lease: 120 „ 67% single node, 5% exceed 10 nodes (11% on Haswell)

slide-5
SLIDE 5
  • www. chameleoncloud.org

EXPERIMENTS: ALLOCATABLE RESOURCES

„ Allocatable: managed in @me (advance reserva@ons, extensions) and space „ Advance reserva@ons are cri@cal to provide access to resources in demand „ Extensions: 5.4% usage across leases

Also see: “Managing Allocatable Resources” , CLOUD’19

slide-6
SLIDE 6
  • www. chameleoncloud.org

EXPERIMENTS: EXPRESSIVENESS

„ Resources can be specified at different levels

„ Model/constraint-based: none (9.5%), single (89.24%), mul7ple (1.26%) „ Hardware type (single constraint): 90.18% „ Node UID (single constraint): 3.38% (18.45% for leases made 7 days in advance)

„ Separa@on of alloca@on and configura@on

„ 20.07% alloca7ons had more than 1 instance deployed (max of 12)

„ Network s@tching (ExoGENI): 22 (8%) projects created 920 s@tched links „ Bring Your Own Controller (BYOC): 11 (4%) projects „ Orchestra@on (Heat): 94 (2017), 155 (2018), and 405 (2019) deployments „ Automated deployment: surprisingly liYle use

slide-7
SLIDE 7
  • www. chameleoncloud.org

EXPERIMENTERS: COST

„ Support cost

„ Average of 13 help desk 2ckets per week, less than one 2cket per user „ Heavily leveraging smoke tests, live monitoring, and automated remedia2on

„ Working with mainstream open source project (OpenStack)

„ Familiar interfaces: 858 deployments across 441 organiza2ons in 63 countries „ Transferable skills „ Working with large community (~8,400 total contributors, ~6,000 reviewing code) „ New features: whole disk image boot, support for non x86, mul2-tenant networking „ Access to exis2ng documenta2on and support systems „ Opportunity to contribute (though at a cost): Blazar as OpenStack component

Chameleon expresses capabili1es needed for CS research in terms of a mainstream cloud func1onality (CHI-in-a-Box)

slide-8
SLIDE 8
  • www. chameleoncloud.org

EXPERIMENTERS: ACTIVE USERS

slide-9
SLIDE 9
  • www. chameleoncloud.org

EXPERIMENTERS: ACTIVE LEASES

slide-10
SLIDE 10
  • www. chameleoncloud.org

EXPERIMENTERS: COMMUNITY

„ Ins$tu$ons: 168 (11 MSI, 19 EPSCOR) „ Geography (US): 40 states + Puerto Rico „ Funding source: NSF (also DOE, DARPA, others) „ Research versus educa$on

„ Educa7on: 45/513 projects use ~9% of total 7me „ Research: similar average usage

„ Publica$ons: 275/75 overall /journal „ Field of science

„ 12% (non CS), 10% (security), 17% (ML), 8% (Edge)

„ Renewals: ~75% of eligible projects sought

renewal, 33 renewed > 5 $mes

slide-11
SLIDE 11
  • www. chameleoncloud.org

SHARING EXPERIMENTS

„ Testbeds/clouds lead to the crea@on of compa@ble digital ar@facts that

package an experiment

„ In Chameleon: ~120,000 images and ~31,000 orchestra2on templates

„ Elements of reproducibility support in Chameleon

„ Testbed versioning „ Image versioning „ Orchestra2on „ Experiment Précis (Linux history analogue)

„ How do we @e them all together?

slide-12
SLIDE 12
  • www. chameleoncloud.org

SHARING EXPERIMENTS: PACKAGING

„ Repeatability by default: Jupyter notebooks + Chameleon experimental containers

„ JupyterLab for our users: use jupyter.chameleoncloud.org with Chameleon creden<als „ Interface to the testbed in Python/bash + examples (see LCN’18: hGps://vimeo.com/297210055) „ Named containers: your experimental process goes here

Experimental storytelling: ideas/text, process/code, results Complex Experimental containers Jupyter Notebooks

+

Also see: “A Case for Integra@ng Experimental Containers with Notebooks”, CloudCom 2019

slide-13
SLIDE 13
  • www. chameleoncloud.org

SHARING EXPERIMENTS: PUBLICATION

„ Digital publishing with Zenodo: make your experimental ar$facts citable via

Digital Object Iden$fiers (DOIs)

„ Integra$on with Zenodo

„ Export: make your research citable and discoverable „ Import: access a wealth of digital research ar7facts already published

„ Towards making research findable: the digital sharing pla_orm

Familiar research sharing ecosystem Digital research sharing ecosystem

?

slide-14
SLIDE 14
  • www. chameleoncloud.org

PARTING THOUGHTS

„ Chameleon expresses capabili@es needed for CS research in terms of a

mainstream cloud func@onality -- OpenStack

„ Our paper discusses the extensions and augmenta2ons to support our use case „ Prac2cal delivery: CHI-in-a-Box – packaging of the CHameleon Infrastructure

„ Experimental testbeds: opportunity for sharing

„ The most experiments for the most experimenters „ Opportunity for the support of efficient sharing of experiments

„ Chasing the research fron@er: the func@onality of any scien@fic

instrument has to follow the emergent opportuni@es in the science they serve – development-driven opera@ons

slide-15
SLIDE 15
  • www. chameleoncloud.org

We’re here to change

www.chameleoncloud.org

keahey@anl.gov