SLATE A new approach for DevOps in distributed scientific computing - - PowerPoint PPT Presentation

slate
SMART_READER_LITE
LIVE PREVIEW

SLATE A new approach for DevOps in distributed scientific computing - - PowerPoint PPT Presentation

SLATE A new approach for DevOps in distributed scientific computing facilities Rob Gardner University of Chicago Middleware and Grid Interagency Coordination (MAGIC) Meeting October 3, 2018 Outline What is SLATE ? The motivation The


slide-1
SLIDE 1

SLATE

A new approach for DevOps in distributed scientific computing facilities

Middleware and Grid Interagency Coordination (MAGIC) Meeting October 3, 2018

Rob Gardner University of Chicago

slide-2
SLIDE 2

Outline

  • What is SLATE?
  • The motivation
  • The SLATE Vision
  • Current technology explorations
  • Challenges and open questions
  • Wrap up

2

slide-3
SLIDE 3

What is SLATE?

  • NSF DIBBs award, "SLATE and the Mobility of

Capability" (NSF 1724821)

  • Equip the ScienceDMZ with service orchestration

capabilities, federated to create scalable, multi-campus science platforms

  • Platform for service operators & science gateway

developers

3

slide-4
SLIDE 4

Motivation: enabling multi-institution collaborative science

slide-5
SLIDE 5

5

165 scientists, 25 institutions, 11 countries

Collaboration

XENON - Dark Matter Search in Gran Sasso Laboratory, Italy

slide-6
SLIDE 6

6

Global data & processing platform

EU & US processing

EU & US storage

Job management with HTCondor & workflow pipeline tools

Example

slide-7
SLIDE 7

The Open Science Grid

7

  • OSG is the nation's shared HTC cyberinfrastucture
  • Serves over 36 science disciplines
  • Used by single PIs to the largest collaborations
  • Consortium of over 70 HTC sites in US
  • Provides US part of worldwide LHC computing grid
  • Produces >1.5B CPU-hours/y Moves >100s PB/y

Example

slide-8
SLIDE 8

Facilitator for "data lake" R&D

8

  • Allow continuous

development of caching & delivery services

  • Roll out updates centrally
  • Configure & Op centrally

data delivery service edge or network hosted caching servers

Example

slide-9
SLIDE 9

Caching network for IceCube & LIGO

9

containerized by

Example

slide-10
SLIDE 10

Deployment is difficult!

  • A broken DevOps cycle!
  • Deployment means:

○ Finding a friendly sysadmin at the site ○ Having them procure hardware or a virtual machine ○ Sending them the deployment instructions and hoping for the best

  • Operations problems too:

○ Someone has to make sure it actually keeps running ○ Latency in updates across sites make it extremely difficult to rapidly innovate platform services

10

slide-11
SLIDE 11

The SLATE Vision

slide-12
SLIDE 12

12

slide-13
SLIDE 13

13

Global data & processing platform

EU & US processing

EU & US storage

Job management with HTCondor & workflow pipeline tools

XENON COMPUTING

AUTOMATE DEVOPS

slide-14
SLIDE 14

The Open Science Grid

14

  • OSG is the nation's shared HTC cyberinfrastucture
  • Serves over 36 science disciplines
  • Used by single PIs to the largest collaborations
  • Consortium of over 70 HTC sites in US
  • Provides US part of worldwide LHC computing grid
  • Produces >1.5B CPU-hours/y Moves >100s PB/y

AUTOMATE DEVOPS

slide-15
SLIDE 15

Caching network deployed for IceCube & LIGO

15

containerized by

AUTOMATE DEVOPS

slide-16
SLIDE 16

Services Layer At The Edge

  • A ubiquitous underlayment -- the missing shim

○ A generic cyberinfrastructure substrate optimized for hosting edge services ○ Programmable ○ Easy & natural for HPC and IT professionals ○ Tool for creating "hybrid" platforms

  • DevOps friendly

○ For both platform and science gateway developers ○ quick patches, release iterations, fast track new capabilities ○ reduced operations burden for site administrators

16

slide-17
SLIDE 17

SLATE Concepts & Components

  • Containerized services in

managed clusters

  • Widely used open source

technologies for growth and sustainability

  • SLATE additions

○ Curated services ○ Create a “Loose federation”

  • f clusters & platforms

17

http://bit.ly/slate-arch

slide-18
SLIDE 18

18

developers (& admins) cluster admins InCommon

signup/login

slide-19
SLIDE 19

Policy and Trust

  • SLATE applications curated into a trusted

application catalog

  • Applications must define and request all needed

network, disk, device, etc access.

○ Think application permissions on your phone

  • Site policies must be respected

○ Access, privileges, capabilities are controlled and transparent

19

slide-20
SLIDE 20

Deploying an "Application"

20

  • like
slide-21
SLIDE 21

Summary

  • Reduce barriers to supporting collaborative science
  • Give science platform developers a ubiquitous "CI

substrate"

  • Change distributed cyberinfrastructure operational practice

by mobilizing capabilities in the edge

  • Developing the DevOps model, provider concerns and

policies, tooling to give developers consistent environment

  • First k8s-based WAN deployments underay:

○ caching networks for OSG (StashCache) and ATLAS at CERN (XCache)

21

slide-22
SLIDE 22

Thank you! slateci.io

22