Multi-Cloud Federated Kubernetes at CERN Clenimar Filemon @clenimar - PowerPoint PPT Presentation

Multi-Cloud Federated Kubernetes at CERN Clenimar Filemon @clenimar clenimar@lsd.ufcg.edu.br Ricardo Rocha @ahcorporto ricardo.rocha@cern.ch

Fundamental Science Founded in 1954 What is 96% of the universe made of? What was the state of matter just after the Big Bang? Why isn’t there anti-matter in the universe?

Huge Data Still Big Collisions L1 ~40 MHz Trigger ~ 1PB/sec Still Big Hardware Filter HL Trigger ~ 100 kHz Software Filter Raw Data ~ 1 kHz ~ 1-10 GB/s

Distributed Computing T2 T1 ... CERN ... ... ... ... ... ... ... Reconstruction Calibration 200+ Sites ~400 000 Jobs Simulation 700 000 Cores ~30 GiB/s Analysis

Motivation for Federation Periodic Load Spikes International Conferences, Reconstruction Campaigns Simplification Monitoring, Lifecycle, Alarms Deployment Uniform API, Replication, Load Balancing

OpenStack Magnum An OpenStack API Service that allows creation of container clusters ● Use your keystone credentials ● You choose your cluster type ● Multi-Tenancy ● Quickly create new clusters with advanced features such as multi-master

OpenStack Magnum Single command cluster creation $ openstack coe cluster create --cluster-template kubernetes --node-count 100 … mycluster $ openstack cluster list +------+----------------+------------+--------------+-----------------+ | uuid | name | node_count | master_count | status | +------+----------------+------------+--------------+-----------------+ | .... | mycluster | 100 | 1 | CREATE_COMPLETE | +------+----------------+------------+--------------+-----------------+ $ $(magnum cluster-config mycluster --dir mycluster) $ kubectl get pod $ openstack coe cluster update mycluster replace node_count=200

Kubernetes

Kubernetes Multiple type os Resources apiVersion: batch/v1 kind: Job ● Pod, Service, Deployment, DaemonSet, Job, ... metadata: name: pi-with-timeout spec: ● Requests and Limits backoffLimit: 5 activeDeadlineSeconds: 100 template: spec: ● Retrial Policies containers: - name: myjob image: python ● Taints and Tolerations command: ["/myjob.py"] resources: limits: cpu: "1" ● And much more... restartPolicy: Never

Use Case CERN Large Scale Batch Systems - HTCONDOR 14

Sched Collector StartD AcctGroup = "ATLAS" CERNEnvironment = “production” Negotiator JobPrio = 0 Datacenter = “meyrin” RequestCpus = 2 HasMPI = true RequestMemory = 4260 TotalCpus = 8 ... TotalMemory = 22500 ... Matchmaking with ClassAds Extensive Experience in HEP Fair Share Running Virtualized Preemption External Storage and Networking

Host kubefed init cern-condor --host-cluster-context=condor-host … Sched Collector openstack coe federation create --host-cluster condor-host cern-condor Negotiator

StartD StartD ... ... StartD ... Host kubefed join --host-cluster-context … --cluster-context … atlas-recast-y Sched Collector openstack coe federation join cern-condor atlas-recast-x atlas-recast-y Negotiator

apiVersion: apps/v1 kind: DaemonSet metadata: name: {{ template "condor-startd.fullname" . }} ... spec: spec: hostNetwork: true containers: - name: {{ .Chart.Name }} image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" securityContext: privileged: true livenessProbe: exec: command: - condor_who Host StartD StartD StartD ... ... ... Sched Collector Negotiator https://gitlab.cern.ch/helm/charts/tree/master/condor-startd

Storage ● Building on well established deployments ● Software distribution handle by CVMFS (hierarchical squid caches) ● Access to physics data done directly S0 Host CVMFS CVMFS CVMFS Sched Collector StartD StartD StartD ... ... ... Negotiator

https://specs.openstack.org/openstack/magnum-specs/specs/queens/federation-api.html →Rocky 1. An existing Magnum cluster in an OpenStack environment is to be extended using external resources. An external cluster endpoint (deployed in AWS, Azure, GKE, another OpenStack or cloud) can be added to an existing Magnum federated cluster, including the complex setup and management of cluster credentials. 2. A project has several existing clusters which it would like to expose to a set of users in a single endpoint, without disrupting existing users of each cluster. 3. A set of Magnum clusters is created, each with different characteristics: node flavor, storage setup, etc. Federating them together forms a heterogeneous cluster. API and Persistence Layer already merged, kubernetes support ongoing 21

Kubernetes SIG Multi-Cluster ● Home of the Federation work ● Currently working on Federation v2, Cluster Registry, Multi Cluster Ingress REGISTRY OVERRIDES PLACEMENT TEMPLATE https://github.com/kubernetes/community/tree/master/sig-multicluster 22

Demo Reusable Analysis Workflows - RECAST https://github.com/recast-hep https://github.com/diana-hep/yadage https://github.com/reanahub 23

Summary • Federation support in Kubernetes is ready • Ongoing development for the v2 API, with significant changes • OpenStack Magnum support coming in Rocky • Already in use at CERN • Started with a legacy application, limited integration • Expanded to a cloud native implementation, with great results • Great support from OpenStack and Kubernetes communities

Questions? Clenimar Filemon clenimar@lsd.ufcg.edu.br @clenimar Ricardo Rocha ricardo.rocha@cern.ch @ahcorporto 25

Multi-Cloud Federated Kubernetes at CERN Clenimar Filemon @clenimar - PowerPoint PPT Presentation

Multi-Cloud Federated Kubernetes at CERN Clenimar Filemon @clenimar clenimar@lsd.ufcg.edu.br Ricardo Rocha @ahcorporto ricardo.rocha@cern.ch Fundamental Science Founded in 1954 What is 96% of the universe made of? What was the state of matter

Kubernetes on ARM64 Kubernetes on ARM64 Raspberry PI 4 Kubernetes cloud for a Raspberry PI 4

Airflow on Kubernetes: Containerizing your Workflows By Michael Hewitt Agenda Kubernetes

Overview of the SPS LLRF upgrade Gregoire Hagmann (CERN) Mattia Rizzi (CERN) Philippe

Towards ds a self elf auto tomated CE CERN Clo Cloud Jos Castro Len CERN Cloud

Matthias Sohn Adel Zaalouk SAP From Containers to Kubernetes From Containers to Kubernetes

Docker in the EGI Docker in the EGI Federated Cloud Federated Cloud Carlos Gimeno

From Laptop to the World With Kubernetes @saturnism @googlecloud #kubernetes Ray Tsang

Contributing to kubernetes Who am I? Senior Software Engineer at Gojek Organizer at Kubernetes

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

Federated Learning Min Du Postdoc, UC Berkeley Outline q Preliminary: deep learning and SGD q

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Continuous Kubernetes Security @sublimino and @controlplaneio Im: - Andy - Dev-like -

Kubernetes Matthias Haeussler Mirna Alaisami Overview Overview Kubernetes is an open-source

Accelera'ng records management at CERN Andrew Short andrew.short@cern.ch CERN Accelerator

Marek Domaracky CERN IT Vidyo@CERN CERN WebRTC Future 3 VIDYO@CERN: SCALE AND

Benchmarking topics at Benchmarking topics at CERN CERN Helge Meinhard / CERN- -IT IT Helge

Networking Named Content Van Jacobson, Diana K. Smetters, James D. Thornton, Michael Plass, Nick

D E M A N D D R I V E N A R C H I T E C T U R E K O VA S B O G U TA & D A V I D N O L E

Photonic Many-Core Architecture Study Nadya Bliss 1 , Krste Asanovi 2 , Keren Bergman 3 , Luca

An Evolutionary Exascale Programming Model Deserves Revolutionary Support Barbara Chapman

Real-time Network Measurements Ran Ben Basat, Technion Joint work with Gil Einziger, Erez

In fi nite Parallel Universes: State at the Edge Peter Bourgon Fastly In fi nite Parallel

Is 2.44 trillion unknowns the largest finite element system that can be solved today? U. Rde

Project PIZZARO - Image Restoration Module - Report I Michal Sorel, Filip Sroubek,

Multi-Cloud Federated Kubernetes at CERN Clenimar Filemon @clenimar - PowerPoint PPT Presentation

Multi-Cloud Federated Kubernetes at CERN Clenimar Filemon @clenimar clenimar@lsd.ufcg.edu.br Ricardo Rocha @ahcorporto ricardo.rocha@cern.ch Fundamental Science Founded in 1954 What is 96% of the universe made of? What was the state of matter

Kubernetes on ARM64 Kubernetes on ARM64 Raspberry PI 4 Kubernetes cloud for a Raspberry PI 4

Airflow on Kubernetes: Containerizing your Workflows By Michael Hewitt Agenda Kubernetes

Overview of the SPS LLRF upgrade Gregoire Hagmann (CERN) Mattia Rizzi (CERN) Philippe

Towards ds a self elf auto tomated CE CERN Clo Cloud Jos Castro Len CERN Cloud

Matthias Sohn Adel Zaalouk SAP From Containers to Kubernetes From Containers to Kubernetes

Docker in the EGI Docker in the EGI Federated Cloud Federated Cloud Carlos Gimeno

From Laptop to the World With Kubernetes @saturnism @googlecloud #kubernetes Ray Tsang

Contributing to kubernetes Who am I? Senior Software Engineer at Gojek Organizer at Kubernetes

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

Federated Learning Min Du Postdoc, UC Berkeley Outline q Preliminary: deep learning and SGD q

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Continuous Kubernetes Security @sublimino and @controlplaneio Im: - Andy - Dev-like -

Kubernetes Matthias Haeussler Mirna Alaisami Overview Overview Kubernetes is an open-source

Accelera'ng records management at CERN Andrew Short andrew.short@cern.ch CERN Accelerator

Marek Domaracky CERN IT Vidyo@CERN CERN WebRTC Future 3 VIDYO@CERN: SCALE AND

Benchmarking topics at Benchmarking topics at CERN CERN Helge Meinhard / CERN- -IT IT Helge

Networking Named Content Van Jacobson, Diana K. Smetters, James D. Thornton, Michael Plass, Nick

D E M A N D D R I V E N A R C H I T E C T U R E K O VA S B O G U TA &amp; D A V I D N O L E

Photonic Many-Core Architecture Study Nadya Bliss 1 , Krste Asanovi 2 , Keren Bergman 3 , Luca

An Evolutionary Exascale Programming Model Deserves Revolutionary Support Barbara Chapman

Real-time Network Measurements Ran Ben Basat, Technion Joint work with Gil Einziger, Erez

In fi nite Parallel Universes: State at the Edge Peter Bourgon Fastly In fi nite Parallel

Is 2.44 trillion unknowns the largest finite element system that can be solved today? U. Rde

Project PIZZARO - Image Restoration Module - Report I Michal Sorel, Filip Sroubek,

D E M A N D D R I V E N A R C H I T E C T U R E K O VA S B O G U TA & D A V I D N O L E