JEREMY EDER - RED HAT PERFORMANCE ENGINEERING 1
Enabling GPU-as-a-Service Providers with Red Hat OpenShift - - PowerPoint PPT Presentation
Enabling GPU-as-a-Service Providers with Red Hat OpenShift - - PowerPoint PPT Presentation
Enabling GPU-as-a-Service Providers with Red Hat OpenShift @jeremyeder Senior Principal Software Engineer, Red Hat March, 2018 1 JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Agenda OpenShift Cluster Overview Infrastructure
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Agenda
- OpenShift Cluster Overview
- Infrastructure Abstraction
- High Performance Features
- GPU Overview
2
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Community Powered Innovation
3
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
What does an OpenShift Cluster look like?
SERVICE LAYER ROUTING LAYER PERSISTENT STORAGE REGISTRY RHEL NODE C C RHEL NODE C C RHEL NODE
c
C C RHEL NODE C C RHEL NODE C RHEL NODE C
RED HAT ENTERPRISE LINUX MASTER
API/AUTHENTICATION DATA STORE SCHEDULER HEALTH/SCALING PHYSICAL VIRTUAL PRIVATE PUBLIC HYBRID
4
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Abstract away any infrastructure
SERVICE LAYER ROUTING LAYER PHYSICAL VIRTUAL PRIVATE PUBLIC HYBRID
- Bare Metal
- RHV
- OpenStack
- VMware
- GCE
- Azure
- AWS
- BYO nodes...
5
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
6
One Platform to... OpenShift is the single platform to run any application:
- Old or new
- Monolithic/Microservice
Big Data NFV FSI Animation ISVs HPC Machine Learning
6
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
7
High Performance RFEs by Vertical
Feature FSI NFV ISV BD/ML ANIM HPC NUMA (cpuset.cpus and cpuset.mems) Yes Yes Yes Maybe Maybe Yes Device Passthrough (NIC/Disk/GPU etc...) Yes Yes Yes Maybe Maybe Yes sysctl Support (non-namespaced too) Yes Yes Yes Yes Yes Yes Separation of control- and data-plane Yes Yes Yes Yes Yes Yes Node “fitness” (extended health info) Yes Yes Maybe Maybe Maybe Yes Multi-homed pods Yes Yes Maybe Yes Yes Yes Kernel Modules (DKMS-ish) Yes Yes Maybe Maybe Yes Maybe Hugepages Yes Yes Yes Yes Maybe Maybe
7
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Enable containerization of Infrastructure Software
- Software-defined Storage and Networking
- Packet switching and routing tiers
- Multi-workloads (very different) within a single cluster
○ Layered schedulers (HPC/grid)
- Many more...
Why do this?
8
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
- Gluster/Container Native Storage
- Ceph
- OpenStack
- rad analytics
- KubeVirt
Enable containerization of Red Hat’s products
9
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
- Resource Management Working Group
○ Features Delivered ■ Device Plugins (GPU/Bypass/FPGA) ■ CPU Manager (exclusive cores) ■ Huge Pages Support ○ Extensive Roadmap
- Intel, IBM, Google, NVIDIA, Red Hat, many more...
Upstream First: Kubernetes Working Groups
10
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
- Network Plumbing Working Group
○ Formalized Dec 2017
- Goal is to implement an out of tree, pseudo-standard collection of
CRDs for multiple networks, owned by sig-network, *out of tree*
- Separate control- and data-plane, Overlapping IPs, Fast Data-plane
- IBM, Intel, Red Hat, Huawei, Cisco, Tigera...at least.
Upstream First: Kubernetes Working Groups
11
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
GPU CLUSTER TOPOLOGY
12
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Control Plane Compute Nodes and Storage Tier Infrastructure
master and etcd master and etcd master and etcd registry and router registry and router LB registry and router
OpenShift Cluster Topology
13
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Compute Nodes...
- How to enable software to take advantage of “special”
hardware
- Create Node Pools
○ Mark them as “special” ○ Taints/Tolerations ○ ExtendedResourceTole ration
OpenShift Cluster Topology
14
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Compute Nodes...
- How to enable software to take advantage of “special”
hardware
- Tune/Configure the OS
○ Tuned Profiles ○ CPU Isolation ○ sysctls
OpenShift Cluster Topology
15
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Unsafe
- Experimental Kubelet Flag
- kernel.sem*
- kernel.shm*
- kernel.msg*
- fs.mqueue.*
- net.*
In OpenShift, there are three “types” of sysctls
Safe
- Enabled by default
- kernel.shm_rmid_forced
- net.ipv4.ip_local_port_range
- net.ipv4.tcp_syncookies
Node-level
- Can’t set from a pod
- Potentially affects other
pods
- Many interesting sysctls
- Use TuneD
16
OpenShift Cluster Topology
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Compute Nodes...
- How to enable software to take advantage of “special”
hardware
- Optimize your workload
○ Dedicate CPU cores ○ Consume hugepages
OpenShift Cluster Topology
17
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Compute Nodes...
- How to enable software to take advantage of “special”
hardware
- Enable the Hardware
○ Install drivers ○ Deploy Device Plugin
OpenShift Cluster Topology
18
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Compute Nodes...
- How to enable software to take advantage of “special”
hardware
- Consume the Device
○ KubeFlow Template deployment
OpenShift Cluster Topology
19
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Kubernetes Deployment for STAC-A2
- All-in-One Kubernetes Installation
- (hack/local-up-cluster.sh)
- Node labeled
- Containers:
○ RHEL7+CUDA9 ○ RHEL7+CUDA9+DEVICE-PLUGIN ○ RHEL7+CUDA9+STAC-A2
- CUDA 9
- 8 x NVIDIA Tesla V100 (Volta) GPUs
- HPE Apollo 6500 w/XL270d Gen9
- Red Hat Enterprise Linux 7.4
- Kubernetes 1.8 (setup info)
- nvidia-smi
- -applications-clocks=877,1380
- https://rhelblog.redhat.com/2017/11/21/red-hat-and-partners-deliver-new-perf
- rmance-records-on-prominent-risk-analytics-benchmark/
- https://news.developer.nvidia.com/a-new-stac-a2-record/
20
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
21
Kubernetes Deployment for STAC-A2
Volta GPU Kubelet Device Plugin (daemonset) Kube Scheduler Volta GPU Volta GPU Volta GPU Volta GPU Volta GPU Volta GPU Volta GPU Benchmark (pod) resources: limits: nvidia.com/gpu: 8 kubectl create
21
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Benchmark (pod) resources: limits: nvidia.com/gpu: 8 22
Kubernetes Deployment for STAC-A2
Volta GPU Kubelet Device Plugin (daemonset) Kube Scheduler Volta GPU Volta GPU Volta GPU Volta GPU Volta GPU Volta GPU Volta GPU kubectl create
22
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
- Early KubeFlow involvement
- radanalytics templates for ML-workflow on OpenShift
- Machine-Learning OpenShift Commons
- Demo Repositories
○ https://github.com/zvonkok/nvidia-k8s ○ https://github.com/redhat-performance/openshift-psap
Recent GPU-related work on OpenShift
23
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
THANK YOU
plus.google.com/+RedHat linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHatNews
24
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Commoditizing GPU-as-a-Service Providers with Red Hat OpenShift Tuesday, Mar 27, 1:00 PM - 1:25 PM, Room 210E Red Hat OpenShift Container Platform, with Kubernetes at it's core, can play an important role in building flexible hybrid cloud infrastructure. By abstracting infrastructure away from developers, workloads become portable across any
- cloud. With NVIDIA Volta GPUs now available in every public cloud [1], as well as
from every computer maker, an abstraction library like OpenShift becomes even more valuable. Through demonstrations, this session will introduce you to declarative models for consuming GPUs via OpenShift, as well as the two-level scheduling decisions that provide fast placement and stability.
25