BUILDING A GPU-FOCUSED CI SOLUTION Mike Wendt @mike_wendt - - PowerPoint PPT Presentation

building a gpu focused ci solution
SMART_READER_LITE
LIVE PREVIEW

BUILDING A GPU-FOCUSED CI SOLUTION Mike Wendt @mike_wendt - - PowerPoint PPT Presentation

BUILDING A GPU-FOCUSED CI SOLUTION Mike Wendt @mike_wendt github.com/nvidia github.com/mike-wendt Need for CPU CI Challenges of GPU CI Methods to Implement GPU CI Improving GPU CI Today AGENDA Demo Lessons Learned Next Steps Getting


slide-1
SLIDE 1

Mike Wendt @mike_wendt github.com/nvidia github.com/mike-wendt

BUILDING A GPU-FOCUSED CI SOLUTION

slide-2
SLIDE 2

2

AGENDA

Need for CPU CI Challenges of GPU CI Methods to Implement GPU CI Improving GPU CI Today Demo Lessons Learned Next Steps Getting Started

slide-3
SLIDE 3

3

NEED FOR GPU CI

  • The leading open-source software projects from Apache and
  • thers rely on CI
  • External demand
  • Partners are collaborating with us on projects like GPU Open

Analytics Initiative (GoAi) and need GPU CI to ensure stable builds

  • Internal demand
  • Large code-bases internally for all kinds of GPU-accelerated

applications require testing across different platforms/hardware

  • Performance testing of new drivers and hardware needs repeatable

methods to make sure we continue to deliver performance

The number of GPU-accelerated applications are growing

slide-4
SLIDE 4

4

CHALLENGES OF GPU CI

Need GPUs

Cloud or physical

Resource management Expose GPU configuration to developers

Driver, CUDA, GPU type

Many traditional tools like Travis CI, Circle CI, and others do not support GPUs

For good reasons, dangers of misuse

For tools that offer support, many times it is not native

Still feels “hacky,” but it gets the job done

GPUs bring a different set of problems than traditional CI

slide-5
SLIDE 5

5

METHODS TO IMPLEMENT GPU CI

slide-6
SLIDE 6

6

BARE-METAL + GPU

Benefits

Reduces complexity with minimal setup Works well for a small set of projects that use the same/similar dependencies

Challenges

Managing dependencies can be tricky for multiple projects Limits ability to test multiple platforms, limited to installed CUDA/OS Resource management is difficult

Fastest to get started with the most limitations

slide-7
SLIDE 7

7

BARE-METAL + GPU

Fastest to get started with the most limitations

Server GPUs CI Environment Source Code Tests Test Results

slide-8
SLIDE 8

8

DOCKER + NVIDIA CONTAINER RUNTIME

Docker runtime that allows for GPU pass- thru on Linux systems Works with Debian/Ubuntu, RHEL/CentOS, and Amazon Linux Allows for testing multiple CUDA/OS environments on one machine Includes options to set supported driver

  • perations and restrict GPU visibility

github.com/nvidia/nvidia-docker

slide-9
SLIDE 9

9

DOCKER + GPU

Benefits

Ability to test multiple CUDA/OS combinations Handles dependency management for all projects Enables fine-grained resource management Supports scale needed for larger projects and teams

Challenges

Typically requires pre-built Docker images with environments for testing and code to test injected into container for testing Configuration tends to be a lot of environment variables and cumbersome to manage GitLab CI and Jenkins require “runners” for multiple nodes

Easier to use with some hacking still required

slide-10
SLIDE 10

10

DOCKER + GPU

Easier to use with some hacking still required

Server GPUs CI Environment Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Custom Config

slide-11
SLIDE 11

11

DOCKER + GPU

Easier to use with some hacking still required

Server GPUs CI Environment Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Custom Config

slide-12
SLIDE 12

12

DOCKER + GPU

Easier to use with some hacking still required

Server GPUs CI Environment Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Custom Config

slide-13
SLIDE 13

13

KUBERNETES + DOCKER + GPU

Benefits

GPU support in v1.8+ of Kubernetes Takes care of the “runner” challenge with GitLab/Jenkins Resource management and scheduling is handled by Kubernetes

Challenges

Can only target GPUs on homogeneous nodes (heterogeneous support coming) Not all tools support GPU CI out of the box Docker containers required for testing, but this can be the previous step in a pipeline

Promises to be the easiest to use with minimal hacking

slide-14
SLIDE 14

14

KUBERNETES + DOCKER + GPU

Promises to be the easiest to use with minimal hacking

Kubernetes Master Kubernetes Master Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Server CI Environment Kubernetes Worker GPUs Docker Container Repo

Docker Test Container

Scheduler … Custom Config

slide-15
SLIDE 15

15

KUBERNETES + DOCKER + GPU

Promises to be the easiest to use with minimal hacking

Kubernetes Master Kubernetes Master Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Server CI Environment Kubernetes Worker GPUs Docker Container Repo

Docker Test Container

Scheduler … Custom Config

slide-16
SLIDE 16

16

KUBERNETES + DOCKER + GPU

Promises to be the easiest to use with minimal hacking

Kubernetes Master Kubernetes Master Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Server CI Environment Kubernetes Worker GPUs Docker Container Repo

Docker Test Container

Scheduler … Custom Config

slide-17
SLIDE 17

17

KUBERNETES + DOCKER + GPU

Promises to be the easiest to use with minimal hacking

Kubernetes Master Kubernetes Master Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Server CI Environment Kubernetes Worker GPUs Docker Container Repo

Docker Test Container

Scheduler … Custom Config

slide-18
SLIDE 18

18

HOW CAN WE MAKE THIS BETTER TODAY?

slide-19
SLIDE 19

19

JENKINS PLUGIN FOR NVIDIA + DOCKER

Simplifies the configuration of Docker containers for GPU CI testing Allows for targeting a Dockerfile within the repo to build and use for testing or a Docker image in a remote hub Supports side-containers with GPU support Easy to use and adapt a project for GPU CI

Based on Jenkins docker-slaves plugin

slide-20
SLIDE 20

20

DEMO

slide-21
SLIDE 21

21

JENKINS PLUGIN FOR NVIDIA + DOCKER

Simplifying the configuration for GPU CI

Server GPUs Jenkins CI Environment Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container + Plugin Config Test Results

slide-22
SLIDE 22

22

LESSONS LEARNED

  • CI best practices apply to GPU code as well
  • Pull request testing is one of the best methods to ensure code quality
  • GitLab CI works great if there are only a few GPU-enabled repos to test
  • For scale-out, GitLab on Kubernetes is best
  • Larger organizations and projects need a centralized CI platform like Jenkins
  • Setup of a new repo is easy and with parameterized builds we can make use of

existing pipelines

  • Advanced uses of Jenkins
  • Tagging is key to test on multiple GPU architectures and pipelines for multiple CUDA

version testing

slide-23
SLIDE 23

23

NEXT STEPS

  • Continue plugin development and release as an open source project
  • Internal
  • Continue deployment of GPU CI and migrate performance testing toward full GPU CI
  • Leverage capabilities of Jenkins to go beyond CI with CD and workflow automation
  • External
  • Expand GPU CI testing by testing pull requests of open source projects using Jenkins

and the plugin

  • Take advantage of the GPU targeting within Kubernetes and new GPU features in

the coming months

  • Look at ways to more closely integrate GPU CI with GitLab CI and Jenkins plugins for

Kubernetes

slide-24
SLIDE 24

24

GETTING STARTED

github.com/nvidia NVIDIA Docker Runtime

nvidia-docker

NVIDIA Kubernetes Device Plugin

k8s-device-plugin

github.com/mike-wendt Jenkins Plugin For NVIDIA

Coming soon

Docker + NVIDIA Runtime on Ubuntu

nvidia-docker-ubuntu

Links to useful repos

slide-25
SLIDE 25

Mike Wendt @mike_wendt github.com/nvidia github.com/mike-wendt

THANK YOU