Mike Wendt @mike_wendt github.com/nvidia github.com/mike-wendt
BUILDING A GPU-FOCUSED CI SOLUTION Mike Wendt @mike_wendt - - PowerPoint PPT Presentation
BUILDING A GPU-FOCUSED CI SOLUTION Mike Wendt @mike_wendt - - PowerPoint PPT Presentation
BUILDING A GPU-FOCUSED CI SOLUTION Mike Wendt @mike_wendt github.com/nvidia github.com/mike-wendt Need for CPU CI Challenges of GPU CI Methods to Implement GPU CI Improving GPU CI Today AGENDA Demo Lessons Learned Next Steps Getting
2
AGENDA
Need for CPU CI Challenges of GPU CI Methods to Implement GPU CI Improving GPU CI Today Demo Lessons Learned Next Steps Getting Started
3
NEED FOR GPU CI
- The leading open-source software projects from Apache and
- thers rely on CI
- External demand
- Partners are collaborating with us on projects like GPU Open
Analytics Initiative (GoAi) and need GPU CI to ensure stable builds
- Internal demand
- Large code-bases internally for all kinds of GPU-accelerated
applications require testing across different platforms/hardware
- Performance testing of new drivers and hardware needs repeatable
methods to make sure we continue to deliver performance
The number of GPU-accelerated applications are growing
4
CHALLENGES OF GPU CI
Need GPUs
Cloud or physical
Resource management Expose GPU configuration to developers
Driver, CUDA, GPU type
Many traditional tools like Travis CI, Circle CI, and others do not support GPUs
For good reasons, dangers of misuse
For tools that offer support, many times it is not native
Still feels “hacky,” but it gets the job done
GPUs bring a different set of problems than traditional CI
5
METHODS TO IMPLEMENT GPU CI
6
BARE-METAL + GPU
Benefits
Reduces complexity with minimal setup Works well for a small set of projects that use the same/similar dependencies
Challenges
Managing dependencies can be tricky for multiple projects Limits ability to test multiple platforms, limited to installed CUDA/OS Resource management is difficult
Fastest to get started with the most limitations
7
BARE-METAL + GPU
Fastest to get started with the most limitations
Server GPUs CI Environment Source Code Tests Test Results
8
DOCKER + NVIDIA CONTAINER RUNTIME
Docker runtime that allows for GPU pass- thru on Linux systems Works with Debian/Ubuntu, RHEL/CentOS, and Amazon Linux Allows for testing multiple CUDA/OS environments on one machine Includes options to set supported driver
- perations and restrict GPU visibility
github.com/nvidia/nvidia-docker
9
DOCKER + GPU
Benefits
Ability to test multiple CUDA/OS combinations Handles dependency management for all projects Enables fine-grained resource management Supports scale needed for larger projects and teams
Challenges
Typically requires pre-built Docker images with environments for testing and code to test injected into container for testing Configuration tends to be a lot of environment variables and cumbersome to manage GitLab CI and Jenkins require “runners” for multiple nodes
Easier to use with some hacking still required
10
DOCKER + GPU
Easier to use with some hacking still required
Server GPUs CI Environment Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Custom Config
11
DOCKER + GPU
Easier to use with some hacking still required
Server GPUs CI Environment Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Custom Config
12
DOCKER + GPU
Easier to use with some hacking still required
Server GPUs CI Environment Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Custom Config
13
KUBERNETES + DOCKER + GPU
Benefits
GPU support in v1.8+ of Kubernetes Takes care of the “runner” challenge with GitLab/Jenkins Resource management and scheduling is handled by Kubernetes
Challenges
Can only target GPUs on homogeneous nodes (heterogeneous support coming) Not all tools support GPU CI out of the box Docker containers required for testing, but this can be the previous step in a pipeline
Promises to be the easiest to use with minimal hacking
14
KUBERNETES + DOCKER + GPU
Promises to be the easiest to use with minimal hacking
Kubernetes Master Kubernetes Master Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Server CI Environment Kubernetes Worker GPUs Docker Container Repo
Docker Test Container
Scheduler … Custom Config
15
KUBERNETES + DOCKER + GPU
Promises to be the easiest to use with minimal hacking
Kubernetes Master Kubernetes Master Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Server CI Environment Kubernetes Worker GPUs Docker Container Repo
Docker Test Container
Scheduler … Custom Config
16
KUBERNETES + DOCKER + GPU
Promises to be the easiest to use with minimal hacking
Kubernetes Master Kubernetes Master Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Server CI Environment Kubernetes Worker GPUs Docker Container Repo
Docker Test Container
Scheduler … Custom Config
17
KUBERNETES + DOCKER + GPU
Promises to be the easiest to use with minimal hacking
Kubernetes Master Kubernetes Master Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container Test Results Server CI Environment Kubernetes Worker GPUs Docker Container Repo
Docker Test Container
Scheduler … Custom Config
18
HOW CAN WE MAKE THIS BETTER TODAY?
19
JENKINS PLUGIN FOR NVIDIA + DOCKER
Simplifies the configuration of Docker containers for GPU CI testing Allows for targeting a Dockerfile within the repo to build and use for testing or a Docker image in a remote hub Supports side-containers with GPU support Easy to use and adapt a project for GPU CI
Based on Jenkins docker-slaves plugin
20
DEMO
21
JENKINS PLUGIN FOR NVIDIA + DOCKER
Simplifying the configuration for GPU CI
Server GPUs Jenkins CI Environment Docker Container Docker + NVIDIA Runtime Source Code Tests Dockerfile or Container + Plugin Config Test Results
22
LESSONS LEARNED
- CI best practices apply to GPU code as well
- Pull request testing is one of the best methods to ensure code quality
- GitLab CI works great if there are only a few GPU-enabled repos to test
- For scale-out, GitLab on Kubernetes is best
- Larger organizations and projects need a centralized CI platform like Jenkins
- Setup of a new repo is easy and with parameterized builds we can make use of
existing pipelines
- Advanced uses of Jenkins
- Tagging is key to test on multiple GPU architectures and pipelines for multiple CUDA
version testing
23
NEXT STEPS
- Continue plugin development and release as an open source project
- Internal
- Continue deployment of GPU CI and migrate performance testing toward full GPU CI
- Leverage capabilities of Jenkins to go beyond CI with CD and workflow automation
- External
- Expand GPU CI testing by testing pull requests of open source projects using Jenkins
and the plugin
- Take advantage of the GPU targeting within Kubernetes and new GPU features in
the coming months
- Look at ways to more closely integrate GPU CI with GitLab CI and Jenkins plugins for
Kubernetes
24
GETTING STARTED
github.com/nvidia NVIDIA Docker Runtime
nvidia-docker
NVIDIA Kubernetes Device Plugin
k8s-device-plugin
github.com/mike-wendt Jenkins Plugin For NVIDIA
Coming soon
Docker + NVIDIA Runtime on Ubuntu
nvidia-docker-ubuntu
Links to useful repos
Mike Wendt @mike_wendt github.com/nvidia github.com/mike-wendt