Combining NVIDIA Docker and databases to enhance agile development - PowerPoint PPT Presentation

Combining NVIDIA Docker and databases to enhance agile development and optimize resource allocation Chris Davis , Sophie Voisin , Devin White, Andrew Hardin Scalable and High Performance Geocomputation Team Geographic Information Science and Technology Group Oak Ridge National Laboratory GTC 2017 – May 2017 ORNL is managed by UT-Battelle for the US Department of Energy

Outline • Background • Example HPC Application • Study Results • Lessons Learned / Future Work 2

The Story • We are: – Developing an HPC suite of applications – Spread across multiple R&D teams – In an Agile development process – Delivering to a production environment – Needing to support multiple systems / multiple capabilities – Collecting performance metrics for system optimization 3

Why We Use NVIDIA-Docker Resource Optimization Operating Systems GPU Access Flexibility NVIDIA-Docker Docker Virtual Machine 4

Hardware – Quadro: Compute + Display Card M4000 P6000 Capability 5.2 6.1 Block 32 32 SM 13 30 Cores 1664 3840 Memory 8GB 24GB 5

Hardware – Tesla: Compute Only Card K40 K80 Capability 3.5 3.7 Block 16 16 SM 15 13 Cores 2880 2496 Memory 12GB 12GB 6

Hardware – High End DELL C4130 GPU 4 x K80 RAM 256GB Cores 48 SSD Storage 400GB 7

Constructing Containers PostgreSQL Compile Stats • Build Container: Profile Stats – Based off NVIDIA Images at gitlab.com – https://gitlab.com/nvidia/cuda/tree/centos7 – CentOS 7 Git Repo – CUDA 8.0 / 7.5 HPC Server – cuDNN 5.1 Isilon Local Drive – GCC 4.9.2 Container Container – Cores: 24 Container – Mount local folder with code NVIDIA-Docker CPUs GPUs • Compile against chosen compute capability • Copy product inside container Container • ”docker commit” container updates to new image • “docker save” to Isilon Data 8

Running Containers PostgreSQL Compile Stats Profile Stats • For each compute capability: Isilon – “docker load” from Isilon storage Container HPC Server – Run container & profile script Local Drive Container – Send nvprof results to Profile Stats DB Container – Container/Image removed NVIDIA-Docker CPUs GPUs Container Data 9

PostgreSQL Hooking It All Together Compile Stats Profile Stats Git Repo HPC Server HPC Server Isilon Local Drive Local Drive Container Container Container Container NVIDIA-Docker NVIDIA-Docker CPUs GPUs CPUs GPUs • One server generates containers Container HPC Server • All servers pull containers from Isilon Local Drive Data • Data to be processed pulled from Isilon Container • Container build stats stored in Compiler DB NVIDIA-Docker CPUs GPUs • Container execution stats stored in Profiler DB 10

Profiling Combinations P6000 • nvprof 6.1 CPU – Output Parsed – Sent to Profile DB 6.0 3.0 • Containers for: D4 D1 – Cuda Version – Each Capability M4000 D3 D2 – All Capabilities 5.2 3.5 – CPU only CUDA 7.5 • Data sets: 4 CUDA 8.0 5.0 3.7 • Total of 104 profiles K40 All Capabilities K80 11

Database Hostname • Postgres Databases Compute CUDA Num CPU – Shared Fields Capability Version Threads – Compile DB Compile GPU Kernel / Time Device API Call – Run Time DB Execution Dataset Time – NVPROF DB Num CPU Step Time Timestamp Threads Step Time Timestamp Dataset Max Time Percent Num Calls Num CPU Ave Time Threads Timestamp Min Time Step Name 12

Example HPC Application • Geospatial metadata generator – Leverages Open Source 3rdparty libraries • OpenCV, Caffe, GDAL, … – Computer Vision Algorithms – GPU Enabled • SURF, ORB, NCC, NMI… – Automated matching against control data – Calculates geospatial metadata for input imagery Satellites Manned Aircraft Unmanned Aerial Systems 14

Example HPC Application - GTC16 • Two-step Image Re-alignment Application using NMI Normalized Mutual Information !"# = & ' + & ) Pipeline & * Preprocessing Input Image Source Selection Core Libraries: • NITRO • GDAL Global Localization • Proj.4 • libpq (Postgres) Histograms Control Source • OpenCV • CUDA Registration • OpenMP Joint Resection CPU GPU Output Image Metadata 15

Example HPC Application - GTC16 • Global Localization Control 382x100 Pipeline Tactical 258x67 Solutions 4250 Preprocessing Input Image Source Selection Core Libraries: • NITRO • Objective • GDAL Global Localization • Proj.4 • libpq (Postgres) – Re-align the source image with the control image. • OpenCV • CUDA Registration • OpenMP • Method In-house Implementation – Roughly match source and control images. Resection CPU GPU – Coarse resolution Output Image Metadata – Mask for non-valid data – Exhaustive search 16

Example HPC Application - GTC16 • Global Localization 17

Example HPC Application - GTC16 • Similarity Metric – Normalized Mutual Information !"# = & ' + & ) & * Source image and mask: N S xM S pixels 3 & = − , - . /01 2 - . 456 & is the entropy - . is the probability density function H ∈ J 0. . 255 for S and C 0. . 65535 for J – Histogram with masked area Control image and mask: N C xM C pixels • Missing data • Artifact • Homogeneous area Solution space: nxm NMI coefficients 18

Example HPC Application - GTC16 Kernel specifications 100% occupancy Summary threads / block 128 • Global Localization as coarse re-alignment stack frame 264192 33.81 MB total memory / block – Problematic: joint histogram computation for each solution 541.06 MB total memory / SM • No compromise on the number of bins - 65536 7.03 GB total memory / GPU • Exhaustive search memory % 61.06% – Solution: leverage of the K80 specifications 0 – 0 spill stores – spill loads • 12 GB of memory 27 registers • 1 thread per solution 0 smem / block • Less than 25 seconds - 61K solutions 0 smem / SM for a 131K pixel image smem % 0.00% 448 – 20 cmem[0] – cmem[2] - 1 solution / thread 19

Example HPC Application - GTC16 • Registration Control 382x100 Tactical 258x67 Pipeline Preprocessing Input Image Source Selection Core Libraries: • NITRO • GDAL Global Localization • Proj.4 • libpq (Postgres) • OpenCV • CUDA Registration • OpenMP Resection CPU GPU Output Image Metadata 20

Example HPC Application - GTC16 • Registration Control 382x100 Tactical 258x67 Tactical & Control 4571x1555 Pipeline Preprocessing Input Image Source Selection Core Libraries: • NITRO • GDAL Global Localization • Proj.4 • libpq (Postgres) • Objective • OpenCV • CUDA Registration Refine the localization – • OpenMP • Method Resection CPU GPU Use higher resolution ~400 times – Keypoint matching – Output Image Metadata 21

Example HPC Application - GTC16 • Registration Workflow Search windows: 73x73 pixels Control Image Search Windows Descriptor metric Keypoint list detect from Tiepoint list Source Image detect Keypoint list describe Descriptor Descriptors: 11x11 intensity values 22

Application • Similarity Metric – Normalized Mutual Information … !"# = & ' + & ) & * Descriptors: 11x11 intensity values 3 & = − , - . /01 2 - . … 456 & is the entropy - . is the probability density function H ∈ J 0. . 255 for S and C 0. . 65535 for J Search windows: 73x73 pixels – Small “images” but numerous Keypoints … • Numerous keypoints – up to 65536 with GPU SURF detector • Image / Descriptor size – 11 x 11 intensity values to describe • Search area Solution spaces: 63x63 NMI coefficients – 73 x 73 control sub-image • Solution space – 63 x 63 = 3969 / keypoint 23

Example HPC Application - GTC16 Kernel Find the best match for all keypoints 1 block per keypoint Summary Optimize for the 63 x 63 search windows 64 threads / blocks – 1 idle each threads compute a “row” of solutions • Registration refine the re-alignment Sparse joint histogram 65536 bins but only 121 values – Problematic: joint histogram computation for each solution Leverage the 11 x 11 descriptor size Create 2 lists (length 121) of intensity values • No compromise on the number of bins - 65536 Update joint histogram count from lists Loop over lists to retrieve aggregate count • Exhaustive search Set aggregate count to 0 after first retrieval – Solution: leverage of the K80 specifications • 12 GB of memory • 1 block per solution List of indices for source • Leverage the number of values of the descriptors = 121 (maximum) << 65536 List of indices for the corresponding subset control • Less than 100 seconds - 65K keypoints Joint histogram 260M NMI coefficients • About 10K keypoints in less than 20 seconds 24

Compile Time Results Compute Capability Specifications 2500 1000 900 2000 800 size of binary files in MB 700 time in seconds 1500 600 500 1000 400 300 500 200 100 0 0 OFF 30 35 37 50 52 60 61 30 - 52 30 - 61 CUDA 7.5 CUDA 8.0 CUDA7.5 CUDA 8.0 26

Combining NVIDIA Docker and databases to enhance agile development - PowerPoint PPT Presentation

Combining NVIDIA Docker and databases to enhance agile development and optimize resource allocation Chris Davis , Sophie Voisin , Devin White, Andrew Hardin Scalable and High Performance Geocomputation Team Geographic Information Science and

docker service is the new docker run Getting Started with Docker Clustering Mike Goelzer /

Setup docker rm $(docker ps -aq) docker network rm my_net Demo - Install and activate yum -y

Docker Provider The Docker provider is used to interact with Docker containers and images. It uses

Docker Review Basic Commands docker image ls # list images currently present locally docker

Orchestration in Docker Swarm mode, Docker services and declarative application deployment Mike

Going D/S/K Prod Like A Pro BRET FISHER Docker Captain, DevOps Dude, Creator of Docker Mastery

agile CMMI CMMI agile agile Process Innovation at the Speed Speed of Life of Life Process

Enhance OpenSSH for Fun and Security Enhance OpenSSH for Fun and Security Enhance OpenSSH for Fun

Docker Orchestration: Beyond the Basics Aaron Lehmann Software Engineer, Docker About me

Docker meets Python A look on the Docker SDK for Python pip install docker Jan Wagner

INTRODUCTION TO DOCKER ADRIAN MOUAT SO WHAT IS DOCKER? SIMILAR TO A LIGHTWEIGHT VM Both

The AGILE Data Center and the First AGILE Catalog Carlotta Pittori, on behalf of the AGILE

FOR THE BEST VDI USER EXPERIENCE NVIDIA VIRTUAL GPU PRODUCT POSITIONING NVIDIA GRID NVIDIA

NVIDIA NSIGHT ECLIPSE EDITION CHRISTOPH ANGERER, NVIDIA JULIEN DEMOUTH, NVIDIA WHAT YOU WILL

Corin Lucey Agile Lead Scaling Agile at HomeNet Who is HomeNet? Our Agile Landscape

Agile Unified Process (UP): Agile Process Overview Introduction to an OOA/D Agile Unified

Joint Distribution of Eigenvalues of Linear Stochastic Systems S Adhikari Department of

High-Fidelity Coupling of Predictive Plasma-Wall Models Goal: Develop a predictive model of the

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department

Session 3 Observation and Assessment SECTION 1: 1 Identifying Adults Needs Assessment can be

Total Power Map to Visibilitjes (TP2VIS) Joint-Deconvolutjon of ALMA 12m, 7m & TP Array Data

Quantum Random Access Codes with Shared Randomness Maris Ozols, Laura Mancinska, Andris

Identification of SFD force coefficients Large Clearance Open Ends SFD TRC-SFD-01-2012 Luis San

Economic Prospects and Regional Impacts Northeast Alberta Region Overview Objectives Global

Combining NVIDIA Docker and databases to enhance agile development - PowerPoint PPT Presentation

Combining NVIDIA Docker and databases to enhance agile development and optimize resource allocation Chris Davis , Sophie Voisin , Devin White, Andrew Hardin Scalable and High Performance Geocomputation Team Geographic Information Science and

docker service is the new docker run Getting Started with Docker Clustering Mike Goelzer /

Setup docker rm $(docker ps -aq) docker network rm my_net Demo - Install and activate yum -y

Docker Provider The Docker provider is used to interact with Docker containers and images. It uses

Docker Review Basic Commands docker image ls # list images currently present locally docker

Orchestration in Docker Swarm mode, Docker services and declarative application deployment Mike

Going D/S/K Prod Like A Pro BRET FISHER Docker Captain, DevOps Dude, Creator of Docker Mastery

agile CMMI CMMI agile agile Process Innovation at the Speed Speed of Life of Life Process

Enhance OpenSSH for Fun and Security Enhance OpenSSH for Fun and Security Enhance OpenSSH for Fun

Docker Orchestration: Beyond the Basics Aaron Lehmann Software Engineer, Docker About me

Docker meets Python A look on the Docker SDK for Python pip install docker Jan Wagner

INTRODUCTION TO DOCKER ADRIAN MOUAT SO WHAT IS DOCKER? SIMILAR TO A LIGHTWEIGHT VM Both

The AGILE Data Center and the First AGILE Catalog Carlotta Pittori, on behalf of the AGILE

FOR THE BEST VDI USER EXPERIENCE NVIDIA VIRTUAL GPU PRODUCT POSITIONING NVIDIA GRID NVIDIA

NVIDIA NSIGHT ECLIPSE EDITION CHRISTOPH ANGERER, NVIDIA JULIEN DEMOUTH, NVIDIA WHAT YOU WILL

Corin Lucey Agile Lead Scaling Agile at HomeNet Who is HomeNet? Our Agile Landscape

Agile Unified Process (UP): Agile Process Overview Introduction to an OOA/D Agile Unified

Joint Distribution of Eigenvalues of Linear Stochastic Systems S Adhikari Department of

High-Fidelity Coupling of Predictive Plasma-Wall Models Goal: Develop a predictive model of the

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department

Session 3 Observation and Assessment SECTION 1: 1 Identifying Adults Needs Assessment can be

Total Power Map to Visibilitjes (TP2VIS) Joint-Deconvolutjon of ALMA 12m, 7m &amp; TP Array Data

Quantum Random Access Codes with Shared Randomness Maris Ozols, Laura Mancinska, Andris

Identification of SFD force coefficients Large Clearance Open Ends SFD TRC-SFD-01-2012 Luis San

Economic Prospects and Regional Impacts Northeast Alberta Region Overview Objectives Global

Total Power Map to Visibilitjes (TP2VIS) Joint-Deconvolutjon of ALMA 12m, 7m & TP Array Data