S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND - - PowerPoint PPT Presentation

s9751
SMART_READER_LITE
LIVE PREVIEW

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND - - PowerPoint PPT Presentation

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND CODE ANALYSIS DEVELOPER TOOLS Aurelien Chartier & Steve Ulrich, March 19 th , 2019 Debugging Tools Nsight Eclipse Edition CUDA GDB Nsight Visual Studio Steve Ulrich CUDA


slide-1
SLIDE 1

Aurelien Chartier & Steve Ulrich, March 19th, 2019

S9751:

ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND CODE ANALYSIS DEVELOPER TOOLS

slide-2
SLIDE 2

2

AGENDA

Debugging Tools Nsight Eclipse Edition CUDA GDB Nsight Visual Studio CUDA Memcheck Debug API Steve Ulrich Sanitizer API Aurelien Chartier

slide-3
SLIDE 3

Steve Ulrich, March 19th, 2019

DEBUGGING TOOLS

slide-4
SLIDE 4

4

WHO ARE YOU?

Have you …

  • Ever written a CUDA application?
  • Developed in CUDA for…
  • … over a year?
  • … over five years?
  • Windows or Linux?
  • Desktop or Embedded?
slide-5
SLIDE 5

5

WHAT ARE YOUR PREFERENCES?

CLI vs. IDE? How many spaces in a tab? Big Endian vs. Little Endian?

slide-6
SLIDE 6

6

DEBUGGING STYLES

Before using a debugger:

Reboot (“you may have a future in tech support ☺”) Google the error message Ask for help printf() assert()

And eventually…

Break out the debugger

slide-7
SLIDE 7

7

NVIDIA DEBUGGER PORTFOLIO

IDE Tools Nsight Eclipse Edition Nsight Visual Studio CLI Tools CUDA GDB CUDA MEMCHECK Development Libraries CUDA Debug API

slide-8
SLIDE 8

8

NSIGHT ECLIPSE

  • Install Nsight Eclipse plugins in your own Eclipse environment
  • Supported in Eclipse version 4.7/4.8/4.9 for CUDA 10.1 (4.10

is in validation)

  • Full featured IDE to edit, build and debug CUDA applications
  • NVCC build integration to cross compile for various target

platforms(x86/L4T/Drive Linux/Drive QNX).

  • Debugger - Seamless and simultaneous debugging of both CPU

and GPU code using CUDA GDB

  • New with 10.1 - Docker Support

Eclipse IDE for CUDA

slide-9
SLIDE 9

9

NSIGHT ECLIPSE EDITION

  • Use the Drive OS Docker images from NVIDIA GPU Cloud to build the CUDA

projects using Nsight EE.

  • Nsight EE plugins supports mounting the project source files from the host to

docker container and build the projects using CUDA toolkit/toolchains available inside docker container. No CUDA toolkit installation is required on the host.

  • You can choose the docker image to use when creating the project or in

preference page.

  • Remote debug applications from Nsight EE using cuda-gdb inside docker

container..

Docker Support

slide-10
SLIDE 10

10

CUDA-GDB

Command line source and assembly (SASS) level debugger Simultaneous CPU and GPU debugging Inspect and modify memory, register, variable state Control program execution Support for multiple GPUs, multiple contexts, multiple kernels

slide-11
SLIDE 11

11

CUDA-GDB

slide-12
SLIDE 12

12

EVER HAD A CRASH AND WISHED YOU’D BEEN DEBUGGING?

$ CUDA_DEVICE_WAITS_ON_EXCEPTION=1

slide-13
SLIDE 13

13

EVER HAD A CRASH AND WISHED YOU’D BEEN DEBUGGING?

$ CUDA_DEVICE_WAITS_ON_EXCEPTION=1

slide-14
SLIDE 14

14

WHAT ABOUT CORE DUMPS?

$ CUDA_ENABLE_COREDUMP_ON_EXCEPTION=1

slide-15
SLIDE 15

15

WHAT ABOUT CORE DUMPS?

$ CUDA_ENABLE_COREDUMP_ON_EXCEPTION=1

slide-16
SLIDE 16

16

NSIGHT VISUAL STUDIO

Native CUDA C/C++ GPU Debugging Source-correlated assembly debugging (SASS / PTX / SASS+PTX) Data breakpoints for CUDA C/C++ code Expressions in Locals, Watch and Conditionals

Visual Studio IDE for CUDA

slide-17
SLIDE 17

17

NSIGHT VISUAL STUDIO

slide-18
SLIDE 18

18

CUDA-MEMCHECK

Four tools in One:

Memory Checker Race Condition Checker Initialization Checker Synchronization Checker

slide-19
SLIDE 19

19

CUDA DEBUG API

ABI Support Exception Reporting Attach and Detach Runtime control State Inspection

Enable 3rd-party Debuggers

slide-20
SLIDE 20

20

WHERE TO GET THE TOOLS?

https://developer.nvidia.com/tools-overview

slide-21
SLIDE 21

Aurelien Chartier, March 19th, 2019

CODE ANALYSIS – SANITIZER API

slide-22
SLIDE 22

22

EVER…

  • Used cuda-memcheck but needed more control?
  • Needed a tool that let you analyze memory access patterns in your code?
  • Wished for a PIN-like binary instrumentation tool for CUDA?
slide-23
SLIDE 23

23

INTRODUCING THE SANITIZER API

Released in CUDA 10.1

Host Target Memory allocations Kernel launch Device code Stream sync Callback host function Callback device function

Sanitizer callback API Sanitizer patching API

slide-24
SLIDE 24

24

SANITIZER API USAGE

static void ApiTrackerCallback( void* userdata, Sanitizer_CallbackDomain domain, Sanitizer_CallbackId cbid, const void* cbdata) { auto* pCallbackData = (Sanitizer_CallbackData*)cbdata; auto returnValue = *(cudaError_t*)pCallbackData->functionReturnValue; std::cout << "API call to " << pCallbackData->functionName <<" (return code " << returnValue << ")" << std::endl; }

Callback API

  • 1. Write a callback function using Sanitizer_CallbackFunc function signature
slide-25
SLIDE 25

25

SANITIZER API USAGE

  • 2. Use sanitizerSubscribe to create a sanitizer subscriber
  • 3. Use one of the sanitizerEnableCallback functions to enable/disable callbacks

As in the CUPTI API, only one subscriber can be active at any point.

Callback API

sanitizerSubscribe(&handle, ApiTrackerCallback, userdata); sanitizerEnableDomain(1, handle, SANITIZER_CB_DOMAIN_RUNTIME_API);

slide-26
SLIDE 26

26

API TRACKER

Memory allocations Kernel launch Device code Stream sync ApiTrackerCallback Target

slide-27
SLIDE 27

27

API TRACKER

slide-28
SLIDE 28

28

SANITIZER API USAGE

  • 1. Write patches as __device__ functions using the SanitizerCallback function signature

Patching API: device side

extern "C" __device__ __noinline__ SanitizerPatchResult MemoryAccessCallback( void* userdata, uint64_t pc, void* ptr, uint32_t accessSize, uint32_t flags) { auto* pTracker = (MemoryAccessTracker*)userdata; uint32_t old = atomicAdd(&(pTracker->currentEntry), 1); MemoryAccess& access = pTracker->accesses[old]; access.address = (uint64_t)(uintptr_t)ptr; access.accessSize = accessSize; access.flags = flags; return SANITIZER_PATCH_SUCCESS; }

slide-29
SLIDE 29

29

SANITIZER API USAGE

  • 2. Compile patches in a cubin or fatbin using new PTXAS option –-compile-as-tools-

patch

Patching API: device side

$ nvcc --cubin --keep-device-functions –Xptxas –-compile-as-tools-patch MemoryTrackerPatches.cu –o MemoryTrackerPatches.cubin

slide-30
SLIDE 30

30

SANITIZER API USAGE

  • 1. Load patches in a CUDA context using sanitizerAddPatches
  • 2. Mark which instructions should be patched using sanitizerPatchInstructions
  • 3. Once all instructions have been marked, use sanitizerPatchModule to patch a given

module

  • 4. Use sanitizerSetCallbackData to specify user data passed to callbacks

Patching API: host side

sanitizerAddPatchesFromFile("MemoryTrackerPatches.cubin", ctx); sanitizerPatchInstructions(SANITIZER_INSTRUCTION_MEMORY_ACCESS, module, "MemoryAccessCallback"); sanitizerPatchModule(pModuleData->module); sanitizerSetCallbackData(stream, userdata);

slide-31
SLIDE 31

31

MEMORY TRACKER

Module load Kernel launch Device code Stream sync

  • 1. Load patches
  • 2. Mark memory accesses to

be patched

  • 3. Patch module

MemoryAccessCallback Target Host

  • 4. Set userdata

Retrieve userdata

slide-32
SLIDE 32

32

MEMORY TRACKER

slide-33
SLIDE 33

33

WHERE TO GET THE TOOL

  • Released in CUDA 10.1 under extras/Sanitizer folder
  • Support for Windows, Linux (x64 only), OSX
  • Support for Maxwell and newer GPUs
  • Missing features:
  • CDP (partial support)
  • CUDA graphs
  • Atomics on short types
  • Get help with either:
  • devtools-support@nvidia.com
  • https://devtalk.nvidia.com/default/board/373/compute-sanitizer-api/
slide-34
SLIDE 34

34

SANITIZER SAMPLES

Find the full samples code for the ApiTracker and MemoryTracker at: https://github.com/NVIDIA/compute-sanitizer-samples

slide-35
SLIDE 35

35

DEVELOPER TOOLS AT GTC19

Talks: S9751: Accelerate Your CUDA Development with Latest Debugging and Code Analysis Developer Tools, Tue @9am S9866 - Optimizing Facebook AI Workloads for NVIDIA GPUs, Tue @9am S9345: CUDA Kernel Profiling using NVIDIA Nsight Compute, Tue @1pm S9661: Nsight Graphics - DXR/Vulkan Profiling/Vulkan Raytracing, Wed @10am S9503: Using Nsight Tools to Optimize the NAMD Molecular Dynamics Simulation Program, Wed @1pm Hands-on labs: L9102: Jetson Developer Tools Training Lab, Mon @9am, 11:30am L9124: Debugging and optimizing CUDA applications with Nsight products on Linux training lab, Tue @8am, 10am Connect with the Experts (where DevTools will be available): CE9123: CUDA & Graphics Developer Tools, Tue @2pm, Wed @3pm CE9137: Jetson Embedded Platform, Tue @12pm, 5pm, Wed @1pm, 4pm, Thu @12pm Podium: Demos of DevTools products on Linux, DRIVE AGX & Jetson AGX at the showfloor Tue @12pm – 7pm Wed @12pm – 7pm Thu @11am – 2pm

slide-36
SLIDE 36