DEVELOPMENT BUG PREVENTION AND ISOLATION Erika Dignam and Ross - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley ROBUST SOFTWARE DEVELOPMENT BUG PREVENTION AND ISOLATION Erika Dignam and Ross Cunniff 04 April 2016

ABOUT US Ross Cunniff Senior Software Engineer and NVIDIA SPEC representative. 15-year NVIDIA employee. Over 30 years of computer engineering experience. Erika Dignam Technical Program Manager and Bug Triager Studied computer arts. At NVIDIA for 9 years. 2 4/25/2016

Bug types | Triage and Tools | Recap STRUCTURE Process Details | Bookkeeping Prevention and Benchmarking 3

BUG TYPES Crash or TDR Corruption Performance SLI Scaling 4 4/25/2016

TOOLS AND TRIAGE Traces – All bug types What is a trace? Intercepts calls between application and driver | Records to a file NV apitrace APP Driver file.trace Apitrace (DX and OpenGL) - http://apitrace.github.io/ • • Pass along .trace file – Replay, performance info, and dump API stream Simple to use - copy <API>.dll to executable location • Caveats - Long reproductions means large files | Tracing tools don’t always capture • | Some apps are not tracing friendly out of the box 5 4/25/2016

TOOLS AND TRIAGE Traces More Tracing tools GLIntercept (OpenGL) - https://github.com/dtrebilco/glintercept • Useful for error states and other tracing, a little older than apitrace • • Copy opengl.dll and gliConfig.ini to executable folder location Swapping the DebugContext.ini config file can give very helpful information, for • example issues with SLI Scaling EXAMPLE: OpenGL: Performance(Medium) 131234: SLI performance warning: SLI AFR copy and • synchronization for texture mipmaps (42) 6 4/25/2016

TOOLS AND TRIAGE Crashes/TDR Dump files • Mini dump - Always helpful, you can simply right click the process from the task manager or process explorer and select “Dump to File” Full dump - Better, but larger • https://msdn.microsoft.com/en-us/library/windows/desktop/bb787181(v=vs.85).aspx • TDR – Timeout Detection and Recovery Increase the TDR delay, what are the results then? • https://msdn.microsoft.com/en-us/library/windows/hardware/ff569918(v=vs.85).aspx • 7 4/25/2016

TOOLS AND TRIAGE CPU Profilers - Performance Intel VTune In-depth perf analysis, finer tuned control, filters noise | Needs a license, not • free https://software.intel.com/en-us/intel-vtune-amplifier-xe • AMD CodeAnalyst • Simple, free, runs on both CPUs | Less robust than Vtune, no longer supported http://developer.amd.com/tools-and-sdks/archive/amd-codeanalyst- • performance-analyzer/ App bound? Driver bound? GPU bound? Performance paths taken 8 4/25/2016

TOOLS AND TRIAGE Performance/Resources Process Explorer Free quick overview tool - Check loaded .dlls, can see load on resources, memory • leaks, GPU or CPU bound https://technet.microsoft.com/en-us/sysinternals/processexplorer.aspx • GPUview • Free Windows tool included with the Windows Performance Toolkit (WPT) https://graphics.stanford.edu/~mdfisher/GPUView.html • https://developer.nvidia.com/content/are-you-running-out-video-memory- • detecting-video-memory-overcommitment-using-gpuview 9 4/25/2016

PROCESS EXPLORER 10

TOOLS AND TRIAGE Tools gDEBugger http://www.gremedy.com/ • Free OpenGL debugging tool • • Useful for data gathering, good for tracking state changes, dynamically look at stream • EXAMPLE: • Polygon count information from models Performance bug was root caused to one mode of the model was sending a significant • amount more polys into the OpenGL pipeline. 11 4/25/2016

NVIDIA TOOLS AND LOGS NVIDIA OpenGL Driver Error codes External Swak = Swiss Army Knife NVIDIA tool used to capture detailed system information • • Only available under NDA, on the partners site WSAppNotifier.exe – Profiles • For application profile problems, tells you which profiles are running/applied You may have to launch the app twice • NDA only, on partner site • 12 4/25/2016

WSAPPNOTIFIER.EXE 13

TRIAGE/DEBUGGING Profiles – Things to Try Changing Global Profiles • Workstation App - Dynamic Streaming | Turns off some optimized driver paths 3D App – Game Development | Simulates a GeForce • • SLI Aware Application | SLI performance testing Threaded optimization = OFF | In Profile settings • Notebooks Try setting NVIDIA GPU to default | In profiles or SBIOS if available • 14 4/25/2016

TRIAGE/DEBUGGING Vulkan https://www.khronos.org/vulkan/ New API that puts the application developer in control, appDev manages GPU memory and • resources Built in Validation Layer – API violations SDK - https://vulkan.lunarg.com/signin | Need account Demos • https://github.com/SaschaWillems/Vulkan | https://github.com/McNopper/Vulkan Renderdoc | Graphics Debugger https://github.com/baldurk/renderdoc 16 4/25/2016

TRIAGE/DEBUGGING Vulkan Vulkan Talks • S6818 – Vulkan and NVIDIA: The Essentials S6138 – GPU Driven Rendering in Vulkan and OpenGL • S6133 – VKCPP: A C++ Layer on Top of Vulkan • Three Hangouts, Monday and Tuesday afternoons • Resources https://github.com/KhronosGroup/Khronosdotorg/blob/master/api/vulkan/resources.md • 17 4/25/2016

BUG PROCESS Normal External Bug Flow External Bug -> QA -> Triage -> Engineering • Accounts to file bugs • partners.nvidia.com – Needs NDA developer.nvidia.com\join • • Access to early release drivers and NVIDIA tools, report bugs! 18 4/25/2016

BUG PROCESS Overview NVBUGS Start by filing as a software issue Important to have basic reproduction steps • OS, driver, card, application and version if applicable, system information, frequency • Severity and impact for you Type - Performance, Crash, Corruption, TDR • Regression information is very helpful if can be provided 19 4/25/2016

TOOLS AND TRIAGE Overview Simple app/license A trace would be great, no license/app/model needed • Avoids delays, very useful when a third party has a repro others can’t get • • If not possible, then models/scenes/app/license/demo will be needed – Time sink What to attach to bugs • Logs, traces, performance snap shots, dump files, videos, event logs System information via externSwak (NVTOOL) • 21 4/25/2016

WHAT HAPPENS TO YOUR BUG Fixes -> Driver | Branches ODE = Optimized Driver for Enterprise QNF = Quadro New Feature Long lived branch Short lived branch • • • Multiple releases or dot version per • One release per branch branch Release driver for testing new • For production use and features and fixes • certification WHQL = Windows Hardware Quality Labs Testing and Signed 22 4/25/2016

PREVENTION What NVIDIA does ATP and QA • We have QA teams with application experts around the world testing applications, GPUs, OSs, and drivers ATP is our automated test harness for further testing to cover more configurations • DVS • Driver Validation System. Automated and run with every single code change. 10 million images/tests per day German Test Lab and Global Test Lab 24/7 automated testing of professional applications and features • 23 4/25/2016

PREVENTION Best Process We want benchmarks and test suites! • Early detection of bugs and issues Early detection of performance regressions • Get involved in industry standard benchmarks, example SPEC • Over to Ross to discuss Performance Benchmark creation! 24 4/25/2016

PERFORMANCE BENCHMARKING A key to high-quality user experience 25

“WHEN YOU CANNOT MEASURE IT… …your knowledge is of a meagre and unsatisfactory kind” – Lord Kelvin Anything a computer can do, a human can do. Given enough time … Computers are accelerators. Without good performance, user experience is bad. Benchmarking is the technique to ensure repeatable performance 26

WHAT MAKES A BENCHMARK? Originally a surveying mark which provided a repeatable reference for placing a leveling rod. Key attributes: #1: repeatable #2: accurate #3: reportable 27

UNITS ARE NOT BENCHMARKS Many common units exist: MIPS, FLOPS, FPS, LPM, … Just because you can run a test and get units out, does not make your test a benchmark Quiz: if your test returns a result 60 FPS, what might you be measuring? What about 30, 20, 15, … FPS? 28

REPEATABILITY First principle: make sure the same operations are benchmarked on all configs Most benchmarks exhibit some randomness in performance The causes are many; some examples: Non-deterministic operating system process / thread scheduler Disk I/O – variable times to reach a sector with rotational media; variable wear leveling for solid state media Build-to-build variation due to cache layout changes Virus scan cycles Rule of thumb: a variation of up to 5% is generally acceptable (if higher, use multiple runs and rely on regression toward the mean) 29

DEVELOPMENT BUG PREVENTION AND ISOLATION Erika Dignam and Ross - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley ROBUST SOFTWARE DEVELOPMENT BUG PREVENTION AND ISOLATION Erika Dignam and Ross Cunniff 04 April 2016 ABOUT US Ross Cunniff Senior Software Engineer and NVIDIA SPEC representative. 15-year NVIDIA employee.

Harper Avenue Focus Area Existing Conditions Existing Development Existing Development

Harper Avenue Focus Area Open House Existing Conditions Existing Development Existing

TRES WEST ENGINEERS, INC Existing Site Development Proposed Site Development Proposed Site

Local Development Plan Local Development Plan Local Development Plan Local Development Plan

CGE model development (2) CGE model development (2) CGE model development CGE model development

Transit-Oriented Development (TOD)/Joint Development for Buffalo Niagara TOD/Joint Development

DEVELOPMENT REALM Development Development can increase can reduce vulnerability vulnerability

DOWNTOWN NIAGARA FALLS DEVELOPMENT STRATEGY Empire State Development USA Niagara Development

Recent Development in India Recent Development in India Recent Development in India Recent

X(cross) Development System X(cross) Development System make AGL application development easier

Infrastructure Financing Options for Transit-Oriented Development Development Council of

Sustainable Sustainable Sustainable Sustainable Development Development Development

Development and linkages among Development and linkages among Development and linkages among

Historical Development Historical Development Historical Development Lesson No. 2 ENV H 471

OVERVIEW OF SUSTAINABLE OVERVIEW OF SUSTAINABLE DEVELOPMENT IN THE DEVELOPMENT IN THE MEKONG

Well be starting soon Development Management Committee 27 th May 2020 Development Management

Density-Based Clustering over an Evolving Data Stream with Noise Feng Cao Martin Ester

This circular consists of two annexes: (1) Annex 1 is the guidelines for the presentation of

Trekking through the Trees: Forest Succession at the Trinity River Audubon Center Jewel Lipps

BIENNIAL DOWNTOWN TAMPA WORKER AND RESIDENT STUDY Conducted by HCP Associates on behalf of the

A Similarity Measure for Formal Ontologies with an application to ontologies of a geographic kind

Deep Learning for Predictive Maintenance Pawel Morkisz GTC 2017 Agenda Problem

A Renewable E nergy Strategy for the Republic of Cyprus and the Potential Contribution from the

Brief Overview What Is PaveXpress? Overview of the AASHTO Design Guide Page-by-Page

Sambuz

Useful Links

Newsletter

Mail Us