development
play

DEVELOPMENT BUG PREVENTION AND ISOLATION Erika Dignam and Ross - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley ROBUST SOFTWARE DEVELOPMENT BUG PREVENTION AND ISOLATION Erika Dignam and Ross Cunniff 04 April 2016 ABOUT US Ross Cunniff Senior Software Engineer and NVIDIA SPEC representative. 15-year NVIDIA employee.


  1. April 4-7, 2016 | Silicon Valley ROBUST SOFTWARE DEVELOPMENT BUG PREVENTION AND ISOLATION Erika Dignam and Ross Cunniff 04 April 2016

  2. ABOUT US Ross Cunniff Senior Software Engineer and NVIDIA SPEC representative. 15-year NVIDIA employee. Over 30 years of computer engineering experience. Erika Dignam Technical Program Manager and Bug Triager Studied computer arts. At NVIDIA for 9 years. 2 4/25/2016

  3. Bug types | Triage and Tools | Recap STRUCTURE Process Details | Bookkeeping Prevention and Benchmarking 3

  4. BUG TYPES Crash or TDR Corruption Performance SLI Scaling 4 4/25/2016

  5. TOOLS AND TRIAGE Traces – All bug types What is a trace? Intercepts calls between application and driver | Records to a file NV apitrace APP Driver file.trace Apitrace (DX and OpenGL) - http://apitrace.github.io/ • • Pass along .trace file – Replay, performance info, and dump API stream Simple to use - copy <API>.dll to executable location • Caveats - Long reproductions means large files | Tracing tools don’t always capture • | Some apps are not tracing friendly out of the box 5 4/25/2016

  6. TOOLS AND TRIAGE Traces More Tracing tools GLIntercept (OpenGL) - https://github.com/dtrebilco/glintercept • Useful for error states and other tracing, a little older than apitrace • • Copy opengl.dll and gliConfig.ini to executable folder location Swapping the DebugContext.ini config file can give very helpful information, for • example issues with SLI Scaling EXAMPLE: OpenGL: Performance(Medium) 131234: SLI performance warning: SLI AFR copy and • synchronization for texture mipmaps (42) 6 4/25/2016

  7. TOOLS AND TRIAGE Crashes/TDR Dump files • Mini dump - Always helpful, you can simply right click the process from the task manager or process explorer and select “Dump to File” Full dump - Better, but larger • https://msdn.microsoft.com/en-us/library/windows/desktop/bb787181(v=vs.85).aspx • TDR – Timeout Detection and Recovery Increase the TDR delay, what are the results then? • https://msdn.microsoft.com/en-us/library/windows/hardware/ff569918(v=vs.85).aspx • 7 4/25/2016

  8. TOOLS AND TRIAGE CPU Profilers - Performance Intel VTune In-depth perf analysis, finer tuned control, filters noise | Needs a license, not • free https://software.intel.com/en-us/intel-vtune-amplifier-xe • AMD CodeAnalyst • Simple, free, runs on both CPUs | Less robust than Vtune, no longer supported http://developer.amd.com/tools-and-sdks/archive/amd-codeanalyst- • performance-analyzer/ App bound? Driver bound? GPU bound? Performance paths taken 8 4/25/2016

  9. TOOLS AND TRIAGE Performance/Resources Process Explorer Free quick overview tool - Check loaded .dlls, can see load on resources, memory • leaks, GPU or CPU bound https://technet.microsoft.com/en-us/sysinternals/processexplorer.aspx • GPUview • Free Windows tool included with the Windows Performance Toolkit (WPT) https://graphics.stanford.edu/~mdfisher/GPUView.html • https://developer.nvidia.com/content/are-you-running-out-video-memory- • detecting-video-memory-overcommitment-using-gpuview 9 4/25/2016

  10. PROCESS EXPLORER 10

  11. TOOLS AND TRIAGE Tools gDEBugger http://www.gremedy.com/ • Free OpenGL debugging tool • • Useful for data gathering, good for tracking state changes, dynamically look at stream • EXAMPLE: • Polygon count information from models Performance bug was root caused to one mode of the model was sending a significant • amount more polys into the OpenGL pipeline. 11 4/25/2016

  12. NVIDIA TOOLS AND LOGS NVIDIA OpenGL Driver Error codes External Swak = Swiss Army Knife NVIDIA tool used to capture detailed system information • • Only available under NDA, on the partners site WSAppNotifier.exe – Profiles • For application profile problems, tells you which profiles are running/applied You may have to launch the app twice • NDA only, on partner site • 12 4/25/2016

  13. WSAPPNOTIFIER.EXE 13

  14. TRIAGE/DEBUGGING Profiles – Things to Try Changing Global Profiles • Workstation App - Dynamic Streaming | Turns off some optimized driver paths 3D App – Game Development | Simulates a GeForce • • SLI Aware Application | SLI performance testing Threaded optimization = OFF | In Profile settings • Notebooks Try setting NVIDIA GPU to default | In profiles or SBIOS if available • 14 4/25/2016

  15. RECAP What tools for what bugs Crash or TDR • TDR Delay RegKeys | Collect dump files | Trace | GPUView Corruption • Trace | Changing profiles Performance • Changing profiles | apitrace | VTune/CodeAnalyst SLI Scaling • Debug Context from GLIntercept 15 4/25/2016

  16. TRIAGE/DEBUGGING Vulkan https://www.khronos.org/vulkan/ New API that puts the application developer in control, appDev manages GPU memory and • resources Built in Validation Layer – API violations SDK - https://vulkan.lunarg.com/signin | Need account Demos • https://github.com/SaschaWillems/Vulkan | https://github.com/McNopper/Vulkan Renderdoc | Graphics Debugger https://github.com/baldurk/renderdoc 16 4/25/2016

  17. TRIAGE/DEBUGGING Vulkan Vulkan Talks • S6818 – Vulkan and NVIDIA: The Essentials S6138 – GPU Driven Rendering in Vulkan and OpenGL • S6133 – VKCPP: A C++ Layer on Top of Vulkan • Three Hangouts, Monday and Tuesday afternoons • Resources https://github.com/KhronosGroup/Khronosdotorg/blob/master/api/vulkan/resources.md • 17 4/25/2016

  18. BUG PROCESS Normal External Bug Flow External Bug -> QA -> Triage -> Engineering • Accounts to file bugs • partners.nvidia.com – Needs NDA developer.nvidia.com\join • • Access to early release drivers and NVIDIA tools, report bugs! 18 4/25/2016

  19. BUG PROCESS Overview NVBUGS Start by filing as a software issue Important to have basic reproduction steps • OS, driver, card, application and version if applicable, system information, frequency • Severity and impact for you Type - Performance, Crash, Corruption, TDR • Regression information is very helpful if can be provided 19 4/25/2016

  20. 20

  21. TOOLS AND TRIAGE Overview Simple app/license A trace would be great, no license/app/model needed • Avoids delays, very useful when a third party has a repro others can’t get • • If not possible, then models/scenes/app/license/demo will be needed – Time sink What to attach to bugs • Logs, traces, performance snap shots, dump files, videos, event logs System information via externSwak (NVTOOL) • 21 4/25/2016

  22. WHAT HAPPENS TO YOUR BUG Fixes -> Driver | Branches ODE = Optimized Driver for Enterprise QNF = Quadro New Feature Long lived branch Short lived branch • • • Multiple releases or dot version per • One release per branch branch Release driver for testing new • For production use and features and fixes • certification WHQL = Windows Hardware Quality Labs Testing and Signed 22 4/25/2016

  23. PREVENTION What NVIDIA does ATP and QA • We have QA teams with application experts around the world testing applications, GPUs, OSs, and drivers ATP is our automated test harness for further testing to cover more configurations • DVS • Driver Validation System. Automated and run with every single code change. 10 million images/tests per day German Test Lab and Global Test Lab 24/7 automated testing of professional applications and features • 23 4/25/2016

  24. PREVENTION Best Process We want benchmarks and test suites! • Early detection of bugs and issues Early detection of performance regressions • Get involved in industry standard benchmarks, example SPEC • Over to Ross to discuss Performance Benchmark creation! 24 4/25/2016

  25. PERFORMANCE BENCHMARKING A key to high-quality user experience 25

  26. “WHEN YOU CANNOT MEASURE IT… …your knowledge is of a meagre and unsatisfactory kind” – Lord Kelvin Anything a computer can do, a human can do. Given enough time … Computers are accelerators. Without good performance, user experience is bad. Benchmarking is the technique to ensure repeatable performance 26

  27. WHAT MAKES A BENCHMARK? Originally a surveying mark which provided a repeatable reference for placing a leveling rod. Key attributes: #1: repeatable #2: accurate #3: reportable 27

  28. UNITS ARE NOT BENCHMARKS Many common units exist: MIPS, FLOPS, FPS, LPM, … Just because you can run a test and get units out, does not make your test a benchmark Quiz: if your test returns a result 60 FPS, what might you be measuring? What about 30, 20, 15, … FPS? 28

  29. REPEATABILITY First principle: make sure the same operations are benchmarked on all configs Most benchmarks exhibit some randomness in performance The causes are many; some examples: Non-deterministic operating system process / thread scheduler Disk I/O – variable times to reach a sector with rotational media; variable wear leveling for solid state media Build-to-build variation due to cache layout changes Virus scan cycles Rule of thumb: a variation of up to 5% is generally acceptable (if higher, use multiple runs and rely on regression toward the mean) 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend