April 4-7, 2016 | Silicon Valley
Erika Dignam and Ross Cunniff 04 April 2016
DEVELOPMENT BUG PREVENTION AND ISOLATION Erika Dignam and Ross - - PowerPoint PPT Presentation
April 4-7, 2016 | Silicon Valley ROBUST SOFTWARE DEVELOPMENT BUG PREVENTION AND ISOLATION Erika Dignam and Ross Cunniff 04 April 2016 ABOUT US Ross Cunniff Senior Software Engineer and NVIDIA SPEC representative. 15-year NVIDIA employee.
April 4-7, 2016 | Silicon Valley
Erika Dignam and Ross Cunniff 04 April 2016
2
Ross Cunniff Senior Software Engineer and NVIDIA SPEC representative. 15-year NVIDIA employee. Over 30 years of computer engineering experience. Erika Dignam Technical Program Manager and Bug Triager Studied computer arts. At NVIDIA for 9 years.
4/25/2016
3
Bug types | Triage and Tools | Recap Process Details | Bookkeeping Prevention and Benchmarking
4
4/25/2016
5
What is a trace? Intercepts calls between application and driver | Records to a file
| Some apps are not tracing friendly out of the box
4/25/2016
APP NV Driver file.trace apitrace
6
More Tracing tools
example issues with SLI Scaling EXAMPLE:
synchronization for texture mipmaps (42)
4/25/2016
7
Dump files
process explorer and select “Dump to File”
TDR – Timeout Detection and Recovery
4/25/2016
8
Intel VTune
free
AMD CodeAnalyst
performance-analyzer/ App bound? Driver bound? GPU bound? Performance paths taken
4/25/2016
9
Process Explorer
leaks, GPU or CPU bound
GPUview
detecting-video-memory-overcommitment-using-gpuview
4/25/2016
10
11
gDEBugger
stream
amount more polys into the OpenGL pipeline.
4/25/2016
12
NVIDIA OpenGL Driver Error codes External Swak = Swiss Army Knife
WSAppNotifier.exe – Profiles
4/25/2016
13
14
Changing Global Profiles
Notebooks
4/25/2016
15
Crash or TDR
Corruption
Performance
SLI Scaling
4/25/2016
16
https://www.khronos.org/vulkan/
resources
Built in Validation Layer – API violations SDK - https://vulkan.lunarg.com/signin | Need account Demos
Renderdoc | Graphics Debugger https://github.com/baldurk/renderdoc
4/25/2016
17
Vulkan Talks
Resources
4/25/2016
18
Normal External Bug Flow
Accounts to file bugs
4/25/2016
19
NVBUGS Start by filing as a software issue Important to have basic reproduction steps
frequency
Regression information is very helpful if can be provided
4/25/2016
20
21
Simple app/license
What to attach to bugs
4/25/2016
22
ODE = Optimized Driver for Enterprise
branch
certification QNF = Quadro New Feature
features and fixes
4/25/2016
WHQL = Windows Hardware Quality Labs Testing and Signed
23
ATP and QA
and drivers
DVS
images/tests per day
German Test Lab and Global Test Lab
4/25/2016
24
We want benchmarks and test suites!
Over to Ross to discuss Performance Benchmark creation!
4/25/2016
25
26
Anything a computer can do, a human can do. Given enough time… Computers are accelerators. Without good performance, user experience is bad. Benchmarking is the technique to ensure repeatable performance
27
Originally a surveying mark which provided a repeatable reference for placing a leveling rod. Key attributes: #1: repeatable #2: accurate #3: reportable
28
Many common units exist: MIPS, FLOPS, FPS, LPM, … Just because you can run a test and get units out, does not make your test a benchmark Quiz: if your test returns a result 60 FPS, what might you be measuring? What about 30, 20, 15, … FPS?
29
First principle: make sure the same operations are benchmarked on all configs Most benchmarks exhibit some randomness in performance The causes are many; some examples:
Non-deterministic operating system process / thread scheduler Disk I/O – variable times to reach a sector with rotational media; variable wear leveling for solid state media Build-to-build variation due to cache layout changes Virus scan cycles
Rule of thumb: a variation of up to 5% is generally acceptable (if higher, use multiple runs and rely on regression toward the mean)
30
“Do these numbers reflect reality?” Always verify assumptions. Do you expect your benchmark to be GPU limited? Then verify on GPUs with different performance levels. Faster is not always better – ensure work is actually being done that reflects end- user experience A good benchmark has a means to verify correct operation Make sure the key portion of your benchmark runs long enough that you can actually measure its performance, not virtual memory subsystem latency or other irrelevant metrics
31
If you are not measuring properly, you might not be able to make improvements 60Hz example – sync-to-vblank (default on NVIDIA) Bottleneck shift. A graphics benchmark may start CPU/API-limited, then after tuning move to being limited by GPU vertex processing. Or even change to being limited by pixel-processing as window sizes change or as workloads shift Constantly re-evaluate benchmark assumptions when tuning
32
Your benchmark should yield a metric – FPS, LPM, etc. – that is easily collected for further processing Output in standard formats – CSV, JSON, XML – many tools to format and compare If your benchmark is repeatable, accurate, and has good reports, you should be able to track performance over multiple builds / revisions of your application You will also be able to track performance over other changing variables: OS, CPU, GPU driver, memory size, … Important: select a reference score, and keep it constant if at all possible – avoid normalization of deviance If weighting multiple subtests, consider relative importance of subtest to your user
33
Clockwise from right:
34
Clockwise from right: Catia-04 Creo-01 Maya-04 Medical-01 Showcase-01 SNX-02 SW-03
35
36
Test 1 Test 2 Test 5 Test 6 Test 8 Test 10
37
Note varying weights – sum
38
SPEC benchmarking group – http://www.spec.org SPEC Graphics and Workstation Performance Group (GWPG): http://www.spec.org/gwpg/publish/gpcfaqs.html Contribute to SPEC GWPG: http://www.spec.org/gwpg/publish/develop_bench.html
“SPEC's Graphics and Workstation Performance Group (SPEC/GWPG) is seeking ISVs, software user groups, publication editors and testing lab directors to help develop and maintain standardized benchmarks based on professional graphics and workstation
consideration by a project group or help the group develop an entirely new benchmark.”
39
Benchmark what matters to your users! Create good benchmarks Help us help you! – share your benchmark with NVIDIA and we will put it in our driver regression automation suite to prevent performance bugs Consider sharing application benchmarks with SPEC
40
Help us help you! Create good unit tests and benchmarks Make use of available software development and analysis tools Be systematic in development and testing Share your unit tests and benchmarks with us (especially if there is a problem) Be clear and concise in your bug reports
41
Ross Cunniff – rcunniff@nvidia.com Erika Dignam – edignam@nvidia.com
April 4-7, 2016 | Silicon Valley
JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join
43
44
Looking at dumps
Performance
4/25/2016