Automated Creation of Tests from CUDA Kernels
Oleg Rasskazov, Andrey Zhezherun, Antti Lamberg (JP Morgan) April 2016
Automated Creation of Tests from CUDA Kernels Oleg Rasskazov, Andrey - - PowerPoint PPT Presentation
April 2016 Automated Creation of Tests from CUDA Kernels Oleg Rasskazov, Andrey Zhezherun, Antti Lamberg (JP Morgan) GPUs in JP Morgan JP Morgan is extensively using GPUs to speed up risk calculations and reduce computational costs since
Oleg Rasskazov, Andrey Zhezherun, Antti Lamberg (JP Morgan) April 2016
2
Monte Carlo and PDEs GPU code Hand-written Cuda Kernels Thrust Auto-Generated Cuda Kernels Hardest part in delivering GPUs to production Bugs
3
Putting all of a Quant library to GPU is hard Parts of the code change frequently, so need to be rewritten Domain specific languages (DSL) could help
4
Sources (ways to notice)
5
How to verify that issue is not your code bug
Assume the issue could be reproduced by run of a standalone kernel, e.g.
6
Capture
7
Dump an array from GPU memory Restored Array can be allocated at different address
8
Assume GPU memory fits into 2GB Intercept GPU memory allocations in your code and replace with your custom allocator from
Allocate 3GB block of GPU memory BB
9
Assume GPU memory for the kernel fits into 1GB Intercept GPU memory allocations in your code and replace with your custom allocator from
Assume we do not store pointers back to CPU memory on the GPU Run 1:
10
Allocate 2GB block of GPU memory, BB
11
If many conditions are held We can automatically create standalone cuda test cases out our auto-generated kernels Surprisingly, preconditions hold for us say 99% of time 100GB worth of standalone tests from snapshot of our production (uncompressed)