compiler assisted test acceleration on gpus for embedded
play

COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE - PowerPoint PPT Presentation

COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE VANYA YANEVA Ajitha Rajan, Christophe Dubach ISSTA 2017 10 July 2017 Santa Barbara, CA EMBEDDED SOFTWARE IS EVERYWHERE ITS SAFETY AND CORRECTNESS ARE CRUCIAL FUNCTIONAL


  1. COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE VANYA YANEVA Ajitha Rajan, Christophe Dubach ISSTA 2017 10 July 2017 Santa Barbara, CA

  2. EMBEDDED SOFTWARE IS EVERYWHERE ITS SAFETY AND CORRECTNESS ARE CRUCIAL FUNCTIONAL TESTING IS CRITICAL

  3. FUNCTIONAL TESTING CAN BE EXTREMELY TIME CONSUMING

  4. FUNCTIONAL TESTING CAN BE EXTREMELY TIME CONSUMING Test suite Test case 1 Expected result 1 Test case 2 Expected result 2 Test case 3 Expected result 3 Application Test case n Expected result n

  5. FUNCTIONAL TESTING CAN BE EXTREMELY TIME CONSUMING Test suite Test case 1 Expected result 1 Test case 2 Expected result 2 Test case 3 Expected result 3 Application Test case n Expected result n TESTING IS AN IDEAL CANDIDATE FOR PARALLELISATION

  6. CPU SERVERS Expensive Do not scale easily as test suites grow Can be extremely underutilised

  7. CPU SERVERS GPUS Expensive Cheap and widely available Do not scale easily as test suites grow Large-scale parallelism, thousands of threads Can be extremely underutilised SIMD architecture suited to functional testing

  8. EXECUTE TESTS IN PARALLEL ON THE GPU THREADS Test suite Read test cases: INPUT[] = {test case 1 … test case n} Test case 1 Expected result 1 Transfer INPUT[] to GPU memory Test case 2 Expected result 2 Build and launch tested program Test case 3 on the GPU threads Expected result 3 th_id 0 1 2 3 n-1 OUTPUT[th_id] = program ( INPUT[th_id] ) Test case n Expected result n Transfer OUTPUT[] to CPU memory A. Rajan, S. Sharma, P. Schrammel, D. Kroening. Accelerated test execution using GPUs. In proceedings of ASE 2014, pages 97- 102, Sweden, Nov 2014.

  9. EXECUTE TESTS IN PARALLEL ON THE GPU THREADS Test suite Read test cases: INPUT[] = {test case 1 … test case n} Test case 1 Expected result 1 Transfer INPUT[] to GPU memory CHALLENGES Test case 2 Expected result 2 Build and launch tested program Test case 3 on the GPU threads Expected result 3 Usability ✘ th_id 0 1 2 3 n-1 Scope ✘ OUTPUT[th_id] = program ( INPUT[th_id] ) Test case n Expected result n Performance ? Transfer OUTPUT[] to CPU memory A. Rajan, S. Sharma, P. Schrammel, D. Kroening. Accelerated test execution using GPUs. In proceedings of ASE 2014, pages 97- 102, Sweden, Nov 2014.

  10. INTRODUCING PARTECL Test cases (CSV format) Unmodified ParTeCL ParTeCL source files Execution OpenCL CodeGen Runtime on the GPU Config file

  11. INPUTS Example: Configuration: #include <stdio.h> input: int a 1 #include <stdlib.h> input: int b 2 result: int sum variable: sum int c; int addc(int a, int b){ Test cases: return a + b + c; } int main(int argc, char* argv[]){ int a = atoi(argv[1]); 1 13 7 int b = atoi(argv[2]); 2 50 22 c = 3; 3 1000 0 4 0 1000 int sum = addc(a, b); 5 0 0 printf("%d + %d + %c = %d\n", a, b, c, sum); }

  12. PARTECL CODEGEN Example: OpenCL: #include <stdio.h> #include "structs.h" #include <stdlib.h> //#include <stdio.h> //#include <stdlib.h> int c; /*int c;*/ int addc(int a, int b){ int addc(int a, int b, int *c){ return a + b + c; return a + b + (*c); } } int main(int argc, char* argv[]){ kernel void main_kernel( global struct test_input* inputs, int a = atoi(argv[1]); global struct test_result* results){ int b = atoi(argv[2]); c = 3; int idx = get_global_id(0); struct test_input input_gen = inputs[idx]; int sum = addc(a, b); global struct test_result *result_gen = &results[idx]; printf("%d + %d + %c = %d\n", a, b, c, sum); int argc = input_gen.argc; } result_gen->test_case_num = input_gen.test_case_num; int c; int a = input_gen.a; int b = input_gen.b; c = 3; int sum = addc(a, b, &c); /*printf("%d + %d + %c = %d\n", a, b, c, sum);*/ result_gen->sum = sum; }

  13. CODE TRANSFORMATIONS global scope variables command line arguments standard in/out standard library (partial support): clClibc

  14. PARTECL RUNTIME Read test cases: INPUT[] = {test case 1 … test case n} Transfer INPUT[] to GPU memory Automatically generated Build and launch tested program OpenCL on the GPU threads th_id 0 1 2 3 n-1 OUTPUT[th_id] = program ( INPUT[th_id] ) Transfer OUTPUT[] to CPU memory

  15. CHALLENGES Usability ✔ Scope ✔ Performance ? Test cases (CSV format) Unmodified ParTeCL ParTeCL source files Execution OpenCL CodeGen Runtime on the GPU Config file

  16. EVALUATION 1. Speedup against CPU 2. Data transfer overhead 3. Comparison to a multi-core CPU 4. Correctness

  17. EXPERIMENT Subjects: EEMBC - Industry-standard benchmark suite for embedded software Hardware: GPU - NVidia Tesla K40m; CPU - Intel Xeon, 8 cores Test suite size: 130K

  18. SPEEDUP AGAINST CPU

  19. DATA TRANSFER OVERHEAD viterb00 fbital00 a2time01 autcor00 120 140 Input transfer Input transfer Input transfer Input transfer Output transfer Output transfer Output transfer Output transfer 40 80 120 100 Kernelexecution Kernelexecution Kernelexecution Kernelexecution Execution time [ms] Execution time [ms] Execution time [ms] Execution time [ms] 100 80 30 60 80 60 20 40 60 40 40 10 20 20 20 0 0 0 0 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17 17 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17 17 2 2 2 2 Number of tests (log base 2 scale) Number of tests (log base 2 scale) Number of tests (log base 2 scale) Number of tests (log base 2 scale) tblook01 fft00 conven00 puwmod01 rspeed01 140 Input transfer Input transfer Input transfer 140 Input transfer Input transfer 60 Output transfer Output transfer Output transfer Output transfer 60 Output transfer 250 120 120 Kernelexecution Kernelexecution Kernelexecution Kernelexecution Kernelexecution Execution time [ms] Execution time [ms] Execution time [ms] 50 Execution time [ms] Execution time [ms] 50 100 200 100 40 40 80 80 150 30 30 60 60 100 20 20 40 40 50 10 20 10 20 0 0 0 0 0 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17 17 17 17 17 2 2 2 2 2 Number of tests (log base 2 scale) Number of tests (log base 2 scale) Number of tests (log base 2 scale) Number of tests (log base 2 scale) Number of tests (log base 2 scale)

  20. DATA TRANSFER OVERHEAD

  21. COMPARISON TO A MULTI-CORE CPU

  22. CHALLENGES Usability ✔ Scope ✔ Performance ✔

  23. CORRECTNESS For all 9 benchmarks, testing results from the GPU are an exact match to the testing results from the CPU.

  24. SUMMARY Automatic GPU code generation Automatic test execution on the GPU threads Speedup of up to 53x (avg 16x) on EEMBC benchmarks Correct testing results

  25. SUMMARY Automatic GPU code generation Automatic test execution on the GPU threads Speedup of up to 53x (avg 16x) on EEMBC benchmarks Correct testing results FUTURE WORK Extend evaluation & scope Analyse & improve performance

  26. THANKS github.com/wyaneva/partecl-codegen ParTeCL CodeGen ParTeCL Runtime github.com/wyaneva/partecl-runtime github.com/wyaneva/clClibc clClibc

  27. C FEATURES Out of the box: pure functions, function calls, double precision (for OpenCL 1.2) With transformations: standard in/out global scope variables standard library calls (partial support) Unsupported (yet): dynamic memory allocation file I/O recursion

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend