COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE
VANYA YANEVA Ajitha Rajan, Christophe Dubach
ISSTA 2017 10 July 2017 Santa Barbara, CA
COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE - - PowerPoint PPT Presentation
COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE VANYA YANEVA Ajitha Rajan, Christophe Dubach ISSTA 2017 10 July 2017 Santa Barbara, CA EMBEDDED SOFTWARE IS EVERYWHERE ITS SAFETY AND CORRECTNESS ARE CRUCIAL FUNCTIONAL
ISSTA 2017 10 July 2017 Santa Barbara, CA
Test suite
Test case 1 Test case 2 Test case 3 Application Test case n Expected result 1 Expected result 2 Expected result 3 Expected result n
Test suite
Test case 1 Test case 2 Test case 3 Application Test case n Expected result 1 Expected result 2 Expected result 3 Expected result n
Expensive Do not scale easily as test suites grow Can be extremely underutilised
Expensive Do not scale easily as test suites grow Can be extremely underutilised
Cheap and widely available Large-scale parallelism, thousands of threads SIMD architecture suited to functional testing
Test suite
Test case 1 Test case 2 Test case 3 Test case n Expected result 1 Expected result 2 Expected result 3 Expected result n Read test cases: INPUT[] = {test case 1 … test case n} Transfer INPUT[] to GPU memory Build and launch tested program
Transfer OUTPUT[] to CPU memory 1 2 3 n-1 th_id
OUTPUT[th_id] = program( INPUT[th_id] )
102, Sweden, Nov 2014.
Test suite
Test case 1 Test case 2 Test case 3 Test case n Expected result 1 Expected result 2 Expected result 3 Expected result n Read test cases: INPUT[] = {test case 1 … test case n} Transfer INPUT[] to GPU memory Build and launch tested program
Transfer OUTPUT[] to CPU memory 1 2 3 n-1 th_id
OUTPUT[th_id] = program( INPUT[th_id] )
102, Sweden, Nov 2014.
CHALLENGES Usability ✘ Scope ✘ Performance ?
Unmodified source files Config file ParTeCL CodeGen OpenCL Execution
ParTeCL Runtime Test cases (CSV format)
#include <stdio.h> #include <stdlib.h> int c; int addc(int a, int b){ return a + b + c; } int main(int argc, char* argv[]){ int a = atoi(argv[1]); int b = atoi(argv[2]); c = 3; int sum = addc(a, b); printf("%d + %d + %c = %d\n", a, b, c, sum); } input: int a 1 input: int b 2 result: int sum variable: sum 1 13 7 2 50 22 3 1000 0 4 0 1000 5 0 0
#include <stdio.h> #include <stdlib.h> int c; int addc(int a, int b){ return a + b + c; } int main(int argc, char* argv[]){ int a = atoi(argv[1]); int b = atoi(argv[2]); c = 3; int sum = addc(a, b); printf("%d + %d + %c = %d\n", a, b, c, sum); } #include "structs.h" //#include <stdio.h> //#include <stdlib.h> /*int c;*/ int addc(int a, int b, int *c){ return a + b + (*c); } kernel void main_kernel( global struct test_input* inputs, global struct test_result* results){ int idx = get_global_id(0); struct test_input input_gen = inputs[idx]; global struct test_result *result_gen = &results[idx]; int argc = input_gen.argc; result_gen->test_case_num = input_gen.test_case_num; int c; int a = input_gen.a; int b = input_gen.b; c = 3; int sum = addc(a, b, &c); /*printf("%d + %d + %c = %d\n", a, b, c, sum);*/ result_gen->sum = sum; }
Read test cases: INPUT[] = {test case 1 … test case n} Transfer INPUT[] to GPU memory Build and launch tested program
Transfer OUTPUT[] to CPU memory 1 2 3 n-1 th_id
OUTPUT[th_id] = program( INPUT[th_id] )
Automatically generated
Unmodified source files Config file ParTeCL CodeGen OpenCL Execution
ParTeCL Runtime Test cases (CSV format)
Subjects: EEMBC - Industry-standard benchmark suite for embedded software Hardware: GPU - NVidia Tesla K40m; CPU - Intel Xeon, 8 cores Test suite size: 130K
2
8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17
Number of tests (log base 2 scale)
20 40 60 80 100 120 140
Execution time [ms]
a2time01
Input transfer Output transfer Kernelexecution
2
8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17
Number of tests (log base 2 scale)
10 20 30 40
Execution time [ms]
autcor00
Input transfer Output transfer Kernelexecution
2
8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17
Number of tests (log base 2 scale)
10 20 30 40 50 60
Execution time [ms]
conven00
Input transfer Output transfer Kernelexecution
2
8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17
Number of tests (log base 2 scale)
20 40 60 80
Execution time [ms]
fbital00
Input transfer Output transfer Kernelexecution
2
8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17
Number of tests (log base 2 scale)
10 20 30 40 50 60
Execution time [ms]
fft00
Input transfer Output transfer Kernelexecution
2
8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17
Number of tests (log base 2 scale)
50 100 150 200 250
Execution time [ms]
puwmod01
Input transfer Output transfer Kernelexecution
2
8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17
Number of tests (log base 2 scale)
20 40 60 80 100 120 140
Execution time [ms]
rspeed01
Input transfer Output transfer Kernelexecution
2
8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17
Number of tests (log base 2 scale)
20 40 60 80 100 120 140
Execution time [ms]
tblook01
Input transfer Output transfer Kernelexecution
2
8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17
Number of tests (log base 2 scale)
20 40 60 80 100 120
Execution time [ms]
viterb00
Input transfer Output transfer Kernelexecution
ParTeCL CodeGen github.com/wyaneva/partecl-codegen ParTeCL Runtime github.com/wyaneva/partecl-runtime clClibc github.com/wyaneva/clClibc
Out of the box: pure functions, function calls, double precision (for OpenCL 1.2) With transformations: standard in/out global scope variables standard library calls (partial support) Unsupported (yet): dynamic memory allocation file I/O recursion