P ARALLEL P ROGRAMS U SING A D OMAIN -S PECIFIC L ANGUAGE T OBIAS K - - PowerPoint PPT Presentation
P ARALLEL P ROGRAMS U SING A D OMAIN -S PECIFIC L ANGUAGE T OBIAS K - - PowerPoint PPT Presentation
T OWARDS I NTERACTIVE V ISUAL E XPLORATION OF M ASSIVELY P ARALLEL P ROGRAMS U SING A D OMAIN -S PECIFIC L ANGUAGE T OBIAS K LEIN T OOL F OR D EVELOPMENT AND A NALYSIS OF O PEN CL K ERNELS W HY V ISUALIZE P ARALLEL P ROGRAMS ? I NSPIRED BY A
TOOL FOR DEVELOPMENT AND ANALYSIS OF OPENCL KERNELS
WHY VISUALIZE PARALLEL PROGRAMS?
- INSPIRED BY ALGORITHM VISUALIZATION
[MIKE BOSTOCK]
WHY VISUALIZE PARALLEL PROGRAMS?
- GENERAL UNDERSTANDING
- DEBUGGING
- PERFORMANCE ANALYSIS
- RAPID PROTOTYPING
VISUAL ENCODING OF PROGRAM BEHAVIOR
VISUAL ENCODING – LOCAL MEMORY ACCESSES
var[id] = 1; var[id]; barrier(CLK_LOCAL_MEM_FENCE); ...
if (condition){ instruction; } else { instruction; } if (condition){ instruction; } else { instruction; } if (condition){ instruction; } else { instruction; } if (condition){ instruction; } else { instruction; }
7
VISUAL ENCODING - WARP DIVERGENCE
1(true) 1(true) 11 11
if (condition){ instruction; } else { instruction; } if (condition){ instruction; } else { instruction; }
1(true) 0(false)
if (condition){ instruction; } else { instruction; } if (condition){ instruction; } else { instruction; }
0(false) 0(false) 00 11 10
nr of threads
16 8 8
warp
VISUAL ENCODING - DEMO
HOW DO WE CREATE PROGRAM VISUALIZATIONS
DOMAIN-SPECIFIC LANGUAGE
- Device DSL
- Executed in parallel
- OpenCL C + Annotations
- Just-in-time compiled
- Host DSL
- Device configuration
- Domain Objects (Data, Images, Visualizer)
- No compilation
DEVICE DSL - ANNOTATIONS
kernel reduction(global float *g_idata, global float *g_odata, int n) { . . . if (i < n) { sdata[tid] = g_idata[i];} else { sdata[tid] = 0; } barrier(CLK_LOCAL_MEM_FENCE); @watch[start] sdata; for (int s = 1; s < get_local_size(0); s *= 2) { int index = 2 * s * tid; if (index < get_local_size(0)) { sdata[index] += sdata[index + s]; } @watch[end] sdata; . . . }
int local_size = 64; int global_size = 4096; float[512] in; float[512] out; Arrays.fillRandom(in); Device.setLocalWorkSize(local_size, 1); Device.setGlobalWorkSize(global_size, 1); Device.measureExecutionTime(true); Device.setInstrumentation(true); Device.reduction(in, out, 512); Visualizer.plot(out);
HOST DSL
GPU Data Structures
Domain Objects Device Configurations Device Kernel Call Common language concepts Visualize data
USECASE – IMAGE PROCESSING
BILATERAL FILTER
- EDGE PRESERVING AND SMOOTHING
- GAUSSIAN WEIGHT
- RANGE WEIGHT (COLOR)
ANALYSIS OF INTERMEDIATE VALUES
- REPRESENTED AS IMAGES
- REPRESENTED IN A PLOT
USECASE – IMAGE PROCESSING
SUMMARY
- Visualizations can help you to understand the
execution behavior of your GPU Program
- They provide a simple way to reveal see possible
issues in your implementation
- DSLs are able to capture common concepts of a