P ARALLEL P ROGRAMS U SING A D OMAIN -S PECIFIC L ANGUAGE T OBIAS K - - PowerPoint PPT Presentation

p arallel p rograms u sing a d omain s pecific l anguage
SMART_READER_LITE
LIVE PREVIEW

P ARALLEL P ROGRAMS U SING A D OMAIN -S PECIFIC L ANGUAGE T OBIAS K - - PowerPoint PPT Presentation

T OWARDS I NTERACTIVE V ISUAL E XPLORATION OF M ASSIVELY P ARALLEL P ROGRAMS U SING A D OMAIN -S PECIFIC L ANGUAGE T OBIAS K LEIN T OOL F OR D EVELOPMENT AND A NALYSIS OF O PEN CL K ERNELS W HY V ISUALIZE P ARALLEL P ROGRAMS ? I NSPIRED BY A


slide-1
SLIDE 1

TOBIAS KLEIN

TOWARDS INTERACTIVE VISUAL EXPLORATION OF MASSIVELY PARALLEL PROGRAMS USING A DOMAIN-SPECIFIC LANGUAGE

slide-2
SLIDE 2

TOOL FOR DEVELOPMENT AND ANALYSIS OF OPENCL KERNELS

slide-3
SLIDE 3

WHY VISUALIZE PARALLEL PROGRAMS?

  • INSPIRED BY ALGORITHM VISUALIZATION

[MIKE BOSTOCK]

slide-4
SLIDE 4

WHY VISUALIZE PARALLEL PROGRAMS?

  • GENERAL UNDERSTANDING
  • DEBUGGING
  • PERFORMANCE ANALYSIS
  • RAPID PROTOTYPING
slide-5
SLIDE 5

VISUAL ENCODING OF PROGRAM BEHAVIOR

slide-6
SLIDE 6

VISUAL ENCODING – LOCAL MEMORY ACCESSES

var[id] = 1; var[id]; barrier(CLK_LOCAL_MEM_FENCE); ...

slide-7
SLIDE 7

if (condition){ instruction; } else { instruction; } if (condition){ instruction; } else { instruction; } if (condition){ instruction; } else { instruction; } if (condition){ instruction; } else { instruction; }

7

VISUAL ENCODING - WARP DIVERGENCE

1(true) 1(true) 11 11

if (condition){ instruction; } else { instruction; } if (condition){ instruction; } else { instruction; }

1(true) 0(false)

if (condition){ instruction; } else { instruction; } if (condition){ instruction; } else { instruction; }

0(false) 0(false) 00 11 10

nr of threads

16 8 8

warp

slide-8
SLIDE 8

VISUAL ENCODING - DEMO

slide-9
SLIDE 9

HOW DO WE CREATE PROGRAM VISUALIZATIONS

slide-10
SLIDE 10

DOMAIN-SPECIFIC LANGUAGE

  • Device DSL
  • Executed in parallel
  • OpenCL C + Annotations
  • Just-in-time compiled
  • Host DSL
  • Device configuration
  • Domain Objects (Data, Images, Visualizer)
  • No compilation
slide-11
SLIDE 11

DEVICE DSL - ANNOTATIONS

kernel reduction(global float *g_idata, global float *g_odata, int n) { . . . if (i < n) { sdata[tid] = g_idata[i];} else { sdata[tid] = 0; } barrier(CLK_LOCAL_MEM_FENCE); @watch[start] sdata; for (int s = 1; s < get_local_size(0); s *= 2) { int index = 2 * s * tid; if (index < get_local_size(0)) { sdata[index] += sdata[index + s]; } @watch[end] sdata; . . . }

slide-12
SLIDE 12

int local_size = 64; int global_size = 4096; float[512] in; float[512] out; Arrays.fillRandom(in); Device.setLocalWorkSize(local_size, 1); Device.setGlobalWorkSize(global_size, 1); Device.measureExecutionTime(true); Device.setInstrumentation(true); Device.reduction(in, out, 512); Visualizer.plot(out);

HOST DSL

GPU Data Structures

Domain Objects Device Configurations Device Kernel Call Common language concepts Visualize data

slide-13
SLIDE 13

USECASE – IMAGE PROCESSING

BILATERAL FILTER

  • EDGE PRESERVING AND SMOOTHING
  • GAUSSIAN WEIGHT
  • RANGE WEIGHT (COLOR)

ANALYSIS OF INTERMEDIATE VALUES

  • REPRESENTED AS IMAGES
  • REPRESENTED IN A PLOT
slide-14
SLIDE 14

USECASE – IMAGE PROCESSING

slide-15
SLIDE 15

SUMMARY

  • Visualizations can help you to understand the

execution behavior of your GPU Program

  • They provide a simple way to reveal see possible

issues in your implementation

  • DSLs are able to capture common concepts of a

domain without reducing flexibility

slide-16
SLIDE 16

THANK YOU

tobias.da.klein@gmail.com

Contact

peter.rautek@kaust.edu.sa