Composable GPU programming GPUs -- what are they? Basic model: - - PowerPoint PPT Presentation

composable gpu programming gpus what are they
SMART_READER_LITE
LIVE PREVIEW

Composable GPU programming GPUs -- what are they? Basic model: - - PowerPoint PPT Presentation

Composable GPU programming GPUs -- what are they? Basic model: SIMD, SPMD, MIMD; blocks of PUs with single PC, local memory (synchronous); warps many blocks (asynchronous), VRAM discontinuities/constraints from hardware


slide-1
SLIDE 1

Composable GPU programming

slide-2
SLIDE 2

GPUs -- what are they?

  • Basic model: SIMD, SPMD, MIMD;
  • blocks of PUs with single PC, local

memory (synchronous); warps

  • many blocks (asynchronous), VRAM
  • discontinuities/constraints from hardware

implementation of memory access;

  • next-generation hardware likely to

mediate this to make programmability more orthogonal

slide-3
SLIDE 3

GPUs -- what are they?

  • Basic model: SIMD, SPMD, MIMD;
  • blocks of PUs with single PC, local

memory (synchronous); warps

  • many blocks (asynchronous), VRAM
  • discontinuities/constraints from hardware

implementation of memory access;

  • next-generation hardware likely to

mediate this to make programmability more orthogonal

Revenge of the PRAM?

slide-4
SLIDE 4

Programming GPUs

  • CUDA: C-like language for general-purpose

programming with code generated for GPUs

  • previously: OpenGL for graphics programming
  • coming up: OpenCL (compute language)
  • foo<<m, n, k>> (args)
  • Execute foo with implicit argument i, j (block,

PU) selecting from arguments

  • Care required when accessing memory: out of

sequence accesses sequentialized!

slide-5
SLIDE 5

GPU language projects

  • Data parallel Haskell:
  • Programming flat PRAM level
  • Nested/compositional programming
  • map (map f) (xss)
  • Obsidian: Combinator language for

generating CUDA code

  • explicit synchronization
  • choosing threads, mapping to blocks
slide-6
SLIDE 6

How to exploit?

  • Performance: If you have a data parallel

problem, formulate it using scan, map, fold, permute on bulk data (arrays), have it shipped out to a GPU!

  • If you can’t figure out how to do that, do

not expect magic from your compiler.

slide-7
SLIDE 7

Qualities

  • Obsidian good candidate for capturing two-level

model (synchronous blocks and asynchronous sets of blocks) and implementing APRAM model

  • Excellent scan implementations
  • Data parallel Haskell good model for

programming APRAM model and for compositional abstraction on top of that

  • NESL with h.o. functions, polymorphism
slide-8
SLIDE 8

Requirements

  • Need a robust performance model: NESL

at PRAM level, sth else lower;

  • Need to stay in the same programming

model when engineering/tuning code

  • Need a robust programming model (sw/

hw) -- small changes shouldn't lead to unpredicatable radical changes in performance.

slide-9
SLIDE 9

(End)