Dandelion Review for R212: 24 th November 2014 Motivation GPU, - - PowerPoint PPT Presentation

dandelion
SMART_READER_LITE
LIVE PREVIEW

Dandelion Review for R212: 24 th November 2014 Motivation GPU, - - PowerPoint PPT Presentation

Dandelion Review for R212: 24 th November 2014 Motivation GPU, FPGA, Vector processors becoming increasingly common (data parallel, power requirements, SIMD, etc.) What is Dandelion? Compiler for native .NET-based LINQ Compiler code (in


slide-1
SLIDE 1

Dandelion

Review for R212: 24th November 2014

slide-2
SLIDE 2

Motivation

GPU, FPGA, Vector processors becoming increasingly common (data parallel, power requirements, SIMD, etc.)

slide-3
SLIDE 3

What is Dandelion? Compiler Runtime

  • Compiler for native

.NET-based LINQ code (in C# or F#) for GPU programming

  • Abstract scheduling

details from programmer: Multi {machine, CPU, GPU}

slide-4
SLIDE 4

Compiler

  • Clean interface to CUDA
  • Deal with CUDA complexities

– e.g. dynamic memory allocation

  • Bytecode compilation: benefits
  • Static analysis
slide-5
SLIDE 5

Runtime

  • Needs to consider three scenarios:

– Machine-machine – CPU-local – GPU

slide-6
SLIDE 6

Runtime

  • Needs to consider three scenarios:

– Machine-machine – CPU-local – GPU

slide-7
SLIDE 7

GPU dataflow

slide-8
SLIDE 8

GPU dataflow

slide-9
SLIDE 9

Compute cluster

  • Two techniques:

– Dryad: persistent storage, high availability – Moxie (developed for Dandelion):

Spark-like in-memory storage and checkpoints

slide-10
SLIDE 10

Compute cluster

  • Two techniques:

– Dryad: persistent storage, high availability – Moxie (developed for Dandelion):

Spark-like in-memory storage and checkpoints

Master Master Master Container Container Container

slide-11
SLIDE 11

Evaluation

slide-12
SLIDE 12

Single machine performance

slide-13
SLIDE 13

K-means

20x less code

slide-14
SLIDE 14

Criticisms

  • No discussion of inter-machine scheduling and

associated overheads

  • Claim to support FPGAs, but no evaluation of

this (cost reasons perhaps?).

  • Still suffering Garbage Collection due to

managed runtime overheads.

  • More evaluation beyond k-means?
slide-15
SLIDE 15

Summary

  • Data-parallel hardware becoming mainstream;

need high-level programming support.

  • Dandelion schedules work onto GPUs (and
  • thers) from a high-level C# or F#

implementation

  • Achieves noticeable (30x+) speed

improvements through use of GPUs, without learning overhead of CUDA or similar.