DryadLINQ A System for General-Purpose Distributed Data-Parallel - - PowerPoint PPT Presentation

dryadlinq
SMART_READER_LITE
LIVE PREVIEW

DryadLINQ A System for General-Purpose Distributed Data-Parallel - - PowerPoint PPT Presentation

DryadLINQ A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language Overview Motivation for DryadLINQ Design Implementation Performance Q & A Motivation More machines + more code =


slide-1
SLIDE 1

DryadLINQ

A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language

slide-2
SLIDE 2

Overview

  • Motivation for DryadLINQ
  • Design
  • Implementation
  • Performance
  • Q & A
slide-3
SLIDE 3

Motivation

  • More machines + more code = more problems
  • Need to simplify!
  • Solution → Higher-level Language
slide-4
SLIDE 4

Design Goals

  • Easy to write
  • General Purpose
  • Efcient
slide-5
SLIDE 5

Existing Solutions

  • SQL

– Difcult to express common programming constructs

  • MapReduce

– Not fexible enough – Inefcient for some use cases

  • Dryad

– Have to specify DAG – Harder to write

slide-6
SLIDE 6

DryadLINQ

  • Dryad

– Execution Engine

  • Language INtegrated Query

– Declarative + Imperative + Object Oriented

slide-7
SLIDE 7

LINQ vs. SQL

  • Expressions can be directly embedded in code
  • Allow direct calls to C#, F#, … functions
  • Evaluated by Dryad
slide-8
SLIDE 8

LINQ expressions

  • Declarative
  • OO

var adjustedScoreTriples = from d in scoreTriples join r in staticRank on d.docID equals r.key select new QueryScoreDocIDTriple(d,r); var adjustedScoreTriples = scoreTriples.Join(staticRank, d => d.docID, r => r.key, (d, r) => new QueryScoreDocIDTriple(d, r));

slide-9
SLIDE 9

API

  • Compatible with many .NET Languages (e.g. C#)
  • DryadLINQ vs. SPARK

– Language embedded – Compiler Hints – Functions must have no side efects – Non-interactive

slide-10
SLIDE 10

Data Model

  • IEnumberable<T> vs. RDD’s

– Distributed – Strongly typed – Mutable – Nested generics – Lazy Evaluation

slide-11
SLIDE 11

Execution

  • Similar to SQL query plan
  • Create execution plan graph
  • Static Optimizations
  • Pass to Dryad Job Manager
  • Dynamic Optimzations
slide-12
SLIDE 12

Expression Execution

// Do Stuf … var DT = T

  • DryadT

able(X); foreach (row in DT) { // Do more stuf … }

slide-13
SLIDE 13

Optimizations

  • Static

– I/O reduction – Pipelining – Eager aggregation

  • Dynamic

– Partitioning – T

  • pology aware aggregation

– Lazy evaluation

slide-14
SLIDE 14

Example: OrderBy

slide-15
SLIDE 15

Performance

  • T

eraSort

  • Skyserver Q18 computation
slide-16
SLIDE 16

T eraSort

~ 3.87 Gb per machine

slide-17
SLIDE 17

Comparison

slide-18
SLIDE 18

Q & A