DryadLINQ
A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
DryadLINQ A System for General-Purpose Distributed Data-Parallel - - PowerPoint PPT Presentation
DryadLINQ A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language Overview Motivation for DryadLINQ Design Implementation Performance Q & A Motivation More machines + more code =
A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
– Difcult to express common programming constructs
– Not fexible enough – Inefcient for some use cases
– Have to specify DAG – Harder to write
– Execution Engine
– Declarative + Imperative + Object Oriented
var adjustedScoreTriples = from d in scoreTriples join r in staticRank on d.docID equals r.key select new QueryScoreDocIDTriple(d,r); var adjustedScoreTriples = scoreTriples.Join(staticRank, d => d.docID, r => r.key, (d, r) => new QueryScoreDocIDTriple(d, r));
– Language embedded – Compiler Hints – Functions must have no side efects – Non-interactive
– Distributed – Strongly typed – Mutable – Nested generics – Lazy Evaluation
// Do Stuf … var DT = T
able(X); foreach (row in DT) { // Do more stuf … }
– I/O reduction – Pipelining – Eager aggregation
– Partitioning – T
– Lazy evaluation
eraSort
~ 3.87 Gb per machine