Scripting the cloud with Skywriting Derek G. Murray Steven Hand - - PowerPoint PPT Presentation

scripting the cloud with skywriting
SMART_READER_LITE
LIVE PREVIEW

Scripting the cloud with Skywriting Derek G. Murray Steven Hand - - PowerPoint PPT Presentation

Scripting the cloud with Skywriting Derek G. Murray Steven Hand University of Cambridge A universal model? MapReduce A universal model? MapReduce A universal model! Move computation to the data Code Driver program Results submitJob();


slide-1
SLIDE 1

Scripting the cloud
 with Skywriting

Derek G. Murray Steven Hand

University of Cambridge

slide-2
SLIDE 2

A universal model?

MapReduce

slide-3
SLIDE 3

A universal model?

MapReduce

slide-4
SLIDE 4

A universal model!

slide-5
SLIDE 5

Move computation to the data

Driver
 program submitJob(); Code Results

slide-6
SLIDE 6

while (!converged) do work in parallel;

slide-7
SLIDE 7

Iterative algorithm

Code Results Code Results Code Results Code Results Driver
 program submitJob(); Driver program while (…) submitJob();

slide-8
SLIDE 8

Iterative algorithm

Driver
 program Code Results Code Driver program while (…) submitJob();

slide-9
SLIDE 9

Skywriting

Code Results while (…) doStuff();

slide-10
SLIDE 10

Skywriting

  • JavaScript-like job specification language

– Supports functional programming – Data-dependent control flow

  • Distributed execution engine

– Locality-based scheduling – Fault tolerance – Thread migration

slide-11
SLIDE 11

Spawning a task

function f(x) { return x + 1; } res1 = spawn(f, [42]);

slide-12
SLIDE 12

Task dependencies

function f(x) { return x + 1; } function g(y) { … } res1 = spawn(f, [42]); res2 = spawn(g, [res1]);

res1 and res2 are future references

slide-13
SLIDE 13

Logistic regression

points = […]; // List of partitions w = …; // Random initial value for (i in range(0, ITERATIONS)) { w_old = w; results = []; for (part in points) { results += spawn(log_reg, [part, w_old]); } w = spawn(update, [w_old, results]); }

slide-14
SLIDE 14

Logistic regression

points = […]; // List of partitions w = …; // Random initial value do { w_old = w; results = []; for (part in points) { results += spawn(log_reg, [part, w_old]); } w = spawn(update, [w_old, results]); done = spawn(converged, [w_old, w]); } while (!*done);

slide-15
SLIDE 15

Logistic regression

*‐operator dereferences (forces) a future

slide-16
SLIDE 16

Implementation status

  • Implemented in 4000 lines of Python

– Also: Java, C and .NET bindings

  • Many additional features

– Native code execution – Introspection – Conditional synchronisation

  • Available as open-source

– http://github.com/mrry/skywriting

slide-17
SLIDE 17

Job creation overhead

10 20 30 40 50 60 20 40 60 80 100 Overhead (seconds) Number of workers Hadoop Skywriting

slide-18
SLIDE 18

Future directions

  • Multiple-scale parallel computing

– Multiple cores, machines and clouds

  • Streaming computations

– Piping high-bandwidth data between tasks

  • Better language integration

– Hosted Skywriting on CLR or JVM

slide-19
SLIDE 19

Conclusions

  • Turing-complete programming language

for distributed computation

  • Runs real jobs with low overhead
  • Lots more still to do!
slide-20
SLIDE 20

Questions?

  • Email

– Derek.Murray@cl.cam.ac.uk

  • Project website

– http://www.cl.cam.ac.uk/netos/skywriting/