Supercomputing as a Service: Massively-Parallel Jobs on FaaS Platforms
Sadjad Fouladi Stanford University
Supercomputing as a Service: Massively-Parallel Jobs on FaaS - - PowerPoint PPT Presentation
Supercomputing as a Service: Massively-Parallel Jobs on FaaS Platforms Sadjad Fouladi Stanford University Compiling clang takes >2 hours. https://xkcd.com/303/ R O T I D E "MY VIDEO'S ENCODING!" ENCODING! Compressing a
Supercomputing as a Service: Massively-Parallel Jobs on FaaS Platforms
Sadjad Fouladi Stanford University
https://xkcd.com/303/
Compiling clang takes >2 hours.
E D I T O R "MY VIDEO'S ENCODING!"
ENCODING!
Compressing a 15-minute 4K video takes ~7.5 hours.
A N I M A T O R " M Y A N I M A T I O N ' S R E N D E R I N G ! "
RENDERING!
Rendering each frame of Monsters University took 29 hours.
Many of these pipelines take hours and hours to finish.
The Problem
Can we achieve interactive speeds in these applications?
The Question
The Answer * well, probably.
How to get thousands of threads?
support such levels of parallelism.
thousands of parallel threads on demand in an efficient and scalable manner.
8
Classic Approach: VMs
👏 Minute-scale startup time (OS has to boot up, ...) 👏 High minimum cost
9
Cloud function services have (as yet) unrealized power
Functions, etc.
✔ Thousands of threads ✔ Arbitrary Linux executables ✔ Sub-second startup ✔ Sub-second billing
10
3,600 threads for one second → 10¢
Supercomputing as a Service
11
Cancel Remotely (~5 secs, 50¢) Locally (~5 hours)
Compressing this video will take a long
job?
Encoding
Two projects that we did based on this promise:
12
ExCamera: Low-Latency Video Processing Using Thousands of Tiny Threads
Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. "Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads." In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDIʼ17).
What we currently have
14
What we would like to have
"Apply this awesome filter to my video."
"Look everywhere for this face in this movie."
"Remake Star Wars Episode I without Jar Jar."
Challenges in low-latency video processing
parallel, with instant startup.
efficiency.
19
First challenge: thousands of threads
general-purpose parallel computations on a commercial “cloud function” service.
seconds and manages inter-thread communication.
excamera/mu
20
λ λ λ λ
rendezvous server local machine
Second challenge: parallelism hurts compression efficiency
massive parallelism.
fine-grained parallelism.
21
decode(state, frame) → (state′, image) encode(state, image) → interframe rebase(state, image, interframe) → interframe′
ExCamera
“cloud function” service.
23
gg: make -j1000 (and other jobs) on function-as-a-service infrastructure
Sadjad Fouladi, Dan Iter, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, Keith Winstein
What is gg?
interdependent software workflows across thousands of short-lived “lambdas”.
25
hello
(stripped)
libc hello libhello.a hello.c hello.i dirname.c dirname.i closeout.c closeout.i string.h stdio.h hello.o hello.s closeout.o closeout.s dirname.o dirname.s
"Thunk" abstraction
26
hello
(stripped)
libc hello libhello.a hello.c hello.i dirname.c dirname.i closeout.c closeout.i string.h stdio.h hello.o hello.s closeout.o closeout.s dirname.o dirname.s
{ "function": { "exe": "g++", "args": ["-S", "dirname.i", "-o",...], "hash": "A5BNh" }, "infiles": [ { "name": "dirname.i", "order": 1, "hash": "SoYcD" }, { "name": "g++", "order": 0, "hash": "A5BNh" } ], "outfile": "dirname.s" }
"Thunk" abstraction
representing a morsel of computation in terms of a function and its complete functional footprint.
the local machine, or on a remote VM, or inside a lambda function.
27
{ "function": { "exe": "g++", "args": ["-S", "dirname.i", "-o",...], "hash": "AsBNh" }, "infiles": [ { "name": "dirname.i", "order": 1, "hash": "SoYcD" }, { "name": "g++", "order": 0, "hash": "ts0sB" } ], "outfile": "dirname.s" }
Execution
gg-infer make
gg-force --jobs 1000 bin/clang
28
Compiling FFmpeg using gg
29
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5 10 15 20 25 30
worker #
Fetching the dependencies Executing the thunk Uploading the results
preprocess, compile and assemble archive, link and strip job completed
5080 5095 5115 5 10 15 20 25 30
time (s) worker #
job completed archive, link and strip
Evaluation
30
single-core gg (λ) ffmpeg 9m 45s 35s inkscape 33m 35s 1m 15s llvm 1h 16m 18s 1m 11s
gg is open-source software
https://github.com/StanfordSNR/gg
31
Takeaways
jobs.
32
JUST USE GG!
33
https://github.com/StanfordSNR/gg