Supercomputing as a Service: Massively-Parallel Jobs on FaaS - PowerPoint PPT Presentation

Supercomputing as a Service: Massively-Parallel Jobs on FaaS Platforms Sadjad Fouladi Stanford University

Compiling clang takes >2 hours. https://xkcd.com/303/

R O T I D E "MY VIDEO'S ENCODING!" ENCODING! Compressing a 15-minute 4K video takes ~7.5 hours.

R O T A M I N A " M Y A N I M A T I O N ' S R E N D E R I N G ! " RENDERING! Rendering each frame of Monsters University took 29 hours.

The Problem Many of these pipelines take hours and hours to finish.

The Question Can we achieve interactive speeds in these applications?

The Answer Massive Parallelism* * well, probably.

How to get thousands of threads? • The largest companies are able to operate massive datacenters that can support such levels of parallelism. • But, end users and developers are unable to scale their resource footprint to thousands of parallel threads on demand in an efficient and scalable manner. 8

Classic Approach: VMs • Infrastructure-as-a-Service Thousands of threads • Arbitrary Linux executables • 👏 Minute-scale startup time (OS has to boot up, ...) 👏 High minimum cost 9

Cloud function services have (as yet) unrealized power • AWS Lambda, Google Cloud Functions, IBM Cloud Functions, Azure Functions, etc. • Intended for event handlers and Web microservices, but... • Features: ✔ Thousands of threads ✔ Arbitrary Linux executables ✔ Sub-second startup ✔ Sub-second billing 3,600 threads for one second → 10 ¢ 10

Supercomputing as a Service Encoding Compressing this video will take a long time. How do you want to execute this job? Locally (~5 hours) Remotely (~5 secs, 50¢) Cancel 11

Two projects that we did based on this promise: • ExCamera : Low-Latency Video Processing • gg : make -j1000 (and other jobs) on FaaS infrastructure 12

ExCamera: Low-Latency Video Processing Using Thousands of Tiny Threads Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. "Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads." In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI ʼ 17).

What we currently have • People can make changes to a word-processing document • The changes are instantly visible for the others 14

What we would like to have for Video ? • People can interactively edit and transform a video • The changes are instantly visible for the others

"Apply this awesome filter to my video."

"Look everywhere for this face in this movie."

"Remake Star Wars Episode I without Jar Jar."

Challenges in low-latency video processing • Low-latency video processing would need thousands of threads , running in parallel , with instant startup. • However, the finer-grained the parallelism, the worse the compression e ffi ciency. 19

First challenge: thousands of threads rendezvous server • We built mu , a library for designing and deploying general-purpose parallel computations on a commercial “cloud function” service. λ λ λ λ • The system starts up thousands of threads in seconds and manages inter-thread communication. • mu is open-source software: https://github.com/ excamera/mu local machine 20

Second challenge: parallelism hurts compression efficiency • Existing video codecs only expose a simple interface that's not suitable for massive parallelism. • We built a video codec in explicit state-passing style , intended for massive fine-grained parallelism . • Implemented in 11,500 lines of C++11 for Google's VP8 format. decode (state, frame) → (state ′ , image)   encode (state, image) → interframe   rebase (state, image, interframe) → interframe ′ 21

14.8 -minute 4K Video @20dB vpxenc Single-Threaded 453 mins vpxenc Multi-Threaded 149 mins YouTube (H.264) 37 mins ExCamera 2.6 mins

ExCamera • Two major contributions: • Framework to run 5,000-way parallel jobs with IPC on a commercial “cloud function” service. • Purely functional video codec for massive fine-grained parallelism . • 56 × faster than existing encoder, for <$6. 23

gg : make -j1000 (and other jobs) on function-as-a-service infrastructure Sadjad Fouladi, Dan Iter, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, Keith Winstein

What is gg ? • gg is a system for executing dirname.c string.h closeout.c stdio.h hello.c interdependent software workflows across thousands of dirname.i closeout.i hello.i short-lived “lambdas”. dirname.s closeout.s hello.s dirname.o closeout.o libc libhello.a hello.o hello hello (stripped) 25

" Thunk " abstraction dirname.c string.h closeout.c stdio.h hello.c { "function": { "exe": "g++", dirname.i closeout.i hello.i "args": ["-S", "dirname.i", "-o",...], dirname.s closeout.s hello.s "hash": "A5BNh" }, "infiles": [ { "name": "dirname.i", dirname.o closeout.o "order": 1, "hash": "SoYcD" }, libc libhello.a hello.o { "name": "g++", "order": 0, hello "hash": "A5BNh" } ], hello (stripped) "outfile": "dirname.s" } 26

" Thunk " abstraction • Thunk is an abstraction for { "function": { "exe": "g++", "args": ["-S", "dirname.i", representing a morsel of computation "-o",...], in terms of a function and its "hash": "AsBNh" }, "infiles": [ complete functional footprint . { "name": "dirname.i", "order": 1, "hash": "SoYcD" • Thunks can be forced anywhere , on }, { the local machine, or on a remote "name": "g++", "order": 0, VM, or inside a lambda function. "hash": "ts0sB" } ], "outfile": "dirname.s" } 27

Execution • Generating the dependency graph in terms of thunks :   gg-infer make • Forcing the thunk, recursively:   gg-force --jobs 1000 bin/clang 28

Compiling FFmpeg using gg 30 30 job completed � job completed � Fetching the dependencies Executing the thunk 25 25 Uploading the results 20 20 time (s) archive, link and strip � archive, link and strip � 15 15 � preprocess, compile and assemble 10 10 5 5 0 0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5080 5095 5115 worker # worker # 29

Evaluation gg ( λ ) single-core 9m 45s 35s ffmpeg 33m 35s 1m 15s inkscape 1h 16m 18s 1m 11s llvm 30

gg is open-source software https://github.com/StanfordSNR/gg 31

Takeaways • The future is granular, interactive and massively parallel. • Many applications can benefit from this "Laptop Extension" model. • Better platforms are needed to be built to support "bursty" massively-parallel jobs. 32

JUST USE GG! https://github.com/StanfordSNR/gg 33

Supercomputing as a Service: Massively-Parallel Jobs on FaaS - PowerPoint PPT Presentation

Supercomputing as a Service: Massively-Parallel Jobs on FaaS Platforms Sadjad Fouladi Stanford University Compiling clang takes >2 hours. https://xkcd.com/303/ R O T I D E "MY VIDEO'S ENCODING!" ENCODING! Compressing a

JOBS, JOBS, JOBS! JOBS, JOBS, JOBS! Jobs, jobs, JO JOBS! JOBS, JOBS, JOBS! The other reality

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Breaking the Linear-Memory Barrier in Massively Parallel Computing MIS on Trees with Strongly

Casey Rosenthal @caseyrosenthal Part One. SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E

Loosely Dependent Parallel Processes Complementary Paradigms Massively Parallel Task

Massively Parallel Graph Analytics Supercomputing for large-scale graph analytics George M. Slota

Scalable Massively Parallel I/O to Task-Local Files | Wolfgang Frings, Jlich Supercomputing

Massively Parallel A* Search on a GPU Yichao Zhou Jianyang Zeng Institute for Interdisciplinary

Massively Parallel Communication and Query Evaluation Paul Beame U. of Washington Based on

MPMPLAPACK: A Massively Parallel Multi-Precision Linear Algebra Package Jason Martin

Jobs at sea TRINITY HOUSE // KEY STAGE 2 JOBS AT SEA Starter Activity 1 TRINITY HOUSE //

Green Jobs Employment experiences Green Jobs Employment experiences Green Jobs Employment

Green Jobs, Decent Work and Sustainable Development Ana Sanchez Green Jobs Programme Green Jobs

Scalable Parallel I/O Alternatives for Massively Parallel Partitioned Solver Systems Jing Fu,

Just-in-time Staging of Large Input Just-in-time Staging of Large Input Data for Supercomputing

The Barcelona Supercomputing Center Sergi Girona Operations Director 04/12/2019 e-IRG workshop

Tutorial: Intro to Java Getting started. 1 CS 349 - Java tutorial Background Designed by

Aer Acceptance: Reasoning About System Outputs Dr. Stefanos Zachariadis @thenewstef

What is Unix? Unix is a multi-user Operating System Pros Intro To Unix Powerful, reliable,

SWE 681 / ISA 681 Secure So0ware Design & Programming: Lecture 2: Input ValidaCon Dr. David

JAVA Java vs. Java Java Language Specification

Defining Machine Learning Dr. Alex Williams August 21, 2020 COSC 425: Introduction to Machine

Carsten Ziegeler

Combinatory Categorial Grammar The effort to develop natural language grammars and

Supercomputing as a Service: Massively-Parallel Jobs on FaaS - PowerPoint PPT Presentation

Supercomputing as a Service: Massively-Parallel Jobs on FaaS Platforms Sadjad Fouladi Stanford University Compiling clang takes >2 hours. https://xkcd.com/303/ R O T I D E "MY VIDEO'S ENCODING!" ENCODING! Compressing a

JOBS, JOBS, JOBS! JOBS, JOBS, JOBS! Jobs, jobs, JO JOBS! JOBS, JOBS, JOBS! The other reality

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Breaking the Linear-Memory Barrier in Massively Parallel Computing MIS on Trees with Strongly

Casey Rosenthal @caseyrosenthal Part One. SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E

Loosely Dependent Parallel Processes Complementary Paradigms Massively Parallel Task

Massively Parallel Graph Analytics Supercomputing for large-scale graph analytics George M. Slota

Scalable Massively Parallel I/O to Task-Local Files | Wolfgang Frings, Jlich Supercomputing

Massively Parallel A* Search on a GPU Yichao Zhou Jianyang Zeng Institute for Interdisciplinary

Massively Parallel Communication and Query Evaluation Paul Beame U. of Washington Based on

MPMPLAPACK: A Massively Parallel Multi-Precision Linear Algebra Package Jason Martin

Jobs at sea TRINITY HOUSE // KEY STAGE 2 JOBS AT SEA Starter Activity 1 TRINITY HOUSE //

Green Jobs Employment experiences Green Jobs Employment experiences Green Jobs Employment

Green Jobs, Decent Work and Sustainable Development Ana Sanchez Green Jobs Programme Green Jobs

Scalable Parallel I/O Alternatives for Massively Parallel Partitioned Solver Systems Jing Fu,

Just-in-time Staging of Large Input Just-in-time Staging of Large Input Data for Supercomputing

The Barcelona Supercomputing Center Sergi Girona Operations Director 04/12/2019 e-IRG workshop

Tutorial: Intro to Java Getting started. 1 CS 349 - Java tutorial Background Designed by

Aer Acceptance: Reasoning About System Outputs Dr. Stefanos Zachariadis @thenewstef

What is Unix? Unix is a multi-user Operating System Pros Intro To Unix Powerful, reliable,

SWE 681 / ISA 681 Secure So0ware Design &amp; Programming: Lecture 2: Input ValidaCon Dr. David

JAVA Java vs. Java Java Language Specification

Defining Machine Learning Dr. Alex Williams August 21, 2020 COSC 425: Introduction to Machine

Carsten Ziegeler

Combinatory Categorial Grammar The effort to develop natural language grammars and

SWE 681 / ISA 681 Secure So0ware Design & Programming: Lecture 2: Input ValidaCon Dr. David