SEUSS: Skip Redundant Paths to Make Serverless Fast James Cadden , - - PowerPoint PPT Presentation

seuss skip redundant paths to make serverless fast
SMART_READER_LITE
LIVE PREVIEW

SEUSS: Skip Redundant Paths to Make Serverless Fast James Cadden , - - PowerPoint PPT Presentation

SEUSS: Skip Redundant Paths to Make Serverless Fast James Cadden , Thomas Unger, Yara Awad, Han Dong, Orran Krieger, Jonathan Appavoo Department of Computer Science Boston University The Proceedings of EuroSys, 2020 April 29th, 2020


slide-1
SLIDE 1

SEUSS: Skip Redundant Paths to Make Serverless Fast

The Proceedings of EuroSys, 2020 April 29th, 2020 James Cadden, Thomas Unger, Yara Awad, Han Dong, Orran Krieger, Jonathan Appavoo


Department of Computer Science Boston University

slide-2
SLIDE 2

Isolated Execution
 Environments

Application database

.py

f2

.rb

f3

.js

fn

… Platform API

.js

f1

Compute Resources

  • 1. Function-as-a-Service

(FaaS): on-demand execution of a client code snippet (functions)

  • 2. Applications are deployed

and scaled automatically

  • 3. Function start time is

dominated by deterministic setup & install paths

E1

  • 1. event
  • 2. pull app
  • 3. setup & install
  • 4. Run!

Serverless Computing

slide-3
SLIDE 3

Isolated Execution
 Environments

Application database

.py

f2

.rb

f3

.js

fn

… Platform API

.js

f1

Compute Resources

  • 4. Functions deploy quickly

using a pre-initialized environment!

  • 1. event
  • 2. Run!

E1

Serverless Computing

slide-4
SLIDE 4

Isolated Execution
 Environments

Function database

.py

f2

.rb

f3

.js

fn

… FaaS API

.js

f1

Compute Resources

  • 5. FaaS platform utility

becomes a matter of cache efficiency!

  • 6. Mechanism of the system-

level defines the security, cache density, and responsiveness

P1

EXECUTION CACHE

Serverless Computing

slide-5
SLIDE 5

Machine Density Start Time

Linux Process 4200 0.3 s Docker Container 3000 0.5 to 4 s Container in a MicroVM 450 1 to 7 s

Linux v4.15; Docker v18.09; [Xeon 2.20 GHz; 88GB]

less overhead more overhead less
 secure more
 secure P

VM

C

P

C

VM

Node.js “launcher” provides a REST API to import and run an arbitrary JavaScript function

Cache Primitive:

P

.js

f1

FaaS Environment Caching

slide-6
SLIDE 6

Cadden, et al. “Skip Redundant Paths to Make Serverless Fast”, In The Proceedings of EuroSys ‘20

Serverless Execution
 via Unikernel SnapShots

Is there…

a method to better enable reuse across the entire memory footprint of the function?

We want to…

1. Shorten the setup time of new function invocations 2. Improve cache density for fast repeat invocations

slide-7
SLIDE 7
  • 1. Unikernels support strong 


isolation semantics

  • 2. Enable “black box” capture of

environment’s memory footprint into an snapshot (object)

  • 3. Page-level sharing can be

applied ubiquitously across the application and kernel layers Cadden, et al. “Skip Redundant Paths to Make Serverless Fast”, In The Proceedings of EuroSys ‘20

Serverless Execution
 via Unikernel SnapShots

lightweight kernel

F1 F2 F3 F4

Unikernels

shared libs filesystem

Function #1 + Language Runtime

single address space

TCP/IP VMM

scheduler

hardware-lvl interface

What is a unikernel?

In SEUSS, functions are deployed inside of dedicated unikernels

slide-8
SLIDE 8

SEUSS

  • perating system

snapshot cache

cold warm hot

time

construct environment initialize runtime import run arguments start run import code generate bytecode initialization on boot

7.67 ms 2.95 ms 0.82 ms

12 MB

IO core core 1 core 2 core 3 core 4

requests replies

runtime snapshots

9 MB 4 MB 211 MB

function snapshots unikernel binaries 7.67 MB

115 MB

2 1 4 3

time

cold
 start

1

P

.js

f1

P Functions code 
 & libraries

1

{`foo`}

Function invocation times are dominated by deterministic import & initialization procedures

Environment Snapshots

Snapshots captured at strategic points in time can be used as templates for deploying execution

A B

slide-9
SLIDE 9

SEUSS

  • perating system

snapshot cache

cold warm hot

time

construct environment initialize runtime import run arguments start run import code generate bytecode initialization on boot

7.67 ms 2.95 ms 0.82 ms

12 MB

IO core core 1 core 2 core 3 core 4

requests replies

runtime snapshots

9 MB 4 MB 211 MB

function snapshots unikernel binaries 7.67 MB

115 MB

2 1 4 3

time

1

P

.js

f1

1

{`foo`}

.js

f1

P Functions code 
 & libraries

Immutable snapshot images acts as a reusable launch point for new function invocations

warm
 start

Environment Snapshots

runtime snapshot used 
 for new invocations

slide-10
SLIDE 10

SEUSS

  • perating system

snapshot cache

cold warm hot

time

construct environment initialize runtime import run arguments start run import code generate bytecode initialization on boot

7.67 ms 2.95 ms 0.82 ms

12 MB

IO core core 1 core 2 core 3 core 4

requests replies

runtime snapshots

9 MB 4 MB 211 MB

function snapshots unikernel binaries 7.67 MB

115 MB

2 1 4 3

time

1

P

.js

f1

1

{`foo`}

Environment Snapshots

.js

f1

P Functions code 
 & libraries hot start

function snapshot used 
 for repeat invocations

Function-specific snapshots provide the
 near-immediate execution of function bytecode

slide-11
SLIDE 11

SEUSS

  • perating system

snapshot cache

cold warm hot

time

construct environment initialize runtime import run arguments start run import code generate bytecode initialization on boot

7.67 ms 2.95 ms 0.82 ms

12 MB

IO core core 1 core 2 core 3 core 4

requests replies

runtime snapshots

9 MB 4 MB 211 MB

function snapshots unikernel binaries 7.67 MB

115 MB

2 1 4 3

time

Snapshot Lineages

Registers Pages

Page-level sharing & copy-on-write (COW) can be applied to drastically reduce replicated state

Page refs

runtime snapshot function snapshot

Written page

slide-12
SLIDE 12

SEUSS

  • perating system

snapshot cache

cold warm hot

time

construct environment initialize runtime import run arguments start run import code generate bytecode initialization on boot

7.67 ms 2.95 ms 0.82 ms

12 MB

IO core core 1 core 2 core 3 core 4

requests replies

runtime snapshots

9 MB 4 MB 211 MB

function snapshots unikernel binaries 7.67 MB

115 MB

2 1 4 3

time

Snapshot Lineages

Registers Pages Page refs

Anticipatory optimization enabled by accumulating state within the origin snapshot

Written page

Child snapshots contain only a memory ‘diff’ of written pages

slide-13
SLIDE 13

SEUSS OS

OS specialized for FaaS compute plane

Snapshot capture region KVM-QEMU

SEUSS operating system

Ring 3 Ring 0

VCPU TX/RX queues EbbRT LibraryOS virtio

Rumprun unikernel Node.js

V8 JavaScript engine invocation driver.js <*.js> Solo5

  • Foundation event-driven

multi-core kernel (x86_64 native)

  • Per-core job scheduler &

network (NAT) layer

  • In-memory snapshot cache
  • Unprivileged unikernel guest:
  • POSIX0ish unikernel (Rumprun)
  • Minimal domain interface (Solo5)
slide-14
SLIDE 14

Compute Plane API Server Control Plane Controller Benchmark

B

  • 3-node OpenWhisk cluster
  • custom benchmark tool
  • Functions run

in Docker containers

Internal Databases

FaaS Performance

(single node)

  • 12-core, 88GB nodes
slide-15
SLIDE 15

FaaS Platform Cache

Start Time (ms)

350 700 1050 1400 1 4 16 64 256 1024 4096

  • No. of functions

cache limit cache thrashing

using Docker containers

time

f(): f(),g(), h(),i():

Sequential invocation requests to an Apache OpenWhisk compute node

time x = 1 x = 4

Report the average start time

(Linux v4.15; Xeon 2.20 GHz; 88GB)

slide-16
SLIDE 16

FaaS Platform Cache

Start Time (ms)

350 700 1050 1400

  • No. of functions

1 4 16 64 256 1024 4096

  • vs. unikernel snapshots

using Docker containers

time

f(): f(),g(), h(),i():

Sequential invocation requests to an Apache OpenWhisk compute node

time x = 1 x = 4

Report the average start time

slide-17
SLIDE 17

Invocation Goodput (Req/s)

  • 64 concurrent

requests

  • NOP (‘hello world)

invocations

  • No. of function (log scale)

FaaS Platform Throughput

slide-18
SLIDE 18

Resiliency to Traffic Bursts

blue/purple: Blocking IO requests to an external HTTP host (~250ms) red: 128 concurrent CPU-bound functions (~150 ms) 32-second Burst Intervals 16-second Burst Intervals

slide-19
SLIDE 19

Final Thoughts

  • Unikernel snapshots promote reuse

in a safe, simple, and effective way

  • Prototype demonstrates a major

advantage for serverless applications models

  • In the end, high-performance cloud

computing will continue to challenged our infrastructure software in new

  • It will be the operating system

(design, mechanisms & techniques) that will address challenge and enable new workloads

lightweight kernel

F1 F2 A F2 B …

SEUSS OS

Unikernels contexts

snap0 snap1 snap2 read-only


snapshots