Lightweight Preemptible Functions Sol Boucher, Carnegie Mellon - - PowerPoint PPT Presentation

lightweight preemptible functions
SMART_READER_LITE
LIVE PREVIEW

Lightweight Preemptible Functions Sol Boucher, Carnegie Mellon - - PowerPoint PPT Presentation

Lightweight Preemptible Functions Sol Boucher, Carnegie Mellon University Joint work with: Anuj Kalia, Microsoft Research David G. Andersen, CMU Michael Kaminsky, BrdgAI/CMU Lightweight (adj.): Low overhead, cheap Preemptible (adj.):


slide-1
SLIDE 1

Lightweight Preemptible Functions

Sol Boucher, Carnegie Mellon University

Joint work with: Anuj Kalia, Microsoft Research David G. Andersen, CMU Michael Kaminsky, BrdgAI/CMU

slide-2
SLIDE 2

Why?

  • Bound resource use
  • Balance load of different tasks
  • Meet a deadline (e.g., real time)

Light∙weight (adj.): Low overhead, cheap Pre∙empt∙i∙ble (adj.): Able to be stopped

2

Run a preemptible function (PF) Do something else important time

slide-3
SLIDE 3

Desiderata

  • Retain programmer’s control over the CPU
  • Be able to interrupt arbitrary unmodified code
  • Introduce minimal overhead in the common case
  • Support cancellation
  • Maintain compatibility with the existing systems stack

3

slide-4
SLIDE 4

Agenda

  • Why contemporary approaches are insufficient

○ Futures ○ Threads ○ Processes

  • Function calls with timeouts
  • Backwards compatibility
  • Preemptive userland threading

4

slide-5
SLIDE 5

Problem: calling a function cedes control

5

Run a preemptible function (PF) Do something else important time

func()

slide-6
SLIDE 6

Two approaches to multitasking

cooperative vs. preemptive ≈ lightweightness vs. generality

6

slide-7
SLIDE 7

Agenda

  • Why contemporary approaches are insufficient

○ Futures ○ Threads ○ Processes

  • Function calls with timeouts
  • Backwards compatibility
  • Preemptive userland threading

7

slide-8
SLIDE 8

Problem: futures are cooperative

future: lightweight userland thread scheduled by the language runtime One future can depend on another’s result at a yield point

func()

8

PNG

slide-9
SLIDE 9

Agenda

  • Why contemporary approaches are insufficient

○ Futures (cooperative not preemptive) ○ Threads ○ Processes

  • Function calls with timeouts
  • Backwards compatibility
  • Preemptive userland threading

9

slide-10
SLIDE 10

// Problem buffer = decode(&img); time_sensitive_task();

Alternative: kernel threading

10

// Tempting approach pthread_create(&tid, NULL, decode, &img); usleep(TIMEOUT); time_sensitive_task(); pthread_join(&tid, &buffer);

slide-11
SLIDE 11

Run a preemptible function (PF) Do something else important

Problem: SLAs and graceful degradation

11

SLA time

slide-12
SLIDE 12

Call to malloc()

Observation: cancellation is hard

12

Process

Thread PF Thread

฀฀ ฀฀

C A N C E L L E D

slide-13
SLIDE 13

Agenda

  • Why contemporary approaches are insufficient

○ Futures (cooperative not preemptive) ○ Threads (poor ergonomics, no cancellation) ○ Processes

  • Function calls with timeouts
  • Backwards compatibility
  • Preemptive userland threading

13

slide-14
SLIDE 14

Problem: object ownership and lifetime

14

Process PF Process

Shared object Pointer ☐

C A N C E L L E D

slide-15
SLIDE 15

Agenda

  • Why contemporary approaches are insufficient

○ Futures (cooperative not preemptive) ○ Threads (poor ergonomics, no cancellation) (sacrifice programmer control) ○ Processes (poor performance and ergonomics)

  • Function calls with timeouts
  • Backwards compatibility
  • Preemptive userland threading

15

}

slide-16
SLIDE 16

Idea: function calls with timeouts

  • Retain programmer’s control over the CPU
  • Be able to interrupt arbitrary unmodified code
  • Introduce minimal overhead in the common case
  • Support cancellation
  • Maintain compatibility with the existing systems stack

16

slide-17
SLIDE 17
  • Why contemporary approaches are insufficient
  • Function calls with timeouts
  • Backwards compatibility
  • Preemptive userland threading

Agenda

17

slide-18
SLIDE 18

lightweight preemptible function: function invoked with a timeout

  • Faster than spawning a process or thread
  • Runs on the caller’s thread

A new application primitive

18

slide-19
SLIDE 19

lightweight preemptible function: function invoked with a timeout

  • Interrupts at 10–100s microseconds granularity
  • Pauses on timeout for low overhead and flexibility to resume

A new application primitive

19

slide-20
SLIDE 20

lightweight preemptible function: function invoked with a timeout

  • Preemptible code is a normal function or closure
  • Invoked via wrapper like pthread_create(), but synchronous

A new application primitive

20

slide-21
SLIDE 21

funcstate = launch(func, 400 /*us*/, NULL);

The interface: launch() and resume()

if(!funcstate.is_complete) { work_queue.push(funcstate); } // ... funcstate = work_queue.pop(); resume(&funcstate, 200 /*us*/);

21

slide-22
SLIDE 22

The interface: cancel()

funcstate = launch(func, 400 /*us*/, NULL); if(!funcstate.is_complete) { work_queue.push(funcstate); } // ... funcstate = work_queue.pop(); cancel(&funcstate);

22

slide-23
SLIDE 23

// counter == ?! counter = 0; funcstate = launch(λa. ++counter, 1, NULL); ++counter; if(!funcstate.is_complete) { resume(&funcstate, TO_COMPLETION); } assert(counter == 2);

Concurrency: explicit sharing

23

slide-24
SLIDE 24

error[E0503]: cannot use `counter` because it was mutably borrowed 13 | funcstate = launch(λa. ++counter, 1, NULL); |

  • -- ------- borrow occurs due to use

| | of `counter` in closure | | | borrow of `counter` occurs here 14 | ++counter; | ^^^^^^^^^ use of borrowed `counter`

Concurrency: existing protections work (e.g., Rust)

24

slide-25
SLIDE 25

libinger: library implementing LPFs, currently supports C and Rust programs

25

slide-26
SLIDE 26

Implementation: execution stack

funcstate = launch(func, TO_COMPLETION, NULL);

26

Caller’s stack:

... launch()

Preemptible function’s stack:

[stub] func() [caller]

slide-27
SLIDE 27

Implementation: timer signal

funcstate = launch(func, TIMEOUT, NULL);

27

Caller’s stack:

... launch()

Preemptible function’s stack:

[stub] func() [caller] handler() resume()

Timeout?

slide-28
SLIDE 28

funcstate = launch(func, TIMEOUT, NULL); cancel(&funcstate);

Implementation: cleanup

28

Preemptible function’s stack:

[stub] func() handler()

slide-29
SLIDE 29

launch() timeout!

Preemption mechanism

29

t

Timeout?

slide-30
SLIDE 30

libinger microbenchmarks

30

Operation Cost (μs) launch() ≈ 5 resume() ≈ 5 cancel() ≈ 4800* pthread_create() ≈ 30 fork() ≈ 200

* This operation is not typically on the critical path.

slide-31
SLIDE 31

libinger cancels runaway image decoding quickly

31

10

slide-32
SLIDE 32
  • Why contemporary approaches are insufficient
  • Function calls with timeouts
  • Backwards compatibility
  • Preemptive userland threading

Agenda

32

slide-33
SLIDE 33

Signal handlers cannot call non-reentrant code The rest of the program interrupts a preemptible function The rest of the program cannot call non-reentrant code?!

Problem: non-reentrancy

33

Program

Preemptible function Preemptible function Calls to strtok()

slide-34
SLIDE 34

Can reuse each library copy once function runs to completion

Approach 1: library copying

34

Program

Preemptible function Preemptible function

strtok() strtok()

About the Author ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~

libc.so

About the Author ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~

libc.so

About the Author ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~

libc.so

slide-35
SLIDE 35

Dynamic symbol binding

35

Executable

k = strtok(“k:v”, “:”);

Global Offset Table (GOT) ... 0x900dc0de ...

About the Author ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~

libc

?

slide-36
SLIDE 36

libgotcha: runtime implementing selective relinking for linked programs

36

slide-37
SLIDE 37

1. Copy the library for each LPF 2. Create an SGOT for each LPF 3. Point GOT entries at libgotcha

Selective relinking

37

Executable

k = strtok("k:v", ":");

About the Author ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~

libc Global Offset Table (GOT) ... 0x900dc0de ...

About the Author ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~

libc

About the Author ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~

libgotcha 0xc00010ff SGOT

———— ————

slide-38
SLIDE 38

libset: full set of all a program’s libraries

Libsets and cancellation

38

Program

Preemptible function Preemptible function Calls to strtok()

About the Author ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~

libc.so

About the Author ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~

libc.so

slide-39
SLIDE 39

Approach 2: uncopyable functions

Copying doesn’t work for everything…

void *malloc(size_t size) { PREEMPTION_ENABLED = false; void *mem = /* Call the real malloc(). */; check_for_timeout(); PREEMPTION_ENABLED = true; return mem; }

39

slide-40
SLIDE 40

“Approach 3”: blocking syscalls

int open(const char *filename) { syscall(SYS_open, filename); } struct sigaction sa = {}; sa.sa_flags = SA_RESTART;

40

while(errno == EAGAIN)

slide-41
SLIDE 41

libgotcha microbenchmarks

41

Symbol access Time w/o libgotcha Time w/ libgotcha Function call ≈ 2 ns ≈ 14 ns Global variable ≈ 0 ns ≈ 3500* ns

Baseline End-to-end time w/o libgotcha gettimeofday() ≈ 19 ns (65% overhead) getpid() ≈ 44 ns (30% overhead)

* Exported global variables have become rare.

slide-42
SLIDE 42
  • Why contemporary approaches are insufficient
  • Function calls with timeouts
  • Backwards compatibility
  • Preemptive userland threading

Agenda

42

slide-43
SLIDE 43

libturquoise: preemptive version of the Rust Tokio userland thread pool

43

slide-44
SLIDE 44

2 classes: Short: 500 μs Long: 50 ms Vary % long in mix Measure short only

hyper latency benchmark: experimental setup

compute-bound request response

44

slide-45
SLIDE 45

hyper latency benchmarks: results

45

No code changes! Head-of-line blocking

Short latency (ms) % long requests % long requests Median 99% tail

Preemptive Cooperative Preemptive Cooperative

. . . . . . .

slide-46
SLIDE 46

Summary

lightweight preemptible function: function invoked with a timeout

  • Synchronous preemption abstraction
  • Supports resuming and cancellation
  • Interoperable with legacy software
  • Exciting systems applications

46

slide-47
SLIDE 47

Thank you!

Reach me at sboucher@cmu.edu

47