Provable Multicore Schedulers with Ipanema: Application to Work-Conservation
Baptiste Lepers Redha Gouicem Damien Carver Jean-Pierre Lozi Nicolas Palix Virginia Aponte Willy Zwaenepoel Julien Sopena Julia Lawall Gilles Muller
Provable Multicore Schedulers with Ipanema: Application to - - PowerPoint PPT Presentation
Provable Multicore Schedulers with Ipanema: Application to Work-Conservation Baptiste Lepers Redha Gouicem Damien Carver Jean-Pierre Lozi Nicolas Palix Virginia Aponte Willy Zwaenepoel Julien Sopena Julia Lawall Gilles Muller Work
Provable Multicore Schedulers with Ipanema: Application to Work-Conservation
Baptiste Lepers Redha Gouicem Damien Carver Jean-Pierre Lozi Nicolas Palix Virginia Aponte Willy Zwaenepoel Julien Sopena Julia Lawall Gilles Muller
2/32
“No core should be left idle when a core is overloaded”
Core 0 Core 1 Core 2 Core 3
Non work-conserving situation: core 0 is overloaded, other cores are idle
3/32
Linux (CFS) suffers from work conservation issues
Core is mostly idle Core Core is mostly overloaded [Lozi et al. 2016]
8 16 24 32 40 48 56
Time (second)
4/32
FreeBSD (ULE) suffers from work conservation issues
Core is idle Core Core is overloaded [Bouron et al. 2018] Time (second)
5/32
Work conservation bugs are hard to detect No crash, no deadlock. No obvious symptom. 137x slowdown on HPC applications 23% slowdown on a database.
[Lozi et al. 2016]
6/32
Formally prove work-conservation
7/32
(∃c . O(c)) ⇒ (∀c′ . ¬I(c′)) If a core is overloaded, no core is idle
Core 0 Core 1
8/32
(∃c . O(c)) ⇒ (∀c′ . ¬I(c′)) If a core is overloaded, no core is idle
Core 0 Core 1
Does not work for realistic schedulers!
9/32
Concurrent events & optimistic concurrency
10/32
Concurrent events & optimistic concurrency
Based on possibly outdated observations!
time
11/32
Concurrent events & optimistic concurrency
Core 0 Core 1 Core 2 Core 3
Runs load balancing
12/32
Concurrent events & optimistic concurrency
Core 0 Core 1 Core 2 Core 3
Observes load (no lock)
13/32
Concurrent events & optimistic concurrency
Core 0 Core 1 Core 2 Core 3
Locks busiest Ideal scenario: no change since
14/32
Concurrent events & optimistic concurrency
Core 0 Core 1 Core 2 Core 3
Locks “busiest” Busiest might have no thread left! (Concurrent blocks/terminations.) Possible scenario:
15/32
Concurrent events & optimistic concurrency
Core 0 Core 1 Core 2 Core 3
(Fail to) Steal from busiest
16/32
Concurrent events & optimistic concurrency
Based on possibly outdated observations!
time
17/32
If a core is overloaded (but not because a thread was concurrently created) ∃c . (O(c) ∧ ¬fork(c) ∧ ¬unblock(c) …)
Definition of overloaded with « failure cases »:
18/32
∃c . (O(c) ∧ ¬fork(c) ∧ ¬unblock(c) …) ⇒ ∀c′ . ¬(I(c′) ∧ …)
19/32
Existing scheduler code is hard to prove
Historically: low level C code.
20/32
Existing scheduler code is hard to prove
Historically: low level C code.
21/32
Existing scheduler code is hard to prove
Historically: low level C code.
22/32
Trade expressiveness for expertise/knowledge: Robustness: (static) verification of properties Explicit concurrency: explicit shared variables Performance: efficient compilation
23/32
DSL Policy WhyML code C code Proof Kernel module
DSL: close to C Easy learn and to compile to WhyML and C
24/32
Proof on all possible interleavings
25/32
load balancing
Core 0
load balancing
time Proof on all possible interleavings
Split code in blocks (1 block = 1 read or write to a shared variable)
26/32
fork
Core 1 … Core N
terminate fork fork
Proof on all possible interleavings
Split code in blocks (1 block = 1 read or write to a shared variable) Simulate execution of concurrent blocs on N cores Concurrent WC must hold at the end of the load balancing
load balancing
Core 0
load balancing
time
27/32
fork
Core 1 … Core N
terminate fork fork
Proof on all possible interleavings
Split code in blocs (1 bloc = 1 read or write to a shared variable) Simulate execution of concurrent blocs on N cores Concurrent WC must always hold!
load balancing
Core 0
load balancing
time DSL ➔ few shared variables ➔ tractable
28/32
CFS-CWC (365 LOC)
Hierarchical CFS-like scheduler
CFS-CWC-FLAT (222 LOC)
Single level CFS-like scheduler
ULE-CWC (244 LOC)
BSD-like scheduler
29/32
FT.C (NAS benchmark)
30/32
NAS benchmarks (lower is better)
31/32
Sysbench on MySQL (higher is better)
32/32
Work conservation: not straighforward! … new formalism: concurrent work conservation! Complex concurrency scheme …proofs made tractable using a DSL. Performance: similar or better than CFS.