Compiling with Continuations and LLVM Kavon Farvardin John Reppy - - PowerPoint PPT Presentation

compiling with continuations and llvm
SMART_READER_LITE
LIVE PREVIEW

Compiling with Continuations and LLVM Kavon Farvardin John Reppy - - PowerPoint PPT Presentation

Compiling with Continuations and LLVM Kavon Farvardin John Reppy University of Chicago September 22, 2016 Introduction LLVM Introduction to LLVM De facto backend for new language implementations Offers high quality code generation for


slide-1
SLIDE 1

Compiling with Continuations and LLVM

Kavon Farvardin John Reppy

University of Chicago

September 22, 2016

slide-2
SLIDE 2

Introduction LLVM

Introduction to LLVM

◮ De facto backend for new language implementations ◮ Offers high quality code generation for many architectures ◮ Active industry development ◮ Widely used for research ◮ Includes a multitude of features and tools

September 22, 2016 ML’16 — CwC and LLVM 2

slide-3
SLIDE 3

Introduction LLVM

The LLVM Landscape

LLVM IR ARM64 x86-64 Power Compiler Optimizer

LLVM

Rust C SML Haskell Erlang PML

Manticore GHC ErLLVM MLton Rustc Clang … …

September 22, 2016 ML’16 — CwC and LLVM 3

slide-4
SLIDE 4

Introduction LLVM

Characteristics of LLVM IR

define i32 @factorial ( i32 n ) { isZero = compare eq i32 n , if isZero , label base , label recurse base : res1 = add i32 n , 1 goto label final recurse : minusOne = sub i32 n , 1 retVal = call i32 @factorial ( i32 minusOne ) res2 = mul i32 n , retVal goto label final final : res = phi i32 [ res1 , res2 ] return i32 res }

September 22, 2016 ML’16 — CwC and LLVM 4

slide-5
SLIDE 5

Introduction Manticore

Manticore’s Runtime Model

◮ Efficient first-class continuations are used for concurrency,

work-stealing parallelism, exceptions, etc.

◮ As in Compiling with Continuations, return continuations are

passed as arguments to functions.

◮ Continuations are heap-allocated, making callcc cheap. ◮ Functions return by throwing to an explicit continuation. BOM IR … CPS convert CPS IR CFG IR Closure convert MLRISC LLVM x86-64 Manticore compiler

September 22, 2016 ML’16 — CwC and LLVM 5

slide-6
SLIDE 6

Introduction Manticore

This Model Poses a Challenge for LLVM

We require

◮ Efficient, reliable tail calls ◮ Garbage collection ◮ Preemption and multithreading ◮ First-class continuations

? +

September 22, 2016 ML’16 — CwC and LLVM 6

slide-7
SLIDE 7

Implementation Challenges Tail Calls

Efficient, Reliable Tail Calls

◮ Tail calls are a major correctness and efficiency concern for us. ◮ LLVM’s tail call support is shaky: the issues are numerous and

fixes are hard to come by.

September 22, 2016 ML’16 — CwC and LLVM 7

slide-8
SLIDE 8

Implementation Challenges Tail Calls

Anatomy of a Call Stack

Prologue Epilogue

foo: push r12 push r13 push r14 sub sp , 24 ; body of foo call bar after: ; body of foo add sp , 24 pop r14 pop r13 pop r12 ret

r12 Save r13 Save r14 Save after foo’s Spill Area

{

24 bytes

SP

September 22, 2016 ML’16 — CwC and LLVM 8

slide-9
SLIDE 9

Implementation Challenges Tail Calls

LLVM’s Tail Call Optimization

foo: push r12 push r13 push r14 sub sp , 24 ; body of foo call bar ; <-- add sp , 24 pop r14 pop r13 pop r12 ret ; <-- foo: push r12 push r13 push r14 sub sp , 24 ; body of foo add sp , 24 pop r14 pop r13 pop r12 jmp bar ; <--

September 22, 2016 ML’16 — CwC and LLVM 9

slide-10
SLIDE 10

Implementation Challenges Tail Calls

Avoiding the Tail Call Overhead

◮ MLton uses a trampoline, reducing procedure calls. ◮ GHC’s calling convention removes only callee-save instructions. ◮ We remove all overhead with a new calling convention (JWA)

plus the use of naked functions. Naked functions blindly omit all frame setup, requiring you to handle it yourself! GOAL → foo: ; body of foo jmp bar

September 22, 2016 ML’16 — CwC and LLVM 10

slide-11
SLIDE 11

Implementation Challenges Tail Calls

Using Naked Functions

◮ Runtime system sets up frame ◮ Compiler limits number of spills ◮ All functions reuse same frame ◮ FFI calls are transparent

Runtime System’s Frames

Reusable Spill Area

SP

8 byte slot 16-byte boundary

Foreign Function Space

RTS Register Saves

September 22, 2016 ML’16 — CwC and LLVM 11

slide-12
SLIDE 12

Implementation Challenges Garbage Collection

Garbage Collection

◮ Cannot use LLVM’s GC support; assumes a stack runtime model. ◮ Manticore’s stack frame is only for temporary register spills. ◮ Thus, no new stack format to parse; our GC remains unchanged. ◮ We insert heap exhaustion checks before LLVM generation.

September 22, 2016 ML’16 — CwC and LLVM 12

slide-13
SLIDE 13

Implementation Challenges Garbage Collection

Example of a Heap Exhaustion Check

declare {i64* , i64*} @invoke-gc ( i64* , i64* ) define jwa void @foo ( i64 allocPtr_0 , . . . ) naked { . . . if enoughSpace , label continue , label doGC doGC : roots_0 = allocPtr_0 ; ... save live vals in roots_0 ... allocPtr_1 = getelementptr allocPtr_0 , 5 ; bump fresh = call {i64* , i64*} @invoke-gc ( allocPtr_1 , roots_0 ) allocPtr_2 = extractvalue fresh , roots_1 = extractvalue fresh , 1 ; ... restore live vals ... goto label continue continue : allocPtr_3 = phi i64* [ allocPtr_0 , allocPtr_2 ] liveVal_1 = phi i64* [ . . . ] . . .

September 22, 2016 ML’16 — CwC and LLVM 13

slide-14
SLIDE 14

Implementation Challenges Preemption

Preemption and Multithreading

◮ Continuations are a natural representation for suspended threads. ◮ Multithreaded runtimes must asynchronously suspend execution. ◮ When using a precise GC, safe preemption is challenging.

September 22, 2016 ML’16 — CwC and LLVM 14

slide-15
SLIDE 15

Implementation Challenges Preemption

Preemption at Garbage Collection Safe Points

Heap tests can be used for preemption:

◮ Threads keep their heap limit pointer in shared memory. ◮ We preempt by forcing a thread’s next heap test to fail. ◮ Preempted threads reenter runtime system via callcc. ◮ Non-allocating loops are also given a heap test.

fun foo x = ... if limitPtr - allocPtr >= bytesNeeded then foo y else (callcc enterRTS ; foo y) ...

September 22, 2016 ML’16 — CwC and LLVM 15

slide-16
SLIDE 16

Implementation Challenges First-class Continuations

First-class Continuations in LLVM

◮ Preemptions need to occur in the middle of a function. ◮ In CwC, we allocate a function closure to capture a continuation.

Problem LLVM does not have first-class labels to create the closure!

September 22, 2016 ML’16 — CwC and LLVM 16

slide-17
SLIDE 17

Implementation Challenges First-class Continuations

First-class Labels in LLVM

Observations:

◮ The return address of a non-tail call is a label generated at runtime. ◮ Return conventions for C structs specify a mix of stack/registers.

Solution We treat the return address like a first-class label by specifying a return convention for C structs that matches calls.

September 22, 2016 ML’16 — CwC and LLVM 17

slide-18
SLIDE 18

Implementation Challenges First-class Continuations

The Jump-With-Arguments Calling Convention

Arg 1 Location of Value Arg 2 Arg 3 Arg 4 … rsi r11 rdi r8 Field 1 Field 2 Field 3 Field 4 … C Struct Returned Arguments Passed …

September 22, 2016 ML’16 — CwC and LLVM 18

slide-19
SLIDE 19

Implementation Challenges First-class Continuations

Example of First-class Labels for callcc

define jwa void @foo ( . . . ) naked { . . . preempted : env = ; ... save live vars ... closPtr = allocPair ( undef , env ) ret = call jwa {i64* , i64*} @genLabel ( closPtr , @enterRTS ) arg1 = extractvalue ret , arg2 = extractvalue ret , 1 . . . } ; call convention: ; rsi = closPtr , r11 = @enterRTS genLabel : pop rax ; put return addr in rax mov rax , ( rsi ) ; finish closure jmp r11

September 22, 2016 ML’16 — CwC and LLVM 19

slide-20
SLIDE 20

Implementation Challenges First-class Continuations

Example of First-class Labels for callcc

_foo : ... preempted : ; r10 = env , rsi = closPtr (unintialized) mov r10 , 8( rsi ) mov _enterRTS , r11 call genLabel ; return convention: ; rsi = arg1 , r11 = arg2 ... ; call convention: ; rsi = closPtr , r11 = @enterRTS genLabel : pop rax ; put return addr in rax mov rax , ( rsi ) ; finish closure jmp r11

September 22, 2016 ML’16 — CwC and LLVM 20

slide-21
SLIDE 21

Evaluation

Performance Comparison

Speedup (normalized)

0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

life nbody queens quicksort takeuchi

0.86 1 1.12 2.15 1.08 0.86 1 1 2.15 1.09 1.05 1.01 1.08 2.12 1.08 0.87 1 1.07 2.13 1.09 1.08 1 1.02 2.11 1.07 1.07 1 0.99 2 1.08

No Passes "Basic" Passes "Extra" Passes

  • O1
  • O2
  • O3

Figure: Execution time speedups over MLRisc when using LLVM codegen.

September 22, 2016 ML’16 — CwC and LLVM 21

slide-22
SLIDE 22

Conclusion and Future Work

Conclusion and Future Work

◮ Hope to apply this to SML/NJ in the future. ◮ Plan to upstream JWA convention. ◮ More implementation details in our forthcoming tech report!

+

(with modifications)

http://manticore.cs.uchicago.edu

September 22, 2016 ML’16 — CwC and LLVM 22