Cross-Layer Workload Characterization of Meta-Tracing JIT VMs Berkin - - PowerPoint PPT Presentation

cross layer workload characterization of meta tracing jit
SMART_READER_LITE
LIVE PREVIEW

Cross-Layer Workload Characterization of Meta-Tracing JIT VMs Berkin - - PowerPoint PPT Presentation

Cross-Layer Workload Characterization of Meta-Tracing JIT VMs Berkin Ilbeyi 1 , Carl Friedrich Bolz-Tereick 2 , and Christopher Batten 1 1 Cornell University, 2 Heinrich-Heine-Universitt Dsseldorf Dynamic languages are popular S. Cass. The


slide-1
SLIDE 1

Berkin Ilbeyi1, Carl Friedrich Bolz-Tereick2, and Christopher Batten1

1 Cornell University, 2 Heinrich-Heine-Universität Düsseldorf

Cross-Layer Workload Characterization of Meta-Tracing JIT VMs

slide-2
SLIDE 2

2

  • S. Cass. “The 2017 Top Programming Languages.” IEEE Spectrum.

Dynamic languages are popular

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-3
SLIDE 3

2

  • S. Cass. “The 2017 Top Programming Languages.” IEEE Spectrum.

Dynamic languages are popular

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-4
SLIDE 4

3

C C++ Rust Fortran Java Swift Go Pascal Lisp PHP Perl Ruby Python Program Time / Fastest Program Time 1 3 5 10 30 50 100 300

  • I. Guoy. “The Computer Languages Benchmarks Game.”

Dynamic languages are slow

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-5
SLIDE 5

3

C C++ Rust Fortran Java Swift Go Pascal Lisp PHP Perl Ruby Python Program Time / Fastest Program Time 1 3 5 10 30 50 100 300

  • I. Guoy. “The Computer Languages Benchmarks Game.”

Dynamic languages are slow

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-6
SLIDE 6

4

Application: FooLang FooLang Interpreter interpret

Just-in-time-compiling virtual machines

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-7
SLIDE 7

4

Application: FooLang FooLang Interpreter FooLang JIT Compiler interpret compile GC VM Utilities

FooLang VM

Just-in-time-compiling virtual machines

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-8
SLIDE 8

4

Application: FooLang FooLang Interpreter FooLang JIT Compiler interpret compile GC VM Utilities

FooLang VM

Generic JIT Compiler GC VM Utilities

Just-in-time-compiling virtual machines

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-9
SLIDE 9

4

Application: FooLang FooLang Interpreter FooLang JIT Compiler interpret compile GC VM Utilities

FooLang VM

Application: FooLang FooLang Interpreter Generic JIT Compiler interpret compile GC VM Utilities

Just-in-time-compiling virtual machines

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-10
SLIDE 10

4

Application: FooLang FooLang Interpreter FooLang JIT Compiler interpret compile GC VM Utilities

FooLang VM

Application: FooLang FooLang Interpreter Generic JIT Compiler interpret compile GC VM Utilities Application: Bar Bar Interpreter interpret

Just-in-time-compiling virtual machines

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-11
SLIDE 11

4

Application: FooLang FooLang Interpreter FooLang JIT Compiler interpret compile GC VM Utilities

FooLang VM

Application: FooLang FooLang Interpreter Generic JIT Compiler interpret compile GC VM Utilities

Meta-JIT VM

Application: Bar Bar Interpreter interpret

Just-in-time-compiling virtual machines

Meta-JIT Compiler

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-12
SLIDE 12

5

Meta-JIT approaches: meta-tracing and partial evaluation

RPython Framework Meta-tracing: meta-interpreter and tracing JIT Truffle/Graal Framework Partial evaluation: partial evaluator and method JIT

def max(a, b): if a > b: return a else: return b

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-13
SLIDE 13

5

Meta-JIT approaches: meta-tracing and partial evaluation

RPython Framework Meta-tracing: meta-interpreter and tracing JIT Truffle/Graal Framework Partial evaluation: partial evaluator and method JIT

def max(a, b): if a > b: return a else: return b guard_type(a, int) guard_type(b, int) c = int_gt(a, b) guard_true(c) return(a)

Linear JIT IR

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-14
SLIDE 14

guard_type(a, int) guard_type(b, int) c = int_gt(a, b) jump_if_false(c, L1) return(a) L1: return(b)

5

Meta-JIT approaches: meta-tracing and partial evaluation

RPython Framework Meta-tracing: meta-interpreter and tracing JIT Truffle/Graal Framework Partial evaluation: partial evaluator and method JIT

def max(a, b): if a > b: return a else: return b guard_type(a, int) guard_type(b, int) c = int_gt(a, b) guard_true(c) return(a)

Linear JIT IR JIT IR

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-15
SLIDE 15

guard_type(a, int) guard_type(b, int) c = int_gt(a, b) jump_if_false(c, L1) return(a) L1: return(b)

5

Meta-JIT approaches: meta-tracing and partial evaluation

RPython Framework Meta-tracing: meta-interpreter and tracing JIT Truffle/Graal Framework Partial evaluation: partial evaluator and method JIT

def max(a, b): if a > b: return a else: return b guard_type(a, int) guard_type(b, int) c = int_gt(a, b) guard_true(c) return(a)

Linear JIT IR JIT IR

return(b)

Bridge (a <= b)

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-16
SLIDE 16

guard_type(a, int) guard_type(b, int) c = int_gt(a, b) jump_if_false(c, L1) return(a) L1: return(b)

5

Meta-JIT approaches: meta-tracing and partial evaluation

RPython Framework Meta-tracing: meta-interpreter and tracing JIT Truffle/Graal Framework Partial evaluation: partial evaluator and method JIT

def max(a, b): if a > b: return a else: return b guard_type(a, int) guard_type(b, int) c = int_gt(a, b) guard_true(c) return(a)

Linear JIT IR JIT IR

return(b)

Bridge (a <= b)

guard_type(a, float) guard_type(b, float) c = float_gt(a, b) guard_true(c) return(a)

Bridge (float)

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-17
SLIDE 17

guard_type(a, int) guard_type(b, int) c = int_gt(a, b) jump_if_false(c, L1) return(a) L1: return(b)

5

Meta-JIT approaches: meta-tracing and partial evaluation

RPython Framework Meta-tracing: meta-interpreter and tracing JIT Truffle/Graal Framework Partial evaluation: partial evaluator and method JIT

def max(a, b): if a > b: return a else: return b guard_type(a, int) guard_type(b, int) c = int_gt(a, b) guard_true(c) return(a)

Linear JIT IR JIT IR

return(b)

Bridge (a <= b)

guard_type(a, float) guard_type(b, float) c = float_gt(a, b) guard_true(c) return(a)

Bridge (float)

i = is_type(a, int) jump_if_false(i, L2) guard_type(b, int) c = int_gt(a, b) jump_if_false(c, L1) return(a) L1: return(b) L2: guard_type(a, float) guard_type(b, float) c = float_gt(a, b) jump_if_false(c, L3) return(a) L3: return(b)

Re-optimized JIT IR

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-18
SLIDE 18

6

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

PyPy >> CPython

slide-19
SLIDE 19

6

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages?

PyPy >> CPython

slide-20
SLIDE 20

6

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages?

PyPy >> CPython PyPy << C

slide-21
SLIDE 21

6

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages? ▪ Why are meta-tracing JITs for dynamic programming still slower than C?

PyPy >> CPython PyPy << C

slide-22
SLIDE 22

7

Python-based interpreter

Application: FooLang

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-23
SLIDE 23

7

Python-based interpreter

Application: FooLang ... b += a ...

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-24
SLIDE 24

7

Python-based interpreter

Application: FooLang Application: Bytecode compile ... b += a ...

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-25
SLIDE 25

7

Python-based interpreter

Application: FooLang Application: Bytecode compile ... b += a ... ... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ...

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-26
SLIDE 26

7

Python-based interpreter

Application: FooLang Application: Bytecode Interpreter: Python compile interpret ... b += a ... ... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ...

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-27
SLIDE 27

7

Python-based interpreter

Application: FooLang Application: Bytecode Interpreter: Python compile interpret ... b += a ... ... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-28
SLIDE 28

7

Python-based interpreter

Application: FooLang Application: Bytecode Interpreter: Python compile interpret ... b += a ... ... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ... Interpreter: Bytecode compile

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-29
SLIDE 29

7

Python-based interpreter

Application: FooLang Application: Bytecode Interpreter: Python compile interpret ... b += a ... ... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ... Interpreter: Bytecode compile Interpreter Interpreter interpret

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-30
SLIDE 30

8

Application: Python Application: Bytecode compile

RPython Framework

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-31
SLIDE 31

8

Application: Python Application: Bytecode compile Interpreter: RPython

RPython Framework

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-32
SLIDE 32

8

Application: Python Application: Bytecode compile Interpreter: RPython Framework: RPython

RPython Framework

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-33
SLIDE 33

8

Application: Python Application: Bytecode compile Interpreter: RPython Framework: RPython Interpreter + Framework: C translate

RPython Framework

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-34
SLIDE 34

8

Application: Python Application: Bytecode PyPy: Binary compile interpret Interpreter: RPython Framework: RPython Interpreter + Framework: C compile translate

RPython Framework

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-35
SLIDE 35

8

Application: Python Application: Bytecode PyPy: Binary compile interpret Interpreter: RPython Framework: RPython Interpreter + Framework: C compile translate Meta-trace: JIT IR JIT-ed code: Binary assemble trace and optimize

RPython Framework

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-36
SLIDE 36

9

... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Application bytecode Interpreter

Meta-trace

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-37
SLIDE 37

9

... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Application bytecode Interpreter

...

Meta-trace Meta-interpreter

Meta-trace

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-38
SLIDE 38

9

... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Application bytecode Interpreter

...

Meta-trace Meta-interpreter

Meta-trace

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-39
SLIDE 39

9

... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Application bytecode Interpreter

...

Meta-trace Meta-interpreter

Meta-trace

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-40
SLIDE 40

9

... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Application bytecode Interpreter

... p1 = getarrayitem(p0, 1)

Meta-trace Meta-interpreter

Meta-trace

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-41
SLIDE 41

9

... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Application bytecode Interpreter

... p1 = getarrayitem(p0, 1)

Meta-trace Meta-interpreter

Meta-trace

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-42
SLIDE 42

9

... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Application bytecode Interpreter

... p1 = getarrayitem(p0, 1) p2 = getarrayitem(p0, 0)

Meta-trace Meta-interpreter

Meta-trace

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-43
SLIDE 43

9

... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Application bytecode Interpreter

... p1 = getarrayitem(p0, 1) p2 = getarrayitem(p0, 0)

Meta-trace Meta-interpreter

Meta-trace

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-44
SLIDE 44

9

... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Application bytecode Interpreter

... p1 = getarrayitem(p0, 1) p2 = getarrayitem(p0, 0)

Meta-trace Meta-interpreter

Meta-trace

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-45
SLIDE 45

9

... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Application bytecode Interpreter

... p1 = getarrayitem(p0, 1) p2 = getarrayitem(p0, 0) guard_class(p1, int) guard_class(p2, int)

Meta-trace Meta-interpreter

Meta-trace

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-46
SLIDE 46

9

... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Application bytecode Interpreter

... p1 = getarrayitem(p0, 1) p2 = getarrayitem(p0, 0) guard_class(p1, int) guard_class(p2, int) i3 = getfield(p1, intval) i4 = getfield(p2, intval)

Meta-trace Meta-interpreter

Meta-trace

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-47
SLIDE 47

9

... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Application bytecode Interpreter

... p1 = getarrayitem(p0, 1) p2 = getarrayitem(p0, 0) guard_class(p1, int) guard_class(p2, int) i3 = getfield(p1, intval) i4 = getfield(p2, intval) i5 = int_add_ovf(i3, i4) guard_no_overflow() ...

Meta-trace Meta-interpreter

Meta-trace

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-48
SLIDE 48

9

... 21 LOAD_FAST 1 (b) 24 LOAD_FAST 0 (a) 27 INPLACE_ADD 28 STORE_FAST 1 (b) ... while True: bc = bcs[bci] bci += bc.length if bc.type == INPLACE_ADD: v1 = stack.pop() v2 = stack.pop() if (type(v1) == int and type(v2) == int): stack.push(v1 + v2) elif ... elif bc.type == LOAD_FAST: stack.push(local[bc.varnum]) ...

Application bytecode Interpreter

... p1 = getarrayitem(p0, 1) p2 = getarrayitem(p0, 0) guard_class(p1, int) guard_class(p2, int) i3 = getfield(p1, intval) i4 = getfield(p2, intval) i5 = int_add_ovf(i3, i4) guard_no_overflow() ...

Meta-trace Meta-interpreter Deoptimization back to interpreter on guard failure

Meta-trace

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-49
SLIDE 49

10

Application: Python Application: Bytecode PyPy: Binary compile interpret Interpreter: RPython Framework: RPython Interpreter + Application: C compile translate Meta-trace: JIT IR JIT-ed code: Binary assemble trace and optimize

Cross-layer annotations

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-50
SLIDE 50

10

Application: Python Application: Bytecode PyPy: Binary compile interpret Interpreter: RPython Framework: RPython Interpreter + Application: C compile translate Meta-trace: JIT IR JIT-ed code: Binary assemble trace and optimize

Cross-layer annotations

application annotations

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-51
SLIDE 51

10

Application: Python Application: Bytecode PyPy: Binary compile interpret Interpreter: RPython Framework: RPython Interpreter + Application: C compile translate Meta-trace: JIT IR JIT-ed code: Binary assemble trace and optimize

Cross-layer annotations

application annotations interpreter annotations

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-52
SLIDE 52

10

Application: Python Application: Bytecode PyPy: Binary compile interpret Interpreter: RPython Framework: RPython Interpreter + Application: C compile translate Meta-trace: JIT IR JIT-ed code: Binary assemble trace and optimize

Cross-layer annotations

application annotations interpreter annotations framework annotations

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-53
SLIDE 53

10

Application: Python Application: Bytecode PyPy: Binary compile interpret Interpreter: RPython Framework: RPython Interpreter + Application: C compile translate Meta-trace: JIT IR JIT-ed code: Binary assemble trace and optimize

Cross-layer annotations

application annotations interpreter annotations framework annotations IR node of interest asm of interest

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-54
SLIDE 54

10

Application: Python Application: Bytecode PyPy: Binary compile interpret Interpreter: RPython Framework: RPython Interpreter + Application: C compile translate Meta-trace: JIT IR JIT-ed code: Binary assemble trace and optimize

Cross-layer annotations

application annotations interpreter annotations framework annotations IR node of interest asm of interest perf counters using PAPI

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-55
SLIDE 55

10

Application: Python Application: Bytecode PyPy: Binary compile interpret Interpreter: RPython Framework: RPython Interpreter + Application: C compile translate Meta-trace: JIT IR JIT-ed code: Binary assemble trace and optimize

Cross-layer annotations

application annotations interpreter annotations framework annotations IR node of interest asm of interest perf counters using PAPI

Dynamic Binary Instrumentation

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-56
SLIDE 56

10

Application: Python Application: Bytecode PyPy: Binary compile interpret Interpreter: RPython Framework: RPython Interpreter + Application: C compile translate Meta-trace: JIT IR JIT-ed code: Binary assemble trace and optimize

Cross-layer annotations

application annotations interpreter annotations framework annotations IR node of interest asm of interest perf counters using PAPI

Dynamic Binary Instrumentation

phase counters, IR node counters

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-57
SLIDE 57

11

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages? ▪ Why are meta-tracing JITs for dynamic programming still slower than C?

PyPy >> CPython PyPy << C

slide-58
SLIDE 58

12

PyPy with meta-tracing JIT speedup over CPython: Meta-tracing JIT improves the performance significantly

5 10 15 20 25 30 richards crypto_pyaes chaos telco spectral-norm django twisted_iteration spitfire_cstringio raytrace-simple hexiom2 float ai nbody_modified twisted_pb fannkuch genshi_text pyflate-fast bm_mako twisted_names json_bench genshi_xml bm_chameleon pypy_interp twisted_tcp html5lib meteor-contest sympy_sum spitfire spambayes rietveld deltablue eparse sympy_expand slowspitfire sympy_integrate pidigits bm_mdp sympy_str

51.2 30.2 Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-59
SLIDE 59

13

PyPy speedup over CPython and Pycket speedup over Racket: Meta-tracing JIT improves performance significantly across multiple languages

1 2 3 4 5 6 7 8 9 10 11 12 binarytrees chameneosredux fannkuchredux fasta knucleotide mandelbrot meteor nbody pidigits regexdna revcomp spectralnorm threadring PyPy speedup

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-60
SLIDE 60

13

PyPy speedup over CPython and Pycket speedup over Racket: Meta-tracing JIT improves performance significantly across multiple languages

1 2 3 4 5 6 7 8 9 10 11 12 binarytrees chameneosredux fannkuchredux fasta knucleotide mandelbrot meteor nbody pidigits regexdna revcomp spectralnorm threadring PyPy speedup 0.5 1 1.5 2 binarytrees fannkuchredux fasta mandelbrot meteor nbody pidigits revcomp spectralnorm Pycket speedup

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-61
SLIDE 61

14

Meta-tracing JIT VM phases

calls to AOT funs JIT GC deoptimization tracing & opt interpreter 0 2B 4B 6B 8B 10B instructions

richards

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-62
SLIDE 62

14

Meta-tracing JIT VM phases

calls to AOT funs JIT GC deoptimization tracing & opt interpreter calls to AOT funs JIT GC deoptimization tracing & opt interpreter 0 2B 4B 6B 8B 10B instructions 0 2B 4B 6B 8B 10B instructions

richards sympy_str

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-63
SLIDE 63

15

Fastest on PyPy Slowest on PyPy

Meta-tracing JIT VM phases

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

JIT calls JIT GC deopt tracing interp

slide-64
SLIDE 64

16

Benchmarks Fastest on PyPy Benchmarks Slowest on PyPy

The JIT phase: The fastest benchmarks tend to execute JIT-compiled code the most

0.25 0.5 0.75 1

JIT + JIT call to AOT

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-65
SLIDE 65

17

Meta-tracing inlines all loops and can hurt performance

while True: ... memcpy(d, s, n) ... def memcpy(dest, src, n): i = 0 while i < n: dest[i] = src[i] i += 1

Interpreter

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-66
SLIDE 66

17

Meta-tracing inlines all loops and can hurt performance

while True: ... memcpy(d, s, n) ... def memcpy(dest, src, n): i = 0 while i < n: dest[i] = src[i] i += 1

Interpreter

...

Meta-trace Meta-interpreter

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-67
SLIDE 67

17

Meta-tracing inlines all loops and can hurt performance

while True: ... memcpy(d, s, n) ... def memcpy(dest, src, n): i = 0 while i < n: dest[i] = src[i] i += 1

Interpreter

... guard_gt(i0, 0) i3 = getarrayitem(p1, 0) setarrayitem(p2, 0, i3)

Meta-trace Meta-interpreter

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-68
SLIDE 68

17

Meta-tracing inlines all loops and can hurt performance

while True: ... memcpy(d, s, n) ... def memcpy(dest, src, n): i = 0 while i < n: dest[i] = src[i] i += 1

Interpreter

... guard_gt(i0, 0) i3 = getarrayitem(p1, 0) setarrayitem(p2, 0, i3) guard_gt(i0, 1) i4 = getarrayitem(p1, 1) setarrayitem(p2, 1, i4)

Meta-trace Meta-interpreter

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-69
SLIDE 69

17

Meta-tracing inlines all loops and can hurt performance

while True: ... memcpy(d, s, n) ... def memcpy(dest, src, n): i = 0 while i < n: dest[i] = src[i] i += 1

Interpreter

... guard_gt(i0, 0) i3 = getarrayitem(p1, 0) setarrayitem(p2, 0, i3) guard_gt(i0, 1) i4 = getarrayitem(p1, 1) setarrayitem(p2, 1, i4) guard_gt(i0, 2) i5 = getarrayitem(p1, 2) setarrayitem(p2, 2, i5) ...

Meta-trace Meta-interpreter

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-70
SLIDE 70

18

Benchmark % Source Function

ai 19.4 interpreter

setobject.get_storage_from_list

bm_chameleon 17.9 RPython types

rordereddict.ll_call_lookup_function

bm_mako 26.1 RPython lib

runicode.unicode_encode_ucs1_helper

json_bench 18.5 PyPy module

_pypyjson.raw_encode_basestring_ascii

nbody_modified 44.6 external lib

pow

Examples of significant AOT-compiled functions

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-71
SLIDE 71

19

Benchmarks Fastest on PyPy Benchmarks Slowest on PyPy

JIT calls to AOT-compiled functions: AOT-compiled functions can improve performance by avoiding long traces

0.25 0.5 0.75 1

JIT JIT call to AOT functions

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-72
SLIDE 72

20

PyPy bytecode execution rate compared to CPython: Benchmarks that perform the best also warm up the fastest

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-73
SLIDE 73

20

PyPy bytecode execution rate compared to CPython: Benchmarks that perform the best also warm up the fastest

0 2B 4B 6B 8B 10B instructions

richards

50 30 10

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-74
SLIDE 74

20

PyPy bytecode execution rate compared to CPython: Benchmarks that perform the best also warm up the fastest

Breakeven point: the performance of the two VMs at this point is equal

0 2B 4B 6B 8B 10B instructions

PyPy w/o JIT breakeven point CPython breakeven point richards

50 30 10

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-75
SLIDE 75

20

PyPy bytecode execution rate compared to CPython: Benchmarks that perform the best also warm up the fastest

Breakeven point: the performance of the two VMs at this point is equal

0 2B 4B 6B 8B 10B instructions 0 2B 4B 6B 8B 10B instructions

richards html5lib

50 30 10 3 2 1

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-76
SLIDE 76

20

PyPy bytecode execution rate compared to CPython: Benchmarks that perform the best also warm up the fastest

Breakeven point: the performance of the two VMs at this point is equal

0 2B 4B 6B 8B 10B instructions 0 2B 4B 6B 8B 10B instructions

PyPy w/o JIT breakeven point CPython breakeven point richards html5lib

50 30 10 3 2 1

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-77
SLIDE 77

20

PyPy bytecode execution rate compared to CPython: Benchmarks that perform the best also warm up the fastest

Breakeven point: the performance of the two VMs at this point is equal

0 2B 4B 6B 8B 10B instructions 0 2B 4B 6B 8B 10B instructions 0 2B 4B 6B 8B 10B instructions

PyPy w/o JIT breakeven point richards html5lib sympy_str

50 30 10 3 2 1 2 1

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-78
SLIDE 78

21

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages? ▪ Why are meta-tracing JITs for dynamic programming still slower than C?

PyPy >> CPython PyPy << C

slide-79
SLIDE 79

21

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages? ▪ Meta-tracing JIT compilation significantly improves the performance ▪ Why are meta-tracing JITs for dynamic programming still slower than C?

PyPy >> CPython PyPy << C

slide-80
SLIDE 80

21

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages? ▪ Meta-tracing JIT compilation significantly improves the performance ▪ AOT-compiled functions are good to break pathological traces ▪ Why are meta-tracing JITs for dynamic programming still slower than C?

PyPy >> CPython PyPy << C

slide-81
SLIDE 81

21

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages? ▪ Meta-tracing JIT compilation significantly improves the performance ▪ AOT-compiled functions are good to break pathological traces ▪ Easier-to-JIT programs perform the best and warm up the fastest ▪ Why are meta-tracing JITs for dynamic programming still slower than C?

PyPy >> CPython PyPy << C

slide-82
SLIDE 82

22

PyPy and Pycket slowdown over C/C++: Meta-tracing JIT has a big performance gap between static languages

5 10 15 20 25 30 binarytrees chameneosredux fannkuchredux fasta knucleotide mandelbrot meteor nbody pidigits regexdna revcomp spectralnorm threadring PyPy slowdown

1374 31 Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-83
SLIDE 83

22

PyPy and Pycket slowdown over C/C++: Meta-tracing JIT has a big performance gap between static languages

5 10 15 20 25 30 binarytrees chameneosredux fannkuchredux fasta knucleotide mandelbrot meteor nbody pidigits regexdna revcomp spectralnorm threadring PyPy slowdown 2 4 6 8 10 12 binarytrees fannkuchredux fasta mandelbrot meteor nbody pidigits revcomp spectralnorm Pycket slowdown

1374 31 Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-84
SLIDE 84

23

Benchmarks Fastest on PyPy Benchmarks Slowest on PyPy

0.25 0.5 0.75 1

JIT JIT call to AOT functions

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Meta-tracing JIT phases

slide-85
SLIDE 85

24

Meta-tracing JIT IR node breakdown: Likely a big part of JIT compiled code is overhead

Fastest on PyPy Slowest on PyPy

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-86
SLIDE 86

24

Meta-tracing JIT IR node breakdown: Likely a big part of JIT compiled code is overhead

Fastest on PyPy Slowest on PyPy

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-87
SLIDE 87

24

Meta-tracing JIT IR node breakdown: Likely a big part of JIT compiled code is overhead

Fastest on PyPy Slowest on PyPy

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-88
SLIDE 88

25

Benchmarks Fastest on PyPy Benchmarks Slowest on PyPy

0.25 0.5 0.75 1

JIT JIT call to AOT functions

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Meta-tracing JIT phases

slide-89
SLIDE 89

26

Benchmarks Fastest on PyPy Benchmarks Slowest on PyPy

0.25 0.5 0.75 1

Interpreter

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Interpreter phase

slide-90
SLIDE 90

27

PyPy without meta-tracing JIT speedup over CPython: RPython-to-C translation has overheads

0.2 0.4 0.6 0.8 1 1.2 richards crypto_pyaes chaos telco spectral-norm django twisted_iteration spitfire_cstringio raytrace-simple hexiom2 float ai nbody_modified twisted_pb fannkuch genshi_text pyflate-fast bm_mako twisted_names json_bench genshi_xml bm_chameleon pypy_interp twisted_tcp html5lib meteor-contest sympy_sum spitfire spambayes rietveld deltablue eparse sympy_expand slowspitfire sympy_integrate pidigits bm_mdp sympy_str

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-91
SLIDE 91

28

Benchmarks Fastest on PyPy Benchmarks Slowest on PyPy

0.25 0.5 0.75 1

Tracing & optimization

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Tracing and optimization phase

slide-92
SLIDE 92

29

Benchmarks Fastest on PyPy Benchmarks Slowest on PyPy

0.25 0.5 0.75 1

Deoptimization

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Deoptimization phase

slide-93
SLIDE 93

30

Benchmarks Fastest on PyPy Benchmarks Slowest on PyPy

0.25 0.5 0.75 1

Garbage collection

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Garbage collection phase

slide-94
SLIDE 94

31

Benchmarks Fastest on PyPy Benchmarks Slowest on PyPy

0.25 0.5 0.75 1

Interpreter Tracing & optimization Deoptimization Garbage collection

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Meta-tracing JIT VM overheads: Overheads are diverse and can add up to significant portion of execution

slide-95
SLIDE 95

32

Iron law of processor performance: Does meta-tracing VM code execute poorly in addition to more instructions?

Time Program Instructions Program Cycle Instructions Time Cycle =

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

× ×

slide-96
SLIDE 96

33 0.375 0.75 1.125 1.5 1.875 2.25 b i n a r y t r e e s c h a m e n e

  • s

r e d u x f a n n k u c h r e d u x f a s t a k n u c l e

  • t

i d e m a n d e l b r

  • t

m e t e

  • r

n b

  • d

y p i d i g i t s r e g e x d n a r e v c

  • m

p s p e c t r a l n

  • r

m t h r e a d r i n g C/C++ IPC PyPy IPC Pycket IPC

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Comparing meta-tracing JIT IPC to C/C++: Meta-tracing has a similar IPC for most benchmarks

slide-97
SLIDE 97

33 0.375 0.75 1.125 1.5 1.875 2.25 b i n a r y t r e e s c h a m e n e

  • s

r e d u x f a n n k u c h r e d u x f a s t a k n u c l e

  • t

i d e m a n d e l b r

  • t

m e t e

  • r

n b

  • d

y p i d i g i t s r e g e x d n a r e v c

  • m

p s p e c t r a l n

  • r

m t h r e a d r i n g C/C++ IPC PyPy IPC Pycket IPC

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Comparing meta-tracing JIT IPC to C/C++: Meta-tracing has a similar IPC for most benchmarks

slide-98
SLIDE 98

34

IPC measurements can be accurately matched against VM phases

JIT GC deopt trace interp

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

slide-99
SLIDE 99

35

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Interp Trace Deopt GC JIT C/C++

IPC

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Microarchitectural characterization by the VM phase: Meta-tracing-JIT-compiled code has a similar IPC, fewer branches and mispredictions

slide-100
SLIDE 100

35

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Interp Trace Deopt GC JIT C/C++

IPC

0.05 0.1 0.15 0.2 Interp Trace Deopt GC JIT C/C++

Branch per instruction

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Microarchitectural characterization by the VM phase: Meta-tracing-JIT-compiled code has a similar IPC, fewer branches and mispredictions

slide-101
SLIDE 101

35

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Interp Trace Deopt GC JIT C/C++

IPC

0.05 0.1 0.15 0.2 Interp Trace Deopt GC JIT C/C++

Branch per instruction

1 2 3 4 5 6 7 8 9 10 11 Interp Trace Deopt GC JIT C/C++

Branch MPKI

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Microarchitectural characterization by the VM phase: Meta-tracing-JIT-compiled code has a similar IPC, fewer branches and mispredictions

slide-102
SLIDE 102

36

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages? ▪ Meta-tracing JIT compilation significantly improves the performance ▪ AOT-compiled functions are good to break pathological traces ▪ Easier-to-JIT programs perform the best and warm up the fastest ▪ Why are meta-tracing JITs for dynamic programming still slower than C?

PyPy >> CPython PyPy << C

slide-103
SLIDE 103

36

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages? ▪ Meta-tracing JIT compilation significantly improves the performance ▪ AOT-compiled functions are good to break pathological traces ▪ Easier-to-JIT programs perform the best and warm up the fastest ▪ Why are meta-tracing JITs for dynamic programming still slower than C? ▪ Meta-tracing JIT has an order of magnitude performance gap vs. C/C++

PyPy >> CPython PyPy << C

slide-104
SLIDE 104

36

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages? ▪ Meta-tracing JIT compilation significantly improves the performance ▪ AOT-compiled functions are good to break pathological traces ▪ Easier-to-JIT programs perform the best and warm up the fastest ▪ Why are meta-tracing JITs for dynamic programming still slower than C? ▪ Meta-tracing JIT has an order of magnitude performance gap vs. C/C++ ▪ A big part of meta-tracing-JIT-compiled code is likely overhead

PyPy >> CPython PyPy << C

slide-105
SLIDE 105

36

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages? ▪ Meta-tracing JIT compilation significantly improves the performance ▪ AOT-compiled functions are good to break pathological traces ▪ Easier-to-JIT programs perform the best and warm up the fastest ▪ Why are meta-tracing JITs for dynamic programming still slower than C? ▪ Meta-tracing JIT has an order of magnitude performance gap vs. C/C++ ▪ A big part of meta-tracing-JIT-compiled code is likely overhead ▪ The meta-tracing JIT VM has a number of other diverse overheads

PyPy >> CPython PyPy << C

slide-106
SLIDE 106

36

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages? ▪ Meta-tracing JIT compilation significantly improves the performance ▪ AOT-compiled functions are good to break pathological traces ▪ Easier-to-JIT programs perform the best and warm up the fastest ▪ Why are meta-tracing JITs for dynamic programming still slower than C? ▪ Meta-tracing JIT has an order of magnitude performance gap vs. C/C++ ▪ A big part of meta-tracing-JIT-compiled code is likely overhead ▪ The meta-tracing JIT VM has a number of other diverse overheads ▪ The problem is more instructions, not instructions that execute poorly

PyPy >> CPython PyPy << C

slide-107
SLIDE 107

36

Motivation • Meta-tracing • PyPy >> CPython • PyPy << C

Cross-layer workload characterization of meta-tracing JIT VMs

▪ How can meta-tracing JITs significantly improve the performance of multiple dynamic languages? ▪ Meta-tracing JIT compilation significantly improves the performance ▪ AOT-compiled functions are good to break pathological traces ▪ Easier-to-JIT programs perform the best and warm up the fastest ▪ Why are meta-tracing JITs for dynamic programming still slower than C? ▪ Meta-tracing JIT has an order of magnitude performance gap vs. C/C++ ▪ A big part of meta-tracing-JIT-compiled code is likely overhead ▪ The meta-tracing JIT VM has a number of other diverse overheads ▪ The problem is more instructions, not instructions that execute poorly ▪ There is no silver bullet in addressing the performance gap

PyPy >> CPython PyPy << C

slide-108
SLIDE 108

Berkin Ilbeyi1, Carl Friedrich Bolz-Tereick2, and Christopher Batten1

1 Cornell University, 2 Heinrich-Heine-Universität Düsseldorf

Cross-Layer Workload Characterization of Meta-Tracing JIT VMs