1
Research supported by IBM CAS, NSERC, CITO
Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters
Marc Berndl Benjamin Vitale Mathew Zaleski Angela Demke Brown
Context Threading: A flexible and efficient dispatch technique for - - PowerPoint PPT Presentation
Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters Marc Berndl Benjamin Vitale Mathew Zaleski Angela Demke Brown Research supported by IBM CAS, NSERC, CITO 1 Interpreter performance Why not
1
Research supported by IBM CAS, NSERC, CITO
Marc Berndl Benjamin Vitale Mathew Zaleski Angela Demke Brown
Context Threading
2
yet have JITs
Context Threading
3
Context Threading
load
4
Loaded Program
Virtual Program
Return Address Wayness (Conditional)
Execution Cycle Bytecode Bodies Pipeline
Target Address (Indirect)
Predictors
Execution Cycle Virtual Machine Interpreter
Real Machine CPU
Context Threading
5
Bytecode bodies Internal Representation
dispatch Load Parms
Execution Cycle
Context Threading
0: iconst_0 1: istore_1 2: iload_1 3: iload_1 4: iadd 5: istore_1 6: iload_1 7: bipush 64 9: if_icmplt 2 12: return
6
void foo(){ int i=1; do{ i+=i; } while(i<64); }
Javac compiler
Context Threading
while(1){
switch(opcode){ //and many more.. } };
7
case iload_1: .. break;
case iadd: .. break;
Context Threading
8
(as in needle & thread)
iload_1: .. goto *vPC++; iadd: .. goto *vPC++; istore: .. goto *vPC++; 0: iconst_0 1: istore_1 2: iload_1 3: iload_1 4: iadd 5: istore_1 6: iload_1 7: bipush 64 9: if_icmplt 2 12: return
Context Threading
0: iconst_0 1: istore_1 2: iload_1 3: iload_1 4: iadd 5: istore_1 6: iload_1 7: bipush 64 9: if_icmplt 2 12: return
9
iload_1: .. goto *vPC++; iadd: .. goto *vPC++; istore: .. goto *vPC++;
indirect branch predictor (micro-arch)
Context Threading
10
&&if_icmplt 64 &&bipush &&iload_1 &&istore_1 &&iadd &&iload_1 &&iload_1 … iload_1 iload_1 iadd istore_1 iload_1 bipush 64 if_icmplt 2 …
DTT - Direct Threading Table Virtual Program vPC
iload_1: .. goto *vPC++; iadd: .. goto *vPC++;
C implementation
istore: .. goto *vPC++;
Context Threading
11
Body Body Body Body Body GOTO *PC ???? Piumarta & Ricardi : Bodies Replicated Super Instruction Replicate iload_1 goto *pc 1 iload_1 goto *pc 2 1 1 2 2 Ertl & Gregg: Bodies and Dispatch Replicated
Context Threading
12
Context Threading
13
Context Threading
14
iload_1: .. ret; iadd: .. ret; .. call iload_1 call istore_1 call iadd call iload_1 call iload_1
CTT - Context Threading Table (generated code) Bytecode bodies (ret terminated) Return Branch Predictor Stack
… iload_1 iload_1 iadd istore_1 iload_1 bipush 64 if_icmplt 2 …
Context Threading
15
iload_1: … ret; iadd: … ret;
call bipush call if_icmplt call iload_1 call istore_1 call iadd call iload_1 call iload_1
CTT load time generated code Bytecode bodies (ret terminated)
if_cmplt: … goto *vPC++;
… iload_1 iload_1 iadd istore_1 iload_1 bipush 64 if_icmplt 2 … 64
DTT contains addresses in CTT vPC
Context Threading
16
Context Threading
17
Conditional Branch Predictor now mobilized
… … target: … call … call iload_1
if(icmplt) goto target:
…
Branch Inlined Into the CTT
5
DTT vPC
target:
… …
Context Threading
18
Context Threading
19
Context Threading
20
Context Threading
21
0.25 0.50 0.75 1.00 compress db jack javac jess mpeg mtrt ray scimark soot Subroutine Branch Inlining Tiny Inlining Normalized to Direct Threading
SableVm/Java Pentium 4
Context Threading
22
0.25 0.50 0.75 1.00 c
p r e s s d b j a c k j a v a c j e s s m p e g m t r t r a y s c i m a r k s
Subroutine Branch Inlining Tiny Inlining
Normalized to Direct Threading
Pentium 4
Context Threading
23
0.25 0.50 0.75 1.00 j a v a / p 4 j a v a / p p c
a m l / p 4
a m l / p p c Subroutine Branch Inlining Tiny Inlining Normalized to Direct Threading
Context Threading
24
Context Threading
threading to TCL.
bytecode dispatched.
subroutine threading,
25
100 101 102 103 104 105 Tcl or Ocaml Benchmark Cycles per Dispatch
Tcl Ocaml
Cycles per virtual instruction