EPFL, 2006
VIRTUAL EXECUTION ENVIRONMENTS
Jan Vitek
with material from Nigel Horspool and Jim Smith
VIRTUAL EXECUTION ENVIRONMENTS Jan Vitek with material from Nigel - - PowerPoint PPT Presentation
VIRTUAL EXECUTION ENVIRONMENTS Jan Vitek with material from Nigel Horspool and Jim Smith EPFL, 2006 Virtualization The Machine The Machine Abstraction Abstraction Software ! Different perspectives on ! Computer systems are
EPFL, 2006
Jan Vitek
with material from Nigel Horspool and Jim Smith
Dagstuhl, June 2005 EPFL, 2006
VEE’05
VEE '05 (c) 2005, J. E. Smith
6
Abstraction Abstraction
! Computer systems are built
! Higher level of abstraction
hide details at lower levels
! Example: files are an
abstraction of a disk
file file abstraction
I/O devices and Networking Controllers System Interconnect (bus) Controllers Memory Translation Execution Hardware Drivers Memory Manager Scheduler Operating System Libraries Application Programs Main Memory
Software Hardware
VEE '05 (c) 2005, J. E. Smith
8
The “Machine” The “Machine”
! Different perspectives on
what the Machine is:
! OS developer
Instruction Set Architecture
and software
I/O devices and Networking System Interconnect (bus) Memory Translation Execution Hardware Application Programs Main Memory Operating System Libraries
11
Dagstuhl, June 2005 EPFL, 2006
VEE’05
VEE '05 (c) 2005, J. E. Smith
6
Abstraction Abstraction
! Computer systems are built
! Higher level of abstraction
hide details at lower levels
! Example: files are an
abstraction of a disk
file file abstraction
I/O devices and Networking Controllers System Interconnect (bus) Controllers Memory Translation Execution Hardware Drivers Memory Manager Scheduler Operating System Libraries Application Programs Main Memory
Software Hardware
VEE '05 (c) 2005, J. E. Smith
9
The “Machine” The “Machine”
! Different perspectives on
what the Machine is:
! Compiler developer
Application Binary Interface
I/O devices and Networking System Interconnect (bus) Memory Translation Execution Hardware Application Programs Main Memory Operating System Libraries
12
Dagstuhl, June 2005 EPFL, 2006
VEE’05
VEE '05 (c) 2005, J. E. Smith
6
Abstraction Abstraction
! Computer systems are built
! Higher level of abstraction
hide details at lower levels
! Example: files are an
abstraction of a disk
file file abstraction
I/O devices and Networking Controllers System Interconnect (bus) Controllers Memory Translation Execution Hardware Drivers Memory Manager Scheduler Operating System Libraries Application Programs Main Memory
Software Hardware
VEE '05 (c) 2005, J. E. Smith
10
The “Machine” The “Machine”
! Different perspectives on
what the Machine is:
! Application programmer
Application Program Interface
I/O devices and Networking System Interconnect (bus) Memory Translation Execution Hardware Application Programs Main Memory Operating System Libraries
13
Dagstuhl, June 2005 EPFL, 2006
VEE’05
VEE '05 (c) 2005, J. E. Smith
6
Abstraction Abstraction
! Computer systems are built
! Higher level of abstraction
hide details at lower levels
! Example: files are an
abstraction of a disk
file file abstraction
I/O devices and Networking Controllers System Interconnect (bus) Controllers Memory Translation Execution Hardware Drivers Memory Manager Scheduler Operating System Libraries Application Programs Main Memory
Software Hardware
VEE '05 (c) 2005, J. E. Smith
11
System Virtual Machines System Virtual Machines
! Provide a system
environment
! Constructed at ISA
level
! Persistent ! Examples: IBM
VM/360, VMware, Transmeta Crusoe
guest process HOST PLATFORM
virtual network communication
Guest OS VMM guest process guest process guest process Guest OS2 VMM guest process guest process
14
Dagstuhl, June 2005 EPFL, 2006
VEE’05
VEE '05 (c) 2005, J. E. Smith
6
Abstraction Abstraction
! Computer systems are built
! Higher level of abstraction
hide details at lower levels
! Example: files are an
abstraction of a disk
file file abstraction
I/O devices and Networking Controllers System Interconnect (bus) Controllers Memory Translation Execution Hardware Drivers Memory Manager Scheduler Operating System Libraries Application Programs Main Memory
Software Hardware
VEE '05 (c) 2005, J. E. Smith
11
System Virtual Machines System Virtual Machines
! Provide a system
environment
! Constructed at ISA
level
! Persistent ! Examples: IBM
VM/360, VMware, Transmeta Crusoe
guest process HOST PLATFORM
virtual network communication
Guest OS VMM guest process guest process guest process Guest OS2 VMM guest process guest process VEE '05 (c) 2005, J. E. Smith
12
Process Virtual Machines Process Virtual Machines
! Constructed at ABI level ! Runtime manages guest
process
! Not persistent ! Guest processes may
intermingle with host processes
! As a practical matter,
guest and host OSes are
! Dynamic optimizers are a
special case
! Examples: IA-32 EL, FX!32,
Dynamo
HOST OS
Disk
file sharing network communication guest process create host process guest process
runtime runtime
guest process
runtime
host process
15
Dagstuhl, June 2005 EPFL, 2006
VEE’05
VEE '05 (c) 2005, J. E. Smith
6
Abstraction Abstraction
! Computer systems are built
! Higher level of abstraction
hide details at lower levels
! Example: files are an
abstraction of a disk
file file abstraction
I/O devices and Networking Controllers System Interconnect (bus) Controllers Memory Translation Execution Hardware Drivers Memory Manager Scheduler Operating System Libraries Application Programs Main Memory
Software Hardware
VEE '05 (c) 2005, J. E. Smith
11
System Virtual Machines System Virtual Machines
! Provide a system
environment
! Constructed at ISA
level
! Persistent ! Examples: IBM
VM/360, VMware, Transmeta Crusoe
guest process HOST PLATFORM
virtual network communication
Guest OS VMM guest process guest process guest process Guest OS2 VMM guest process guest process VEE '05 (c) 2005, J. E. Smith
13
High Level Language Virtual Machines High Level Language Virtual Machines
! Raise the level of abstraction
! Process VM (or API VM)
HLL Program Intermediate Code Memory Image Object Code (ISA) Compiler front-end Compiler back-end Loader HLL Program Portable Code (Virtual ISA ) Host Instructions
Compiler VM loader VM Interpreter/Translator Traditional HLL VM
16
Dagstuhl, June 2005
Introduction
Dagstuhl, June 2005 4 May 2005 2
Usual Programming Language Implementation
Compiler Front-End Compiler Back-End Source Code Intermediate Code Machine Code Compile-time actions
Dagstuhl, June 2005 4 May 2005 3
Another Programming Language Implementation
Compiler Front-End Interpreter Source Code Intermediate Code Run-time actions
Dagstuhl, June 2005 4 May 2005 4
And Another Implementation
Compiler Front-End Just-In-Time Compiler Source Code Intermediate Code Machine Code Run-time actions
Dagstuhl, June 2005 4 May 2005 5
An Overview
(IR)
1 compile-time (static) translation to machine code 2 emulation of the IR using an interpreter 3 run-time (dynamic) translation to machine code = JIT (Just- In-Time) compiling What is IR? IR is code for an idealized computer, a virtual machine.
Dagstuhl, June 2005 4 May 2005 7
Examples:
Language IR Implementation(s)
Java JVM bytecode Interpreter, JIT C# MSIL JIT (but may be pre-compiled) Prolog WAM code compiled, interpreted Forth bytecode interpreted Smalltalk bytecode interpreted Pascal p-code
compiled C, C++
Perl 6 PVM Parrot interpreted interpreted, JIT Python
sh, bash, csh
interpreted
Dagstuhl, June 2005 4 May 2005 10
Toy Bytecode File Format
We need a representation scheme for the bytecode. A simple
As well as 0 for STOP, we will use this opcode numbering: The order of the bytes in the integer operands is important. We will use big-endian order.
LDI LD ST ADD SUB EQ NE GT JMP JMPF READ WRITE 1 2 3 4 5 6 7 8 9 10 11 12
Dagstuhl, June 2005 4 May 2005 12
The Classic Interpreter Approach
It emulates the fetch/decode/execute stages of a computer.
for( ; ; ) {
switch(opcode) { case LDI: val = fetch4(pc); pc += 4; push(val); break; case LD: num = fetch2(pc); pc += 2; push( variable[num] ); break; ... case SUB: right = pop(); left = pop(); push( right-left ); ...
Dagstuhl, June 2005 4 May 2005 13
The Classic Interpreter Approach, cont’d
case JMP: pc = fetch2(pc); break; case JMPF: val = pop(); if (val) pc += 2; else pc = fetch2(pc); break; ... } /* end of switch */ } /* end of for loop */
Dagstuhl, June 2005 4 May 2005 15
Critique
checking for uninitialized variables, debugging, ... anything.
less than the equivalent compiled program.
program. The slowdown is 1 to 3 orders of magnitude (depending on the language). What can we do to speed up our interpreter?
Dagstuhl, June 2005 4 May 2005 16
Improving the Classic Interpreter
before beginning execution, thus avoiding run-time checks. We should also be able to verify that stacks cannot overflow or underflow.
ing opcode numbers to addresses of the opcode implementations ... LDI
4 byte unaligned integer
Dagstuhl, June 2005 4 May 2005 17
Classic Interpreter with Operation Addresses
The bytecode file ... as in our example
READ; ST 0; READ; ST 1; LD 0; LD 1; NE; JMPF 54; LD 0; LD 1; GT; JMPF 41; LD 0; LD 1; SUB; ST 0; JMP 51; LD 1; LD 0; SUB; ST 1; JMP 8; LD 0; WRITE; STOP
would be expanded into the following values when loaded into the interpreter’s bytecode array. and so on. Each value is a 4 byte address or a 4-byte operand.
&READ &ST &READ &ST 1 &LD ...
Dagstuhl, June 2005 4 May 2005 18
Classic Interpreter, cont’d
Now the interpreter dispatch loop becomes:
pc = 0; /* index of first instruction */ DISPATCH: goto *code[pc++]; LDI: val = *code[pc++]; push(val); goto DISPATCH; LD: num = *code[pc++]; push( variable[num] ); goto DISPATCH; ...
The C code can be a bit better still ...
Dagstuhl, June 2005 4 May 2005 19
Classic Interpreter, cont’d
Recommended C style for accessing arrays is to use a pointer to the array elements, so we get:
pc = &code[0]; /* pointer to first instruction */ DISPATCH: goto *pc++; LDI: val = *pc++; push(val); goto DISPATCH; LD: num = *pc++; push( variable[num] ); goto DISPATCH; ...
But let’s step back and see a new technique –
Dagstuhl, June 2005 4 May 2005 20
(Direct) Threaded Code Interpreters
Reference: James R. Bell, Communications of ACM 1973 Classic Interpreter
code for
dispatch op code for
code for next op
Threaded Code Interpreter
Dagstuhl, June 2005 4 May 2005
Threaded Code Interpreters, cont’d
As before the bytecode is a sequence of addresses (inter- mixed with operands needed by the ops) ... The interpreter code looks like this ...
&LDI 99 &LDI 23 &ADD &ST 5 ... /* start it going */ pc = &code[0]; goto *code[pc++]; LDI:
push(operand); goto *code[pc++]; ADD: right = pop(); left = pop(); push(left+right); goto *code[pc++]; ...
Dagstuhl, June 2005 4 May 2005
Threaded Code Interpreters, cont’d
As before, better C style is to use a pointer to the next element in the code ... This makes the implementation very similar to Bell’s, who pro- grammed for the DEC PDP11.
/* start it going */ pc = &code[0]; goto *(*pc++); LDI:
push(operand); goto *(*pc++); ADD: right = pop(); left = pop(); push(left+right); goto *(*pc++); ...
Dagstuhl, June 2005 4 May 2005 26
Further Improvements to Interpreters ...
A problem still being researched. (See the papers in the IVME annual workshop.) Speed improvement ideas include:
Space improvement ideas (for embedded systems?) include:
Dagstuhl, June 2005
The Java Virtual machine
Dagstuhl, June 2005 5 May 2005 3
A Main Reference Source
The JavaTM Virtual Machine Specification (2nd Ed) by Tim Lindholm & Frank Yellin Addison-Wesley, 1999 The book is on-line and available for download:
http://java.sun.com/docs/books/vmspec/
Dagstuhl, June 2005 5 May 2005 5
The Java Classfile
Dagstuhl, June 2005 5 May 2005 10
JVM Runtime Behaviour
Dagstuhl, June 2005 5 May 2005 11
VM Startup and Exit
Startup
Dagstuhl, June 2005 5 May 2005 12
Class Loading
Class object
ClassCircularityError, NoClassDefFoundError
Dagstuhl, June 2005 5 May 2005 13
Class Loaders
loader
java -verbose:class Test.java
(the bootstrap class loader is never unreachable)
Dagstuhl, June 2005 5 May 2005 14
Class Linking - 1. Verification
Dagstuhl, June 2005 5 May 2005 15
Class Linking - 2. Preparation
initializers)
Dagstuhl, June 2005 5 May 2005 16
Class Linking - 3. Resolution
allowed
Dagstuhl, June 2005 5 May 2005 17
Class Initialization
Happens once just before first instance creation, or first use
Dagstuhl, June 2005 5 May 2005 18
Instance Creation/Finalisation
Class
1 Allocate space for all the instance variables (including the inherited ones), 2 Initialize them with the default values 3 Call the appropriate constructor (do parent's first)
Dagstuhl, June 2005 5 May 2005 19
JVM Architecture
The internal runtime structure of the JVM consists of:
Dagstuhl, June 2005 11 May 2005 2
Run-Time Data Areas (Venners Figure 5-1)
method area native method pc registers Java stacks stacks heap runtime data areas native method interface execution engine class loader subsystem native method libraries class files
Dagstuhl, June 2005
EPFL, 2006
Dagstuhl, June 2005 11 May 2005 5
Datatypes of the JVM (Venners 5-4)
Primitive Types Numeric Types F.P. Types Integral Types Reference Types returnValue reference float double byte short int long char class types interface types array types two words
Dagstuhl, June 2005 11 May 2005 12
Control Transfer
Switch statement implementation
Comparison operations for long, float & double types
Dagstuhl, June 2005 11 May 2005 6
Load and Store Instructions
Transferring values between local variables and operand stack
and special cases of the above: iload_0, iload_1 ...
Pushing constants onto the operand stack
and special cases: iconst_0, iconst_1, ...
Dagstuhl, June 2005
11 May 2005 7
Arithmetic Operations
Operands are normally taken from operand stack and the re- sult pushed back there
Bitwise Operations
11 May 2005
Type Conversion Operations
Widening Operations
Narrowing Operations
Operand Stack Management
Dagstuhl, June 2005 11 May 2005 10
Object Creation and manipulation
aastore
Dagstuhl, June 2005 11 May 2005 13
Method Invocation / Return
EPFL, 2006
but not speed.
argument)
types
EPFL, 2006
either class/interface.
EPFL, 2006
INVOKEVIRTUAL, - instance method INVOKEINTERFACE, - interface method INVOKESPECIAL - constructor/private/super method INVOKESTATIC - class method
foo/baz/Myclass/myMethod(Ljava/lang/String;)V
| -------- | | | | classname methodname descriptor
EPFL, 2006
If C is interface, throw IncompatibleClassChangeError.
EPFL, 2006
done faster?
EPFL, 2006
creation, a class must be initialized.
and running the static initializers.
must check the status of the class.
EPFL, 2006
reduce the space requirements of exception handler’s finally clauses.
EPFL, 2006
int bar(int i) { try { if (i == 3) return this.foo(); } finally { this.ladida(); } return i; }
01 iload 1 // Push i 02 iconst 3 // Push 3 03 if icmpne 10 // Goto 10 if i does not e // Then case of if statement 04 aload 0 // Push this 05 invokevirtual foo // Call this.foo 06 istore 2 // Save result of this.foo 07 jsr 13 // Do finally block before 08 iload 2 // Recall result from this 09 ireturn // Return result of this.f // Else case of if statement 10 jsr 13 // Do finally block before // Return statement following try statement 11 iload 1 // Push i 12 ireturn // Return i // finally block 13 astore 3 // Save return address in 14 aload 0 // Push this 15 invokevirtual ladida // Call this.ladida() 16 ret 3 // Return to address saved // Exception handler for try body 17 astore 2 // Save exception 18 jsr 13 // Do finally block 19 aload 2 // Recall exception 20 athrow // Rethrow exception // Exception handler for finally body 21 athrow // Rethrow exception
Region Target 1–12 17 13–16 21
EPFL, 2006
2427 bytes [Freund98].
tools.
20 40 60 80 100 5 10 15 20 25 30 35 40 Number of methods Size in bytes Size of subroutines in JRE packages 20 40 60 80 100 120 140 10 20 30 40 50 60 Number of methods Growth in bytes Growth of code size after inlining (JRE)
Figure 7. Sizes of subroutines and size increase after inlining.
From Artho, Biere, Bytecode 2005.
EPFL, 2006
[Pugh99]
swingall javac Total size 3,265 516 excluding jar overhead 3,010 485 Field definitions 36 7 Method definitions 97 10 Code 768 114 Other 72 12 Constant pool 2,037 342 Utf8 entries 1,704 295 if shared 372 56 if shared and factored 235 26
EPFL, 2006
[Bradley,Horspool,Vitek,98]
icebrowserbean.jar
File Format Size % orig. size JAR file, uncompressed 260,178 100.0% JAR file, compressed 132,600 51.0% Clazz 97,341 37.4% Gzip 97,223 37.4% Jazz 59,321 22.8%
Dagstuhl, June 2005
Java Virtual Machine, part three
Dagstuhl, June 2005 12 May 2005 4
Verification
semantics, and
There are many aspects to verification
Dagstuhl, June 2005 12 May 2005 5
Verification, cont’d
Some Checks during Loading
Additional Checks after/during Loading
Dagstuhl, June 2005 12 May 2005 6
Additional Checks after/during Loading, cont’d
method signatures etc)
A Final Check (required before method is executed)
This last check is very complicated (so complicated that Sun got it wrong a few times)
Dagstuhl, June 2005 12 May 2005 7
Verifying Bytecode
The requirements
a local variable) are in range.
destination which is in range and is the start of an instruction
correct datatypes
the class instance is used
Dagstuhl, June 2005 12 May 2005 8
The requirements, cont’d
the beginnings of instructions, and the start must be before the end
instruction
Dagstuhl, June 2005 12 May 2005 9
Sun’s Verification Algorithm
A before state is associated with each instruction. The state is:
element), plus
uninitialized or unusable or the datatype) A datatype is integral, long, float, double or any reference type Each instruction has an associated changed bit:
Dagstuhl, June 2005 12 May 2005 10
Sun’s Verification Algorithm, cont’d
do forever { find an instruction I whose changed bit is true; if no such instruction exists, return SUCCESS; set changed bit of I to false; state S = before state of I; for each operand on stack used by I verify that the stack element in S has correct datatype and pop the datatype from the stack in S; for each local variable used by I verify that the variable is initialized and has the correct datatype in S; if I pushes a result on the stack, verify that the stack in S does not overflow, and push the datatype onto the stack in S; if I modifies a local variable, record the datatype of the variable in S ... continued
Dagstuhl, June 2005 12 May 2005 11
Sun’s Verification Algorithm, cont’d
determine SUCC, the set of instructions which can follow I; (Note: this includes exception handlers for I) for each instruction J in SUCC do merge next state of I with the before state of J and set J’s changed bit if the before state changed; (Special case: if J is a destination because of an exception then a special stack state containing a single instance of the exception object is created for merging with the before state of J.) } // end of do forever
Verification fails if a datatype does not match with what is re- quired by the instruction, the stack underflows or overflows,
different heights.
Dagstuhl, June 2005 12 May 2005 12
Sun’s Verification Algorithm, cont’d
Merging two states
merging the types of corresponding elements.
merging the types of corresponding variables.
The result of merging two types:
if they are both references, then the result is the first common superclass (lowest common ancestor in class hierarchy);
Dagstuhl, June 2005 16 May 2005 2
Example (Leroy, Figure 1):
static int factorial( int n ) { int res; for (res = 1; n > 0; n--) res = res * n; return res; }
Corresponding JVM bytecode:
method static int factorial(int), 2 variables, 2 stack slots 0: iconst_1 // push the integer constant 1 1: istore_1 // store it in variable 1 (res) 2: iload_0 // push variable 0 (the n parameter) 3: ifle 14 // if negative or null, go to PC 14 6: iload_1 // push variable 1 (res) 7: iload_0 // push variable 0 (n) 8: imul // multiply the two integers at top of stack 9: istore_1 // pop result and store it in variable 1 10: iinc 0, -1 // decrement variable 0 (n) by 1 11: goto 2 // go to PC 2 14: iload_1 // load variable 1 (res) 15: ireturn // return its value to caller
Dagstuhl, June 2005 16 May 2005 3
Sun’s Analysis Algorithm
where I = integral; T = uninitialized/unusable; ? = = unknown
Chng’d State before
Instruction
State after Stack Locals Stack Locals X () (I,T) 0: iconst_1
(?,?) 1: istore_1
(?,?) 2: iload_0
(?,?) 3: ifle 14
(?,?) 6: iload_1
(?,?) 7: iload_0
(?,?) 8: imul
(?,?) 9: istore_1
(?,?) 10: iinc 0, -1
(?,?) 11: goto 2
(?,?) 14: iload_1
(?,?) 15: ireturn
T
Dagstuhl, June 2005 16 May 2005 4
Sun’s Analysis Algorithm - after 1 step
Chng’d State before
Instruction
State after Stack Locals Stack Locals
(I,T) 0: iconst_1 (I) (I,T) X (I) (I,T) 1: istore_1
(?,?) 2: iload_0
(?,?) 3: ifle 14
(?,?) 6: iload_1
(?,?) 7: iload_0
(?,?) 8: imul
(?,?) 9: istore_1
(?,?) 10: iinc 0, -1
(?,?) 11: goto 2
(?,?) 14: iload_1
(?,?) 15: ireturn
Dagstuhl, June 2005 16 May 2005 5
Sun’s Analysis Algorithm - after 4 steps
Chng’d State before
Instruction
State after Stack Locals Stack Locals
(I,T) 0: iconst_1
(I,T) 1: istore_1
(I,I) 2: iload_0
(I,I) 3: ifle 14 () (I,I) X () (I,I) 6: iload_1
(?,?) 7: iload_0
(?,?) 8: imul
(?,?) 9: istore_1
(?,?) 10: iinc 0, -1
(?,?) 11: goto 2 X () (I,I) 14: iload_1
(?,?) 15: ireturn
Dagstuhl, June 2005 16 May 2005 6
Analysis Algorithm - after 12 steps
and we have completed the verification without error.
Chng’d State before
Instruction
State after Stack Locals Stack Locals
(I,T) 0: iconst_1
(I,T) 1: istore_1
(I,I) 2: iload_0
(I,I) 3: ifle 14
(I,I) 6: iload_1
(I,I) 7: iload_0
8: imul
(I,I) 9: istore_1
(I,I) 10: iinc 0, -1
(I,I) 11: goto 2
(I,I) 14: iload_1
(I,I) 15: ireturn () (I,I)
Dagstuhl, June 2005 16 May 2005 10
Some of the Lattice of Types (Leroy, Figure 3)
T Object float int Object[] Object[][] C D E null int[] float[] C[] D[] E[] C[][] D[][] E[][] T
class C { } class D extends C { } class E extends C { }
not in Leroy’s lattice
Dagstuhl, June 2005 16 May 2005 11
Merging Types
lub(t1,t2)
(the well-foundedness property). The step in Sun’s verification algorithm where types are merged is implemented as lub. The finiteness property guarantees that Sun’s algorithm will converge in a finite number of steps.
Dagstuhl, June 2005
Garbage Collection –
(based on chapter 2 of Jones and Lins)
Dagstuhl, June 2005 20 May 2005 2
Reference Counting
(references to files come from directories)
references to the object; if the count becomes zero, the storage of the object is immediately reclaimed (put into a free list?)
Dagstuhl, June 2005 20 May 2005 3
Pseudocode for Reference Counting
rc is the reference count field in the object
// called by program to get a // new object instance function New(): if freeList == null then report an error; newcell = allocate(); newcell.rc = 1; return newcell; // called by program to overwrite // a pointer variable R with // another pointer value S procedure Update(var R, S): if S != null then S.rc += 1; delete(*R); *R = S; // called by New function allocate(): newcell = freeList; freeList = freeList.next; return newcell; // called by Update procedure delete(T): T.rc -= 1; if T.rc == 0 then foreach pointer U held inside object T do delete(*U); free(T); // called by delete procedure free(N): N.next = freeList; freeList = N;
Dagstuhl, June 2005 20 May 2005 4
Benefits of Reference Counting
smooth response times in interactive situations. (Contrast with a stop and collect approach.)
locations which were probably going to be touched anyway. (Contrast with a marking phase which walks all over memory.)
reference counting will reclaim them and reuse them quickly. (Contrast with a scheme where the dead objects remain unused for a long period until the next gc and get paged out of memory.)
Dagstuhl, June 2005 20 May 2005 7
Issues with Reference Counting, cont’d
2 1 1 1 local variable P
Dagstuhl, June 2005 20 May 2005 10
Mark-Sweep (aka Mark-Scan) Algorithm
the mark-sweep gc to return inaccessible objects to the free pool and then resumes
Dagstuhl, June 2005 20 May 2005 11
Pseudocode for Mark-Sweep
function New(): if freeList == null then markSweep(); newcell = allocate(); return newcell; // called by New function allocate(): newcell = freeList; freeList = freeList.next; return newcell; procedure free(P): P.next = freeList; freeList = P; procedure markSweep(): foreach R in RootSet do mark(R); sweep(); if freeList == null then abort "memory exhausted" // called by markSweep procedure mark(N): if N.markBit == 0 then N.markBit = 1; foreach pointer M held inside the object N do mark(*M); // called by markSweep procedure sweep(): K = address of heap bottom; while K < heap top do if K.markBit == 0 then free(K); else K.markBit = 0; K += size of object referenced by K;
Dagstuhl, June 2005 20 May 2005 12
Pros and Cons of Mark-Sweep GC
interrupted for about 4.5 seconds every 79 seconds.
across the heap
(causing more frequent gc)
Dagstuhl, June 2005 20 May 2005 13
Copying Garbage Collectors
fromSpace and the toSpace.
fromSpace) into the new space (the toSpace), and the program’s variables are updated to use the new copies.
copying process (no gaps are left).
Dagstuhl, June 2005 20 May 2005 15
Example of Copying Collector in Action
toSpace fromSpace root 1
Dagstuhl, June 2005 20 May 2005 16
Example of Copying Collector in Action
... the root node is copied, and a forwarding pointer added
fromSpace toSpace root 1
Dagstuhl, June 2005 20 May 2005 17
Example of Copying Collector in Action
... the left child of first node is copied
fromSpace toSpace root 1
Dagstuhl, June 2005 20 May 2005 18
Example of Copying Collector in Action
... and the right child of the first node is copied
fromSpace toSpace root 1
Dagstuhl, June 2005 20 May 2005 19
Example of Copying Collector in Action
... and when the right child of the right child is copied ...
fromSpace toSpace root 1 1
Dagstuhl, June 2005 20 May 2005 20
Example of Copying Collector in Action
... and we are almost finished
fromSpace toSpace root 1 1
Dagstuhl, June 2005 20 May 2005 21
Example of Copying Collector in Action
done ... and we carry on allocating new nodes in the toSpace
fromSpace toSpace root 1
Dagstuhl, June 2005 20 May 2005 14
Pseudocode for a Copying Collector
procedure init(): toSpace = start of heap; spaceSize = heap size / 2; topOfSpace =toSpace+spaceSize; fromSpace = topOfSpace+1; free = toSpace; // n = size of object to allocate function New(n): if free + n > topOfSpace then flip(); if free + n > topOfSpace then abort "memory exhausted"; newcell = free; free += n; return newcell; procedure flip(): fromSpace, toSpace = toSpace, fromSpace; free = toSpace; for R in RootSet do R = copy(R); // parameter P points to a word, // not to an object function copy(P): if P is not a pointer
return P; if P[0] is not a pointer into toSpace then n = size of object referenced by P; PP = free; free += n; temp = P[0]; P[0] = PP; PP[0] = copy(temp); for i = 0 to n-1 do PP[i] = copy(P[i]); return P[0]; // Note: // The first word of an object, // P[0], serves a dual role to // hold a forwarding pointer.
Dagstuhl, June 2005 20 May 2005 22
Pros and Cons of Copying Collectors
(may not be a problem with virtual memory systems where we can have big address spaces)
Jason Baker, Antonio Cunei, Chapman Flack, Filip Pizlo, Marek Prochazka, Krista Grothoff, Christian Grothoff, Andrey Madan, Gergana Markova, Jeremy Manson, Krzystof Palacz, Jacques Thomas, Jan Vitek, Hiroshi Yamauchi
Purdue University
David Holmes
DLTeCH DARPA Program Composition for Embedded Systems (PCES) NSF/HDCP - Assured Software Composition for Real-Time Systems
January 2006
Our mission: implement a Real-time Specification for Java compliant VM Only other RTSJVM was an interpreter & proprietary Target is avionics software for the Boeing/Insitu ScanEagle UAV
January 2006
A clean-room implementation Internal project goal:
A Java-in-Java VM 150KLoc of Java, 15Kloc of C code GNU classpath libraries + our own RTSJ implementation
January 2006
0.0 0.5 1.0 1.5 2.0 c
p r e s s j e s s d b j a v a c m p e g a u d i
t r t j a c k
Time, relative to Ovm Ovm 1.01 RTSJ Ovm 1.01 GCJ 4.0.2 HotSpot1.5.0.06 jTime 1.0 5.6 2.2 12.2 4.4
January 2006
Bootstrapped under Hotspot Configuration and partial evaluation Generate an executable image (data+code) IR-spec + interpreter generation
Stage 1: code, metadata and data in standard Java format
JVM-hosted self-hosted
Stage 2: code and metadata in OvmIR format Stage 3: data in Ovm specific format Stage 4: complete Ovm configuration
Rewriting Image serialization Loading
January 2006
Core Services Access Ovm Kernel Runtime Exports
User domain Executive domain
Domain Reflection Library Imports Library Glue GNU CLASSPATH Java Application
CSA downcalls from Java bytecode CSA uses Ovm kernel methods to implement Java bytecode semantics Cross-domain calls.
January 2006
Separation is necessary
Each domain can have it’s memory manager, scheduler, class libraries, and even object model
cross domain accesses are reflective enforced by the type system -- requires Object not to be builtin special handling of exceptions crossing boundaries
January 2006
JavaInJava anecdotal evidence of lower bug rates same optimizing compiler for VM & user code fewer cross-language calls
public Oop updateReference(MovingGC oop) { int sz = oop.getBluepqrint().getVariableSize(oop); if (sz >= blockSize) { movedBytes += sz; VM_Word off = VM_Address.fromObject(oop).diff(heapBase); int idx = off.asInt() >>> blockShift; block.pin(idx); return oop; } else { VM_Address newLoc = getHeapMem(sz, false); Mem.the().cpy(newLoc, VM_Address.fromObject(oop), sz);
return newLoc.asOop(); }
January 2006
January 2006
VM_Address getMem(int size) throws PragmaNoPollcheck, PragmaNoBarriers { VM_Address ret = base().add(offset);
Mem.the().zero(ret.add(ALIGN), offset == rsize?size-ALIGN:size); return ret; }
allocator is used by scoped memory areas and ensure allocation times linear in the size of the allocated object (due to zeroing). Notice the use of the VM Address types to represent native memory locations.
static VM_Address* getMem(TransientArea* area, jint size){ jint s1 = area + area->offset; area->offset += size; jint s2 = s1 + (&SplitRegionManager)->ALIGN; jint s3 = (area->offset == area->rsize)? (size-(&SplitRegionManager)->ALIGN) : size; PollingAware_zero(roots->values[57]), s2, s3); return sl; }
casts are omitted, and names shortened for readability.) This method is not virtual and can be inlined by the GCC
In fact, after translation all occurrence of dynamic method invocation have been eliminated.
January 2006
execution engine static analysis fast locks memcopy I/O system threading transactions
aot / jit / interp
fast / bounded-latency SIGIOSocketsPollingOther ( Profiling ) / SIGIOSockectsStallingFilesPolling java / realtime / profiling
AllCopy:B-M-F-H MostlyCopySplitRegions:B-Mf-F-H MostlyCopyWB:B-Mf-F-H MostlyCopyRegions:B-M-F-H MostlyCopyingRegions-B_Mf_F_H MostlyCopyingSC-B_M_F_H minimalMM-B_M_J_H MostlyCopyingSplitRegions- SelectSocketsPollingOther SelectSocketsStallingFiles- PollingOther pip / time preemptive MostlyCopyWB:B-M-F-H JMTk:B-M-J-H MostlyCopy:B-M-F-H SimpleSemiSpace:B-M-F-H minimalMM-B_0M minimalMM-B_M
January 2006
Configuration mechanisms Interfaces and inheritance are not sufficient (we have 3371 classes and ~450 interfaces) AOP should be revisited Component systems such as Jiazzi, Scala... We rolled our own...
January 2006
Configuration mechanisms, example transactions: Implementing a form of transactional memory in Ovm takes about ~1200 lines code. Changes to the sources of the VM, ~40 lines in 34 different places, e.g.:
void runThread(OVMThread t) throws PragmaNoPollcheck{ boolean aborting = Transaction.the().preRunThreadHook(thisThread, t); setCurrentThread(t); Processor.getCurrentProcessor().run(t.getContext()); ... Transaction.the().postRunThreadHook(aborting);
January 2006
boolean aborting = Transaction.the().preRunThreadHook(thisThread, t);
Generated C code
jboolean _stack_2 = S3Transaction_preRunThreadHook(e.roots->vals[97]), _stack_0, _stack_1);
Stitcher specification
# Select an implementation of the transactional API described in the # Preemptible Atomic Region paper. EmptyTransaction gives the # default behavior. S3Transaction is the real thing. s3.services.transactions.Transaction \ s3.services.transactions.S3Transaction
January 2006
GCC as a backend
cross-platform portability using C++ exceptions is suboptimal inlining can lead to bloat and long compile times No precise GC ... but working on it.
January 2006
Cooperative Scheduling OS-independent Priority inversion avoidance (PIP/PCE) supported in a portable fashion and optimized by the compiler but, we had to implement our own non-blocking I/O
·
void someMethod() { ... while(...) { ... } }
⇒
void someMethod() { POLLCHECK(); ... while(...) { ... POLLCHECK(); } }
POLLCHECK: if (pollUnion.pollWord == 0) { pollUnion.s.notSignaled = 1; pollUnion.s.notEnabled = 1; handleEvents(); }
January 2006
·
0.2% 0.7% 1.2% 1.7% 2.2% 2.7%
compress jess db javac mpegaudio mtrt jack
enabling polling did not slow down the benchmark.