Accurate Garbage Collection in Uncooperative Environments with - - PowerPoint PPT Presentation

accurate garbage collection
SMART_READER_LITE
LIVE PREVIEW

Accurate Garbage Collection in Uncooperative Environments with - - PowerPoint PPT Presentation

Accurate Garbage Collection in Uncooperative Environments with Lazy Pointer Stacks Jason Baker, Antonio Cunei, Filip Pizlo , Jan Vitek Purdue University New Programming Language Translate Compile Hard! Old Programming Language d l O


slide-1
SLIDE 1

Accurate Garbage Collection

in

Uncooperative Environments

with

Lazy Pointer Stacks

Jason Baker, Antonio Cunei, Filip Pizlo, Jan Vitek Purdue University

slide-2
SLIDE 2

New Programming Language Native Code Compile

Hard!

Old Programming Language Translate U s e O l d C

  • m

p i l e r

slide-3
SLIDE 3

Which Old Language?

  • Lots of systems translate to C or C++
  • Can use freely available high quality

compilers like GCC to target many platforms (x86, AMD64, PowerPC, SPARC, ARM, etc.)

  • Portability and speed (almost) for free!
slide-4
SLIDE 4

So what is the Problem?

  • In three words: accurate garbage collection.
  • Most new languages have some form of

garbage collection.

  • Accurate garbage collection is often

preferred as it reclaims more memory and is more predictable than conservative garbage collection.

  • However, accurate GC requires accurate

stack maps.

slide-5
SLIDE 5

void foo(void) { void *ptr = alloc(); if (ptr == 0) error(); bar(ptr); } _foo: mflr r0 stmw r30,-8(r1) stw r0,8(r1) stwu r1,-80(r1) bl L_alloc$stub

  • mr. r30,r3

bne+ cr0,L2 bl L_error$stub L2: mr r3,r30 bl L_bar$stub lwz r0,88(r1) addi r1,r1,80 mtlr r0 lmw r30,-8(r1) blr

slide-6
SLIDE 6
  • Accurate GC requires accurate stack maps.
  • Most C/C++ compilers cannot provide

accurate stack maps.

  • We would like to scan the stack

accurately while still using a stock C++ compiler as our back-end.

slide-7
SLIDE 7

Old Approaches

  • Pointer Stacks
  • Henderson’s linked lists
slide-8
SLIDE 8

Pointer Stacks

  • Idea: put all pointer local variables into an

array in the heap.

  • Make accesses to these locals go to the

array.

  • To find pointers, just scan this array.
  • We say that the array is an explicit pointer

stack because it mimics the normal C stack but contains only pointers.

slide-9
SLIDE 9

Pointer Stacks

void foo() { void *ptr = alloc(); bar(ptr); } extern void **pStackTop; void foo() { pStackTop++; pStackTop[-1] = alloc(); bar(pStackTop[-1]); pStackTop--; }

slide-10
SLIDE 10

Henderson’s Linked Lists

  • See Henderson ISMM’02, or our paper, for

details.

  • Same basic idea as pointer stacks, but uses

a linked list instead of an array.

slide-11
SLIDE 11

Analysis of these approaches

  • Both approaches are legal C (or C++) and so

are portable - they will have the desired effect

  • n any standards-compliant compiler.
  • Both approaches make stack scanning very easy.
  • Neither approach allows register allocation of

pointer locals.

  • Both approaches add code to the prologue and

epilogue

slide-12
SLIDE 12

Can we do better?

  • The goal is to allow local pointers to be

register allocated.

  • Further, we wish to minimize the amount of

additional code in the prologue and epilogue.

  • Is this possible?
slide-13
SLIDE 13

The Idea

  • Keep pointers in local variables
  • Allow C++ compiler to place pointers

anywhere.

  • Have a mechanism for moving the pointers

from the C++ local variables to a well- known heap location on-demand.

slide-14
SLIDE 14
  • When the collector wishes to scan the stack, it

causes every thread to throw an exception.

  • Transform each safe point to catch the exception and

save pointers to a pointer stack.

  • After pointers are saved, the exception is rethrown.
  • When this process completes, two things will have

happened:

  • First, the collector will have accurate pointer

information, and

  • second, all thread stacks will be destroyed!
slide-15
SLIDE 15

void foo() { void *ptr = alloc(); try { bar(ptr); } catch (const StackScanException&) { lazyPtrStack->pushPtr(ptr); throw; } }

slide-16
SLIDE 16

Two problems remain!

  • First, we must find a way to restore the

stacks to their previous state so that the program can execute, and

  • second, we still need a way of allowing the

collector to restore the pointers to new values (to support moving collection).

slide-17
SLIDE 17

The solution to the first problem...

slide-18
SLIDE 18

Bootstrap Frame App Frame App Frame App Frame App Frame Context Switch stackBase stackCur (a) Switch to a thread that needs stack walking.

slide-19
SLIDE 19

Bootstrap Frame App Frame App Frame App Frame App Frame Context Switch stackBase stackCur App Frame App Frame App Frame App Frame Context Switch Stack Copy (b) Copy the portion of the stack that will be unwound.

slide-20
SLIDE 20

Bootstrap Frame stackBase, stackCur App Frame App Frame App Frame App Frame Context Switch Stack Copy (c) Stack is unwound, but we still have a copy.

slide-21
SLIDE 21

Bootstrap Frame App Frame App Frame App Frame App Frame Context Switch stackBase stackCur App Frame App Frame App Frame App Frame Context Switch Stack Copy (d) Restore the stack with a second copy, use context switch to restore registers. Thread is now back to where it was in (a).

slide-22
SLIDE 22

Problem 2: Moving GC

slide-23
SLIDE 23

What about pointer restoration?

  • We cannot directly modify the stack to update the pointers
  • because we still have no idea where the C++ compiler has

placed pointers!

  • All we can do is generate C++ code that performs pointer

replacement in the context of the affected frame.

  • Thus, we wish for some code to run at the safe point, but

this time:

  • We want to run the code when the called function

actually returns following GC,

  • and we want to take this opportunity to restore pointers.
slide-24
SLIDE 24
  • Assume for a moment that we can

magically throw an exception when we return for the first time into a frame after GC.

  • Then we can use the same strategy as

before: a catch block that runs restoration code.

slide-25
SLIDE 25

And the code looks like...

slide-26
SLIDE 26

void *ptr; try { functionCall(); } catch (const StackScanException&) { if (saving) { lazyPtrStack->pushPtr(ptr); throw; } else if (restoring) { ptr = lazyPtrStack->popPtr(); if (returned normally) { restore return value; } else { throw app exception; } } }

slide-27
SLIDE 27
  • How to run the pointer restoration code at the

right time?

  • When the GC runs, it updates pointers in

its own pointer stack, and then installs thunks at every frame on the stack.

  • The thunk throws the StackScanException

when invoked.

slide-28
SLIDE 28

Direction of stack growth

Return PC Return PC Caller Caller

(a) Ordinary callstack for C or C++ code.

slide-29
SLIDE 29
  • capture return values
  • catch user exceptions
  • restore proper return PC
  • throw StackScanException
  • or restore backup stack

and proceed with GC.

The Thunk

Direction of stack growth

Thunk PC Thunk PC Caller Caller

(b) "Thunkified" callstack.

slide-30
SLIDE 30

Thunk PC Caller

  • capture return values
  • catch user exceptions
  • restore proper return PC
  • throw StackScanException
  • or restore backup stack

and proceed with GC.

The Thunk

Direction of stack growth

Thunk PC Caller

(c) If a function completes (either by return or throw), the thunk runs.

Thunk Runs!

slide-31
SLIDE 31

“Safe Point Catch And Thunk”

  • 1. Throw an exception to trigger stack

scanning.

  • 2. Keep a backup copy of the original stack to

allow the thread to continue as normal after stack scanning.

  • 3. Install thunks that trigger pointer

restoration after the GC runs.

slide-32
SLIDE 32
  • We have also experimented with using a counting

scheme to emulate the exception and thunk scheme.

  • Put simply, each callsite contains instrumentation

that dynamically checks if pointers should be saved

  • r restored, by using counters that keep track of

stack height.

  • Collectively, we call this class of mechanisms “lazy

pointer stacks.”

slide-33
SLIDE 33

Implementation

  • We have implemented explicit pointer stacks,

Henderson’s linked lists, safe point catch and thunk, and pointer frame counting in the Ovm and J2c compiler.

  • Ovm is a real time Java virtual machine

developed at Purdue.

  • J2c is Ovm’s ahead-of-time compiler. It

generates C++ code, and GCC is used as the backend.

slide-34
SLIDE 34
  • By default, Ovm+J2c uses mostlyCopying, a

Bartlett-style semispace garbage collector that performs conservative stack scanning.

  • We have added the ability to perform accurate

stack scanning using the four techniques. The user is allowed to select the stack scanning style at compile time.

  • The mechanism is modular - any of Ovm’s

collectors, including our RTGC, can select any of the stack scanning implementations.

slide-35
SLIDE 35
  • We use the industry-standard SPECjvm98

benchmark suite.

  • Each benchmark was run with the five stack

scanning configurations (conservative, ptr stack, henderson, thunking, and counter) under Ovm+J2c+mostlyCopying at various heap sizes.

  • We used a Pentium IV Linux machine with 512

MB of RAM for all runs.

  • Additionally, we compared against the HotSpot

JVM or GCJ (see paper).

Experimental Evaluation

slide-36
SLIDE 36

Overhead relative to Conservative for Large heap (256MB)

c

  • m

p r e s s j e s s d b j a v a c m p e g a u d i

  • m

t r t j a c k G e

  • .

M e a n

  • 2.5%

0% 2.5% 5.0% 7.5% 10.0% 12.5% 15.0% 17.5% 20.0%

Ptr Stack Thunking Henderson Counter

Overhead relative to Conservative

slide-37
SLIDE 37

Overhead relative to Conservative for Small heap (32MB)

c

  • m

p r e s s j e s s d b j a v a c m p e g a u d i

  • m

t r t j a c k G e

  • .

M e a n

  • 20%
  • 15%
  • 10%
  • 5%

0% 5% 10% 15% 20% 25% 30%

Ptr Stack Thunking Henderson Counter

Overhead relative to Conservative

slide-38
SLIDE 38

Code Size for SPECjvm98 in KB

Conservative 3376 Ptr Stack 3857 Henderson 4031 Thunking 11081 Counter 9320

slide-39
SLIDE 39

See the paper for more algorithmic details and more performance evaluation (different heap sizes, some profiling, etc.) The End