Schism: Fragmentation-Tolerant Real-Time Garbage Collection Fil - - PowerPoint PPT Presentation

schism fragmentation tolerant real time garbage collection
SMART_READER_LITE
LIVE PREVIEW

Schism: Fragmentation-Tolerant Real-Time Garbage Collection Fil - - PowerPoint PPT Presentation

Schism: Fragmentation-Tolerant Real-Time Garbage Collection Fil Pizlo Luke Ziarek Peta Maj * Tony Hosking * Ethan Blanton Jan Vitek * * Friday, June 11, 2010 Why another Real Time Garbage Collector? Friday, June 11, 2010


slide-1
SLIDE 1

Schism: Fragmentation-Tolerant Real-Time Garbage Collection

Fil Pizlo† Tony Hosking* Luke Ziarek† Ethan Blanton† Peta Maj* Jan Vitek†*

† *

Friday, June 11, 2010

slide-2
SLIDE 2

Why another Real Time Garbage Collector?

Friday, June 11, 2010

slide-3
SLIDE 3

Why another Real Time Garbage Collector?

  • Real-time programmers want hard bounds on both Space and

Time.

Friday, June 11, 2010

slide-4
SLIDE 4

Why another Real Time Garbage Collector?

  • Real-time programmers want hard bounds on both Space and

Time.

  • Previous RTGCs either:
  • Fail to bound space, or

Friday, June 11, 2010

slide-5
SLIDE 5

Why another Real Time Garbage Collector?

  • Real-time programmers want hard bounds on both Space and

Time.

  • Previous RTGCs either:
  • Fail to bound space, or
  • Cause large slow-downs.

Friday, June 11, 2010

slide-6
SLIDE 6

Why another Real Time Garbage Collector?

  • Real-time programmers want hard bounds on both Space and

Time.

  • Previous RTGCs either:
  • Fail to bound space, or
  • Cause large slow-downs.
  • We propose a new RTGC called Schism, which
  • bounds space while

Friday, June 11, 2010

slide-7
SLIDE 7

Why another Real Time Garbage Collector?

  • Real-time programmers want hard bounds on both Space and

Time.

  • Previous RTGCs either:
  • Fail to bound space, or
  • Cause large slow-downs.
  • We propose a new RTGC called Schism, which
  • bounds space while
  • running faster than other RTGCs.

Friday, June 11, 2010

slide-8
SLIDE 8

What Schism Real-Time GC provides:

Friday, June 11, 2010

slide-9
SLIDE 9

What Schism Real-Time GC provides:

  • executes concurrently
  • guarantees progress for heap accesses
  • minimizes heap access overhead
  • gives uniformly good throughput
  • minimizes external fragmentation

Friday, June 11, 2010

slide-10
SLIDE 10

What Schism Real-Time GC provides:

  • executes concurrently
  • guarantees progress for heap accesses
  • minimizes heap access overhead
  • gives uniformly good throughput
  • minimizes external fragmentation

p r e e m p t i b l e a t a n y t i m e

Friday, June 11, 2010

slide-11
SLIDE 11

What Schism Real-Time GC provides:

  • executes concurrently
  • guarantees progress for heap accesses
  • minimizes heap access overhead
  • gives uniformly good throughput
  • minimizes external fragmentation

p r e e m p t i b l e a t a n y t i m e w a i t

  • f

r e e

Friday, June 11, 2010

slide-12
SLIDE 12

What Schism Real-Time GC provides:

  • executes concurrently
  • guarantees progress for heap accesses
  • minimizes heap access overhead
  • gives uniformly good throughput
  • minimizes external fragmentation

p r e e m p t i b l e a t a n y t i m e w a i t

  • f

r e e O ( 1 ) , a f e w i n s t r u c t i

  • n

s

Friday, June 11, 2010

slide-13
SLIDE 13

What Schism Real-Time GC provides:

  • executes concurrently
  • guarantees progress for heap accesses
  • minimizes heap access overhead
  • gives uniformly good throughput
  • minimizes external fragmentation

p r e e m p t i b l e a t a n y t i m e w a i t

  • f

r e e O ( 1 ) , a f e w i n s t r u c t i

  • n

s f a s t e s t R T G C

Friday, June 11, 2010

slide-14
SLIDE 14

What Schism Real-Time GC provides:

  • executes concurrently
  • guarantees progress for heap accesses
  • minimizes heap access overhead
  • gives uniformly good throughput
  • minimizes external fragmentation

p r e e m p t i b l e a t a n y t i m e w a i t

  • f

r e e O ( 1 ) , a f e w i n s t r u c t i

  • n

s f a s t e s t R T G C p r

  • v

e n s p a c e b

  • u

n d s ( s e e a p p e n d i x )

Friday, June 11, 2010

slide-15
SLIDE 15

Real Time Garbage Collection: state of the art

Friday, June 11, 2010

slide-16
SLIDE 16

Real Time Garbage Collection: state of the art

  • Baseline: HotSpot 1.6 collector: Fast, hard space bounds.

Friday, June 11, 2010

slide-17
SLIDE 17

Real Time Garbage Collection: state of the art

  • Baseline: HotSpot 1.6 collector: Fast, hard space bounds.
  • but: not concurrent, not suitable for hard real-time

Friday, June 11, 2010

slide-18
SLIDE 18

Real Time Garbage Collection: state of the art

  • Baseline: HotSpot 1.6 collector: Fast, hard space bounds.
  • but: not concurrent, not suitable for hard real-time
  • Java RTS: hard space bounds, concurrent, wait-free.

Friday, June 11, 2010

slide-19
SLIDE 19

Real Time Garbage Collection: state of the art

  • Baseline: HotSpot 1.6 collector: Fast, hard space bounds.
  • but: not concurrent, not suitable for hard real-time
  • Java RTS: hard space bounds, concurrent, wait-free.
  • but: 60% slow-down, logarithmic heap access

Friday, June 11, 2010

slide-20
SLIDE 20

Real Time Garbage Collection: state of the art

  • Baseline: HotSpot 1.6 collector: Fast, hard space bounds.
  • but: not concurrent, not suitable for hard real-time
  • Java RTS: hard space bounds, concurrent, wait-free.
  • but: 60% slow-down, logarithmic heap access
  • J9 SRT (Metronome): only 30% slow-down, concurrent, wait-

free.

Friday, June 11, 2010

slide-21
SLIDE 21

Real Time Garbage Collection: state of the art

  • Baseline: HotSpot 1.6 collector: Fast, hard space bounds.
  • but: not concurrent, not suitable for hard real-time
  • Java RTS: hard space bounds, concurrent, wait-free.
  • but: 60% slow-down, logarithmic heap access
  • J9 SRT (Metronome): only 30% slow-down, concurrent, wait-

free.

  • but: susceptible to fragmentation

Friday, June 11, 2010

slide-22
SLIDE 22

Real Time Garbage Collection: state of the art

  • Baseline: HotSpot 1.6 collector: Fast, hard space bounds.
  • but: not concurrent, not suitable for hard real-time
  • Java RTS: hard space bounds, concurrent, wait-free.
  • but: 60% slow-down, logarithmic heap access
  • J9 SRT (Metronome): only 30% slow-down, concurrent, wait-

free.

  • but: susceptible to fragmentation

We want something as fast as Metronome, but fragmentation-tolerant like Java RTS.

Friday, June 11, 2010

slide-23
SLIDE 23

Previous Approaches to Minimizing Fragmentation in RTGC

Friday, June 11, 2010

slide-24
SLIDE 24

On-demand Defragmentation

Friday, June 11, 2010

slide-25
SLIDE 25

On-demand Defragmentation

  • Stop-the-world or incremental: simple, but causes pauses.
  • we don’t want pauses.

Friday, June 11, 2010

slide-26
SLIDE 26

On-demand Defragmentation

  • Stop-the-world or incremental: simple, but causes pauses.
  • we don’t want pauses.
  • Concurrent: still has draw-backs

Friday, June 11, 2010

slide-27
SLIDE 27

On-demand Defragmentation

  • Stop-the-world or incremental: simple, but causes pauses.
  • we don’t want pauses.
  • Concurrent: still has draw-backs
  • Custom hardware? [Click et al ’05]

Friday, June 11, 2010

slide-28
SLIDE 28

On-demand Defragmentation

  • Stop-the-world or incremental: simple, but causes pauses.
  • we don’t want pauses.
  • Concurrent: still has draw-backs
  • Custom hardware? [Click et al ’05]
  • throughput penalty during defrag is 5x or more. [Pizlo et al

’07], [Pizlo et al ’08]

time performance

Defrag starts

Friday, June 11, 2010

slide-29
SLIDE 29

On-demand Defragmentation

  • Stop-the-world or incremental: simple, but causes pauses.
  • we don’t want pauses.
  • Concurrent: still has draw-backs
  • Custom hardware? [Click et al ’05]
  • throughput penalty during defrag is 5x or more. [Pizlo et al

’07], [Pizlo et al ’08]

time performance

Defrag starts Defrag ends

Friday, June 11, 2010

slide-30
SLIDE 30

On-demand Defragmentation

  • Stop-the-world or incremental: simple, but causes pauses.
  • we don’t want pauses.
  • Concurrent: still has draw-backs
  • Custom hardware? [Click et al ’05]
  • throughput penalty during defrag is 5x or more. [Pizlo et al

’07], [Pizlo et al ’08]

time performance

Defrag starts Defrag ends

Worst-case throughput penalty is too large.

Friday, June 11, 2010

slide-31
SLIDE 31

Replication-based GC

Friday, June 11, 2010

slide-32
SLIDE 32

Replication-based GC

  • See: [Nettles-O’Toole ’93], [Cheng-Blelloch ’01]
  • Allows concurrent defragmentation

Friday, June 11, 2010

slide-33
SLIDE 33

Replica

Replication-based GC

  • See: [Nettles-O’Toole ’93], [Cheng-Blelloch ’01]
  • Allows concurrent defragmentation
  • Two spaces: one space for reads; writes “replicated” to both

spaces

Original Application

Friday, June 11, 2010

slide-34
SLIDE 34

Replica

Replication-based GC

  • See: [Nettles-O’Toole ’93], [Cheng-Blelloch ’01]
  • Allows concurrent defragmentation
  • Two spaces: one space for reads; writes “replicated” to both

spaces

Original Application

Read

Friday, June 11, 2010

slide-35
SLIDE 35

Replica

Replication-based GC

  • See: [Nettles-O’Toole ’93], [Cheng-Blelloch ’01]
  • Allows concurrent defragmentation
  • Two spaces: one space for reads; writes “replicated” to both

spaces

Original Application

Read Write

Friday, June 11, 2010

slide-36
SLIDE 36

Replica

Replication-based GC

  • See: [Nettles-O’Toole ’93], [Cheng-Blelloch ’01]
  • Allows concurrent defragmentation
  • Two spaces: one space for reads; writes “replicated” to both

spaces

Original

Object Copying

Application

Read Write

Friday, June 11, 2010

slide-37
SLIDE 37

Replica

Replication-based GC

  • See: [Nettles-O’Toole ’93], [Cheng-Blelloch ’01]
  • Allows concurrent defragmentation
  • Two spaces: one space for reads; writes “replicated” to both

spaces

  • Problem: Writes not atomic! Loss of coherence!

Original

Object Copying

Application

Read Write

Friday, June 11, 2010

slide-38
SLIDE 38

Replica

Replication-based GC

  • See: [Nettles-O’Toole ’93], [Cheng-Blelloch ’01]
  • Allows concurrent defragmentation
  • Two spaces: one space for reads; writes “replicated” to both

spaces

  • Problem: Writes not atomic! Loss of coherence!

Original

Object Copying

Application

Read Write

Works best for immutable objects.

Friday, June 11, 2010

slide-39
SLIDE 39

Allocate in fragments [Siebert ’99]

  • All objects split into small fragments.
  • Fragment size is typically fixed at 32 bytes.
  • Fragments are linked, application must follow links on
  • bject access.

Friday, June 11, 2010

slide-40
SLIDE 40

Allocate in fragments [Siebert ’99]

  • All objects split into small fragments.
  • Fragment size is typically fixed at 32 bytes.
  • Fragments are linked, application must follow links on
  • bject access.

Plain Object Most objects require only two fragments. Access cost is known statically, does not vary.

Friday, June 11, 2010

slide-41
SLIDE 41

Allocate in fragments [Siebert ’99]

  • All objects split into small fragments.
  • Fragment size is typically fixed at 32 bytes.
  • Fragments are linked, application must follow links on
  • bject access.

Plain Object Most objects require only two fragments. Access cost is known statically, does not vary.

Friday, June 11, 2010

slide-42
SLIDE 42

Allocate in fragments [Siebert ’99]

  • All objects split into small fragments.
  • Fragment size is typically fixed at 32 bytes.
  • Fragments are linked, application must follow links on
  • bject access.

Array Array accesses will see significant slow- down! Access cost is logarithmic.

Friday, June 11, 2010

slide-43
SLIDE 43

Allocate in fragments [Siebert ’99]

  • All objects split into small fragments.
  • Fragment size is typically fixed at 32 bytes.
  • Fragments are linked, application must follow links on
  • bject access.

Array Array accesses will see significant slow- down! Access cost is logarithmic.

Friday, June 11, 2010

slide-44
SLIDE 44

Allocate in fragments [Siebert ’99]

  • All objects split into small fragments.
  • Fragment size is typically fixed at 32 bytes.
  • Fragments are linked, application must follow links on
  • bject access.

Array Array accesses will see significant slow- down!

Bad idea for large arrays.

Access cost is logarithmic.

Friday, June 11, 2010

slide-45
SLIDE 45

Synopsis

  • Replication-copying Collection:
  • great, but only for immutable objects
  • Fragmented Allocation:
  • great, unless you have large arrays

Friday, June 11, 2010

slide-46
SLIDE 46

Synopsis

  • Replication-copying Collection:
  • great, but only for immutable objects
  • Fragmented Allocation:
  • great, unless you have large arrays

Can we combine the two?

Friday, June 11, 2010

slide-47
SLIDE 47

Idea:

combine Fragmented Allocation with Replication-Copying using Arraylets

Friday, June 11, 2010

slide-48
SLIDE 48

A new way of exploiting Arraylets

Friday, June 11, 2010

slide-49
SLIDE 49

Arraylet Spine

A new way of exploiting Arraylets

Friday, June 11, 2010

slide-50
SLIDE 50

Arraylet Spine

A new way of exploiting Arraylets

Fragments have fixed size - no external fragmentation

Friday, June 11, 2010

slide-51
SLIDE 51

Arraylet Spine

A new way of exploiting Arraylets

Fragments have fixed size - no external fragmentation The Arraylet Spine has variable size, which can lead to fragmentation!

Friday, June 11, 2010

slide-52
SLIDE 52

Arraylet Spine

A new way of exploiting Arraylets

But the spine is immutable ... Fragments have fixed size - no external fragmentation

Friday, June 11, 2010

slide-53
SLIDE 53

Arraylet Spine

A new way of exploiting Arraylets

But the spine is immutable ... Fragments have fixed size - no external fragmentation ... and replication is ideal for immutable objects

Friday, June 11, 2010

slide-54
SLIDE 54

Schism = arraylets + replication + fragments

  • Combination:
  • Concurrent mark-sweep GC for fixed-size fragments
  • Replication copying for variable-size arraylet spines
  • No external fragmentation for either fragments or spines
  • Heap access is O(1), wait-free, and coherent.

Friday, June 11, 2010

slide-55
SLIDE 55

Friday, June 11, 2010

slide-56
SLIDE 56

Concurrent Mark-Sweep Heap for Fragments To-space for Array Spines From-space for Array Spines Concurrent Replication Heap for Spines

Friday, June 11, 2010

slide-57
SLIDE 57

Concurrent Mark-Sweep Heap for Fragments To-space for Array Spines From-space for Array Spines Small Object Concurrent Replication Heap for Spines

Friday, June 11, 2010

slide-58
SLIDE 58

Concurrent Mark-Sweep Heap for Fragments To-space for Array Spines From-space for Array Spines Small Object Large Array? Concurrent Replication Heap for Spines

Friday, June 11, 2010

slide-59
SLIDE 59

Concurrent Mark-Sweep Heap for Fragments To-space for Array Spines From-space for Array Spines Small Object Large Array? Concurrent Replication Heap for Spines

Friday, June 11, 2010

slide-60
SLIDE 60

Concurrent Mark-Sweep Heap for Fragments To-space for Array Spines From-space for Array Spines Small Object Large Array? Concurrent Replication Heap for Spines

Friday, June 11, 2010

slide-61
SLIDE 61

Concurrent Mark-Sweep Heap for Fragments To-space for Array Spines From-space for Array Spines Small Object Large Array? Concurrent Replication Heap for Spines

Friday, June 11, 2010

slide-62
SLIDE 62

Friday, June 11, 2010

slide-63
SLIDE 63

related work

  • or -

how to make a complete RTGC

Friday, June 11, 2010

slide-64
SLIDE 64

Cheng & Blelloch ’01

related work

  • or -

how to make a complete RTGC

Friday, June 11, 2010

slide-65
SLIDE 65

Cheng & Blelloch ’01 Siebert ’99

related work

  • or -

how to make a complete RTGC

Friday, June 11, 2010

slide-66
SLIDE 66

Cheng & Blelloch ’01 Siebert ’99 Schism

related work

  • or -

how to make a complete RTGC

Friday, June 11, 2010

slide-67
SLIDE 67

Cheng & Blelloch ’01 Siebert ’99 Schism Henrikkson ’98

related work

  • or -

how to make a complete RTGC

Friday, June 11, 2010

slide-68
SLIDE 68

Cheng & Blelloch ’01 Siebert ’99 Schism Kalibera et al ’09 Henrikkson ’98

related work

  • or -

how to make a complete RTGC

Friday, June 11, 2010

slide-69
SLIDE 69

Cheng & Blelloch ’01 Siebert ’99 Schism Kalibera et al ’09 Blackburn & McKinley ’08 Henrikkson ’98

related work

  • or -

how to make a complete RTGC

Friday, June 11, 2010

slide-70
SLIDE 70

Cheng & Blelloch ’01 Siebert ’99 Schism Kalibera et al ’09

Doligez, Leroy, Gonthier ’93, ’94

Blackburn & McKinley ’08 Henrikkson ’98

related work

  • or -

how to make a complete RTGC

Friday, June 11, 2010

slide-71
SLIDE 71

Cheng & Blelloch ’01 Siebert ’99 Schism Puffitsch & Schoeberl ’08 Kalibera et al ’09

Doligez, Leroy, Gonthier ’93, ’94

Blackburn & McKinley ’08 Henrikkson ’98

related work

  • or -

how to make a complete RTGC

Friday, June 11, 2010

slide-72
SLIDE 72

Cheng & Blelloch ’01 Siebert ’99 Schism Puffitsch & Schoeberl ’08 Kalibera et al ’09

Doligez, Leroy, Gonthier ’93, ’94

Blackburn & McKinley ’08 Henrikkson ’98 Fiji CMR*

* concurrent mark- region

related work

  • or -

how to make a complete RTGC

Friday, June 11, 2010

slide-73
SLIDE 73

Cheng & Blelloch ’01 Siebert ’99 Schism Puffitsch & Schoeberl ’08 Kalibera et al ’09

Doligez, Leroy, Gonthier ’93, ’94

Blackburn & McKinley ’08 Henrikkson ’98 Fiji CMR*

SCHISM/CMR

* concurrent mark- region

related work

  • or -

how to make a complete RTGC

Friday, June 11, 2010

slide-74
SLIDE 74

Cheng & Blelloch ’01 Siebert ’99 Schism Puffitsch & Schoeberl ’08 Kalibera et al ’09

Doligez, Leroy, Gonthier ’93, ’94

Blackburn & McKinley ’08 Henrikkson ’98 Fiji CMR*

SCHISM/CMR

* concurrent mark- region

related work

  • or -

how to make a complete RTGC

  • n-the-fly

concurrent good throughput} time/space bounds

Friday, June 11, 2010

slide-75
SLIDE 75

Tunable throughput-predictability trade-off.

Friday, June 11, 2010

slide-76
SLIDE 76

Tunable throughput-predictability trade-off.

  • Schism A: completely deterministic:
  • arrays allocated fragmented
  • Schism C: optimize throughput:
  • allocate contiguously if possible
  • Schism CW: simulate worst-case execution of Schism C:
  • poison all fast-paths (array accesses, write barriers,

allocations)

Friday, June 11, 2010

slide-77
SLIDE 77

(very short) Summary of Results

  • Goal: as fast as Metronome
  • Goal: fragmentation tolerant like Java RTS
  • Goal: deterministic

Friday, June 11, 2010

slide-78
SLIDE 78

(very short) Summary of Results

  • Goal: as fast as Metronome
  • Goal: fragmentation tolerant like Java RTS
  • Goal: deterministic

Friday, June 11, 2010

slide-79
SLIDE 79

SPECjvm98 throughput summary

0% 10% 20% 30% 40% 50% 60% 70%

Java RTS Metronome Schism

Throughput (100% = HotSpot)

Friday, June 11, 2010

slide-80
SLIDE 80

(very short) Summary of Results

  • Goal: as fast as Metronome
  • Goal: fragmentation tolerant like Java RTS
  • Goal: deterministic

Friday, June 11, 2010

slide-81
SLIDE 81

(very short) Summary of Results

  • Goal: as fast as Metronome
  • Goal: fragmentation tolerant like Java RTS
  • Goal: deterministic

Friday, June 11, 2010

slide-82
SLIDE 82

Fragger Results

Friday, June 11, 2010

slide-83
SLIDE 83

Fragger Results

Friday, June 11, 2010

slide-84
SLIDE 84

Fragger Results

Friday, June 11, 2010

slide-85
SLIDE 85

Fragger Results

  • Amount of free memory successfully allocated under

fragmentation:

  • HotSpot: ~100%
  • Java RTS: ~80%
  • Metronome: ~1%, unless using >10KB objects
  • Schism: ~100% (all objects)

Friday, June 11, 2010

slide-86
SLIDE 86

(very short) Summary of Results

  • Goal: as fast as Metronome
  • Goal: fragmentation tolerant like Java RTS
  • Goal: deterministic

Friday, June 11, 2010

slide-87
SLIDE 87

(very short) Summary of Results

  • Goal: as fast as Metronome
  • Goal: fragmentation tolerant like Java RTS
  • Goal: deterministic

✓ ✓

Friday, June 11, 2010

slide-88
SLIDE 88

Schism predictability: RTEMS* on 40MHz LEON3

Friday, June 11, 2010

slide-89
SLIDE 89

Schism predictability: RTEMS* on 40MHz LEON3

* Real Time Executive for Missile Systems

Friday, June 11, 2010

slide-90
SLIDE 90

Schism predictability: RTEMS* on 40MHz LEON3

The OS/hardware platform used for NASA & ESA space missions.

* Real Time Executive for Missile Systems

Friday, June 11, 2010

slide-91
SLIDE 91

Performance baseline: C code.

Friday, June 11, 2010

slide-92
SLIDE 92

Performance baseline: C code.

Using both C and Java implementations of the CDx real-time air traffic collision detection benchmark [Kalibera et al ’09].

Friday, June 11, 2010

slide-93
SLIDE 93

40 60 80 100 120

Java (CMR, Schism) versus C on CDx real-time benchmark

Milliseconds

C code Java Fiji CMR Java Schism C Java Schism CW Java Schism A

Friday, June 11, 2010

slide-94
SLIDE 94

40 60 80 100 120

Java (CMR, Schism) versus C on CDx real-time benchmark

Milliseconds

C code Java Fiji CMR Java Schism C Java Schism CW Java Schism A

Min

Friday, June 11, 2010

slide-95
SLIDE 95

40 60 80 100 120

Java (CMR, Schism) versus C on CDx real-time benchmark

Milliseconds

C code Java Fiji CMR Java Schism C Java Schism CW Java Schism A

Max Min

Friday, June 11, 2010

slide-96
SLIDE 96

40 60 80 100 120

Java (CMR, Schism) versus C on CDx real-time benchmark

Milliseconds

C code Java Fiji CMR Java Schism C Java Schism CW Java Schism A

Max Min

CDx performance varies between events due to varying number of predicted collisions.

Friday, June 11, 2010

slide-97
SLIDE 97

40 60 80 100 120

Java (CMR, Schism) versus C on CDx real-time benchmark

Milliseconds

C code Java Fiji CMR Java Schism C Java Schism CW Java Schism A

70.5

Friday, June 11, 2010

slide-98
SLIDE 98

40 60 80 100 120

Java (CMR, Schism) versus C on CDx real-time benchmark

Milliseconds

C code Java Fiji CMR Java Schism C Java Schism CW Java Schism A

70.5 96.6

Friday, June 11, 2010

slide-99
SLIDE 99

40 60 80 100 120

Java (CMR, Schism) versus C on CDx real-time benchmark

Milliseconds

C code Java Fiji CMR Java Schism C Java Schism CW Java Schism A

70.5 96.6 97.2

Friday, June 11, 2010

slide-100
SLIDE 100

40 60 80 100 120

Java (CMR, Schism) versus C on CDx real-time benchmark

Milliseconds

C code Java Fiji CMR Java Schism C Java Schism CW Java Schism A

70.5 96.6 97.2 112.5

Schism CW refines the worst-case of Schism C by accounting for GC

Friday, June 11, 2010

slide-101
SLIDE 101

40 60 80 100 120

Java (CMR, Schism) versus C on CDx real-time benchmark

Milliseconds

C code Java Fiji CMR Java Schism C Java Schism CW Java Schism A

70.5 96.6 97.2 98.5 112.5

Schism A is completely deterministic - no further refinement necessary.

Friday, June 11, 2010

slide-102
SLIDE 102

40 60 80 100 120

Java (CMR, Schism) versus C on CDx real-time benchmark

Milliseconds

C code Java Fiji CMR Java Schism C Java Schism CW Java Schism A

70.5 96.6 97.2 98.5 112.5

Java is 40% worse than C but just as deterministic.

Friday, June 11, 2010

slide-103
SLIDE 103

Schism Predictability: SPECjbb2000 on Linux Xeon

Friday, June 11, 2010

slide-104
SLIDE 104

Warehouses

Log[Milliseconds]

1 2 3 4 5 6 7 8 1 10 100 1000

SPECjbb2000 Worst-case Transaction Times

Friday, June 11, 2010

slide-105
SLIDE 105

Warehouses

Log[Milliseconds]

1 2 3 4 5 6 7 8 1 10 100 1000

CMR & Schism

SPECjbb2000 Worst-case Transaction Times

Friday, June 11, 2010

slide-106
SLIDE 106

Warehouses

Log[Milliseconds]

1 2 3 4 5 6 7 8 1 10 100 1000

CMR & Schism

SPECjbb2000 Worst-case Transaction Times

Friday, June 11, 2010

slide-107
SLIDE 107

Warehouses

Log[Milliseconds]

1 2 3 4 5 6 7 8 1 10 100 1000

CMR & Schism Metronome

SPECjbb2000 Worst-case Transaction Times

Friday, June 11, 2010

slide-108
SLIDE 108

Friday, June 11, 2010

slide-109
SLIDE 109
  • Additional experiments in the paper:
  • SPECjvm98 in detail
  • Worst-case-time v. memory for CDx on RTEMS/LEON3
  • MMU for CDx on RTEMS/LEON3
  • Detailed fragmentation numbers with Fragger
  • Array access performance under fragmentation
  • Scalability with SPECjbb2000
  • Analytical proof of space bounds
  • Experimental validation of analytical proof of space bounds

Read the paper for the most awesomely epic RTGC evaluation, ever.

Friday, June 11, 2010

slide-110
SLIDE 110

Conclusion: A good Real-Time GC...

  • executes concurrently with mutator threads
  • guarantees progress for heap accesses
  • wait-free (per-thread progress)
  • minimizes heap access overhead
  • few instructions
  • gives uniformly good throughput
  • is space efficient (minimizes external fragmentation)

Friday, June 11, 2010

slide-111
SLIDE 111

Conclusion: A good Real-Time GC...

  • executes concurrently with mutator threads
  • guarantees progress for heap accesses
  • wait-free (per-thread progress)
  • minimizes heap access overhead
  • few instructions
  • gives uniformly good throughput
  • is space efficient (minimizes external fragmentation)

Friday, June 11, 2010