schism fragmentation tolerant real time garbage collection
play

Schism: Fragmentation-Tolerant Real-Time Garbage Collection Fil - PowerPoint PPT Presentation

Schism: Fragmentation-Tolerant Real-Time Garbage Collection Fil Pizlo Luke Ziarek Peta Maj * Tony Hosking * Ethan Blanton Jan Vitek * * Friday, June 11, 2010 Why another Real Time Garbage Collector? Friday, June 11, 2010


  1. Replication-based GC • See: [Nettles-O’Toole ’93], [Cheng-Blelloch ’01] • Allows concurrent defragmentation • Two spaces: one space for reads; writes “replicated” to both spaces Works best for immutable objects. • Problem: Writes not atomic ! Loss of coherence! Application Read Write Object Original Replica Copying Friday, June 11, 2010

  2. Allocate in fragments [Siebert ’99] •All objects split into small fragments. •Fragment size is typically fixed at 32 bytes. •Fragments are linked, application must follow links on object access. Friday, June 11, 2010

  3. Allocate in fragments [Siebert ’99] •All objects split into small fragments. •Fragment size is typically fixed at 32 bytes. •Fragments are linked, application must follow links on object access. Plain Object Access cost is known statically, does not vary. Most objects require only two fragments. Friday, June 11, 2010

  4. Allocate in fragments [Siebert ’99] •All objects split into small fragments. •Fragment size is typically fixed at 32 bytes. •Fragments are linked, application must follow links on object access. Plain Object Access cost is known statically, does not vary. Most objects require only two fragments. Friday, June 11, 2010

  5. Allocate in fragments [Siebert ’99] •All objects split into small fragments. •Fragment size is typically fixed at 32 bytes. •Fragments are linked, application must follow links on object access. Array Access cost is logarithmic. Array accesses will see significant slow- down! Friday, June 11, 2010

  6. Allocate in fragments [Siebert ’99] •All objects split into small fragments. •Fragment size is typically fixed at 32 bytes. •Fragments are linked, application must follow links on object access. Array Access cost is logarithmic. Array accesses will see significant slow- down! Friday, June 11, 2010

  7. Allocate in fragments [Siebert ’99] •All objects split into small fragments. •Fragment size is typically fixed at 32 bytes. •Fragments are linked, application must follow links on object access. Bad idea for large arrays. Array Access cost is logarithmic. Array accesses will see significant slow- down! Friday, June 11, 2010

  8. Synopsis •Replication-copying Collection: • great, but only for immutable objects •Fragmented Allocation: • great, unless you have large arrays Friday, June 11, 2010

  9. Synopsis •Replication-copying Collection: • great, but only for immutable objects •Fragmented Allocation: • great, unless you have large arrays Can we combine the two? Friday, June 11, 2010

  10. Idea : combine Fragmented Allocation with Replication-Copying using Arraylets Friday, June 11, 2010

  11. A new way of exploiting Arraylets Friday, June 11, 2010

  12. A new way of exploiting Arraylets Arraylet Spine Friday, June 11, 2010

  13. A new way of exploiting Arraylets Arraylet Spine Fragments have fixed size - no external fragmentation Friday, June 11, 2010

  14. A new way of exploiting Arraylets The Arraylet Spine has variable size, which can lead to fragmentation! Arraylet Spine Fragments have fixed size - no external fragmentation Friday, June 11, 2010

  15. A new way of exploiting Arraylets But the spine is immutable ... Arraylet Spine Fragments have fixed size - no external fragmentation Friday, June 11, 2010

  16. A new way of exploiting Arraylets But the spine is immutable ... ... and replication is ideal for immutable objects Arraylet Spine Fragments have fixed size - no external fragmentation Friday, June 11, 2010

  17. Schism = arraylets + replication + fragments •Combination: •Concurrent mark-sweep GC for fixed-size fragments • Replication copying for variable-size arraylet spines •No external fragmentation for either fragments or spines •Heap access is O(1), wait-free, and coherent. Friday, June 11, 2010

  18. Friday, June 11, 2010

  19. Concurrent Replication Heap for Spines To-space for Array From-space for Array Spines Spines Concurrent Mark-Sweep Heap for Fragments Friday, June 11, 2010

  20. Concurrent Replication Heap for Spines To-space for Array From-space for Array Spines Spines Small Object Concurrent Mark-Sweep Heap for Fragments Friday, June 11, 2010

  21. Concurrent Replication Heap for Spines To-space for Array From-space for Array Spines Spines Large Array? Small Object Concurrent Mark-Sweep Heap for Fragments Friday, June 11, 2010

  22. Concurrent Replication Heap for Spines To-space for Array From-space for Array Spines Spines Large Array? Small Object Concurrent Mark-Sweep Heap for Fragments Friday, June 11, 2010

  23. Concurrent Replication Heap for Spines To-space for Array From-space for Array Spines Spines Large Array? Small Object Concurrent Mark-Sweep Heap for Fragments Friday, June 11, 2010

  24. Concurrent Replication Heap for Spines From-space for Array To-space for Array Spines Spines Large Array? Small Object Concurrent Mark-Sweep Heap for Fragments Friday, June 11, 2010

  25. Friday, June 11, 2010

  26. related work - or - how to make a complete RTGC Friday, June 11, 2010

  27. related work Cheng & Blelloch ’01 - or - how to make a complete RTGC Friday, June 11, 2010

  28. related work Cheng & Blelloch ’01 - or - how to make a complete RTGC Siebert ’99 Friday, June 11, 2010

  29. related work Cheng & Blelloch ’01 - or - how to make a complete RTGC Siebert ’99 Schism Friday, June 11, 2010

  30. related work Cheng & Blelloch ’01 - or - how to make a complete RTGC Siebert ’99 Schism Henrikkson ’98 Friday, June 11, 2010

  31. related work Cheng & Blelloch ’01 - or - how to make a complete RTGC Siebert ’99 Schism Henrikkson ’98 Kalibera et al ’09 Friday, June 11, 2010

  32. related work Cheng & Blelloch ’01 - or - how to make a complete RTGC Siebert ’99 Schism Blackburn & McKinley ’08 Henrikkson ’98 Kalibera et al ’09 Friday, June 11, 2010

  33. related work Cheng & Blelloch ’01 - or - how to make a complete RTGC Siebert ’99 Schism Blackburn & McKinley ’08 Doligez, Leroy, Gonthier ’93, ’94 Henrikkson ’98 Kalibera et al ’09 Friday, June 11, 2010

  34. related work Cheng & Blelloch ’01 - or - how to make a complete RTGC Siebert ’99 Schism Blackburn & McKinley ’08 Doligez, Leroy, Gonthier ’93, ’94 Puffitsch & Schoeberl ’08 Henrikkson ’98 Kalibera et al ’09 Friday, June 11, 2010

  35. related work Cheng & Blelloch ’01 - or - how to make a complete RTGC Siebert ’99 Schism Blackburn & McKinley ’08 Doligez, Leroy, Fiji CMR* Gonthier ’93, ’94 Puffitsch & Schoeberl ’08 Henrikkson ’98 Kalibera et al * concurrent mark- ’09 region Friday, June 11, 2010

  36. related work Cheng & Blelloch ’01 - or - how to make a complete RTGC Siebert ’99 Schism Blackburn & McKinley ’08 S CHISM / CMR Doligez, Leroy, Fiji CMR* Gonthier ’93, ’94 Puffitsch & Schoeberl ’08 Henrikkson ’98 Kalibera et al * concurrent mark- ’09 region Friday, June 11, 2010

  37. related work Cheng & Blelloch ’01 - or - how to make a complete RTGC Siebert ’99 Schism good throughput } on-the-fly Blackburn & concurrent McKinley ’08 S CHISM / CMR time/space bounds Doligez, Leroy, Fiji CMR* Gonthier ’93, ’94 Puffitsch & Schoeberl ’08 Henrikkson ’98 Kalibera et al * concurrent mark- ’09 region Friday, June 11, 2010

  38. Tunable throughput-predictability trade-off. Friday, June 11, 2010

  39. Tunable throughput-predictability trade-off. • Schism A : completely deterministic : •arrays allocated fragmented • Schism C : optimize throughput: •allocate contiguously if possible • Schism CW : simulate worst-case execution of Schism C: •poison all fast-paths (array accesses, write barriers, allocations) Friday, June 11, 2010

  40. (very short) Summary of Results •Goal: as fast as Metronome •Goal: fragmentation tolerant like Java RTS •Goal: deterministic Friday, June 11, 2010

  41. (very short) Summary of Results •Goal: as fast as Metronome •Goal: fragmentation tolerant like Java RTS •Goal: deterministic Friday, June 11, 2010

  42. SPECjvm98 throughput summary 70% 60% (100% = HotSpot) 50% Throughput 40% 30% 20% 10% 0% Java RTS Metronome Schism Friday, June 11, 2010

  43. (very short) Summary of Results •Goal: as fast as Metronome •Goal: fragmentation tolerant like Java RTS •Goal: deterministic Friday, June 11, 2010

  44. (very short) Summary of Results ✓ •Goal: as fast as Metronome •Goal: fragmentation tolerant like Java RTS •Goal: deterministic Friday, June 11, 2010

  45. Fragger Results Friday, June 11, 2010

  46. Fragger Results Friday, June 11, 2010

  47. Fragger Results Friday, June 11, 2010

  48. Fragger Results •Amount of free memory successfully allocated under fragmentation: • HotSpot : ~ 100% • Java RTS : ~ 80% • Metronome : ~ 1% , unless using >10KB objects • Schism : ~ 100% (all objects) Friday, June 11, 2010

  49. (very short) Summary of Results ✓ •Goal: as fast as Metronome •Goal: fragmentation tolerant like Java RTS •Goal: deterministic Friday, June 11, 2010

  50. (very short) Summary of Results ✓ •Goal: as fast as Metronome ✓ •Goal: fragmentation tolerant like Java RTS •Goal: deterministic Friday, June 11, 2010

  51. Schism predictability: RTEMS* on 40MHz LEON3 Friday, June 11, 2010

  52. Schism predictability: RTEMS* on 40MHz LEON3 * Real Time Executive for Missile Systems Friday, June 11, 2010

  53. Schism predictability: RTEMS* on 40MHz LEON3 The OS/hardware platform used for NASA & ESA space missions. * Real Time Executive for Missile Systems Friday, June 11, 2010

  54. Performance baseline: C code. Friday, June 11, 2010

  55. Performance baseline: C code. Using both C and Java implementations of the CDx real-time air traffic collision detection benchmark [Kalibera et al ’09]. Friday, June 11, 2010

  56. Java (CMR, Schism) versus C on CDx real-time benchmark 120 100 Milliseconds 80 60 40 Java Java Java Java C code Fiji CMR Schism C Schism CW Schism A Friday, June 11, 2010

  57. Java (CMR, Schism) versus C on CDx real-time benchmark 120 100 Milliseconds 80 60 Min 40 Java Java Java Java C code Fiji CMR Schism C Schism CW Schism A Friday, June 11, 2010

  58. Java (CMR, Schism) versus C on CDx real-time benchmark 120 100 Milliseconds Max 80 60 Min 40 Java Java Java Java C code Fiji CMR Schism C Schism CW Schism A Friday, June 11, 2010

  59. Java (CMR, Schism) versus C on CDx real-time benchmark 120 100 Milliseconds Max 80 60 CDx performance varies between Min events due to varying number of 40 predicted collisions. Java Java Java Java C code Fiji CMR Schism C Schism CW Schism A Friday, June 11, 2010

  60. Java (CMR, Schism) versus C on CDx real-time benchmark 120 100 Milliseconds 80 70.5 60 40 Java Java Java Java C code Fiji CMR Schism C Schism CW Schism A Friday, June 11, 2010

  61. Java (CMR, Schism) versus C on CDx real-time benchmark 120 96.6 100 Milliseconds 80 70.5 60 40 Java Java Java Java C code Fiji CMR Schism C Schism CW Schism A Friday, June 11, 2010

  62. Java (CMR, Schism) versus C on CDx real-time benchmark 120 97.2 96.6 100 Milliseconds 80 70.5 60 40 Java Java Java Java C code Fiji CMR Schism C Schism CW Schism A Friday, June 11, 2010

  63. Java (CMR, Schism) versus C on CDx real-time benchmark 112.5 120 97.2 96.6 100 Milliseconds 80 70.5 60 Schism CW refines the worst-case of 40 Schism C by accounting for GC Java Java Java Java C code Fiji CMR Schism C Schism CW Schism A Friday, June 11, 2010

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend