A True Hardware Read Barrier Matthias Meyer Institute of - PowerPoint PPT Presentation

INSTITUT FÜR INSTITUT FÜR NACHRICHTENVERMITTLUNG KOMMUNIKATIONSNETZE Universität Stuttgart Universität Stuttgart UND DATENVERARBEITUNG UND RECHNERSYSTEME Prof. Dr.-Ing. Dr. h. c. mult. P. J. Kühn Prof. Dr.-Ing. Dr. h. c. mult. P. J. Kühn A True Hardware Read Barrier Matthias Meyer Institute of Communication Networks and Computer Engineering University of Stuttgart, Germany matthias.meyer@ikr.uni-stuttgart.de International Symposium on Memory Management June 10–11, 2006 Ottawa, Canada

Outline A True Hardware Read Barrier ❐ Real-time garbage collection: The synchronization problem ❐ A hardware-supported approach ✗ Novel processor architecture ✗ Garbage collection coprocessor ✗ Prototype ❐ The read barrier ✗ Effect on mutator progress ✗ A closer look at the read barrier fault handler ✗ Novel hardware read barrier design ❐ Conclusions and further work Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

Real-Time Garbage Collection The synchronization problem Root set Application Garbage collector (“Mutator”) (GC) Heap memory ❐ Mutator and GC modify graph of objects ➠ read or write barriers ➠ mechanisms for mutual exclusion ❐ Mutator and GC access same object ➠ or atomic processing of objects ❐ Critical regions (root set processing) ➠ unbounded pauses ➠ ➠ high synchronization overhead no hard real-time capabilities Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A Novel Processor Architecture (1) Basic idea ❐ Hide garbage collection at the assembly language level ❐ Efficiently realize garbage collection and synchronization in hardware Precondition ❐ Knowledge of pointers and objects in hardware Novel approach ❐ Strictly separate pointers from non-pointer data ✗ in the register file ✗ in the instruction set ✗ in memory Object Structure Attributes Pointer Area Data Area π δ 0 1 2 π –1 0 1 2 δ –1 Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A Novel Processor Architecture (2) Extensions to a classical RISC pipeline ❐ Separate data and Instruct. Register Data pointer registers ALU Cache Set Cache ❐ Extend pointer registers by attributes ❐ Add PGU for operations AGU π π that generate pointers δ δ (allocate, copy pointer) Attribute ❐ Add attribute stage PGU Cache for efficient attribute access Fetch Decode Execute Memory Attribute Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A Novel Processor Architecture (3) Support for concurrent compaction Fromspace Tospace forwarding pointer scan δ π backlink ❐ Extend pointer register Instruct. Register Data set by backlink entry Cache Set Cache ALU ❐ Extend attribute cache by backlink entry ❐ AGU dynamically uses AGU π π δ δ tospace pointer or backlink for address Attribute PGU generation Cache Fetch Decode Execute Memory Attribute Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A Novel Processor Architecture (4) The read barrier ❐ Two comparators check loaded pointers (hardware read barrier) ❐ Read barrier will trigger interrupt if loaded pointer refers to fromspace ❐ Interrupt handled by a dedicated garbage collection coprocessor Instruct. Register Data Read- Cache Set Cache Barrier GC ALU Coprocessor AGU π π δ δ Attribute PGU Cache Fetch Decode Execute Memory Attribute Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

Garbage Collection Coprocessor Features ✗ performs garbage collection concurrently with application processing ✗ low cost device, specialized for garbage collection Integration ✗ tightly coupled to main processor Main GC ✗ realized on same device Processor Coprocessor ✗ separate ports to memory controller Caches Memory interface ✗ no temporal locality: no cache! ✗ spatial locality: burst registers! Memory Controller Algorithm ✗ based on Baker’s algorithm ✗ directly implemented in microcode Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

Prototype Serial Standard SDRAM modules Ethernet PS/2 DVI Parallel Main Processor with on-chip GC Coprocessor Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

Prototype Hardware ❐ Main processor: 3-way multiple issue, “in order” ❐ GC coprocessor: 256 x 80 bit microcode memory ❐ Synchronously operated at 25 MHz Software ❐ Static Java compiler (bytecode to machine code) ❐ Subset of the Java class libraries Features ❐ Low-cost fine-grained synchronization ✗ independent of compiler and runtime system ✗ no code size overhead, little runtime overhead ❐ First known system that limits any GC-related pause to max. 500 clock cycles Question How are the pauses distributed over time? Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

Read barrier effect on mutator progress Experimental results Percentage of pause cycles (in intervals of 500 clock cycles, benchmark “database”) 100 80 60 40 20 0 0s 5s 10s 15s 100 Minimum mutator utilization 80 1 ms intervals 7.2% 60 40 5 ms intervals 8.3% 5ms 20 25 ms intervals 11.4% 0 3.04s 3.08s 3.12s 3.16s 3.20s Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A closer look at the read barrier fault handler Trigger: Processor reads fromspace pointer Fromspace Tospace π δ free Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A closer look at the read barrier fault handler Step 1: Coprocessor reads faulting pointer Fromspace Tospace π δ free Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A closer look at the read barrier fault handler Step 2: Coprocessor reads object attributes Fromspace Tospace π δ free Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A closer look at the read barrier fault handler Step 3: Coprocessor advances free Fromspace Tospace π δ free + 8 + π + δ = free new Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A closer look at the read barrier fault handler Step 4: Coprocessor overwrites fromspace attributes Fromspace Tospace forwarding pointer π free new Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A closer look at the read barrier fault handler Step 5: Coprocessor initializes tospace attributes Fromspace Tospace π δ backlink free new Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A closer look at the read barrier fault handler Step 6: Coprocessor updates fromspace pointer Fromspace Tospace π δ free new Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A novel hardware read barrier design Analysis ❐ Read barrier fault handling expensive despite hardware support ❐ Necessary to sacrifice the tospace invariant to avoid clustering? Insights 1. Read barrier in hardware ... but read barrier fault handling still in software 2. Processors expensively communicate via main memory ... because faulting pointer local to main processor, not to garbage collector Novel idea Live with the clustering, save the tospace invariant 1. Increase efficiency of the handler ➠ Realize fault handling completely in hardware! 2. Resolve the locality issue ➠ Move fault handling to main processor! Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A novel hardware read barrier design Trigger: Processor reads fromspace pointer Fromspace Tospace π δ free Read- ALU Barrier Instruct. Register Data AGU Cache Set Cache Attribute Cache PGU Fetch Decode Execute Memory Attribute Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A novel hardware read barrier design Step 1: Advance free, write fromspace attributes, update fromspace pointer Fromspace Tospace π free + 8 + π + δ = free new Read- ALU Barrier Instruct. Register Data AGU Cache Set Cache Attribute Cache PGU Fetch Decode Execute Memory Attribute Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A novel hardware read barrier design Step 2: Initialize tospace attributes Fromspace Tospace π δ free new Read- ALU Barrier Instruct. Register Data AGU Cache Set Cache Attribute Cache PGU Fetch Decode Execute Memory Attribute Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A novel hardware read barrier design Experimental results Percentage of stall cycles within intervals of 500 clock cycles (benchmark “database”) 100 80 60 40 20 0 0s 5s 10s 15s 100 Minimum mutator utilization 80 1 ms intervals 56.8% (7.2%) 60 40 5 ms intervals 58.1% (8.3%) 20 25 ms intervals 62.1% (11.4%) 0 3.04s 3.08s 3.12s 3.16s 3.20s Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

A True Hardware Read Barrier Matthias Meyer Institute of - PowerPoint PPT Presentation

INSTITUT FR INSTITUT FR NACHRICHTENVERMITTLUNG KOMMUNIKATIONSNETZE Universitt Stuttgart Universitt Stuttgart UND DATENVERARBEITUNG UND RECHNERSYSTEME Prof. Dr.-Ing. Dr. h. c. mult. P. J. Khn Prof. Dr.-Ing. Dr. h. c. mult. P. J.

I - -75 Median Cable Barrier 75 Median Cable Barrier 75 Median Cable Barrier I 75 Median Cable

BAKER BRICK BAKER BRICK BARRIER BARRIER BAKER BRICK BARRIER The Easy Solution to Stained

Hardware Observability Framework Hardware Observability Framework Hardware Observability

# of true positives true positive rate = # of known positives (Proportion of actual positives

# of true positives true positive rate = # of known positives (Proportion of actual positives

Air Barrier and Insulation Installation Component Guide COMPONENT AIR BARRIER CRITERIA

Noise Barrier Meeting March 12, 2019 WHY ARE WE HERE TONIGHT? Noise Barrier Final Design Noise

Overview What is an Asymmetric Barrier? Median barrier with unbalanced roadway elevations

developing a MPA network in the Great Barrier Reef Jon Day Great Barrier Reef Marine Park

REBRANDING PREPARED BY HUMMINGBIRD CREATIVE GROUP TRUE CARE/CONFIDENTAL | BIG IDEA 2019

Personal Statements TRUE FALSE TRUE FALSE TRUE There is a 4,000 character

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

TRUE IT A LI A N T A STE PROJECT WH A T IS THE TRUE IT A LI A N T A STE PROJECT ? The True

between the mountains and the sea True Majesty. True North. True Scotland CruiseAberdeenshire

True or False? Take a quick vote at your table as to whether the following statements are true

Test Coverage and Post-Verification Defects: A Multiple Case Study A. Mockus - audris@avaya.com

Distributed Transactions Definition a transaction in which more than one server is

Decision to Incision Timing: No financial disclosures related to this talk Is the 30-minute

(Pre-)Lattices Carl Pollard Department of Linguistics Ohio State University November 17, 2011

Graph Databases for Polyglot Persistence with NotaQL 2017-03-08 Johannes Schildgen

Category-specific video summarization Speaker: Danila Potapov Joint work with: Matthijs Douze

PART I Galaxy Formation Models Darren Croton Centre for Astrophysics and Supercomputing

Windows Azure, Java and NoSQL Mario Szpuszta Jrgen Mayrburl T echnical Evangelist