Compress Objects, Not Cache Lines: An Object-Based Compressed Memory - PowerPoint PPT Presentation

Compressing objects would be hard to do on cache hierarchies  Ideally, we want a memory system that  Moves objects, rather than cache lines  Transparently updates pointers during compression  Therefore, we realize our ideas on Hotpads [Tsai et al., MICRO’18]  A recent object-based memory hierarchy 9

Baseline system: Hotpads overview 10

Baseline system: Hotpads overview L2 L3 L1 Core pad pad pad 10

Baseline system: Hotpads overview  Data array  Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad  Stores variable-sized objects compactly Data Array Objects Free space 10

Baseline system: Hotpads overview  Data array  Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad  Stores variable-sized objects compactly Data Array Obj. A Objects Free space 10

Baseline system: Hotpads overview  Data array  Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad  Stores variable-sized objects compactly Data Array Obj. A Objects Obj. B Free space 10

Baseline system: Hotpads overview  Data array  Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad  Stores variable-sized objects compactly  Can store variable-sized compressed objects compactly too! Data Array Obj. A Objects Obj. B Free space 10

Baseline system: Hotpads overview  Data array  Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad  Stores variable-sized objects compactly  Can store variable-sized compressed objects compactly too! Data Array (word/object) Obj. A Metadata  C-Tags Objects Obj. B  Decoupled tag store Free space  Metadata C-Tags  Pointer? valid? dirty? recently-used? 10

Hotpads moves objects instead of cache lines 11

Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 11

Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 Program code: r0 1 A copied into L1 pad. int v = A.value; A r1 A r2 B r3 11

Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 Program code: r0 1 A copied into L1 pad. int v = A.value; A r1 A r2 B B B copied into L1 pad. Program code: 2 r3 v = A.next.value; 11

Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 Program code: r0 1 A copied into L1 pad. int v = A.value; A r1 A r2 B B B copied into L1 pad. Program code: 2 r3 v = A.next.value; 63 50 48 47 0 Hotpads takes control of the Size Object address (48b) memory layout, hides pointers Fetching size words from the starting address yields the entire object from software, and encodes object information in pointers 11

Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 Program code: r0 1 A copied into L1 pad. int v = A.value; A r1 A r2 B B B copied into L1 pad. Program code: 2 r3 v = A.next.value; 48 47 0 63 50 Hotpads takes control of the Compressed size Compressed object address (48b) memory layout, hides pointers Fetching compressed size words from the starting compressed from software, and encodes address yields the entire compressed object object information in pointers 11

Hotpads updates pointers among objects on evictions 12

Hotpads updates pointers among objects on evictions A (stale) L1 pad is now full, L1 pad is full because of A (modified) 3 B triggering a bulk fetched objects or newly- B C eviction in HW. allocate objects D 12

Hotpads updates pointers among objects on evictions A (stale) L1 pad is now full, L1 pad is full because of A (modified) 3 B triggering a bulk fetched objects or newly- B C eviction in HW. allocate objects D B After an L1 bulk eviction: A Pointers are updated to Free 4 point to the new locations. B space D Copied objects ( A ) are New objects ( D ) are back to old location sequentially allocated 12

Hotpads updates pointers among objects on evictions A (stale) L1 pad is now full, L1 pad is full because of A (modified) 3 B triggering a bulk fetched objects or newly- B C eviction in HW. allocate objects D B After an L1 bulk eviction: A Pointers are updated to Free 4 point to the new locations. B space D Copied objects ( A ) are New objects ( D ) are back to old location sequentially allocated  Bulk eviction amortizes the cost of finding and updating pointers across objects 12

Hotpads updates pointers among objects on evictions A (stale) L1 pad is now full, L1 pad is full because of A (modified) 3 B triggering a bulk fetched objects or newly- B C eviction in HW. allocate objects D B After an L1 bulk eviction: A Pointers are updated to Free 4 point to the new locations. B space D Copied objects ( A ) are New objects ( D ) are back to old location sequentially allocated  Bulk eviction amortizes the cost of finding and updating pointers across objects  Since updating pointers already happens in Hotpads, there is no extra cost to update them to compressed locations! 12

Zippads: Locating objects without translations 13

Zippads: Locating objects without translations  Zippads leverages Hotpads to  Manipulate and compress objects rather than cache lines  Avoid translation by pointing directly to compressed objects during evictions 13

Zippads: Locating objects without translations  Zippads leverages Hotpads to  Manipulate and compress objects rather than cache lines  Avoid translation by pointing directly to compressed objects during evictions Uncompressed Compressed Compress L1 L2 L3 Main Core Pad Pad Pad Memory Decompress 13

Zippads: Locating objects without translations  Zippads leverages Hotpads to  Manipulate and compress objects rather than cache lines  Avoid translation by pointing directly to compressed objects during evictions Neutral to the Uncompressed Compressed algorithm Compress L1 L2 L3 Main Core Pad Pad Pad Memory Decompress Compress both on-chip and off-chip memories 13

Zippads compresses objects when they move 14

Zippads compresses objects when they move  Objects are compressed during bulk object evictions 14

Zippads compresses objects when they move  Objects are compressed during bulk object evictions Case 1: Newly moved objects L3 pad Objects Object (uncompressed) L2 pad Free space Objects start their lifetime uncompressed in private levels 14

Zippads compresses objects when they move  Objects are compressed during bulk object evictions Case 1: Newly moved objects L3 pad Objects Compression Object New object (uncompressed) HW (compressed) L2 pad Free space When objects are evicted into a Objects start their lifetime uncompressed compressed level, they are compressed in in private levels that level and store compactly 14

Zippads compresses objects when they move  Objects are compressed during bulk object evictions Case 1: Newly moved objects L3 pad Objects Compression Object New object (uncompressed) HW (compressed) L2 pad Free space When objects are evicted into a Objects start their lifetime uncompressed compressed level, they are compressed in in private levels that level and store compactly Piggyback the bulk eviction process to find and update all pointers at once, amortizing update costs 14

Zippads compresses objects when they move  Objects are compressed during bulk object evictions 15

Zippads compresses objects when they move  Objects are compressed during bulk object evictions Case 2: Dirty writeback L3 pad Objects Compression Updated object Old object HW (uncompressed) (compressed) L2 pad Objects Free space 15

Zippads compresses objects when they move  Objects are compressed during bulk object evictions Objects Case 2: Dirty writeback Updated object L3 pad (compressed) Objects Unused space Compression Updated object Old object Objects HW (uncompressed) (compressed) Free space L2 pad Objects Free space 15

Zippads compresses objects when they move  Objects are compressed during bulk object evictions Objects Case 2: Dirty writeback Updated object L3 pad (compressed) Objects Unused space Compression Updated object Old object Objects HW (uncompressed) (compressed) Free space L2 pad Objects Objects Free space Forwarding thunk Unused space Objects Updated object (compressed) 15

Zippads compresses objects when they move  Objects are compressed during bulk object evictions Objects Case 2: Dirty writeback Updated object L3 pad (compressed) Objects Unused space Compression Updated object Old object Objects HW (uncompressed) (compressed) Free space L2 pad Objects Objects Free space Forwarding thunk Unused space Periodic compaction reclaims those unused spaces Objects (Bulk eviction in on-chip pads, GC in main memory) Updated object (compressed) 15

Zippads uses pointers to accelerate decompression 16

Zippads uses pointers to accelerate decompression  Every object access starts with a pointer!  Pointers are updated to the compressed locations, so no translation is needed 16

Zippads uses pointers to accelerate decompression  Every object access starts with a pointer!  Pointers are updated to the compressed locations, so no translation is needed  Prior work shows it’s beneficial to use different algorithms for various patterns  Zippads encodes compression metadata in pointers to decompress objects quickly 63 50 48 48-X 0 Compressed size Compressed object address (48-X bits) Compression encoding bits (X bits) 16

Zippads uses pointers to accelerate decompression  Every object access starts with a pointer!  Pointers are updated to the compressed locations, so no translation is needed  Prior work shows it’s beneficial to use different algorithms for various patterns  Zippads encodes compression metadata in pointers to decompress objects quickly 63 50 48 48-X 0 Compressed size Compressed object address (48-X bits) Compression encoding bits (X bits)  Zippads thus knows how to locate and what decompression algorithm to use when accessing compressed objects with pointers 16

COCO: Cross-object-compression algorithm 17

COCO: Cross-object-compression algorithm  COCO exploits similarity across objects with shared base objects  A collection of representative objects 17

COCO: Cross-object-compression algorithm  COCO exploits similarity across objects with shared base objects  A collection of representative objects Base object Uncompressed Compression object HW 17

COCO: Cross-object-compression algorithm  COCO exploits similarity across objects with shared base objects  A collection of representative objects Base object Compressed object Pointer to the base object Uncompressed Compression Bytes that are object HW different 17

COCO: Cross-object-compression algorithm 18

COCO: Cross-object-compression algorithm  COCO requires accessing base objects for every compression/decompression 18

COCO: Cross-object-compression algorithm  COCO requires accessing base objects for every compression/decompression  Caching base objects avoids extra latency and bandwidth to fetch them  A small (8KB) base object cache works well  Few types account for most accesses 18

See paper for additional features and details  Compressing large objects with subobjects and allocate-on-access  COCO compression/decompression circuit RTL implementation details  Details on integrating Zippads and COCO  Discussion on using COCO with conventional memory hierarchies 19

Evaluation 20

Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM 20

Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM  We compare 4 schemes 20

Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM  We compare 4 schemes  Uncomp: Conventional 3-level cache hierarchy with no compression 20

Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM  We compare 4 schemes  Uncomp: Conventional 3-level cache hierarchy with no compression  CMH: Compressed memory hierarchy  LLC: VSC [Alameldeen and Wood, ISCA’04]  Main memory: LCP [Pekhimenko et al., MICRO’13]  Algorithm: HyComp-style hybrid algorithm [Arelakis et al., MICRO’15]  BDI [Pekhimenko et al., PACT’12] + FPC [Alameldeen and Wood, ISCA’04] 20

Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM  We compare 4 schemes  Uncomp: Conventional 3-level cache hierarchy with no compression  CMH: Compressed memory hierarchy  LLC: VSC [Alameldeen and Wood, ISCA’04]  Main memory: LCP [Pekhimenko et al., MICRO’13]  Algorithm: HyComp-style hybrid algorithm [Arelakis et al., MICRO’15]  BDI [Pekhimenko et al., PACT’12] + FPC [Alameldeen and Wood, ISCA’04]  Hotpads: The baseline system we build on 20

Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM  We compare 4 schemes  Uncomp: Conventional 3-level cache hierarchy with no compression  CMH: Compressed memory hierarchy  LLC: VSC [Alameldeen and Wood, ISCA’04]  Main memory: LCP [Pekhimenko et al., MICRO’13]  Algorithm: HyComp-style hybrid algorithm [Arelakis et al., MICRO’15]  BDI [Pekhimenko et al., PACT’12] + FPC [Alameldeen and Wood, ISCA’04]  Hotpads: The baseline system we build on  Zippads: With and without COCO 20

Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM  We compare 4 schemes  Uncomp: Conventional 3-level cache hierarchy with no compression  CMH: Compressed memory hierarchy  LLC: VSC [Alameldeen and Wood, ISCA’04]  Main memory: LCP [Pekhimenko et al., MICRO’13]  Algorithm: HyComp-style hybrid algorithm [Arelakis et al., MICRO’15]  BDI [Pekhimenko et al., PACT’12] + FPC [Alameldeen and Wood, ISCA’04]  Hotpads: The baseline system we build on  Zippads: With and without COCO  Workloads: 8 Java apps with large memory footprint from different domains 20

Zippads improves compression ratio 21

Zippads improves compression ratio Same algo as CMH 21

Zippads improves compression ratio Same algo as CMH CMH algo + COCO 21

Zippads improves compression ratio Same algo as CMH CMH algo + COCO Only 24% better than Uncomp. 21

Zippads improves compression ratio Same algo as CMH CMH algo + COCO 70% better Only 24% better than Uncomp. 21

Zippads improves compression ratio Same algo as CMH CMH algo + COCO 2X 70% better better Only 24% better than Uncomp. 21

Zippads improves compression ratio Same algo as CMH CMH algo + COCO 2X 70% better better Only 24% better than Uncomp. 1. Both Zippads and CMH work well in array-heavy apps 21

Zippads improves compression ratio Same algo as CMH CMH algo + COCO 2X 70% better better Only 24% better than Uncomp. 1. Both Zippads and CMH work 2. Zippads works much better than well in array-heavy apps CMH in object-heavy apps 21

Zippads reduces memory traffic and improves performance 22

Zippads reduces memory traffic and improves performance Lower is better 22

Zippads reduces memory traffic and improves performance 1. CMH reduces traffic by 15% with data compression Lower is better 22

Zippads reduces memory traffic and improves performance 1. CMH reduces traffic by 15% with data compression 2. Hotpads reduces traffic by 66% with object-based data movement Lower is better 22

Zippads reduces memory traffic and improves performance 1. CMH reduces traffic by 15% with data compression 2. Hotpads reduces traffic by 66% with object-based data movement Lower is better 3. Zippads combines the benefits of both, reducing traffic by 2X ( 70% less traffic than CMH) 22

Compress Objects, Not Cache Lines: An Object-Based Compressed Memory - PowerPoint PPT Presentation

Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy Po-An Tsai and Daniel Sanchez Prior memory compression techniques are limited to compressing cache lines 2 Prior memory compression techniques are limited to

1 2 Compress a massive object to a small sketch 2 Compress a massive object to a small

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Mutable Values Announcements Objects (Demo) Objects 4 Objects Objects represent

61A Lecture 12 Announcements Objects (Demo) Objects 4 Objects Objects represent

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

Caesar Cipher If he had anything confidential to say, he wrote it in cipher, that is, by so

Measuring Methane Emissions from Oil and Natural Gas Well Pads in the Barnett Shale Using the

Quantum Authentication and Encryption with Key Recycling Or: How to Re-use a One-Time Pad Even if

Count Review Joe Francis Jan Vink Program on Applied Demographics Web:

Simulation of the ILD TPC with pixel readout Kees Ligtenberg LCTPC Collaboration meeting January

Digital Hadron Calorimeter with ith Resistive Plate Chambers Resistive Plate Chambers Jos

Intro to Deep Learning Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

CS 1501 www.cs.pitt.edu/~nlf4/cs1501/ An Introduction to Cryptography Introduction to crypto