efficient hardware based undo redo logging for persistent
play

Efficient Hardware-based Undo+Redo Logging for Persistent Memory - PowerPoint PPT Presentation

Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems Matheus Ogleari Prof. Ethan Miller Prof. Jishen Zhao 1 Overview Background/Motivation Hardware-Driven Logging Design Evaluation 2 Background


  1. Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems Matheus Ogleari Prof. Ethan Miller Prof. Jishen Zhao 1

  2. Overview ● Background/Motivation ● Hardware-Driven Logging Design ● Evaluation 2

  3. Background ● Persistent Memory - What is it? ○ A “hybridization” between storage and memory ○ Combines the persistence in data storage of disks with the byte-addressability and load/store interface of memories ○ Avoids paging data blocks from/to a storage device or context switching while servicing page faults ○ Utilizes NVM technology (NonVolatile Memory) 3

  4. Motivation ● “There is no such thing as a free lunch” ○ There is overhead and complexity costs associated with implementing persistent memory ○ There is “a large performance gap between a system with a persistent memory and a “native system” (i.e., with no persistence support)” [Zhao, Micro’13] Image from Zhao Micro‘13 Paper 4

  5. Our Work ● What are we doing? ○ Hardware-Driven Logging for Persistent Memory Systems ○ Shifting the balance between hardware and software while improving performance with persistent memory applications 5

  6. Baseline ● Baseline Assumptions ■ System uses 100% NVRAM OR hybrid NVRAM/DRAM ■ In a hybrid system, we can distinguish between using DRAM and NVRAM ■ Cache is SRAM Memory Cache Memory Cache (DRAM) Processor Processor (SRAM) (NVRAM) (SRAM) (NVRAM) Chip Chip Off-Chip Off-Chip 6

  7. Baseline ● Baseline Assumptions ■ Use logging for memory persistence ■ Log is finite size, cannot grow forever, is uncacheable ■ Log is a circular buffer with head and tail pointers, wraps around head Circular Log Buffer tail 7

  8. Baseline ● Software-Driven Logging 8

  9. Our Work ● What we want: 9

  10. Hardware-Driven Logging ● How does it work? ● Two primary mechanisms: ○ Hardware Redo+Undo Logging (WAL) ○ Cache forced write-back (FWB) 10

  11. Hardware-Driven Logging ● Design ○ Hardware Redo+Undo Logging (WAL) ■ All writes within persistent memory blocks automatically trigger a write to log. ■ These logs accumulate in an on-chip log buffer <addr, old_val, new_val, txid> ■ Once transactions complete (or buffer full), write out log entries to NVRAM 11

  12. Hardware-Driven Logging ● Design ○ Hardware Redo+Undo Logging (WAL) ■ For a log entry <addr, old_val, new_val, txid>, addr and new_val are given in the write request, and old_val is grabbed from write-allocate cache line 12

  13. Hardware-Driven Logging ● Design ○ Forced Write-Back (FWB) ■ New cache write-back scheme that enhances the standard ■ Makes it so that no cache line remains “too long” in the cache and are written-back if an FWB is triggered by the mechanism ■ Checks cache lines and updates state periodically (e.g., once per cycle). 13

  14. Hardware-Driven Logging ● Design ○ Forced Write-Back (FWB) ■ Provides guaranteed persistence due to smarter cache line write back, unlike software approach ■ Allows for writes to coalesce in cache lines before writing back to memory ■ Is more efficient than always writing-back after persistent updates 14

  15. Trade-Offs ● Hardware-Driven Logging for Persistent Memory Systems ○ Primary idea: Let hardware take care of it, but there are trade-offs PROS: CONS: +Provides redo+undo logging at low - More complex hardware cost - Higher chip area +Removes burden from the programmer - More on-chip power consumption +More efficient utilization of underlying hardware +Guarantees persistence +Improves performance 15

  16. Evaluation ● Experiments ○ Microbenchmarks (hash, rbtree, sps, ssca2, btree) w/ real workloads (WHISPER) ● Results ○ On average, performance (measured by IPC) increases by 1.42x over the baseline without CLWB and a 1.51x improvement over the baseline design with CLWB. ○ On average, dynamic memory energy consumption reduces by 1.53x over the baseline without CLWB and a 1.72x improvement over the baseline design with CLWB. ○ On average, operational throughput increases by 1.45x over the baseline without CLWB and a 1.60x improvement over the baseline design with CLWB. ○ On average, memory write traffic reduces by 2.36x over the baseline without CLWB and a 3.12x improvement over the baseline design with CLWB. 16

  17. Hardware Cost? ● Hardware Overhead ○ Table below shows major new hardware components ○ Relatively low overhead compared to existing cache state ○ Other logic also added, but is small- to medium-sized logic components like Decoders and Muxes Mechanism Logic Type Size Transaction ID register flip-flops 1 Byte Log head pointer register flip-flops 8 Bytes Log tail pointer register flip-flops 8 Bytes Log buffer SRAM 964 Bytes FWB tag bits SRAM 1040 Bytes 17

  18. Impact ● Why does this matter? ○ Provides redo+undo logging at low cost, with benefits of both ○ Improves persistent memory performance, closing the gap with non-persistent systems ○ Addresses common issues that arise with current persistent memory software-based models ○ Removes the need for persistent memory-related instructions (mfence, clwb, clflush) ○ Provides new avenue of research, with hardware-based solutions becoming more appealing 18

  19. Thank You! 19

  20. Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems ● What we want: head Circular Log Buffer tail 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend