Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems
Matheus Ogleari
- Prof. Ethan Miller
- Prof. Jishen Zhao
1
Efficient Hardware-based Undo+Redo Logging for Persistent Memory - - PowerPoint PPT Presentation
Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems Matheus Ogleari Prof. Ethan Miller Prof. Jishen Zhao 1 Overview Background/Motivation Hardware-Driven Logging Design Evaluation 2 Background
Matheus Ogleari
1
2
3
○ There is overhead and complexity costs associated with implementing persistent memory ○ There is “a large performance gap between a system with a persistent memory and a “native system” (i.e., with no persistence support)” [Zhao, Micro’13]
Image from Zhao Micro‘13 Paper
4
5
■ System uses 100% NVRAM OR hybrid NVRAM/DRAM ■ In a hybrid system, we can distinguish between using DRAM and NVRAM ■ Cache is SRAM Processor Cache (SRAM) Memory (NVRAM) Chip Off-Chip Processor Cache (SRAM) Memory (DRAM) (NVRAM) Chip Off-Chip
6
■ Use logging for memory persistence ■ Log is finite size, cannot grow forever, is uncacheable ■ Log is a circular buffer with head and tail pointers, wraps around
Circular Log Buffer
head tail
7
8
9
○ Hardware Redo+Undo Logging (WAL) ○ Cache forced write-back (FWB)
10
Hardware Redo+Undo Logging (WAL) ■ All writes within persistent memory blocks automatically trigger a write to log. ■ These logs accumulate in an on-chip log buffer <addr, old_val, new_val, txid> ■ Once transactions complete (or buffer full), write out log entries to NVRAM
11
Hardware Redo+Undo Logging (WAL) ■ For a log entry <addr, old_val, new_val, txid>, addr and new_val are given in the write request, and old_val is grabbed from write-allocate cache line
12
Forced Write-Back (FWB) ■ New cache write-back scheme that enhances the standard ■ Makes it so that no cache line remains “too long” in the cache and are written-back if an FWB is triggered by the mechanism ■ Checks cache lines and updates state periodically (e.g., once per cycle).
13
14
○ Primary idea: Let hardware take care of it, but there are trade-offs PROS: +Provides redo+undo logging at low cost +Removes burden from the programmer +More efficient utilization of underlying hardware +Guarantees persistence +Improves performance
15
CONS:
○ Microbenchmarks (hash, rbtree, sps, ssca2, btree) w/ real workloads (WHISPER)
○ On average, performance (measured by IPC) increases by 1.42x over the baseline without CLWB and a 1.51x improvement over the baseline design with CLWB. ○ On average, dynamic memory energy consumption reduces by 1.53x over the baseline without CLWB and a 1.72x improvement over the baseline design with CLWB. ○ On average, operational throughput increases by 1.45x over the baseline without CLWB and a 1.60x improvement over the baseline design with CLWB. ○ On average, memory write traffic reduces by 2.36x over the baseline without CLWB and a 3.12x improvement over the baseline design with CLWB.
16
○ Table below shows major new hardware components ○ Relatively low overhead compared to existing cache state ○ Other logic also added, but is small- to medium-sized logic components like Decoders and Muxes
17
Mechanism Logic Type Size Transaction ID register flip-flops 1 Byte Log head pointer register flip-flops 8 Bytes Log tail pointer register flip-flops 8 Bytes Log buffer SRAM 964 Bytes FWB tag bits SRAM 1040 Bytes
18
19
20
Circular Log Buffer
head tail