Efficient Hardware-based Undo+Redo Logging for Persistent Memory - - PowerPoint PPT Presentation

▶

Aug 05, 2023 32 likes •234 views

Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems Matheus Ogleari Prof. Ethan Miller Prof. Jishen Zhao 1 Overview Background/Motivation Hardware-Driven Logging Design Evaluation 2 Background

SLIDE 1

Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems

Matheus Ogleari

Prof. Ethan Miller
Prof. Jishen Zhao

SLIDE 2

Overview

Background/Motivation
Hardware-Driven Logging Design
Evaluation

SLIDE 3

Background

Persistent Memory - What is it?

○ A “hybridization” between storage and memory ○ Combines the persistence in data storage of disks with the byte-addressability and load/store interface of memories ○ Avoids paging data blocks from/to a storage device or context switching while servicing page faults ○ Utilizes NVM technology (NonVolatile Memory)

SLIDE 4

Motivation

“There is no such thing as a free lunch”

○ There is overhead and complexity costs associated with implementing persistent memory ○ There is “a large performance gap between a system with a persistent memory and a “native system” (i.e., with no persistence support)” [Zhao, Micro’13]

Image from Zhao Micro‘13 Paper

SLIDE 5

Our Work

What are we doing?

○ Hardware-Driven Logging for Persistent Memory Systems ○ Shifting the balance between hardware and software while improving performance with persistent memory applications

SLIDE 6

Baseline

Baseline Assumptions

■ System uses 100% NVRAM OR hybrid NVRAM/DRAM ■ In a hybrid system, we can distinguish between using DRAM and NVRAM ■ Cache is SRAM Processor Cache (SRAM) Memory (NVRAM) Chip Off-Chip Processor Cache (SRAM) Memory (DRAM) (NVRAM) Chip Off-Chip

SLIDE 7

Baseline

Baseline Assumptions

■ Use logging for memory persistence ■ Log is finite size, cannot grow forever, is uncacheable ■ Log is a circular buffer with head and tail pointers, wraps around

Circular Log Buffer

head tail

SLIDE 8

Baseline

Software-Driven Logging

SLIDE 9

Our Work

What we want:

SLIDE 10

Hardware-Driven Logging

How does it work?
Two primary mechanisms:

○ Hardware Redo+Undo Logging (WAL) ○ Cache forced write-back (FWB)

SLIDE 11

Hardware-Driven Logging

Design

○

Hardware Redo+Undo Logging (WAL) ■ All writes within persistent memory blocks automatically trigger a write to log. ■ These logs accumulate in an on-chip log buffer <addr, old_val, new_val, txid> ■ Once transactions complete (or buffer full), write out log entries to NVRAM

SLIDE 12

Hardware-Driven Logging

Design

○

Hardware Redo+Undo Logging (WAL) ■ For a log entry <addr, old_val, new_val, txid>, addr and new_val are given in the write request, and old_val is grabbed from write-allocate cache line

SLIDE 13

Hardware-Driven Logging

Design

○

Forced Write-Back (FWB) ■ New cache write-back scheme that enhances the standard ■ Makes it so that no cache line remains “too long” in the cache and are written-back if an FWB is triggered by the mechanism ■ Checks cache lines and updates state periodically (e.g., once per cycle).

SLIDE 14

Hardware-Driven Logging

Design

○ Forced Write-Back (FWB) ■ Provides guaranteed persistence due to smarter cache line write back, unlike software approach ■ Allows for writes to coalesce in cache lines before writing back to memory ■ Is more efficient than always writing-back after persistent updates

SLIDE 15

Trade-Offs

Hardware-Driven Logging for Persistent Memory Systems

○ Primary idea: Let hardware take care of it, but there are trade-offs PROS: +Provides redo+undo logging at low cost +Removes burden from the programmer +More efficient utilization of underlying hardware +Guarantees persistence +Improves performance

CONS:

More complex hardware
Higher chip area
More on-chip power consumption

SLIDE 16

Evaluation

Experiments

○ Microbenchmarks (hash, rbtree, sps, ssca2, btree) w/ real workloads (WHISPER)

Results

○ On average, performance (measured by IPC) increases by 1.42x over the baseline without CLWB and a 1.51x improvement over the baseline design with CLWB. ○ On average, dynamic memory energy consumption reduces by 1.53x over the baseline without CLWB and a 1.72x improvement over the baseline design with CLWB. ○ On average, operational throughput increases by 1.45x over the baseline without CLWB and a 1.60x improvement over the baseline design with CLWB. ○ On average, memory write traffic reduces by 2.36x over the baseline without CLWB and a 3.12x improvement over the baseline design with CLWB.

SLIDE 17

Hardware Cost?

Hardware Overhead

○ Table below shows major new hardware components ○ Relatively low overhead compared to existing cache state ○ Other logic also added, but is small- to medium-sized logic components like Decoders and Muxes

Mechanism Logic Type Size Transaction ID register flip-flops 1 Byte Log head pointer register flip-flops 8 Bytes Log tail pointer register flip-flops 8 Bytes Log buffer SRAM 964 Bytes FWB tag bits SRAM 1040 Bytes

SLIDE 18

Impact

Why does this matter?

○ Provides redo+undo logging at low cost, with benefits of both ○ Improves persistent memory performance, closing the gap with non-persistent systems ○ Addresses common issues that arise with current persistent memory software-based models ○ Removes the need for persistent memory-related instructions (mfence, clwb, clflush) ○ Provides new avenue of research, with hardware-based solutions becoming more appealing

SLIDE 19

Thank You!

SLIDE 20

Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems

What we want:

Circular Log Buffer

head tail