Efficient Hardware-based Undo+Redo Logging for Persistent Memory - - PowerPoint PPT Presentation

efficient hardware based undo redo logging for persistent
SMART_READER_LITE
LIVE PREVIEW

Efficient Hardware-based Undo+Redo Logging for Persistent Memory - - PowerPoint PPT Presentation

Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems Matheus Ogleari Prof. Ethan Miller Prof. Jishen Zhao 1 Overview Background/Motivation Hardware-Driven Logging Design Evaluation 2 Background


slide-1
SLIDE 1

Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems

Matheus Ogleari

  • Prof. Ethan Miller
  • Prof. Jishen Zhao

1

slide-2
SLIDE 2

Overview

  • Background/Motivation
  • Hardware-Driven Logging Design
  • Evaluation

2

slide-3
SLIDE 3

Background

  • Persistent Memory - What is it?

○ A “hybridization” between storage and memory ○ Combines the persistence in data storage of disks with the byte-addressability and load/store interface of memories ○ Avoids paging data blocks from/to a storage device or context switching while servicing page faults ○ Utilizes NVM technology (NonVolatile Memory)

3

slide-4
SLIDE 4

Motivation

  • “There is no such thing as a free lunch”

○ There is overhead and complexity costs associated with implementing persistent memory ○ There is “a large performance gap between a system with a persistent memory and a “native system” (i.e., with no persistence support)” [Zhao, Micro’13]

Image from Zhao Micro‘13 Paper

4

slide-5
SLIDE 5

Our Work

  • What are we doing?

○ Hardware-Driven Logging for Persistent Memory Systems ○ Shifting the balance between hardware and software while improving performance with persistent memory applications

5

slide-6
SLIDE 6

Baseline

  • Baseline Assumptions

■ System uses 100% NVRAM OR hybrid NVRAM/DRAM ■ In a hybrid system, we can distinguish between using DRAM and NVRAM ■ Cache is SRAM Processor Cache (SRAM) Memory (NVRAM) Chip Off-Chip Processor Cache (SRAM) Memory (DRAM) (NVRAM) Chip Off-Chip

6

slide-7
SLIDE 7

Baseline

  • Baseline Assumptions

■ Use logging for memory persistence ■ Log is finite size, cannot grow forever, is uncacheable ■ Log is a circular buffer with head and tail pointers, wraps around

Circular Log Buffer

head tail

7

slide-8
SLIDE 8

Baseline

  • Software-Driven Logging

8

slide-9
SLIDE 9

Our Work

  • What we want:

9

slide-10
SLIDE 10

Hardware-Driven Logging

  • How does it work?
  • Two primary mechanisms:

○ Hardware Redo+Undo Logging (WAL) ○ Cache forced write-back (FWB)

10

slide-11
SLIDE 11

Hardware-Driven Logging

  • Design

Hardware Redo+Undo Logging (WAL) ■ All writes within persistent memory blocks automatically trigger a write to log. ■ These logs accumulate in an on-chip log buffer <addr, old_val, new_val, txid> ■ Once transactions complete (or buffer full), write out log entries to NVRAM

11

slide-12
SLIDE 12

Hardware-Driven Logging

  • Design

Hardware Redo+Undo Logging (WAL) ■ For a log entry <addr, old_val, new_val, txid>, addr and new_val are given in the write request, and old_val is grabbed from write-allocate cache line

12

slide-13
SLIDE 13

Hardware-Driven Logging

  • Design

Forced Write-Back (FWB) ■ New cache write-back scheme that enhances the standard ■ Makes it so that no cache line remains “too long” in the cache and are written-back if an FWB is triggered by the mechanism ■ Checks cache lines and updates state periodically (e.g., once per cycle).

13

slide-14
SLIDE 14

Hardware-Driven Logging

  • Design

○ Forced Write-Back (FWB) ■ Provides guaranteed persistence due to smarter cache line write back, unlike software approach ■ Allows for writes to coalesce in cache lines before writing back to memory ■ Is more efficient than always writing-back after persistent updates

14

slide-15
SLIDE 15

Trade-Offs

  • Hardware-Driven Logging for Persistent Memory Systems

○ Primary idea: Let hardware take care of it, but there are trade-offs PROS: +Provides redo+undo logging at low cost +Removes burden from the programmer +More efficient utilization of underlying hardware +Guarantees persistence +Improves performance

15

CONS:

  • More complex hardware
  • Higher chip area
  • More on-chip power consumption
slide-16
SLIDE 16

Evaluation

  • Experiments

○ Microbenchmarks (hash, rbtree, sps, ssca2, btree) w/ real workloads (WHISPER)

  • Results

○ On average, performance (measured by IPC) increases by 1.42x over the baseline without CLWB and a 1.51x improvement over the baseline design with CLWB. ○ On average, dynamic memory energy consumption reduces by 1.53x over the baseline without CLWB and a 1.72x improvement over the baseline design with CLWB. ○ On average, operational throughput increases by 1.45x over the baseline without CLWB and a 1.60x improvement over the baseline design with CLWB. ○ On average, memory write traffic reduces by 2.36x over the baseline without CLWB and a 3.12x improvement over the baseline design with CLWB.

16

slide-17
SLIDE 17

Hardware Cost?

  • Hardware Overhead

○ Table below shows major new hardware components ○ Relatively low overhead compared to existing cache state ○ Other logic also added, but is small- to medium-sized logic components like Decoders and Muxes

17

Mechanism Logic Type Size Transaction ID register flip-flops 1 Byte Log head pointer register flip-flops 8 Bytes Log tail pointer register flip-flops 8 Bytes Log buffer SRAM 964 Bytes FWB tag bits SRAM 1040 Bytes

slide-18
SLIDE 18

Impact

  • Why does this matter?

○ Provides redo+undo logging at low cost, with benefits of both ○ Improves persistent memory performance, closing the gap with non-persistent systems ○ Addresses common issues that arise with current persistent memory software-based models ○ Removes the need for persistent memory-related instructions (mfence, clwb, clflush) ○ Provides new avenue of research, with hardware-based solutions becoming more appealing

18

slide-19
SLIDE 19

Thank You!

19

slide-20
SLIDE 20

Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems

  • What we want:

20

Circular Log Buffer

head tail