HOOP: Efficient Hardware-Assisted Out-of-Place Update for - - PowerPoint PPT Presentation

▶

Dec 11, 2023 154 likes •372 views

HOOP: Efficient Hardware-Assisted Out-of-Place Update for Non-Volatile Memory Miao Cai Chance Coats Jian Huang Systems Platform Research Group Non-Volatile Memory is a Revolutionary Technology Close-to-DRAM Performance Data

SLIDE 1

HOOP: Efficient Hardware-Assisted Out-of-Place Update for Non-Volatile Memory

Miao Cai † Chance Coats Jian Huang

Systems Platform Research Group

†

SLIDE 2

Non-Volatile Memory is a Revolutionary Technology

New and emerging NVMs offer promising properties and become popular

Close-to-DRAM Performance Data Durability Byte Addressability

SLIDE 3

Memory Persistency Challenge: A Well-Known Problem

Ensuring memory persistency with commodity architecture is challenging!

Performance vs. Persistency Out-of-Order Execution Volatile Processor Cache

SLIDE 4

State-of-the-Art Approach: Redo/Undo Logging

Undo Logging Redo Logging

Undo/Redo logging causes DOUBLE WRITES on the critical path.

SLIDE 5

Page Copy

State-of-the-Art Approach: Shadow Paging

Optimized shadow paging still suffers from FREQUENT DATA FLUSHES.

SLIDE 6

State-of-the-Art Approach: Log-structured NVM

Software-based LSNVM suffers from LONG ACCESS LATENCY.

Log Index

SLIDE 7

A Summary of State-of-the-Art Approaches

Logging Shadow Paging Log-structured NVM

Memory persistency overheads: double writes, frequent flushes, long critical-path latency

SLIDE 8

Our Approach: Hardware-assisted Out-Of-Place (HOOP) Update

Reduced write traffic with data coalescing and packing No requirement on persistence ordering Transparent support of atomic data durability

+ +

SLIDE 9

Lightweight Indirection Layer

Challenges of Supporting Out-Of-Place Update

Limited Resource in Memory Controller Efficient Garbage Collection

SLIDE 10

Address Remapping for Supporting Out-of-Place Update

Processor Cache

Memory Co Controlle ler

Home Region OOP Region

NVM

Mapping Table

store load

physical-to-physical address mapping

Insert mapping entry

Upo pon a wri rite to

OOP regio

ion

Delete mapping entry

Data migra rati tion fr from OOP P to

Up Upon n a read fr from OOP region

GC GC

SLIDE 11

Processor Cache

Memory Co Controlle ler

Home Region OOP Region

NVM

Mapping Table

store load

Data Packing in the Memory Controller for Improved Performance

OOP Data Buffer

Many applications update data at a fine granularity Home address

OOP Block Head OOP Block Head

…

SLIDE 12

Processor Cache

Memory Co Controlle ler

Home Region OOP Region

NVM

Mapping Table

store load

OOP Data Buffer

Ensuring Persistence Ordering in the Memory Controller

Done the data packing for a memory slice Upon the end of transaction (e.g., Tx_end)

SLIDE 13

Processor Cache

Memory Co Controlle ler

Home Region OOP Region

NVM

Mapping Table

store load

OOP Data Buffer

Efficient Garbage Collection for Improved Memory Utilization

GC GC

OOP Block Head OOP Block Head

…

Load sta tale le data ta dur urin ing GC

Eviction Buffer Linked Memory Slices

SLIDE 14

Processor Cache

Memory Co Controlle ler

Home Region OOP Region

NVM

Mapping Table

store load

OOP Data Buffer OOP Block Head OOP Block Head

…

Handling Crash Consistency Upon Failures

Eviction Buffer

SLIDE 15

Put It All Together

Last-Level Cache

Memory Co Controlle ler

Home Region OOP Region

NVM

Mapping Table

store load

OOP Data Buffer Eviction Buffer

L1 Cache L1 Cache

core core

miss miss

SLIDE 16

HOOP Implementation Evaluation Benchmarks

McS cSim imA+: OoO

O co

cores, , 2.5 .5GHz, 32KB KB L1, , 256KB L2, , 2MB LLC Processor Simulator NVM Simulator Read/Wri rite te = 50/1 /150ns, , 512GB

Synthetic Workloads Real-world Workloads

Vect ctor, , Hash shMap, , Queue, , RB-Tree, , B- Tree YCSB, TPC PCC

SLIDE 17

Improving Transaction Throughput with HOOP

0.5 1 1.5 2 2.5

Vector Queue RBTree Btree HashMap YCSB TPCC Normalized Speedup Optimized Redo Optimized Undo Optimized Shadow Paging Log-Structured NVM Logless Atomic Durability HOOP Ideal

HOOP is close to the performance of a system without any persistence enforcement.

SLIDE 18

Reducing Critical-Path Latency with HOOP

0.5 1 1.5 2 2.5

Vector Queue RBTree Btree HashMap YCSB TPCC Normalized Latency Ideal Optimized Redo Optimized Undo Optimized Shadow Paging Log-Structured NVM Logless Atomic Durability HOOP

HOOP achieves the lowest latency, compared to state-of-the-art approaches.

SLIDE 19

Reducing Write Traffic with HOOP

0.5 1 1.5 2 2.5 3

Vector Queue RBTree Btree HashMap YCSB TPCC Normalized Write Traffic Ideal Optimized Redo Optimized Undo Optimized Shadow Paging Log-Structured NVM Logless Atomic Durability HOOP

HOOP reduces write traffic by up to 2.1x, compared to logging approaches.

SLIDE 20

HOOP Summary

1.7x Performance Speedup for Data-Intensive Apps 2.1x Reduction of Write Amplification

HOOP: Efficient Hardware-Assisted Out-of-Place Update for Non-Volatile Memory

Miao Cai † Chance Coats Jian Huang

Systems Platform Research Group

†

Non-Volatile Memory is a Revolutionary Technology

New and emerging NVMs offer promising properties and become popular

Memory Persistency Challenge: A Well-Known Problem

Ensuring memory persistency with commodity architecture is challenging!

State-of-the-Art Approach: Redo/Undo Logging

Undo/Redo logging causes DOUBLE WRITES on the critical path.

State-of-the-Art Approach: Shadow Paging

Optimized shadow paging still suffers from FREQUENT DATA FLUSHES.

State-of-the-Art Approach: Log-structured NVM

Software-based LSNVM suffers from LONG ACCESS LATENCY.

A Summary of State-of-the-Art Approaches

Our Approach: Hardware-assisted Out-Of-Place (HOOP) Update

+ +

Challenges of Supporting Out-Of-Place Update

Address Remapping for Supporting Out-of-Place Update

Data Packing in the Memory Controller for Improved Performance

…

Ensuring Persistence Ordering in the Memory Controller

Efficient Garbage Collection for Improved Memory Utilization

…

…

Handling Crash Consistency Upon Failures

Put It All Together

NVM

miss miss

HOOP Implementation Evaluation Benchmarks

Improving Transaction Throughput with HOOP

Reducing Critical-Path Latency with HOOP

Reducing Write Traffic with HOOP

HOOP Summary

1.7x Performance Speedup for Data-Intensive Apps 2.1x Reduction of Write Amplification

Thanks!

Miao Cai Chance Coats Jian Huang

Systems Platform Research Group