HOOP: Efficient Hardware-Assisted Out-of-Place Update for - - PowerPoint PPT Presentation
HOOP: Efficient Hardware-Assisted Out-of-Place Update for - - PowerPoint PPT Presentation
HOOP: Efficient Hardware-Assisted Out-of-Place Update for Non-Volatile Memory Miao Cai Chance Coats Jian Huang Systems Platform Research Group Non-Volatile Memory is a Revolutionary Technology Close-to-DRAM Performance Data
2
Non-Volatile Memory is a Revolutionary Technology
New and emerging NVMs offer promising properties and become popular
Close-to-DRAM Performance Data Durability Byte Addressability
3
Memory Persistency Challenge: A Well-Known Problem
Ensuring memory persistency with commodity architecture is challenging!
Performance vs. Persistency Out-of-Order Execution Volatile Processor Cache
4
State-of-the-Art Approach: Redo/Undo Logging
Undo Logging Redo Logging
Undo/Redo logging causes DOUBLE WRITES on the critical path.
Page Copy
5
State-of-the-Art Approach: Shadow Paging
Optimized shadow paging still suffers from FREQUENT DATA FLUSHES.
6
State-of-the-Art Approach: Log-structured NVM
Software-based LSNVM suffers from LONG ACCESS LATENCY.
Log Index
7
A Summary of State-of-the-Art Approaches
Logging Shadow Paging Log-structured NVM
Memory persistency overheads: double writes, frequent flushes, long critical-path latency
8
Our Approach: Hardware-assisted Out-Of-Place (HOOP) Update
Reduced write traffic with data coalescing and packing No requirement on persistence ordering Transparent support of atomic data durability
+ +
9
Lightweight Indirection Layer
Challenges of Supporting Out-Of-Place Update
Limited Resource in Memory Controller Efficient Garbage Collection
10
Address Remapping for Supporting Out-of-Place Update
Processor Cache
Memory Co Controlle ler
Home Region OOP Region
NVM
Mapping Table
store load
physical-to-physical address mapping
Insert mapping entry
Upo pon a wri rite to
- OOP regio
ion
Delete mapping entry
Data migra rati tion fr from OOP P to
- hom
- me
Up Upon n a read fr from OOP region
GC GC
11
Processor Cache
Memory Co Controlle ler
Home Region OOP Region
NVM
Mapping Table
store load
Data Packing in the Memory Controller for Improved Performance
OOP Data Buffer
Many applications update data at a fine granularity Home address
OOP Block Head OOP Block Head
…
12
Processor Cache
Memory Co Controlle ler
Home Region OOP Region
NVM
Mapping Table
store load
OOP Data Buffer
Ensuring Persistence Ordering in the Memory Controller
Done the data packing for a memory slice Upon the end of transaction (e.g., Tx_end)
13
Processor Cache
Memory Co Controlle ler
Home Region OOP Region
NVM
Mapping Table
store load
OOP Data Buffer
Efficient Garbage Collection for Improved Memory Utilization
GC GC
OOP Block Head OOP Block Head
…
Load sta tale le data ta dur urin ing GC
Eviction Buffer Linked Memory Slices
14
Processor Cache
Memory Co Controlle ler
Home Region OOP Region
NVM
Mapping Table
store load
OOP Data Buffer OOP Block Head OOP Block Head
…
Handling Crash Consistency Upon Failures
Eviction Buffer
15
Put It All Together
Last-Level Cache
Memory Co Controlle ler
Home Region OOP Region
NVM
Mapping Table
store load
OOP Data Buffer Eviction Buffer
L1 Cache L1 Cache
core core
miss miss
16
HOOP Implementation Evaluation Benchmarks
McS cSim imA+: OoO
- O co
cores, , 2.5 .5GHz, 32KB KB L1, , 256KB L2, , 2MB LLC Processor Simulator NVM Simulator Read/Wri rite te = 50/1 /150ns, , 512GB
Synthetic Workloads Real-world Workloads
Vect ctor, , Hash shMap, , Queue, , RB-Tree, , B- Tree YCSB, TPC PCC
17
Improving Transaction Throughput with HOOP
0.5 1 1.5 2 2.5
Vector Queue RBTree Btree HashMap YCSB TPCC Normalized Speedup Optimized Redo Optimized Undo Optimized Shadow Paging Log-Structured NVM Logless Atomic Durability HOOP Ideal
HOOP is close to the performance of a system without any persistence enforcement.
18
Reducing Critical-Path Latency with HOOP
0.5 1 1.5 2 2.5
Vector Queue RBTree Btree HashMap YCSB TPCC Normalized Latency Ideal Optimized Redo Optimized Undo Optimized Shadow Paging Log-Structured NVM Logless Atomic Durability HOOP
HOOP achieves the lowest latency, compared to state-of-the-art approaches.
19
Reducing Write Traffic with HOOP
0.5 1 1.5 2 2.5 3
Vector Queue RBTree Btree HashMap YCSB TPCC Normalized Write Traffic Ideal Optimized Redo Optimized Undo Optimized Shadow Paging Log-Structured NVM Logless Atomic Durability HOOP
HOOP reduces write traffic by up to 2.1x, compared to logging approaches.
20
HOOP Summary
1.7x Performance Speedup for Data-Intensive Apps 2.1x Reduction of Write Amplification
Thanks!
University of Illinois at Urbana-Champaign