HOOP: Efficient Hardware-Assisted Out-of-Place Update for - - PowerPoint PPT Presentation

hoop efficient hardware assisted
SMART_READER_LITE
LIVE PREVIEW

HOOP: Efficient Hardware-Assisted Out-of-Place Update for - - PowerPoint PPT Presentation

HOOP: Efficient Hardware-Assisted Out-of-Place Update for Non-Volatile Memory Miao Cai Chance Coats Jian Huang Systems Platform Research Group Non-Volatile Memory is a Revolutionary Technology Close-to-DRAM Performance Data


slide-1
SLIDE 1

HOOP: Efficient Hardware-Assisted Out-of-Place Update for Non-Volatile Memory

Miao Cai † Chance Coats Jian Huang

Systems Platform Research Group

slide-2
SLIDE 2

2

Non-Volatile Memory is a Revolutionary Technology

New and emerging NVMs offer promising properties and become popular

Close-to-DRAM Performance Data Durability Byte Addressability

slide-3
SLIDE 3

3

Memory Persistency Challenge: A Well-Known Problem

Ensuring memory persistency with commodity architecture is challenging!

Performance vs. Persistency Out-of-Order Execution Volatile Processor Cache

slide-4
SLIDE 4

4

State-of-the-Art Approach: Redo/Undo Logging

Undo Logging Redo Logging

Undo/Redo logging causes DOUBLE WRITES on the critical path.

slide-5
SLIDE 5

Page Copy

5

State-of-the-Art Approach: Shadow Paging

Optimized shadow paging still suffers from FREQUENT DATA FLUSHES.

slide-6
SLIDE 6

6

State-of-the-Art Approach: Log-structured NVM

Software-based LSNVM suffers from LONG ACCESS LATENCY.

Log Index

slide-7
SLIDE 7

7

A Summary of State-of-the-Art Approaches

Logging Shadow Paging Log-structured NVM

Memory persistency overheads: double writes, frequent flushes, long critical-path latency

slide-8
SLIDE 8

8

Our Approach: Hardware-assisted Out-Of-Place (HOOP) Update

Reduced write traffic with data coalescing and packing No requirement on persistence ordering Transparent support of atomic data durability

+ +

slide-9
SLIDE 9

9

Lightweight Indirection Layer

Challenges of Supporting Out-Of-Place Update

Limited Resource in Memory Controller Efficient Garbage Collection

slide-10
SLIDE 10

10

Address Remapping for Supporting Out-of-Place Update

Processor Cache

Memory Co Controlle ler

Home Region OOP Region

NVM

Mapping Table

store load

physical-to-physical address mapping

Insert mapping entry

Upo pon a wri rite to

  • OOP regio

ion

Delete mapping entry

Data migra rati tion fr from OOP P to

  • hom
  • me

Up Upon n a read fr from OOP region

GC GC

slide-11
SLIDE 11

11

Processor Cache

Memory Co Controlle ler

Home Region OOP Region

NVM

Mapping Table

store load

Data Packing in the Memory Controller for Improved Performance

OOP Data Buffer

Many applications update data at a fine granularity Home address

OOP Block Head OOP Block Head

slide-12
SLIDE 12

12

Processor Cache

Memory Co Controlle ler

Home Region OOP Region

NVM

Mapping Table

store load

OOP Data Buffer

Ensuring Persistence Ordering in the Memory Controller

Done the data packing for a memory slice Upon the end of transaction (e.g., Tx_end)

slide-13
SLIDE 13

13

Processor Cache

Memory Co Controlle ler

Home Region OOP Region

NVM

Mapping Table

store load

OOP Data Buffer

Efficient Garbage Collection for Improved Memory Utilization

GC GC

OOP Block Head OOP Block Head

Load sta tale le data ta dur urin ing GC

Eviction Buffer Linked Memory Slices

slide-14
SLIDE 14

14

Processor Cache

Memory Co Controlle ler

Home Region OOP Region

NVM

Mapping Table

store load

OOP Data Buffer OOP Block Head OOP Block Head

Handling Crash Consistency Upon Failures

Eviction Buffer

slide-15
SLIDE 15

15

Put It All Together

Last-Level Cache

Memory Co Controlle ler

Home Region OOP Region

NVM

Mapping Table

store load

OOP Data Buffer Eviction Buffer

L1 Cache L1 Cache

core core

miss miss

slide-16
SLIDE 16

16

HOOP Implementation Evaluation Benchmarks

McS cSim imA+: OoO

  • O co

cores, , 2.5 .5GHz, 32KB KB L1, , 256KB L2, , 2MB LLC Processor Simulator NVM Simulator Read/Wri rite te = 50/1 /150ns, , 512GB

Synthetic Workloads Real-world Workloads

Vect ctor, , Hash shMap, , Queue, , RB-Tree, , B- Tree YCSB, TPC PCC

slide-17
SLIDE 17

17

Improving Transaction Throughput with HOOP

0.5 1 1.5 2 2.5

Vector Queue RBTree Btree HashMap YCSB TPCC Normalized Speedup Optimized Redo Optimized Undo Optimized Shadow Paging Log-Structured NVM Logless Atomic Durability HOOP Ideal

HOOP is close to the performance of a system without any persistence enforcement.

slide-18
SLIDE 18

18

Reducing Critical-Path Latency with HOOP

0.5 1 1.5 2 2.5

Vector Queue RBTree Btree HashMap YCSB TPCC Normalized Latency Ideal Optimized Redo Optimized Undo Optimized Shadow Paging Log-Structured NVM Logless Atomic Durability HOOP

HOOP achieves the lowest latency, compared to state-of-the-art approaches.

slide-19
SLIDE 19

19

Reducing Write Traffic with HOOP

0.5 1 1.5 2 2.5 3

Vector Queue RBTree Btree HashMap YCSB TPCC Normalized Write Traffic Ideal Optimized Redo Optimized Undo Optimized Shadow Paging Log-Structured NVM Logless Atomic Durability HOOP

HOOP reduces write traffic by up to 2.1x, compared to logging approaches.

slide-20
SLIDE 20

20

HOOP Summary

1.7x Performance Speedup for Data-Intensive Apps 2.1x Reduction of Write Amplification

slide-21
SLIDE 21

Thanks!

University of Illinois at Urbana-Champaign

Miao Cai Chance Coats Jian Huang

Systems Platform Research Group