Lazy Persistency: a High-Performing and Write-Efficient Software - PowerPoint PPT Presentation

Lazy Persistency: a High-Performing and Write-Efficient Software Persistency Technique Mohammad Alshboul , James Tuck, and Yan Solihin Email : maalshbo@ncsu.edu ARPERS Research Group

Introduction • Future systems will likely include Non-Volatile Main Memory (NVMM) • NVMM can host data persistently across crashes and reboots • Crash consistent data requires persistency models, which define when stores reach NVMM (i.e. become durable) – E.g. Intel PMEM: CLFLUSH, CLFLUSHOPT, CLWB, SFENCE ARPERS Research Group 2

P Disk Disk Delay ARPERS Research Group 3

P Disk cache NVMM Disk Delay ARPERS Research Group 3

P Disk cache NVMM NVMM Delay • CLFLUSHOPT flushes a cache block to NVMM Disk Delay • SFENCE orders CLFLUSHOPT with other stores We refer to this type of persistency models as Eager Persistency ARPERS Research Group 4

Our Solution: Lazy Persistency • Principle: Make the Common Case Fast • Software technique • Code is broken into Lazy Persistency (LP) regions – Each LP region protected by a checksum – Checksum enables persistency failure detection after a crash – On recovery, failed regions are re-executed • Lazily relies on natural cache evictions No persist barriers (CLFLUSHOPT, SFENCE) needed ARPERS Research Group 5

Lazy Persistency Details CPU ST A1 • Programs are divided into associative LP regions ST B1 • Programmers choose LP region granularity ST A2 ST B2 • A checksum covers updates in an LP region - Stored at the end of the LP region ST A3 ST B3 ST A4 ST B4 ARPERS Research Group 6

Lazy Persistency Details CPU ST A1 + • Programs are divided into associative LP regions ST B1 ST CHK1 • Programmers choose LP region granularity ST A2 + ST B2 • A checksum covers updates in an LP region ST CHK2 - Stored at the end of the LP region ST A3 + ST B3 ST CHK3 ST A4 + ST B4 ST CHK4 ARPERS Research Group 6

Lazy Persistency Details 7

Lazy Persistency Details LP Region 7

Lazy Persistency Details ➔ Initialize at the beginning of the region LP Region 7

Lazy Persistency Details ➔ Initialize at the beginning of the region LP Region ➔ Update during each iteration in the region 7

Lazy Persistency Details ➔ Initialize at the beginning of the region LP Region ➔ Update during each iteration in the region ➔ Store the checksum to the corresponding location 7

Eviction CPU Cache NVMM ST A1 ST B1 A1 B1 CHK1 ST CHK1 ST A2 ST B2 A2 B2 CHK2 ST CHK2 ST A3 ST B3 A3 B3 CHK3 ST CHK3 ST A4 A4 B4 CHK4 ST B4 ST CHK4 INST INST PC 8

Eviction CPU Cache NVMM ST A1 ST B1 A1 B1 CHK1 ST CHK1 ST A2 ST B2 CHK2 A2 B2 ST CHK2 ST A3 CHK3 ST B3 A3 B3 ST CHK3 ST A4 A4 B4 CHK4 ST B4 ST CHK4 INST INST PC 8

Eviction CPU Cache NVMM ST A1 ST B1 A1 B1 CHK1 ST CHK1 ST A2 ST B2 CHK2 B2 A2 ST CHK2 ST A3 ST B3 CHK3 A3 B3 ST CHK3 ST A4 CHK4 A4 B4 ST B4 ST CHK4 INST INST PC 9

Eviction CPU Cache NVMM ST A1 ST B1 A1 B1 CHK1 ST CHK1 ST A2 ST B2 B2 CHK2 A2 ST CHK2 ST A3 CHK3 ST B3 A3 B3 ST CHK3 ST A4 CHK4 A4 B4 ST B4 ST CHK4 INST INST PC 9

Recovering From a Crash • On a crash, checksums are validated to detect regions that were not persisted • Failed regions are recomputed • Finally, program resumes execution in normal mode ARPERS Research Group 10

P Disk NVMM NVMM Delay Disk Delay 11

P cache Disk NVMM Create checksum NVMM Delay Disk Delay 11

Limitations of Lazy Persistency • LP regions need to be associative, i.e. (R1, R2), R3 = R1, (R2, R3) – Most HPC kernels contain loop iterations that satisfy this requirement – Can be relaxed in some situations (see the paper) • Recovery code needed for LP regions – Solution: Prior work can be exploited [PACT ’ 17] • Amount of recovery may be unbounded (e.g. due to hot blocks) – Solution: Periodic Flushes (Next Slide) ARPERS Research Group 12

Bounding the Amount of Recovery • Cache blocks may stay in the cache for a long time (e.g. hot blocks) – Getting worse the larger the cache • Regions with such blocks may fail to persist • Upper-bound is needed for the time a block might remain dirty in the cache • This is needed to guarantee forward progress ARPERS Research Group 13

Solution: Periodic Flushes • A simple hardware support • All dirty blocks in the cache are written back periodically, in the background • Modest increase in the number of writes (see paper for details) • The periodic flush interval puts an upper bound for recovery work ARPERS Research Group 14

Evaluation Methodology • Simulations on a modified version of gem5. Supports most Intel PMEM instructions (e.g. CLFLUSHOPT) • Detailed out-of-order CPU model. Ruby memory system. 8 threads is the default for all experiments • Evaluation was also done on 32-core DRAM-based real hardware machine ARPERS Research Group 15

Evaluation Multi-Threaded Benchmarks • Tiled Matrix Multiplication • Cholesky Factorization • 2D convolution • Fast Fourier Transform • Gauss Elimination ARPERS Research Group 16

Evaluation: All Benchmarks Eager Persistency Eager Persistency Lazy Persistecy Lazy Persistecy Overhead (X) Normalized to base Overhead (X) Normalized to base 1.6 1.2 1.4 1.1 1.2 1.0 0.9 1.0 0.8 0.8 TMM Cholesky 2D-conv Gauss FFT gmean TMM Cholesky 2D-conv Gauss FFT gmean (b) Number of Writes Overhead (a) Execution Time Overhead 17

Evaluation: All Benchmarks Eager Persistency Eager Persistency Lazy Persistecy Lazy Persistecy Overhead (X) Normalized to base Overhead (X) Normalized to base 1.6 1.2 9% vs 1% 1.4 21% vs 3% 1.1 1.2 1.0 0.9 1.0 0.8 0.8 TMM Cholesky 2D-conv Gauss FFT gmean TMM Cholesky 2D-conv Gauss FFT gmean (b) Number of Writes Overhead (a) Execution Time Overhead 17

More Evaluations We performed other interesting evaluations that can be found in the paper: • Sensitivity study with varying the read/write latency for NVMM • Sensitivity study with varying the number of threads • Evaluating the execution time for all the 5 benchmark on real hardware • Sensitivity study with varying the Last Level Cache size • Analysis for the Number of Writes of Periodic Flushes hardware support • Evaluating the execution time overhead when trying different error detection mechanisms ARPERS Research Group 18

Summary • Lazy Persistency is a software persistency technique that relies on natural cache evictions (No stalls on SFENCE) • It reduces the execution time and write amplification overheads, from 9% and 21%, to only 1% and 3%, respectively. • A simple hardware support can provide an upper-bound on the recovery work ARPERS Research Group 19

Questions ? ARPERS Research Group 20

Lazy Persistency: a High-Performing and Write-Efficient Software - PowerPoint PPT Presentation

Lazy Persistency: a High-Performing and Write-Efficient Software Persistency Technique Mohammad Alshboul , James Tuck, and Yan Solihin Email : maalshbo@ncsu.edu ARPERS Research Group Introduction Future systems will likely include

Persistency Programming 101 Why and What of memory persistency

ROTORUA PERFORMING ARTS ACTIVATIONS ROTORUA PERFORMING ARTS ACTIVATIONS 2019 2019 IMAGE: IMAGE:

Maximum Persistency in Energy Minimization Alexander Shekhovtsov, TU Graz June 25, 2014 1/26

Can We Represent Infinite Lists? Lazy Evaluation Amtoft Motivation Lazy Lists Conversions

Imagine for a moment @trentmwillis Lazy Loading Engines: Anything But Lazy Engines allow

Welcome Performing Arts Academy Orientation Performing Arts Philosophy The Performing Arts

Lazy Exact Deduplication Jingwei Ma, Rebecca J. Stones , Yuxiang Ma, Jingui Wang, Junjie Ren, Gang

Lazy v. Yield Incremental, Linear Pretty-printing Oleg Kiselyov Simon Peyton-Jones Amr Sabry

Lazy Modules Keiko Nakata Institute of Cybernetics at Tallinn University of Technology

2020-2021 Reopening Plans July 15, 2020 High-performing A-rated District 1 High-performing

PERFORMING & PERFORMING & PROD PRODUCTION UCTION ARTS ARTS YEAR 1 1 Unit 1:

Develop A Peak Performing Value Proposition For Your _____ A. Develop A B. Develop A Peak

Persistency of Linear Programming Formulations for the Stable Set Problem guez-Heck 1 , Karl

Persistence Semantics for Weak Memory Integrating Epoch Persistency with the TSO Memory Model

Strand Persistency Vaibhav Gogte, William Wang $ , Stephan Diestelhorst $ , Peter M. Chen, Satish

Efficient Persist Barriers for Multicores Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, Stratis

A Persistent Friedman Lock-Free Queue Maurice Herlihy for Non-Volatile Memory Virendra

Persistent Memory Ordering Michael Swi6 Includes slides from

WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems Se Kwon Lee K. Hyun Lim 1 ,

What is Planning? Planning is the process of deciding exactly what you , your team or your

Transaction Logging Unleashed with NVRAM Tianzheng Wang Ryan Johnson No, Im not talking

EE 457 Unit 6c Control Hazards 2 Control Hazards Control (branch) hazards are named such

Introduction to OpenMP Lecture 8: Memory model Why do we need a memory model? On modern

Attack Directories, Not Caches: Side Channel Attacks in a Non-Inclusive World Mengjia Yan , Read

Lazy Persistency: a High-Performing and Write-Efficient Software - PowerPoint PPT Presentation

Lazy Persistency: a High-Performing and Write-Efficient Software Persistency Technique Mohammad Alshboul , James Tuck, and Yan Solihin Email : maalshbo@ncsu.edu ARPERS Research Group Introduction Future systems will likely include

Persistency Programming 101 Why and What of memory persistency

ROTORUA PERFORMING ARTS ACTIVATIONS ROTORUA PERFORMING ARTS ACTIVATIONS 2019 2019 IMAGE: IMAGE:

Maximum Persistency in Energy Minimization Alexander Shekhovtsov, TU Graz June 25, 2014 1/26

Can We Represent Infinite Lists? Lazy Evaluation Amtoft Motivation Lazy Lists Conversions

Imagine for a moment @trentmwillis Lazy Loading Engines: Anything But Lazy Engines allow

Welcome Performing Arts Academy Orientation Performing Arts Philosophy The Performing Arts

Lazy Exact Deduplication Jingwei Ma, Rebecca J. Stones , Yuxiang Ma, Jingui Wang, Junjie Ren, Gang

Lazy v. Yield Incremental, Linear Pretty-printing Oleg Kiselyov Simon Peyton-Jones Amr Sabry

Lazy Modules Keiko Nakata Institute of Cybernetics at Tallinn University of Technology

2020-2021 Reopening Plans July 15, 2020 High-performing A-rated District 1 High-performing

PERFORMING &amp; PERFORMING &amp; PROD PRODUCTION UCTION ARTS ARTS YEAR 1 1 Unit 1:

Develop A Peak Performing Value Proposition For Your _____ A. Develop A B. Develop A Peak

Persistency of Linear Programming Formulations for the Stable Set Problem guez-Heck 1 , Karl

Persistence Semantics for Weak Memory Integrating Epoch Persistency with the TSO Memory Model

Strand Persistency Vaibhav Gogte, William Wang $ , Stephan Diestelhorst $ , Peter M. Chen, Satish

Efficient Persist Barriers for Multicores Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, Stratis

A Persistent Friedman Lock-Free Queue Maurice Herlihy for Non-Volatile Memory Virendra

Persistent Memory Ordering Michael Swi6 Includes slides from

WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems Se Kwon Lee K. Hyun Lim 1 ,

What is Planning? Planning is the process of deciding exactly what you , your team or your

Transaction Logging Unleashed with NVRAM Tianzheng Wang Ryan Johnson No, Im not talking

EE 457 Unit 6c Control Hazards 2 Control Hazards Control (branch) hazards are named such

Introduction to OpenMP Lecture 8: Memory model Why do we need a memory model? On modern

Attack Directories, Not Caches: Side Channel Attacks in a Non-Inclusive World Mengjia Yan , Read

PERFORMING & PERFORMING & PROD PRODUCTION UCTION ARTS ARTS YEAR 1 1 Unit 1: