A Framework for Emulating Non-Volatile Memory Systems with Different - PowerPoint PPT Presentation

A Framework for Emulating Non-Volatile Memory Systems with Different Performance Characteristics Dipanjan Sengupta 1,2 , Qi Wang 1,3 , Haris Volos 1 , Lucy Cherkasova 1 , Guilherme Magalhaes 1 , Jun Li 1 , Karsten Schwan 2 1 Hewlett-Packard Labs 1 Georgia Institute of Technology 3 The George Washington University 1

New HP Labs Project “The Machine” aims to create a new computing architecture Shift to non-volatile, byte-addressable memory (e.g., Phase- Change Memory and Memristor) NVM is not commercially available yet Future NVM technologies will offer a variety of different performance characteristics (2-10 slower than DRAM) It is extremely difficult to predict and optimize performance of complex application’s on a future hardware: Which ranges of latencies and bandwidth are critical for achieving good performance and scalability of different application groups? 2 2

Future machines will have both DRAM and NVM There are many open questions: Shall we consider DRAM as a caching layer for NVM? Shall we build systems with two types of memory: fast DRAM and slower NVM? What are the strategies for efficient data placement?

Goal: Build a performance emulator for NVM using a commodity hardware (DRAM) Two performance knobs for NVM emulation: bandwidth latency Additional challenge and goal: Validation experiments to check correctness and accuracy of the emulation platform 4

Throttling memory bandwidth using hardware- based approach Thermal Control Registers in recent Intel-based processors Separate knobs for controlling memory read and write bandwidth 5

Commodity hardware doesn’t support hardware mechanism to control the memory latency Challenges for software-based latency emulation Cannot instrument every memory reference from the application due to high overhead Not all memory references access main memory They might be cached by private and shared last level cache Memory level parallelism in modern processors 6

Software-based approach: Modeling average application perceived memory latency to be close to target NVM latency Injecting software created delays at epochs granularity Performance-counters based memory model Epoch 1 Epoch 2 Memory Accesses Δ 1 Δ 2 7

A very simple memory model for computing the additional delay for a given epoch i is the following: where M i is the number of memory references for a given epoch i 8

Modern processors take advantage of memory level parallelism (MLP) Latency emulation model needs to be MLP aware 9

We have implemented this model and prototyped our emulator for two popular Intel processor families: Sandy Bridge Ivy Bridge 10

How to evaluate correctness and accuracy of the proposed model and its implementation? • Exploit that memory access latencies are different across different NUMA domains in a multi-socket machine: • Access latency to local node and remote memory: • Ivy Bridge: 87 ns and 176 ns • Sandy Bridge: 97 ns and 162 ns D D CPU0 CPU1 CPU0 CPU1 R R QPI A A CPU2 CPU3 CPU2 CPU3 M M Local node Remote node 11

Emulating Remote DRAM Latency E M U CPU0 CPU1 CPU0 CPU1 D D L R QPI R A A A CPU3 CPU3 T CPU2 CPU2 M M I O Local node Remote node N V Executing on Remote DRAM A L I CPU0 CPU1 CPU0 CPU1 D D D R QPI R A A A T CPU2 CPU3 CPU2 CPU3 M M I O Local node Remote node N 12

LLC miss per 1000 instructions signifies the memory intensity of an application Average emulation error across various memory and compute intensive SPEC benchmarks is 1.8 % 13

• This work: proof of concept for single-threaded applications • Combination of hardware- and software-based online approach for NVM emulation • Ongoing work: emulation support for multi- threaded applications • Support for NVM and DRAM using the same platform 14

Thank you! Questions? 15 15

A Framework for Emulating Non-Volatile Memory Systems with Different - PowerPoint PPT Presentation

A Framework for Emulating Non-Volatile Memory Systems with Different Performance Characteristics Dipanjan Sengupta 1,2 , Qi Wang 1,3 , Haris Volos 1 , Lucy Cherkasova 1 , Guilherme Magalhaes 1 , Jun Li 1 , Karsten Schwan 2 1 Hewlett-Packard Labs

Encrypted Non-volatile Main Memory Systems Yu Hua Huazhong University of Science and Technology

Soft Updates Made Simple and Fast on Non-volatile Memory Mingkai Dong , Haibo Chen Institute of

Object-Oriented Recovery for Non-volatile Memory Nachshon Cohen, David Aksun, James Larus EPFL 10

Architectural Support for Atomic Durability in Non-Volatile Memory Arpit Joshi , Vijay Nagarajan,

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

A Write-friendly Hashing Scheme for Non-volatile Memory Systems Pengfei Zuo and Yu Hua Huazhong

Managing Non-Volatile Memory in Database Systems A review by Apaar Shanker DATA ANALYTICS

HOOP: Efficient Hardware-Assisted Out-of-Place Update for Non-Volatile Memory Miao Cai Chance

A Persistent Friedman Lock-Free Queue Maurice Herlihy for Non-Volatile Memory Virendra

Storage Class Memory Towards a disruptively low-cost solid-state non-volatile memory Science

NOVA-Fortis: A Fault-Tolerant Non- Volatile Main Memory File System Jian Andiry Xu, Lu Zhang ,

Rethinking Applications in the NVM Era Amitabha Roy ex- Intel Research NVM = Non Volatile

A Persistent Lock- Free Queue for Maurice Herlihy Non-Volatile Virendra Memory (PPoPP18)

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

SecPM: a Secure and Persistent Memory System for Non-volatile Memory Pengfei Zuo, Yu Hua Huazhong

NFSv4 ID Status Spencer Shepler shepler@eng.sun.com ID Updates (0406) / Definition

Introduction to Computers and Programming Hardware Software Computer languages

Cyber-Physical Systems Memory Architecture ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Role of

Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Onkar Patil 1 , Saurabh

Reconnecting Exchange Rate and the General Equilibrium Puzzle by Yu-chin Chen, Ippei Fujiwara and

CPL 2016, week 1 Java threads and inter-thread visibility Oleg Batrashev Institute of Computer

An out-of-order thread-local semantics for something like volatile relaxed atomics in C and the

Cryptographic software engineering, part 1 Daniel J. Bernstein This is easy, right? 1. Take