NATIVE OS SUPPORT FOR PERSISTENT MEMORY WITH REGIONS Mohammad - - PowerPoint PPT Presentation

native os support for persistent memory with regions
SMART_READER_LITE
LIVE PREVIEW

NATIVE OS SUPPORT FOR PERSISTENT MEMORY WITH REGIONS Mohammad - - PowerPoint PPT Presentation

NATIVE OS SUPPORT FOR PERSISTENT MEMORY WITH REGIONS Mohammad Chowdhury (mchow017@fiu.edu) Raju Rangaswami (raju@cs.fiu.edu) Florida International University PERSISTENT MEMORY (PM) Hybrid characteristics of memory and storage Memory


slide-1
SLIDE 1

Mohammad Chowdhury (mchow017@fiu.edu) Raju Rangaswami (raju@cs.fiu.edu) Florida International University

NATIVE OS SUPPORT FOR PERSISTENT MEMORY WITH REGIONS

slide-2
SLIDE 2

PERSISTENT MEMORY (PM)

5/19/2017

2

 Hybrid characteristics of memory and storage

Memory

  • Volatile
  • Byte-addressable access
  • Fast

Storage

  • Non-volatile/Persistent
  • Block I/O access
  • Slow

Persistent Memory

  • Non-Volatile/Persistent
  • Byte-addressable access
  • Fast

Read/Write latency: 4X-10X

  • f memory
slide-3
SLIDE 3

PM CHALLENGES

5/19/2017

3

 PM is directly accessible by CPU  BUT …

 Caches and Memory controller sit between PM and CPU  Caches write dirty pages to DRAM/PM according to cache eviction policy  Memory Controller optimizes performance by reordering the updates

PM resident data can be corrupted after a system failure if ordering of updates is violated

slide-4
SLIDE 4

PM CHALLENGES: THE COSTS OF ORDERING

5/19/2017

4

  • Ordering requires cache line flushes, barriers, and

ADR (asynchronous DRAM refresh)

  • Increased cost of operations
  • More redundant metadata  More ordering required
  • GOAL
  • Reduce ordering requirements
slide-5
SLIDE 5

PM CHALLENGES: ATOMIC DATA DURABILITY

5/19/2017

5

P1 P2 P3 PM t0 t1 t3 t2 Final Version P1(x) P2(x2) P3(x3) 2 4 8 P3(msync) Memory Null PM 4 All good!! “4” shouldn’t be here Requirements:

  • 1. Make data atomically durable (ALL or NONE)
  • 2. Revert back to initial state in case of failure
slide-6
SLIDE 6

PM OPPORTUNITIES: SHARED CONSISTENCY

5/19/2017

6

P2 DAX/Regular MMAP P1 PM

  • MAP_SHARED
  • Updates immediately reflected in process

address spaces

  • NOT atomically durable

NOVA ATOMIC MMAP PM

  • MAP_ATOMIC  MAP_PRIVATE
  • Updates only visible to the process
  • Atomically durable
  • Forfeits sharing/cache coherency support

P2 P1

Private copy

Requirements:

  • 1. Updates should be visible to all the shared processes
  • 2. Should support atomic durability of all updates across a shared region

Cache coherent visibility

slide-7
SLIDE 7

PM OPPORTUNITIES: SIMPLE MEMORY-LIKE TRANSACTIONS

5/19/2017

7

Program A Allocate persistent Obj1; Allocate persistent Obj2; Begin Transaction Obj1 operations End transaction Begin Transaction Obj2 operations End transactions Program B A = mmap(PM); Allocate objects Obj1,Obj2 from mapped area Operations involving Obj1, Obj2. Sync() More Operations on both Obj1, Obj2 Sync()

Programmers 1. Must track all updates to persistent

  • bjects

2. Must annotate individual transactions Programmers simply call Sync() to persist all updates in a mapped area

slide-8
SLIDE 8

APPLICATIONS REQUIREMENTS FOR PM

5/19/2017

8

Persistent Namespace Consistent Sharing Support Mapped Data Consistency Arbitrary & Unordered Allocation Simple Memory Like Transactions PM Based Application

slide-9
SLIDE 9

CONTEMPORARY SOLUTIONS

5/19/2017

9

DAX File Systems Memory Subsystem Persistent Heaps Regular File Sys. Atomic Msync Replication NOVA, EXT4- DAX, PMFS Failure Atomic Msync (EXT4-JBD) EXT4, BTRFS, etc. Mnemosyne NV-Heaps LibpmemObj OS Mojim RDMA

slide-10
SLIDE 10

Mapped Data Consistency (Partial)

CONTEMPORARY SOLUTIONS

5/19/2017

10

DAX File Systems Memory Subsystem Persistent Heaps Regular File Sys. Atomic Msync Replication Arbitrary and Unordered Allocation Mapped Data Consistency Persistent Namespace Consistent Sharing Support Simple Memory Like Transactions Region System

slide-11
SLIDE 11

REGION SYSTEM

5/19/2017

11

We present “Region System”, a kernel subsystem, to support persistent memory to achieve the following goals:

  • Minimize unwanted latency in the persistent memory

access path;

  • Provide users with direct and consistent access to

shared persistent memory; and

  • Demonstrate modifications of the existing

applications for optimized usage.

slide-12
SLIDE 12

REDEFINED OS MEMORY/STORAGE STACK

5/19/2017

12

NOT intended as replacement for File Systems or Memory Subsystem RS should serve as a core “Persistent Memory Support System” usable by applications, file systems, and other kernel subsystems.

slide-13
SLIDE 13

ARCHITECTURE

5/19/2017

13

Region: Collection of persistent pages PPAGES: 4KB PM pages

slide-14
SLIDE 14

CONSISTENCY STATES

5/19/2017

14

Current Snapshot State No Ppage y Invalid – There can not be a snapshot without current x Un-synced page, mapped to the address space x y x == y, page in synced state x != y, page in unsynced state, “y” is the consistent version

slide-15
SLIDE 15

REGION SYSTEM (RS) INTERFACE

5/19/2017

15

Class System Call Namespace region_d open (char region_name, flags f) int close (region_d rd ) int delete (region_d rd) Allocation ppage_no alloc_ppage (region_d rd) int free_ppage (region_d rd, ppage_no ppn) Mapping & Consistency vaddr pmmap(vaddr va, region_d rd, ppage_no, int nbytes, flags f) int pmunmap(vaddr va) pmsync(vaddr va)

slide-16
SLIDE 16

METADATA OPERATIONS

5/19/2017

16

  • Persistent Operations
  • Modifies persistent metadata
  • Volatile Operations
  • No updates to persistent metadata
  • Persistent operations are designed to achieve atomic

durability

slide-17
SLIDE 17

METADATA OPERATION COMPARISON

5/19/2017

17

Persistent Operations 2.8x 2.2x 1.1x 1.25x 2.3x Volatile Operations

slide-18
SLIDE 18

MAPPED DATA CONSISTENCY CHALLENGES

5/19/2017

18

  • Avoid Unwanted Durability
  • Applications want to make updates durable only updates

a msync() invocation.

  • Updates are made durable in PM before a msync call.
  • In case of a failure, the mapped PM area will contain

uncommitted data.

  • Protecting the Sync
  • During sync operation no applications should be allowed

to write to mapped PM  difficult to achieve due to direct CPU access.

slide-19
SLIDE 19

ATOMIC DURABILITY WITH PMSYNC

5/19/2017

19

  • 1. Identify the dirty pages
  • 2. Write protect the pages
  • 3. Flush dirty cache lines
  • 4. Copy-on-write protection for future writes to

a sync’ed page

slide-20
SLIDE 20

AVOIDING COW PROPAGATION

5/19/2017

20

1 2 4 3 5 8 6 7 10 9 1 2 4 7 9 Conventional CoW Copy-on-write for page 9 1 2 5 4 9 3 c s c s c s 10 6 7 8 9 Region System CoW Copy-on-write for page 9

slide-21
SLIDE 21

PMSYNC EXAMPLE

5/19/2017

21

c s c s c s rs_root rnode: B rnode: A c s c s c s s c s c s c s c s c s 1 2 3 4 5 6 1 2 4 5 6 PM Volatile

cache

CPU 1 CPU 2 mm

vma vma

E3 E6 E8 E7 E9 EE F0 F2 E2

Task Z

tlb mmu

mm

vma vma

Task Y

Page tables

  • 1. IPI
  • 2. Write

Protect E2

  • 2. Wait for

CPU 1

  • 3. IPI returns
  • 4. Flush

Cache line for E2

  • 5. PMSYNC_IN_PROGRESS
  • 6. Change s
  • 7. PMSYNC_COMPLETE

PMSYNC A Locked

3

c

slide-22
SLIDE 22

PMSYNC COMPARISON WITH EXT4-DAX

5/19/2017

22

Latency (μs) File/Region size

slide-23
SLIDE 23

LIBPMEM-REGION

5/19/2017

23

Non-transactional pmem-flush All or None policy does not work A portion of the updates can be lost

Outcome

  • 1. Add atomic durability guarantee to libpmem
  • 2. Reduce risk factor for libraries built on top of libpmem
slide-24
SLIDE 24

LIBPMEM COMPARISONS

5/19/2017

24

slide-25
SLIDE 25

LIBPMEM COMPARISONS

5/19/2017

25

slide-26
SLIDE 26

SUMMARY

5/19/2017

26

  • Region System Features
  • Provides arbitrary and unordered allocation and de-

allocation

  • Minimizes ordering requirements by eliminating

redundancy

  • Provides transparent sharing and atomic durability of

mapped data with competitive performance

  • Usable by File systems, Applications, Libraries, and
  • ther kernel subsystems or modules.
  • Source code will be made public soon!
slide-27
SLIDE 27

QUESTIONS?

Thanks!

5/19/2017

27