distributed shared persistent memory
play

Distributed Shared Persistent Memory (SoCC 17) Yizhou Shan, Yiying - PowerPoint PPT Presentation

Distributed Shared Persistent Memory (SoCC 17) Yizhou Shan, Yiying Zhang Persistent Memory (PM/NVM) CPU Byte Addressable Persistent Cache Low Latency Capacity Cost effective PM DRAM 2 Many PM Work, but All in Single Machine


  1. Distributed Shared Persistent Memory (SoCC ’17) Yizhou Shan, Yiying Zhang

  2. Persistent Memory (PM/NVM) CPU Byte Addressable Persistent Cache Low Latency Capacity Cost effective PM DRAM 2

  3. Many PM Work, but All in Single Machine • Local memory models – NV-Heaps [ASPLOS ’11] , Mnemosyne [ASPLOS ’11] – Memory Persistency [ISCA ’14] , Synchronous Ordering [Micro’16] • Local file systems – BPFS [SOSP’09] , PMFS [EuroSys’14] , SCMFS [SC’11] , HiNFS [EuroSys’16] • Local transaction/logging systems – NVWAL [ASPLOS’16] , SCT/DCT [ASPLOS’16] , Kamino-Tx [Eurosys’17] 3

  4. Moving PM into Datacenters • PM fits datacenter - Applications require a lot memory - and accessing persistent data fast - with low monetary cost • Challenges - Handle node failure - Ensure good performance and scalability - Easy-to-use abstraction 4

  5. How to Use PM in Distributed Environments? • As distributed memory? • As distributed storage? • Mojim [Zhang etal., ASPLOS’15] - First PM work in distributed environments - Efficient PM replication - But far from a full-fledged distributed NVM system 5

  6. Resource Allocation in Datacenters VM1 App1 Container1 App2 Core Core Core Core Core Core Core Core Core Core Core 3GB 4GB Main Memory 8GB Main Memory Memory Node 1 Node 2 6

  7. Resource Utilization in Production Clusters * Google Production Cluster Trace Data. “https://github.com/google/cluster-data” Unused Resource + Waiting/Killed Jobs Because of Physical-Node Constraints 7

  8. Q1: How to achieve better resource utilization? Use remote memory 8

  9. Distributed (Remote) Memory VM1 App1 Container1 App2 App2 Core Core Core Core Core Core Core Core Core Core Core Memory Main Memory Main Memory Node 1 Node 2 9

  10. Modern Datacenter Applications Have Significant Memory Sharing • ¡ PowerGraph TensorFlow 10 Purdue ECE WukLab

  11. Q2: How to scale out parallel applications? Distributed shared memory 11

  12. What about persistence? • Data persistence is useful - Many existing data storage systems ➡ Performance - Memory-based, long-running applications ➡ Checkpointing 12

  13. Q3: How to provide data persistence? 13

  14. DSM Distributed Shared Persistent Memory (DSPM) 
 a significant step towards using PM in datacenters 14

  15. DSPM • Native memory load/store interface – Local or remote (transparent) – Pointers and in-memory data structures • Supports memory read/write sharing 15

  16. DSM Distributed Shared Persistent Memory (DSPM) 
 a significant step towards using PM in datacenters 16

  17. DSPM • Memory load/store interface – Local or remote (transparent) DSPM: One Layer Approach – Pointers and in-memory data structures (Distributed) ¡Memory Benefits of both memory and storage • Supports memory read/write sharing No redundant layers No data marshaling/unmarshalling • Persistent naming • Data durability and reliability (Distributed) ¡Storage 17

  18. Hotpot : 
 A Kernel-Level RDMA-Based 
 DSPM System • Easy to use • Native memory interface • Fast, scalable • Flexible consistency levels • Data durability & reliability 18

  19. Hotpot Architecture 19

  20. Hotpot Architecture 20

  21. Hotpot Architecture 21

  22. Hotpot Architecture 22

  23. Hotpot Architecture 23

  24. Hotpot Architecture 24

  25. Hotpot Code Example /* Open a dataset named 'boilermaker’ */ int fd = open(”/mnt/hotpot/boilermaker”, O_CREAT|O_RDWR); / * map it to application’s virtual address space */ void *base = mmap(0, 40960, PROT_WRITE, MAP_PRIVATE, fd, 0); /* First access: Hotpot will fetch page from remote */ *base = 9; /* Later accesses: Direct memory load/store */ memset(base, 0x27, PAGE_SIZE); /* Commit data: making data coherent, durable, and replicated */ msync(sg_addr, sg_len, MSYNC_HOTPOT); 25

  26. How to efficiently add P to “DSM”? • Distributed Shared Memory - Cache remote memory on-demand for fast local access - Multiple redundant copies • Distributed Storage Systems - Actively add more redundancy to provide data reliability Integrate two forms of redundancy with morphable page states One Layer Principle 26

  27. Morphable Page States • A PM page can serve different purposes, possibly at different times • as a local cached copy to improve performance • as a redundant data page to improve data reliability Node 2 accesses page 3 Node ¡1 Node ¡2 3 3 1 4 2 2 4 1 3 10/9/17 27

  28. How to efficiently add P to “DSM”? • When to make cached copies coherent? • When to make data durable and reliability? • Observations - Data-store applications have well-defined commit points - Commit points: time to make data persistent - Visible to storage devices => visible to other nodes Exploit application behavior: Make data coherent only at commit points 28

  29. Commit Point CPU cache CPU cache CPU cache A’ B’ A C’ PM PM PM A B C A’ A’ Node 1 Node 2 Node 3 • durable • coherent • reliable • single-node and distributed consistency • two consistency modes: single/multiple writer 29

  30. Flexible Coherence Levels • Multiple Reader Multiple Writer ( MRMW ) - Allows multiple concurrent dirty copies - Great parallelism, but weaker consistency - Three-phase commit protocol • Multiple Reader Single Writer ( MRSW ) - Allows only one dirty copy - Trades parallelism for stronger consistency - Single phase commit protocol 30

  31. MongoDB Results • Modify MongoDB with ~120 LOC, use MRMW mode • Compare with tmpfs , PMFS , Mojim , Octopus using YCSB 10/9/17 31

  32. Conclusion • One layer approach: challenges and benefits • Hotpot: a kernel-level RDMA-based DSPM system • Hide complexity behind simple abstraction • Calls for attention to use PM in datacenter • Many open problems in distributed PM! 32

  33. Thank You Questions? Get Hotpot at: https://github.com/WukLab/Hotpot wuklab.io

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend