Distributed Shared Persistent Memory
(SoCC ’17)
Yizhou Shan, Yiying Zhang
Distributed Shared Persistent Memory (SoCC 17) Yizhou Shan, Yiying - - PowerPoint PPT Presentation
Distributed Shared Persistent Memory (SoCC 17) Yizhou Shan, Yiying Zhang Persistent Memory (PM/NVM) CPU Byte Addressable Persistent Cache Low Latency Capacity Cost effective PM DRAM 2 Many PM Work, but All in Single Machine
Yizhou Shan, Yiying Zhang
CPU Cache DRAM PM
2
– NV-Heaps [ASPLOS ’11], Mnemosyne [ASPLOS ’11] – Memory Persistency [ISCA ’14], Synchronous Ordering [Micro’16]
– BPFS [SOSP’09], PMFS [EuroSys’14], SCMFS [SC’11], HiNFS [EuroSys’16]
– NVWAL [ASPLOS’16], SCT/DCT [ASPLOS’16], Kamino-Tx [Eurosys’17]
3
4
How to Use PM in Distributed Environments?
5
Node 1 Node 2
6
8GB Main Memory Core Core Core Core Core Core
VM1 App1
4GB Main Memory Core Core Core Core
Container1
3GB Memory Core
App2
7
Resource Utilization in Production Clusters
Unused Resource + Waiting/Killed Jobs Because of Physical-Node Constraints
* Google Production Cluster Trace Data. “https://github.com/google/cluster-data”
8
Node 1 Node 2
9
Main Memory Core Core Core Core Core Core
VM1 App1
Main Memory Core Core Core Core
Container1
App2
Memory Core
App2
Purdue ECE WukLab
PowerGraph TensorFlow
10
Modern Datacenter Applications Have Significant Memory Sharing
11
➡Performance
➡Checkpointing
12
13
14
– Local or remote (transparent) – Pointers and in-memory data structures
15
16
– Local or remote (transparent) – Pointers and in-memory data structures
17
No redundant layers No data marshaling/unmarshalling
18
19
20
21
22
23
24
/* Open a dataset named 'boilermaker’ */ int fd = open(”/mnt/hotpot/boilermaker”, O_CREAT|O_RDWR);
/* map it to application’s virtual address space */
void *base = mmap(0, 40960, PROT_WRITE, MAP_PRIVATE, fd, 0); /* First access: Hotpot will fetch page from remote */ *base = 9; /* Later accesses: Direct memory load/store */ memset(base, 0x27, PAGE_SIZE); /* Commit data: making data coherent, durable, and replicated */ msync(sg_addr, sg_len, MSYNC_HOTPOT);
25
26
at different times
10/9/17
27
Node ¡2 Node ¡1
3 1 4 2 4 2 3 1 3 Node 2 accesses page 3
Exploit application behavior: Make data coherent only at commit points
28
29
Node 1 Node 2 Node 3 PM A CPU cache A’ CPU cache PM PM CPU cache A A’ A’
B’ B
C C’
30
10/9/17
31
32
Get Hotpot at: https://github.com/WukLab/Hotpot