Persistent Memory Architecture Research at UCSC – Workload Characterization and Hardware Support for Persistence
Jishen Zhao jishen.zhao@ucsc.edu Computer Engineering UC Santa Cruz July 12, 2016
Persistent Memory Architecture Research at UCSC Workload - - PowerPoint PPT Presentation
Persistent Memory Architecture Research at UCSC Workload Characterization and Hardware Support for Persistence Jishen Zhao jishen.zhao@ucsc.edu Computer Engineering UC Santa Cruz July 12, 2016 What is persistent memory? NVRAM
Jishen Zhao jishen.zhao@ucsc.edu Computer Engineering UC Santa Cruz July 12, 2016
2
NVRAM
2016 NVRAM
STT-RAM, PCM, ReRAM, NVDIMM, 3D Xpoint, etc.
3
[Condit+ SOSP’09, Volos+ ASPLOS’11, Coburn+ ASPLOS’11, Venkataraman+ FAST’11]
CPU DRAM Disk/Flash
Load/store Not persistent
Fopen(), fread(), fwrite(),… Persistent CPU NVRAM Load/store Persistent
4
5 Jeff Moyer, “Persistent memory in Linux,” SNIA NVM Summit, 2016.
mmap()
cases
design
persistence mechanisms
software and hardware
DRAM CPU NVRAM SSD/HDD System Software (VM, File System, Database System) Applications
ISA 6
software
system throughput
bottlenecks
memory hardware
insights at micro-architecture level
7
partitioned on DRAM (memmap)
8
allocated in DRAM and directly access NVRAM via loads and stores
through DRAM, i.e., directly swaps the pages between DRAM main memory and NVRAM storage.
9
evaluating file system performance)
getc()
(pmem) region 10
14000 15000 16000 17000 18000 19000 20000 21000 Fileserver Webproxy Webserver Varmail ext4 ext4-DAX NOVA Throughput (opera-ons per second)
1E+09 2E+09 3E+09 4E+09 5E+09 Execu&on &me in nanoseocnds NOVA EXT4-DAX EXT4
TAR UNTAR
20 40 60 80 100 120 NOVA EXT4-DAX EXT4 Transac6ons per ten seconds
TPC-C
11
0.5 1 1.5 dTLB miss rate iTLB miss rate LLC load miss rate LLC store miss rate Page fault rate Fileserver Webproxy Webserver Varmail Zip Unzip FFSB CorrelaFon Coefficient Highly correlated (standard error within 8%) 12
400000 800000 1200000 1600000 2000000 2400000 ext4 ext4-DAX NOVA Throughput (Transac>ons/s)
R=100%, W=0% R=90%, W=10% R=80%, W=20% R=70%, W=30% R=60%, W=40% R=0%, W=100%
0.0 0.5 1.0 1.5 2.0 2.5 3.0
putc() throughput Block writes Block create change rewrite getc() Efficient block reads Effec@ve random seek rate
ext4-dax ext4 nova
Bonnie (read:write = 1:1) FFSB
13
Normalized Throughput
13000 15000 17000 19000 21000 DRAM classic NVM model 50% 60% 70% 80% 90% ext4 ext4-DAX NOVA Buffer hit rate in revised NVRAM model TransacIons per second
13000 15000 17000 19000 21000 4KB DRAM 4KB 2KB 1KB 512B 256B ext4 ext4-DAX NOVA Buffer size in revised NVRAM model TransacHons per second
14
cases
design
persistence mechanisms
software and hardware
DRAM CPU NVRAM SSD/HDD System Software (VM, File System, Database System) Applications
ISA 15
database systems
wasted)
16
17
Root A B C D Log Root A B C D C’ D’
NVRAM
Memory Barrier
Root A B C D C’ D’
Size of one store
Zhao+, “Kiln: Closing the performance gap between systems with and without persistence support,” MICRO 2013. 18
flushing all needed for ensuring data persistence
19
flushing all needed for ensuring data persistence Hardware support for
20
the log – a software-allocated circular buffer
value, and redo cache line value
L1$ Core Processor Core
…
Last-level Cache L1$
… …
Memory Controllers Cache Controllers DRAM NVRAM Log (circular buffer) Log Buffer Cache line size
hit
Core A’1 L1$ 1 Log Buffer (FIFO) A’1 A1 A’2 A2 2 2 NVRAM (Nonvolatile) Log (circular buffer) Processor (Volatile) Bypass Caches 3 Tx_commit 5 A1
(b)
4 TxID, addr(A) ze Cache line size
L1 cache hit – we get all that needed for undo+redo log
22
the log – a software-allocated circular buffer
value, and redo cache line value
L1$ Core Processor Core
…
Last-level Cache L1$
… …
Memory Controllers Cache Controllers DRAM NVRAM Log (circular buffer) Log Buffer Cache line size Write-allocate Lower-level$ Core A1 A’1 miss 1 L1$ 2 2 Hit in a lower-level cache 2 Bypass Caches Log Buffer (FIFO) NVRAM (Nonvolatile) Log (circular buffer) 3 Tx_commit 5
(c)
4 TxID, addr(A) Cache line size A’1 A1 A’2 A2
L1 cache miss – we get all that needed during “write-allocate”
23
head tail
Circular Log Buffer
24
25
cases
design
persistence mechanisms
software and hardware
DRAM CPU NVRAM SSD/HDD System Software (VM, File System, Database System) Applications
ISA
26
27
Email: Jishen.zhao@ucsc.edu https://users.soe.ucsc.edu/~jzhao/
Jishen Zhao jishen.zhao@ucsc.edu Computer Engineering UC Santa Cruz July 12, 2016