Getting M g Mor
- re P
e Per erformance w ce with Pol
- lym
ymorphism sm f from
- m E
Emer erging g Mem emory T y Tec echnol
- logies
es
Iyswarya Narayanan, Aishwarya Ganesan, Anirudh Badam, Sriram Govindan, Bikash Sharma, Anand Sivasubramaniam
Getting M g Mor ore P e Per erformance w ce with Pol olym - - PowerPoint PPT Presentation
Getting M g Mor ore P e Per erformance w ce with Pol olym ymorphism sm f from om E Emer erging g Mem emory T y Tec echnol ologies es Iyswarya Narayanan, Aishwarya Ganesan, Anirudh Badam, Sriram Govindan, Bikash Sharma, Anand
Iyswarya Narayanan, Aishwarya Ganesan, Anirudh Badam, Sriram Govindan, Bikash Sharma, Anand Sivasubramaniam
2
Latency PDF Trimmed tail
High performance High Capacity
SSD SSD
Volatile and persistent accesses
3
In terms of their capacity needs for both volatile reads and persistent writes
0.2 0.4 0.6 0.8 Cloud Storage Map Reduce Search-Index Search-Serve Unique pages accessed within a time window Reads Writes
Write intensive Read intensive Read and write intensive
4
In terms of volume of read and write accesses
Search-Serve
Across different applications Temporally within same applications
How to effectively provision memory and storage resources for diverse cloud storage applications?
5
Volatile Persistent
Latency Cost Capacity
Low latency Low capacity High latency High capacity
6
Volatile Persistent In Function: Memory
In Latency: fast DRAM
In Capacity: Based on server SKU
Latency Cost Capacity
Low latency Low capacity High latency High capacity
7
Non-volatile Lower latencies w.r.t SSD
+
3D XPoint Compressed
Volatile Low capacity Low latency Persistent High capacity High latency Larger and flexible: capacity and latency
Latency Cost Capacity
Can we exploit these emerging memory technologies to overcome drawbacks of existing resources?
8
Persistent memory programming
Intrusive code changes to applications! Benefit volatile and persistent accesses
NVM based file systems
High NVM provisioning cost for entire storage needs! Intrusive code changes to OS and FS! No changes to applications Benefits reads or writes and not both! Low cost and transparent
Transparent cache (memory or storage)
volatile persistent Persistent write cache Volatile memory cache Memory extension (direct access via loads, stores) Transparent write cache above SSD (via block interface)
Can we exploit functional polymorphism knob?
10
5 10 15 50 100
Tail latency
% NVM used as write cache
Partitioning NVM between memory and storage reduces latency
MySQL TPC-C
dm-cache to use a part of NVM as write cache Rest – additional memory accessible via load/stores
What if the working set exceeds physical memory/write-cache capacity?
11
Access Latency Probability Application’s working set split between two fixed latency tiers
95th percentile DRAM
SSD
Tail latency is determined by the slowest tier
12
Access Latency Probability Tail latency reduces Application’s working set split between two fixed latency tiers Faster tier morphs to hold more working set
95th percentile DRAM
SSD
95th percentile
13
2 4 6 8 10 12 4096 2048 1024 512 Access Latency (us) Compressed Access Granularity (bytes) Write Access Read Access
Our goal: Effectively serve diverse cloud applications using polymorphic emerging memory based cache
200 400 600 800 1000 MapReduce SearchServe
% Increase in capacity
Much lower latency compare to SSD! 2X to 8X increase in effective capacity
NVM NVM
14
Unmodified Application DRAM ns us 100 us 10 us Memory SSD Storage Memory Interface Block Interface
Functional Polymorphism: Memory vs. Storage ?
1 1
Cloud applications are diverse: One partition size does not fit all!
NVM can be Battery-backed DRAM, 3D-Xpoint, etc.
15
Unmodified Application DRAM ns us 100 us 10 us Memory SSD Compressed Storage Memory Interface Block Interface Compressed
Functional Polymorphism: Memory vs. Storage ? Representational Polymorphism: Capacity vs. Latency ?
1 2 1 NVM NVM 2 2
We need to navigate performance trade-off across capacity, latency, and persistence dimensions!
NVM can be Battery-backed DRAM, 3D-Xpoint, etc.
16
What is the most significant bottleneck for a generic application with mixes of reads and writes ?
1 2 3 4 5 Reads Writes Latency (ms) Avg. 95
17
Persistent tier is much slower And, SSDs are asymmetric in their read/write latency
Read Misses Persistent writes DRAM Block File System
SSD
SSD reads SSD writes
Use BB-DRAM as Write-Cache to SSD
Read Misses Persistent writes DRAM Block File System
SSD
BB-DRAM Write Cache SSD Reads SSD Writes
DRAM Persistent writes Block File System
SSD
BB-DRAM Write Cache SSD Reads SSD Writes BB-DRAM Read Misses
As write-cache and memory extension
18
As write-cache
Read Misses Persistent writes DRAM Block File System
SSD
BB-DRAM Write Cache SSD Reads SSD Writes
How to apportion NVM capacity between memory and Storage functions?
Resource is byte addressable!
19
20
21
Incrementally repurpose Write-Cache blocks as memory pages to balance read/write performance.
22
DRAM Persistent writes Block File System
SSD
BB-DRAM Write Cache SSD Reads SSD Writes BB-DRAM Read Misses
Functional polymorphic cache Functional + Representational polymorphic cache
DRAM Block File System
SSD
BB-DRAM Write Cache SSD Reads SSD Writes BB-DRAM Read Misses Compressed-BB-DRAM Compressed BB-DRAM Persistent writes
No latency benefits by separating memory and storage functions!
23
DRAM Persistent writes Block File System
SSD
BB-DRAM Write Cache SSD Reads SSD Writes BB-DRAM Read Misses
Functional polymorphic cache Functional + Representational polymorphic cache
DRAM Block File System
SSD
BB-DRAM Write Cache SSD Reads SSD Writes BB-DRAM Read Misses Compressed-BB-DRAM Compressed BB-DRAM Persistent writes
No latency benefits by separating memory and storage functions!
DRAM Block File System
SSD
Battery-backed DRAM SSD Reads SSD Writes BB-DRAM Read Misses Shared-Compressed BB-DRAM Persistent writes
Shared compression layer reduces compute requirements too!
Shared compressed representation
24
capacity management
polymorphism On scheduling a new application On dynamic phase changes within an application
25
persist data to SSD in background
26
2 4 6 8 a b c d e f Mean Normalized Throughput wrt DRAM-Extension Write-Cache Functional Functional+Representational
2.5X 4.55X 5X
Addressing the most significant bottleneck improves performance by 2.5X Exploiting polymorphisms further improves performance by 70% and 90%
0.2 0.4 0.6 0.8 1 Normalized Update Latency wrt DRAM-Ext Write-Cache Functional
0.2 0.4 0.6 0.8 1 Normalize Read Latency wrt DRAM-Ext Functional+Representational
Functional polymorphism reduces write and read tail latency by 60% and 80% EMT based write cache reduces write and read tail latency by 30% and 40% Combining morphing reduces write and read tail latency by 85% and 78%
0.7 0.6 0.4 0.2 0.15 0.22
20 40 60 80 100 F-only F+R F-only F+R F-only F+R F-only F+R F-only F+R F-only F+R a b c d e f EMT alloction in % Persistent Compresed Volatile
PolyEMT benefits diverse cloud applications via careful apportioning of polymorphic cache across three dimensions!
To conclude,
performance per cost