Flash Storage Disaggregation
Ana Klimovic1, Christos Kozyrakis1,4, Eno Thereska3,5, Binu John2 and Sanjeev Kumar2
1 2 3 4 5
Flash Storage Disaggregation Ana Klimovic 1 , Christos Kozyrakis 1,4 - - PowerPoint PPT Presentation
Flash Storage Disaggregation Ana Klimovic 1 , Christos Kozyrakis 1,4 , Eno Thereska 3,5 , Binu John 2 and Sanjeev Kumar 2 2 3 1 5 4 Flash is underutilized Flash provides higher throughput and lower latency than disk PCIe Flash:
1 2 3 4 5
2
App Tier RAM Flash
NIC
App Tier Clients
TCP/IP
Datastore Service App Servers
Key-Value Store
get(k) put(k,val)
Applica(on Tier Datastore Tier
CPU
So9ware Hardware
3
get (k)
4
5
utilization
6
7
utilization
8
utilization
App Tier RAM Flash
NIC
App Tier Clients
TCP/IP
Datastore Service App Servers
Key-Value Store
get(k) put(k,val)
Applica(on Tier Datastore Tier
CPU
So9ware Hardware
9
App Tier RAM
NIC
App Tier Clients
TCP/IP
Datastore Service App Servers
get(k) put(k,val)
Applica(on Tier Datastore Tier
CPU
So5ware Hardware
Flash
NIC
iSCSI
CPU
RAMread(blk); write(blk,data)
Flash Tier
Key-Value Store Remote Block Service So5ware Hardware
Protocol
10
11
IOPS/TB IO size Read 2K – 10K 10KB – 50KB Write 100 – 1K 500KB – 2MB
12
RAM
NIC
TCP/IP
SSDB
server wrapper
CPU
Software Hardware
NIC RocksDB
Remote Block Service
Software Hardware
Protocol
load generator
13
RAM
NIC
TCP/IP
SSDB
server wrapper
CPU
Software Hardware
NIC RocksDB
Remote Block Service
Software Hardware
iSCSI
load generator
14
iSCSI is a standard network storage protocol that transports block storage commands over TCP/IP
RAM
NIC
TCP/IP
SSDB
server wrapper
CPU
Software Hardware
NIC RocksDB
Remote Block Service
Software Hardware
iSCSI
load generator
15
√ Transparent to application √ Runs on commodity network √ Scales datacenter-wide
4GB
10 Gb/E
TCP/IP
SSDB
server wrapper
load generator
6 cores
Software Hardware
Intel P3600 PCIe Flash
10Gb/E
iSCSI
RocksDB
Remote Block Service
Software Hardware
Measure round-trip latency
16
260µs
17
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 10 20 30 40 50 60 70 80 Client Latency (ms) QPS (thousands)
Local Flash iSCSI baseline (8 processes)
45% drop
18
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 10 20 30 40 50 60 70 80 Client Latency (ms) QPS (thousands)
Local Flash 6 iSCSI processes (optimal) 8 iSCSI processes (default) 1 iSCSI process
12%
19
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 10 20 30 40 50 60 70 80 Client Latency (ms) QPS (thousands)
Local Flash NIC offload iSCSI with 6 processes iSCSI baseline (8 processes)
8%
20
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 10 20 30 40 50 60 70 80 Client Latency (ms) QPS (thousands)
Local Flash Jumbo frame NIC offload iSCSI with 6 processes iSCSI baseline (8 processes)
10%
21
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 10 20 30 40 50 60 70 80 Client Latency (ms) QPS (thousands)
Local Flash Interrupt affinity Jumbo frame NIC offload iSCSI with 6 processes iSCSI baseline (8 processes)
4%
22
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 10 20 30 40 50 60 70 80 Client Latency (ms) QPS (thousands)
Local Flash Interrupt affinity Jumbo frame NIC offload iSCSI with 6 processes iSCSI baseline (8 processes)
42%
23
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 10 20 30 40 50 60 70 80 Client Latency (ms) QPS (thousands) local_avg remote_avg local_p95 remote_p95 20% drop
24
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 10 20 30 40 50 60 70 80 Client Latency (ms) QPS (thousands) local_avg remote_avg local_p95 remote_p95 10% drop
25
20% drop
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 20 40 60 80 100 120 140 Client Latency (ms) QPS (thousands) local_avg remote_avg local_p95 remote_p95
26
20% drop
25% drop @ tail
27
40% 30% 20% 10% 0%
Storage Capacity Scaling Factor Compute Intensity Scaling Factor
28
% cost benefit of disaggregation
40% 30% 20% 10% 0%
Storage Capacity Scaling Factor Compute Intensity Scaling Factor Balanced CPU & Flash utilization
% cost benefit of disaggregation
29
40% 30% 20% 10% 0%
Storage Capacity Scaling Factor Compute Intensity Scaling Factor Balanced CPU & Flash utilization Deploy more Flash servers than compute
30
% cost benefit of disaggregation
40% 30% 20% 10% 0%
Storage Capacity Scaling Factor Compute Intensity Scaling Factor Balanced CPU & Flash utilization
31
% cost benefit of disaggregation
32
40% 30% 20% 10% 0%
Storage Capacity Scaling Factor Compute Intensity Scaling Factor
% cost benefit of disaggrega1on
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 10 20 30 40 50 60 70 80 Client Latency (ms) QPS (thousands) local_avg remote_avg local_p95 remote_p95 10% drop
34
20% drop App Tier RAM
NIC
App Tier Clients
TCP/IP
Datastore Service App Servers
get(k) put(k,val)
Applica(on Tier Datastore Tier
CPU
So5ware Hardware
Flash
NIC
iSCSI
CPU RAMread(blk); write(blk,data)
Flash Tier
Key-Value Store Remote Block Service So5ware Hardware
Protocol
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 10 20 30 40 50 60 70 80 Client Latency (ms) QPS (thousands)
Local Flash Interrupt affinity Jumbo frame NIC offload iSCSI with 6 processes iSCSI baseline (8 processes)
42%
32
34
50 100 150 200 250 1 tenant 3 tenants 6 tenants IOPS (thousands)
Local Flash IOPS