 
              PrincetonUniversity From application requests to Virtual IOPs: Provisioned key-value storage with Libra David Shue * and Michael J. Freedman (*now at Google)
Shared Cloud Tenant C Tenant A Tenant B VM VM VM VM VM VM VM VM VM
Shared Cloud Storage Tenant C Tenant A Tenant B VM VM VM VM VM VM VM VM VM Key-Value Block Storage SQL Database Storage
Unpredictable Shared Cloud Storage Tenant C Tenant A Tenant B VM VM VM VM VM VM VM VM VM Key-Value Key-Value Block Storage SQL Database Storage Storage Disk IO-bound Tenants SSD-backed storage
Provisioned Shared Key-Value Storage Reservation A Reservation B Reservation C Tenant C Tenant A Tenant B VM VM VM VM VM VM VM VM VM Application Requests GET/s PUT/s (1KB normalized) Shared Key-Value Storage IOPs Low-level IO BW SSD SSD SSD SSD SSD
Libra Contributions • Libra IO Scheduler - Provisions low-level IO allocations for app-request reservations w/ high utilization. - Supports arbitrary object distributions and workloads. • 2 key mechanisms - Track per-tenant app-request resource profiles. - Model IO resources with Virtual IOPs. 6
Related Work Storage App- Work Conserving Media Type requests Maestro Block N N HDD mClock Block N Y HDD FlashFQ Block N Y SSD DynamoDB Key-Value Y N SSD 7
Provisioned Distributed Key-Value Storage Global Reservation Problem [Pisces OSD1 ’12] Reservation A Reservation B Tenant A Tenant B Tenant B VM VM VM VM VM VM VM VM VM Local Reservation Problem Storage Storage Shared Key-Value Storage ... Node 1 Node N
Provisioned Distributed Key-Value Storage Reservation A Reservation B demand Tenant A Tenant B VM VM VM VM VM VM data partitions Storage Storage ... Node 1 Node N 9
Provisioned Distributed Key-Value Storage Reservation A Reservation A Reservation B Reservation B Storage Storage ... Node 1 Node N 10
Provisioned Distributed Key-Value Storage Reservation B Reservation B Reservation A Reservation A Res A1 Res B1 Res An Res Bn Storage Storage ... Node 1 Node N 11
Provisioned Local Key-Value Storage GET 1001100 GET K Key-Value Protocol Retrieve K Persistence Engine Read l337 Libra IO Scheduler IO Scheduler IO operation Physical Disk 12
Libra Design Persistence Engine Reservation Distribution Policy blah PUT GET Libra How much IO How much IO to Libra IO Provisioning Scheduler to consume? provision? Policy DRR blah Physical Disk 13
Provisioning App-request Reservations is Hard 1 KB PUT ≥ 1 KB Write Track tenant app-request resource profiles IO Amplification IO Interference Underestimate provisionable IO Variable IO throughput Non-linear IO Model IO with Non-linear cost per KB Virtual IOPs Performance 14
Workload-dependent IO Amplification LevelDB (LSM-Tree) PUT K,V 3 PUT PUT write IO 30 IO Throughput (kop/s) V 2 V 3 25 FLUSH M A 20 index V 3 15 H V 10 index V 1 F O 5 COMPACT index V 0 0 1 4 8 1 3 6 1 K K K 6 2 4 2 K K K 8 B B B K B B B B index V 3 GET/PUT Request Size O A 15
Workload-dependent IO Amplification LevelDB (LSM-Tree) PUT K,V 3 PUT FLUSH write IO 30 PUT write IO IO Throughput (kop/s) V 2 V 3 25 FLUSH M A 20 index V 3 15 H V 10 index V 1 F O 5 COMPACT index V 0 0 1 4 8 1 3 6 1 K K K 6 2 4 2 K K K 8 B B B K B B B B index V 3 GET/PUT Request Size O A 16
Workload-dependent IO Amplification LevelDB (LSM-Tree) PUT K,V 3 PUT COMPACT read IO 30 COMPACT write IO IO Throughput (kop/s) V 2 V 3 FLUSH write IO 25 FLUSH PUT write IO M A 20 index V 3 15 H V 10 index V 1 F O 5 COMPACT index V 0 0 1 4 8 1 3 6 1 K K K 6 2 4 2 K K K 8 B B B K B B B B index V 3 GET/PUT Request Size O A 17
Workload-dependent IO Amplification GET K COMPACT read IO 30 COMPACT write IO IO Throughput (kop/s) FLUSH write IO 25 PUT write IO M A GET read IO 20 index 15 H V 10 index F O 5 index 0 1 4 8 1 3 6 1 K K K 6 2 4 2 K K K 8 G Z B B B K B B B B index K GET/PUT Request Size 18
Libra Tracks App-request IO Consumption to Determine IO Allocations IO Tenant A 500 IO/s blah Track IO Compute app-request Provision IO consumption IO profiles allocations Libra 5 5 GET 80 Provisioning GET 100 x Policy Per-GET 1 25 PUT = IO + 1 PUT x 6 Per-PUT FLUSH 50 0.5 PUT Tenant A 70 500 blah 5 IO units COMPACT FLUSH 19
Unpredictable IO Interference 4 read/4 write tenants 1:1 Pure Read/Pure Write Pct of Ideal Throughput 256 100 Write IOP Size (KB) Die-level parallelism, low 128 90 latency IOPs 64 32 80 Shared-controller and bus 16 70 contention 8 4 60 Erase-before-write 2 overhead 50 1 1 2 4 8 16 32 64 128 256 FTL and read-modify-write Read IOP Size (KB) garbage colleciton 20
Unpredictable IO Interference 1:1 Pure Read/Pure Write 75:25 Read/Write Ratio 100 256 256 128 128 90 64 64 80 32 32 16 16 Pct of Ideal Throughput 70 8 8 Write IOP Size (KB) 4 4 60 2 2 50 1 1 1 2 4 8 16 32 64 128 256 1 2 4 8 16 32 64 128 256 50:50 Read/Write Ratio 25:75 Read/Write Ratio 100 256 256 128 128 90 64 64 32 32 80 16 16 70 8 8 4 4 60 2 2 50 1 1 1 2 4 8 16 32 64 128 256 1 2 4 8 16 32 64 128 256 Read IOP Size (KB) 21
Libra Underestimates IO Capacity to Ensure Provisionable Throughput Provisionable IO throughput = floor(workloads) (18 Kop/s) 1 Pct of Read/Write Experiments Provisionable IO limit 0.8 0.6 0.4 75:25 Read/Write 0.2 50:50 Read/Write 25:75 Read/Write 1:1 Pure Read/Pure Write 0 15 20 25 30 35 40 45 Normalized IO Throughput 22
Libra Underestimates IO Capacity to Ensure Provisionable Throughput Provisionable IO throughput = floor(workloads) (18 Kop/s) 1 Pct of Read/Write Experiments Provisionable IO limit 0.8 0.6 75:25 Read/Write 75:25 � = 4K 0.4 75:25 � = 32K 75:25 � = 256K 0.2 50:50 Read/Write 25:75 Read/Write 1:1 Pure Read/Pure Write 0 15 20 25 30 35 40 45 Normalized IO Throughput 23
Libra Underestimates IO Capacity to Ensure Provisionable Throughput Provisionable IO throughput = floor(workloads) (18 Kop/s) 1 Pct of Read/Write Experiments Provisionable IO limit 0.8 0.6 0.4 75:25 � = 256K 0.2 50:50 � = 256K 25:75 � = 256K 1:1 Pure Read/Pure Write 0 15 20 25 30 35 40 45 Normalized IO Throughput 24
Non-linear IO Performance IO Bandwidth IOP Throughput 40 Max BW Max IOP/s 250 35 Bandwidth (MB/s) 200 30 IOP (kop/s) 25 150 20 100 15 10 50 5 0 0 1 2 4 8 16 32 64 128 256 1 2 4 8 16 32 64 128 256 IOP Size (KB) IOP Size (KB) 25
Non-linear IO Performance IO Bandwidth IOP Throughput 40 Max BW Max IOP/s 250 35 Bandwidth (MB/s) 200 30 IOP (kop/s) 25 150 20 100 15 Read Rand Read Seq 10 50 Write Rand 5 Write Seq 0 0 1 2 4 8 16 32 64 128 256 1 2 4 8 16 32 64 128 256 IOP Size (KB) IOP Size (KB) 26
Libra Uses Virtual IOPs to Model IO Resources Max-IOP VOP CPB (IOP-size) = Achieved-IOP(IOP-size) × IOP-size IOP Throughput at 1/2 Max VOPs Libra IO Cost Model 3.5 Read IO cost Virtual IOP Cost (op/KB) 3 Write IO cost Unifies IO cost into a single 2.5 metric 2 Captures non-linear IO 1.5 performance 1 Provides IO insulation 0.5 0 256 1 2 4 8 16 32 64 128 256 2 equal-allocation tenants IOP Size (KB) IO Insulation = 1/2 Max Read/Write 27
Libra Uses Virtual IOPs to Model IO Resources Max-IOP VOP CPB (IOP-size) = Achieved-IOP(IOP-size) × IOP-size IOP Throughput at 1/2 Max VOPs Libra IO Cost Model 40 3.5 Libra Read IO Model Read IO cost Virtual IOP Cost (op/KB) 35 3 Libra Write IO Model Write IO cost Max Read 30 2.5 Max Write IOPs (kop/s) 25 2 20 1.5 15 1 10 0.5 5 0 0 1 2 4 8 16 32 64 128 256 1 2 4 8 16 32 64 128 256 IOP Size (KB) IOP Size (KB) 2 equal-allocation tenants IO Insulation = 1/2 Max Read/Write 28
Libra Design Persistence Engine Update tenant VOP blah allocations Libra Libra IO Charge tenant IOPs Provision VOPs within Provisioning Scheduler based on VOP cost provisionable limit Policy blah Track app-request VOP consumption Physical Disk 29
Evaluation • Does Libra's IO resource model achieve accurate resource allocations? • Does Libra's IO threshold make an acceptable tradeoff of performance for predictability in a real storage stack? • Can Libra ensure per-tenant app-request reservations while achieving high utilization? 30
Libra Achieves Accurate IO Allocations Read-Write IOP Throughput Ratio 1.2 Read Tenants Interference-free Ideal Throughput Ratio 1 Write Tenants 0.8 even 0.6 0.4 0.2 0 W 1KB W 4KB W 8KB W 16KB W 32KB W 64KB W 128KB W 256KB Read 1 KB Throughput Ratio = Actual / Expected (IO Insulation) 31
Recommend
More recommend