HotStorage '20 JULY 13-14, 2020 SplitKV: Splitting IO Paths for - PowerPoint PPT Presentation

HotStorage '20 JULY 13-14, 2020 SplitKV: Splitting IO Paths for Different Sized Key- Value Items with Advanced Storage Devices Shukai Han, Dejun Jiang, Jin Xiong Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences

HotStorage '20 Outline ü Background & Motivation • Design • Evaluation • Conclusion 2

HotStorage '20 Key-Value Store • Key-Value (KV) stores are widely deployed in data centers • The sizes of KV items vary from a couple of bytes to hundreds of kilobytes – Facebook's analysis on Memcached's workload found that more than 80% of requests are less than 500B in size [1] . – The workload data on a typical day in Baidu: over 90% of requests are over 128KB in size [2] . [1] Berk, SIGMETRICS '2012 [2] Lai, MSST '2015 3

HotStorage '20 Conventional Storage Device based KV Store Log-Structured Merge (LSM) Tree based KV Store 1.write Write Buffer DRAM 2.flush SSD Table Table 3.compaction Level 0 Table Table Table ... Level 1 ... ... ... Table Table Table ... Level n Conventional Storage Devices • Block access Log Structured Merge Tree is widely adopted in KV • Low random access performance stores to convert random writes to sequential writes. 4

HotStorage '20 Advanced Storage Device based KV Store • KVell [3] builds low CPU overhead Key-Value Store Based on Modern SSDs Optane SSD • Some works [4] based on the low latency characteristics of PM, in which persistent buffers are built to reduce the logging overhead. Optane DC write PMM Persistent Write Buffer flush PM Advanced Storage Devices SSD • PM:Byte access SSD Store • SSD: Block access • High random access performance [3] Lepers, SOSP'19 [4] Kannan, ATC'18 5

HotStorage '20 Motivation Random Write 64B 256B 1KB 4KB 16KB 64KB 256KB 1MB 4MB 16MB Optane SSD 14.09 14.09 14.09 14.09 21.44 45.79 145.58 532 2091 8223 P3700 Optane DC 0.18 0.20 0.43 1.05 3.90 15.50 61.88 247 1440 6840 PMM Ratio 79.2 70.5 33.0 13.4 5.5 2.9 2.4 2.2 1.45 1.2 • PM is friendly to small KV items • NVM based SSD is friendly to large KV items without suffering from random access cost ? 6

HotStorage '20 Outline • Background & Motivation ü Design • Evaluation • Conclusion 7

HotStorage '20 SplitKV Overview Key idea: Splitting IO Path for small/large KV items KV items large KV items small KV items large KV items directly write batch write … ust_4KB ust_16KB small KV items store global index NVM based SSD Persistent Memory 8

HotStorage '20 SplitKV Overview Reclaim PM space select & sort flush sort table (st) st_3 … st_2 st_1 ust_4KB ust_16KB small KV items store global index NVM based SSD Persistent Memory [5] Hwang, FAST'16 9

HotStorage '20 SplitKV Overview Global index [5] B+Tree (FAST-FAIR) index index st_1 … st_2 st_3 ust_4KB ust_16KB small KV items store global index NVM based SSD Persistent Memory [5] Hwang, FAST'16 10

HotStorage '20 Design challenges Challenge 1: How to decide the size boundary of KV items? KV items small KV items large KV items Persistent Memory NVMe SSD Challenge 2: How to handle the migration of small items? 11

HotStorage '20 Size Boundary of KV Items IO Path 1 : KV is written to PM and then Access Size 256B 1KB 4KB 16KB migrated to SSD through a background IO Path 1 1.5 4.5 15.7 27.6 thread. IO Path 2 : KV is directly written to SSD. IO Path 2 23.4 25.4 14.8 21.3 Ratio 15.8 5.7 0.9 0.8 KV items Write latencies (us) of different IO path 1 2 • When the KV item size is large, the data is written directly to the Persistent Memory SSD for better performance. • Any KV pair whose size is equal 1 to or greater than 4 KB is considered to be large one. NVMe SSD 12

HotStorage '20 Hotness-aware KV Migration Average Weight = 3 Key2 Key:4 Key:5 Key:3 Key:6 Key:1 1 Weight:5 Weight:2 Weight:3 Weight:4 Weight:3 Weight:1 select flush Key:1 Key:4 Key:5 Key:6 Weight:1 Weight:2 Weight:3 Weight:3 batch sort table (st) Average Weight = 1.5 Key:2 Key:3 2 Weight:2 Weight:1 13

HotStorage '20 Outline • Background & Motivation • Design ü Evaluation • Conclusion 14

HotStorage '20 Experiment Setup • System and hardware configuration – Server equipped with two Intel Xeon Gold 5215 CPU (2.5GHZ) – 64GB memory, one Intel Optane SSD P4800 and one Intel Optane DC PMM – CentOS Linux release 7.6.1810 with 4.18.8 kernel Workload Description A 50% reads and 50% updates • Compared systems B 95% reads and 5% updates – RocksDB 、 NoveLSM[4] 、 KVell[3] C 100% reads D 95% reads for latest keys and 5% inserts • Workload E 95% scan and 5% inserts – YCSB with zipfan and unifrom skew F 50% reads and 50% read-modify-writes – Each workload handles 128 GB data set [3] Lepers, SOSP'19 – 50% of the KV items are 256B/4KB in size [4] Kannan, ATC'18 15

HotStorage '20 Average Latency with Single Thread (Zipfan) zipfan A B C D E F 48.35 34.89 30.52 32.28 445.83 72.57 NoveLSM 17.47 21.82 21.72 21.13 497.02 35.19 RocksDB 11.76 8.60 8.64 9.20 609.38 14.12 KVell 3.81 4.65 4.56 4.56 306.65 5.05 SplitKV For workloads A and F, SplitKV reduces latency by 14.4x, 6.9x, and 3.1x compared to NoveLSM, RocksDB and KVell under zipfan workloads. 16

HotStorage '20 Average Latency with Single Thread (Zipfan) zipfan A B C D E F 48.35 34.89 30.52 32.28 445.83 72.57 NoveLSM 17.47 21.82 21.72 21.13 497.02 35.19 RocksDB 11.76 8.60 8.64 9.20 609.38 14.12 KVell 3.81 4.65 4.56 4.56 306.65 5.05 SplitKV For read-intensive workloads B, C and D, SplitKV and KVell achieved better performance than NoveLSM and RocksDB due to the adoption of the global B+-Tree index. 17

HotStorage '20 Average Latency with Single Thread (Zipfan) zipfan A B C D E F 48.35 34.89 30.52 32.28 445.83 72.57 NoveLSM 17.47 21.82 21.72 21.13 497.02 35.19 RocksDB 11.76 8.60 8.64 9.20 609.38 14.12 KVell 3.81 4.65 4.56 4.56 306.65 5.05 SplitKV For workload E, KVell does not sort small KV items in SSD. This introduces read amplification to KVell when serving scan query by reading a plenty of blocks. 18

HotStorage '20 Average Latency with Single Thread (Zipfan .vs Uniform) zipfan A B C D E F NoveLSM 48.35 34.89 30.52 32.28 445.83 72.57 RocksDB 17.47 21.82 21.72 21.13 497.02 35.19 KVell 11.76 8.60 8.64 9.20 609.38 14.12 SplitKV 3.81 4.65 4.56 4.56 306.65 5.05 uniform A B C D E F NoveLSM 96.69 69.77 61.04 64.56 476.19 145.14 RocksDB 21.11 26.13 26.08 25.89 529.10 43.27 KVell 17.86 14.02 13.31 13.80 670.69 23.09 SplitKV 8.81 12.78 12.77 9.22 346.02 13.87 Note that, the hotnessaware migration policy is difficult to figure out cold items under uniform workloads. 19

HotStorage '20 Throughput in YCSB with Four Threads RocksDB KVell SplitKV Norm.Throughput 4 3.5X 2 0 A B C D E F Workload 10 Norm.Throughput 8 RocksDB KVell SplitKV 6 7.9X 4 2 0 A B C D E F Workload 20

HotStorage '20 Outline • Background & Motivation • Design • Evaluation ü Conclusion 21

HotStorage '20 Conclusion • Modern NVMe SSD and persistent memory provide different access features when serving small/large data. • We propose SplitKV to provide different IO paths for different sized KV items for building KV stores with such advanced storage devices. • The throughput of SplitKV is up to 7.9 times that of other KV stores under zipfan load skew. 22

HotStorage '20 THANK YOU ! Q & A Author Email: hanshukai@ict.ac.cn 23

HotStorage '20 JULY 13-14, 2020 SplitKV: Splitting IO Paths for - PowerPoint PPT Presentation

HotStorage '20 JULY 13-14, 2020 SplitKV: Splitting IO Paths for Different Sized Key- Value Items with Advanced Storage Devices Shukai Han, Dejun Jiang, Jin Xiong Institute of Computing Technology, Chinese Academy of Sciences University of

Beehive : Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems Jun Li,

SelectiveEC: Selective Reconstruction in Erasure-coded Storage Systems Liangliang Xu, Min Lyu,

Ultra-Low Latency SSDs Impact on Overall Energy Efficiency Bryan Harris and Nihat Altiparmak

Small is Beautiful or Workloads Rule! Erez Zadok File systems and Storage Lab Stony Brook

Prefetching in Hybrid Main Memory Systems Subisha V , Varun Gohil , Nisarg Ujjainkar

The Case for Benchmarking Control Operations in Cloud Native Storage 12 th USENIX Workshop on Hot

Truly Non-blocking Writes Luis Useche 2 Ricardo Koller 2 Raju Rangaswami 2 Akshat Verma 1 1 IBM

IN SUPPORT OF WORKLOAD-AWARE STREAMING STATE MANAGEMENT Vasiliki Kalavri John Liagouris

A New LSM-style Garbage Collection Scheme for ZNS SSDs Gunhee Choi, Kwanghee Lee, Myunghoon Oh

Can Microservices Drive a Renaissance in Workload-Aware Storage Management? 12th USENIX Workshop

Its Time to Revisit LRU vs. FIFO Ohad Eytan 1,2 , Danny Harnik 1 , Effi Ofer 1 , Roy Friedman 2

True God There is one God one divine essence or being. Our Triune God There are three divine

1 Corinthians 13 Argyle Spring 2020 Preaching Series: New Vision and Values Our Values Totally

Taking Coreference Resolu2on beyond the 60% Performance Barrier

Did#you#do#this?#

Our Grandchildrens Water Ramont Bell September 22, 2018 Join the conversation!

July 12, 2020 You will need communion elements for todays gathering. Please join with audio to

Large Pages May Be Harmful on NUMA Systems Fabien Gaud

Jesus, grace and generosity Looking again at Luke Development Day, Saturday 7 October 2017

by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference on Performance Engineering

Blue Bible pg 1091 Jesus First Words (Luke 2:49) Why were you looking for me? Did you not

What we read about in the Book of Mormon is the Nephite Disease and we have it! . . . We

Dark Halos Dark Halos Dark Halos of Dark Halos of of of M31 and the Milky Way M31 and the

REFERENCES TO MY SERVANT It has been well said that the Old T estament is the New T