key value store with bounded tails
play

Key-value Store with Bounded Tails Junsu Im , Jinwook Bae, Chanwoo - PowerPoint PPT Presentation

DATA -INTENSIVE COMPUTING SYSTEMS LAB ORATORY PinK: High-speed In-storage Key-value Store with Bounded Tails Junsu Im , Jinwook Bae, Chanwoo Chung * , Arvind * , and Sungjin Lee Daegu Gyeongbuk Institute of Science & Technology (DGIST)


  1. DATA -INTENSIVE COMPUTING SYSTEMS LAB ORATORY PinK: High-speed In-storage Key-value Store with Bounded Tails Junsu Im , Jinwook Bae, Chanwoo Chung * , Arvind * , and Sungjin Lee Daegu Gyeongbuk Institute of Science & Technology (DGIST) *Massachusetts Institute of Technology (MIT) 2020 USENIX Annual T echnical Conference (ATC’ 20, July 15 ~ 17)

  2. Key-Value Store is Everywhere!  Key-Value store (KVS) has become a necessary infrastructure  Algorithm Web indexing, Caching, Storage systems  SILK (ATC’19) ,  Dostoevsky (SIGMOD’18)  Monkey (SIGMOD’17) …  System  FlashStore (VLDB’10)  Wisckey (FAST’16)  LOCS (Eurosys’14) …  Architecture  Bluecache (VLDB’16) … 2

  3. Key-Value (KV) Storage Device Web indexing, Caching, Storage systems Key-Value Interface Fewer Host Resources Host KVS Engine Low Latency High Throughput Block Device Driver KV-SSD Device Driver Block-SSD KV-SSD capacitior Offloading KVS functionality 3

  4. Key-Value (KV) Storage Device Web indexing, Caching, Storage systems Key-Value Interface Fewer Host Resources Host KVS Engine Low Latency High Throughput  Academia Block Device Driver KV-SSD Device Driver  LightStore (ASPLOS’19), KV- SSD (SYSTOR’19), iLSM- SSD(MASCOTS’19) Block-SSD KV-SSD KAML (HPCA’17), NVMKV(ATC’15), Bluecache (VLDB’16) …  Industry Offloading KVS  Samsung’s KV -SSD functionality 4

  5. Key Challenges of Designing KV-SSD  1. Limited DRAM resource  SSDs usually have DRAM as much as 0.1% of NAND for indexing!  Logical block: 4KB > KV-pair: 1KB on average DRAM DRAM 1KB 4KB DRAM Scalability NAND Scalability  DRAM scalability slower than NAND! 1.13x / year 1.43x / year Technology and Cost Trends at Advanced Nodes, 2020, https://semiwiki.com/wp-content/uploads/2020/03/Lithovision-2020.pdf 5

  6. Key Challenges of Designing KV-SSD (Cont.)  2. Limited CPU performance  SSDs have low power CPU (ARM based) ARM CPU x86 CPU Which algorithm is better for KV-SSD with these limitations, Hash or Log-structured Merge-tree (LSM-tree) ? 6

  7. Experiments using Hash-based KV-SSD  Samsung KV-SSD prototype  hash-based KV-SSD*  Benchmark  KV-SSD: KVBench**, Long tail latency Performance drop 32B key and 1KB value read request  Block-SSD: FIO, / / / / / / 1KB read request What is the reason? 7 *KV-PM983, **Samsung KV-SSD benchmark tool

  8. Problem of Hash-based KV-SSD SSD: 4TB, DRAM:4GB Key: 32B, Value: 1KB Hash bucket Full key (32B) Pointer to value (4B) Value 144GB >> 4GB KAML ( HPCA’17 ) Pointer to KV (4B) Signature (2B) Full key and Value 24GB > 4GB Flashstore (VLDB’10) 8

  9. Problem of Hash-based KV-SSD Get ( key 7 ) Bucket 10 Signature: 1000 Hash Function LRU Cache Performance Drop Cache miss cached hash buckets Flash Access Bucket Bucket Bucket Bucket 5 Long tail latency Signature Signature Signature Signature Ptr Ptr Ptr Ptr probing 1000 1000 1000 2000 Signature Collision 1001 1001 1001 2001 Read other KV-pair 1002 1002 1002 2002 1003 1003 1003 2003 DRAM Flash Bucket Bucket Bucket Bucket Bucket Bucket Bucket Bucket 9 Bucket 10 Signature Signature Signature Signature Signature Ptr Signature Ptr Signature Ptr Signature Ptr Signature Ptr Ptr Ptr Ptr Ptr 1000 1000 1000 1000 1000 1000 1000 1004 1000 KEY:16 , Value KEY: 10, Value KEY: 7,Value 1001 1001 1001 1001 1001 1001 1001 1005 1001 Key is not 7 1002 1002 1002 1002 1002 1002 1002 1006 1002 Key is not 7 1003 1003 1003 1003 1003 1003 1003 1007 1003 In-flash hash buckets 9

  10. LSM-tree?  Another Option “LSM - tree”  Low DRAM requirement  No collision  Easy to serve range query Is the LSM-tree really good enough? 10

  11. Problem of LSM-tree-based KV-SSD  1. Long tail latency! In the worst case, h-1 flash accesses for 1 KV ( h = height of LSM-tree) Get ( key 7 ) Level 2 Level 0: Memtable Level 1 Level h 0 f h ( 7 ) f h ( 7 ) f h ( 7 ) Bloom filter Bloom filter Bloom filter pass pass 4 15 20 pass … Indices Indices Indices Indices Indices Value Value Value Indices DRAM Flash 4 V 5 V V V 1 V 2 V 4 V 8 V 6 7 1 V 3 V V V 11 12 no key 7 : false positive no key 7 : false positive finally key 7 found 11

  12. Problem of LSM-tree-based KV-SSD  2. CPU overhead!  Merge sort in compaction  Building bloom filters ARM CPU Level N Bloom filter 15 13 11 9 7 Level N+1 6 5 4 3 2 1 16 14 12 10 8 New Level N+1  3. I/O overhead!  Compaction I/O added by LSM-tree 12

  13. Experiments using LSM-tree-based KV-SSD  Lightstore*: LSM-tree-based KV-SSD  Key-value separation ( Wisckey** ) and Bloom filter ( Monkey*** )  Benchmark  Lightstore: YCSB-LOAD and YCSB-C (Read only), 32B key and 1KB value Long tail latency Compaction time-breakdown YCSB-C 13 *ASPLOS’19, **FAST’16, ***SIGMOD’17

  14. PinK : New LSM-tree-based KV-SSD  Long tail latency? L0 L0 DRAM  Using “ Level-pinning ” L1 Flash L1 L2 DRAM L2  CPU overhead? Flash L3 L3  “ No Bloom filter ” Bloom filter  “ HW accelerator ” for compaction  I/O overhead?  Reducing compaction I/O Level N Level N+1 by level-pinning Level N+1  Optimizing GC by reinserting valid data to LSM-tree Level N Level N+1 Level N+1 14

  15. Introduction PinK Overview of LSM-tree in PinK Bounding tail latency Memory requirement Reducing search overhead Reducing compaction I/O Reducing sorting time Experiments Conclusion

  16. Overview of LSM-tree in PinK  PinK is based on key-value separated LSM-tree Skiplist KV KV KV KV Level 0 Start key Level 1 2 23 Level 2 Level list (sorted array) … … … Level h-1 DRAM Flash Meta segment area Data segment area Address pointer 2 V K V K V K V 2 4 11 19 Meta segment Data segment Pointer to KV 16

  17. Bounding Tail Latency PinK LSM-tree with bloom filter LSM-tree: # of Levels 5 GET GET Bloom filter … … L1 Binary search L1 Binary search In worst case, In worst case, 4 flash access! 1 flash access! L2 L2 Binary search Binary search Level list L3 Binary search Binary search L3 … … L4 L4 DRAM DRAM Flash Flash Memory usage? Meta segment … … 17

  18. Memory Requirement  4TB SSD, 4GB DRAM (32B key, 1KB value) Total # of levels: 5  Skip list (L 0 ) 8MB KV KV KV KV L1 L2 Level list 432MB L3 3.5 GB < 4GB … L4 Only one flash access for indexing DRAM Flash 1 level: 1.47MB 2 levels: 68MB Meta segment 3 levels: 3.1GB … 4 levels: 144GB 18

  19. Reducing Search Overhead  Fractional cascading Binary search Binary search × T Binary search Binary search on overlapped range Binary search × T Binary search h Range pointer … … Binary search × T Binary search … … 𝑃(ℎ 2 log(𝑈)) 𝑃(ℎ log(𝑈)) search complexity is Burdensome! 19

  20. Reducing Search Overhead  Prefix Less compare overhead  Cache efficient search  Binary search Binary search “Prefix” and “range pointer” memory usage: about 10% of level list Binary search Prefix (4B) … Key (32B) Ptr (4B) Binary search on same prefix … … Binary search on keys 20

  21. Reducing Compaction I/O PinK without level-pinning PinK with level-pinning Full Full Update level list Update level list 6 read & 6 write No read & write … … Burdensome! 1 2 3 5 6 9 1 2 3 5 6 9 DRAM Flash 1 3 1 3 capacitior 2 5 6 9 2 5 6 9 DRAM … … Flash 21

  22. Reducing Sorting Time DRAM L n Flash Write DRAM or Flash 15 14 11 9 2 ARM CPU DRAM Key Comparator Read DRAM or Flash L n+1 (==, >, <) Flash 16 14 12 10 2 DRAM Flash L n Meta segment addresses New L n+1 Meta segment level list of L n+1 addresses New address for Meta segments PinK 22

  23. PinK Summary  Long tail latency? L0 L0 DRAM Using level-pinning L1 Flash L1 L2 DRAM L2  CPU overhead? Flash L3 L3 Removing Bloom filter Optimizing binary search Bloom filter Adopting HW accelerator ARM CPU  I/O overhead? Reducing compaction I/O Optimizing GC by reinserting valid data to LSM-tree Please refer to the paper! 23

  24. Introduction PinK Experiments Conclusion

  25. Custom KV-SSD Prototype and Setup  All algorithms for KV-SSD were implemented on ZCU102 board  For fast experiments: 64GB SSD, 64 MB DRAM (0.1% of NAND capacity) Client Server KV-SSD platform Xilinx ZCU102 4GB DRAM Expansion Card Custom Connectors Xeon E5-2640 Flash Card (20 cores @ 2.4 GHz) 32GB DRAM Artix7 FPGA 10GbE Zynq Ultrascale+ SoC Raw NAND (Quad-core ARM Cortex-A53 Flash chips 10GbE NIC with FPGA) (256GB) 25

  26. Benchmark Setup  YCSB: 32B key, 1KB value Load A B C D E F R:W ratio 0:100 50:50 95:5 100:0 95:5 95:5 50:50(RMW) Query type Point Range read Point Request Latest Uniform Zipfian Zipfian distribution (Highest locality)  Two phases  Load: issue unique 44M KV pairs (44GB, 70% of total SSD)  Run: issue 44M KV pairs following workload description 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend