log structured
play

LOG-STRUCTURED MERGE-TRIE PART 1 Xingbo Wu and Yuehai Xu, Wayne - PowerPoint PPT Presentation

LOG-STRUCTURED MERGE-TRIE PART 1 Xingbo Wu and Yuehai Xu, Wayne State University; Zili Shao, The Hong Kong Polytechnic University; Song Jiang, Wayne State University Presented by: Joel Friberg LSM-Trie Overview 32MB Htable KV-item


  1. LOG-STRUCTURED MERGE-TRIE PART 1 Xingbo Wu and Yuehai Xu, Wayne State University; Zili Shao, The Hong Kong Polytechnic University; Song Jiang, Wayne State University Presented by: Joel Friberg

  2. LSM-Trie Overview ■ 32MB Htable KV-item organization ■ Almost no index – hash based ■ Fixed size buckets to match disk blocks (4KB) ■ Linear and Exponential levels in the trie (112 total) ■ 16bit bloom filters (5% false positive rate achieved) ■ 1 disk read necessary for bloom filters (BloomCluster) ■ Optimized for up to 10TB store https://www.researchgate.net/profile/Pasi_Fraenti/publication/321323711/figure/fig8/AS:576074708525076@1514358321712/Prefix-tree-example.png

  3. Question 1 “In the meantime, for some KV stores, such as SILT [24], major efforts are made to optimize reads by minimizing metadata size, while write performance can be compromised without conducting multi- level incremental compactions” Explain how high write amplifications are produced in SILT. ■ Single SortedStore on disk for everything ■ Entries in HashStore can cover large range ■ Large ratio between actual data to write and data to merge http://ranger.uta.edu/~sjiang/CSE6350-spring-19/lecture-7.pdf

  4. Question 2 “Note that LSM -trie uses hash functions to organize its data and accordingly does not support range search.” Do FAWN and LevelDB support range search? ■ FAWN is hash based – no range search ■ LevelDB stores sorted KV pairs, indices are block ranges – can range search

  5. Question 3 Use Figure 1 to explain the difference between linear and exponential growth patterns.

  6. Question 4 “Because 4KB block is a disk access unit, it is not necessary to maintain a larger index to determine byte offset of each item in a block.” Show how a lookup with a given key is carried out in LevelDB? ■ Binary search MemTable ■ Recursively binary search and check bloom filter for SSTables that index is in range of on each level ■ Retrieve value http://ranger.uta.edu/~sjiang/CSE6350-spring-19/lecture-7.pdf

  7. Question 5 “Instead, we first apply a cryptographic hash function, such as SHA -1, on the key, and then use the hashed key, or hashkey in short, to make the determination.” Assuming a user- provided key has 160 bits, what’s the issue if LSM -trie used the user keys, instead of hashed keys, in its data structure and operations? ■ Cryptographic hash follows normal distribution ■ User key may be unbalanced https://appliedgo.net/balancedtree/

  8. Question 6 “Among all compactions moving data from Lk to Lk+1, we must make sure their key ranges are not overlapped to keep any two SSTables at Level L k+1 from having overlapped key ranges. However, this cannot be achieved with the LevelDB data organization …” Please explain why LevelDB cannot achieve it? ■ SSTable has limited capacity ■ Key range size of SSTable highly variable ■ SSTables cover different ranges at each sublevel http://ranger.uta.edu/~sjiang/CSE6350-spring-19/lecture-7.pdf

  9. Question 7 Use Figures 2 and 3 to describe the LSM- trie’s structure and how compaction is performed in the trie.

  10. Conclusion ■ Optimized for many small items ■ High performance read and write ■ Hash based with some indices used for large items ■ No range search ■ Utilizes exponential levels (5) and linear levels (8 per exponential levels 1-4, 80 on level 5) to store up to 10TB of data

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend