LOG-STRUCTURED MERGE-TRIE PART 1
Xingbo Wu and Yuehai Xu, Wayne State University; Zili Shao, The Hong Kong Polytechnic University; Song Jiang, Wayne State University Presented by: Joel Friberg
LOG-STRUCTURED MERGE-TRIE PART 1 Xingbo Wu and Yuehai Xu, Wayne - - PowerPoint PPT Presentation
LOG-STRUCTURED MERGE-TRIE PART 1 Xingbo Wu and Yuehai Xu, Wayne State University; Zili Shao, The Hong Kong Polytechnic University; Song Jiang, Wayne State University Presented by: Joel Friberg LSM-Trie Overview 32MB Htable KV-item
Xingbo Wu and Yuehai Xu, Wayne State University; Zili Shao, The Hong Kong Polytechnic University; Song Jiang, Wayne State University Presented by: Joel Friberg
■ 32MB Htable KV-item organization ■ Almost no index – hash based ■ Fixed size buckets to match disk blocks (4KB) ■ Linear and Exponential levels in the trie (112 total) ■ 16bit bloom filters (5% false positive rate achieved) ■ 1 disk read necessary for bloom filters (BloomCluster) ■ Optimized for up to 10TB store
https://www.researchgate.net/profile/Pasi_Fraenti/publication/321323711/figure/fig8/AS:576074708525076@1514358321712/Prefix-tree-example.png
“In the meantime, for some KV stores, such as SILT [24], major efforts are made to
compromised without conducting multi-level incremental compactions” Explain how high write amplifications are produced in SILT. ■ Single SortedStore on disk for everything ■ Entries in HashStore can cover large range ■ Large ratio between actual data to write and data to merge
http://ranger.uta.edu/~sjiang/CSE6350-spring-19/lecture-7.pdf
“Note that LSM-trie uses hash functions to organize its data and accordingly does not support range search.” Do FAWN and LevelDB support range search? ■ FAWN is hash based – no range search ■ LevelDB stores sorted KV pairs, indices are block ranges – can range search
Use Figure 1 to explain the difference between linear and exponential growth patterns.
“Because 4KB block is a disk access unit, it is not necessary to maintain a larger index to determine byte offset of each item in a block.” Show how a lookup with a given key is carried out in LevelDB? ■ Binary search MemTable ■ Recursively binary search and check bloom filter for SSTables that index is in range of on each level ■ Retrieve value
http://ranger.uta.edu/~sjiang/CSE6350-spring-19/lecture-7.pdf
“Instead, we first apply a cryptographic hash function, such as SHA-1, on the key, and then use the hashed key, or hashkey in short, to make the determination.” Assuming a user-provided key has 160 bits, what’s the issue if LSM-trie used the user keys, instead
■ Cryptographic hash follows normal distribution ■ User key may be unbalanced
https://appliedgo.net/balancedtree/
“Among all compactions moving data from Lk to Lk+1, we must make sure their key ranges are not overlapped to keep any two SSTables at Level L k+1 from having
■ SSTable has limited capacity ■ Key range size of SSTable highly variable ■ SSTables cover different ranges at each sublevel
http://ranger.uta.edu/~sjiang/CSE6350-spring-19/lecture-7.pdf
Use Figures 2 and 3 to describe the LSM-trie’s structure and how compaction is performed in the trie.
■ Optimized for many small items ■ High performance read and write ■ Hash based with some indices used for large items ■ No range search ■ Utilizes exponential levels (5) and linear levels (8 per exponential levels 1-4, 80 on level 5) to store up to 10TB of data