LOG-STRUCTURED MERGE-TRIE PART 1 Xingbo Wu and Yuehai Xu, Wayne - - PowerPoint PPT Presentation

log structured
SMART_READER_LITE
LIVE PREVIEW

LOG-STRUCTURED MERGE-TRIE PART 1 Xingbo Wu and Yuehai Xu, Wayne - - PowerPoint PPT Presentation

LOG-STRUCTURED MERGE-TRIE PART 1 Xingbo Wu and Yuehai Xu, Wayne State University; Zili Shao, The Hong Kong Polytechnic University; Song Jiang, Wayne State University Presented by: Joel Friberg LSM-Trie Overview 32MB Htable KV-item


slide-1
SLIDE 1

LOG-STRUCTURED MERGE-TRIE PART 1

Xingbo Wu and Yuehai Xu, Wayne State University; Zili Shao, The Hong Kong Polytechnic University; Song Jiang, Wayne State University Presented by: Joel Friberg

slide-2
SLIDE 2

LSM-Trie Overview

■ 32MB Htable KV-item organization ■ Almost no index – hash based ■ Fixed size buckets to match disk blocks (4KB) ■ Linear and Exponential levels in the trie (112 total) ■ 16bit bloom filters (5% false positive rate achieved) ■ 1 disk read necessary for bloom filters (BloomCluster) ■ Optimized for up to 10TB store

https://www.researchgate.net/profile/Pasi_Fraenti/publication/321323711/figure/fig8/AS:576074708525076@1514358321712/Prefix-tree-example.png

slide-3
SLIDE 3

Question 1

“In the meantime, for some KV stores, such as SILT [24], major efforts are made to

  • ptimize reads by minimizing metadata size, while write performance can be

compromised without conducting multi-level incremental compactions” Explain how high write amplifications are produced in SILT. ■ Single SortedStore on disk for everything ■ Entries in HashStore can cover large range ■ Large ratio between actual data to write and data to merge

http://ranger.uta.edu/~sjiang/CSE6350-spring-19/lecture-7.pdf

slide-4
SLIDE 4

Question 2

“Note that LSM-trie uses hash functions to organize its data and accordingly does not support range search.” Do FAWN and LevelDB support range search? ■ FAWN is hash based – no range search ■ LevelDB stores sorted KV pairs, indices are block ranges – can range search

slide-5
SLIDE 5

Question 3

Use Figure 1 to explain the difference between linear and exponential growth patterns.

slide-6
SLIDE 6

Question 4

“Because 4KB block is a disk access unit, it is not necessary to maintain a larger index to determine byte offset of each item in a block.” Show how a lookup with a given key is carried out in LevelDB? ■ Binary search MemTable ■ Recursively binary search and check bloom filter for SSTables that index is in range of on each level ■ Retrieve value

http://ranger.uta.edu/~sjiang/CSE6350-spring-19/lecture-7.pdf

slide-7
SLIDE 7

Question 5

“Instead, we first apply a cryptographic hash function, such as SHA-1, on the key, and then use the hashed key, or hashkey in short, to make the determination.” Assuming a user-provided key has 160 bits, what’s the issue if LSM-trie used the user keys, instead

  • f hashed keys, in its data structure and operations?

■ Cryptographic hash follows normal distribution ■ User key may be unbalanced

https://appliedgo.net/balancedtree/

slide-8
SLIDE 8

Question 6

“Among all compactions moving data from Lk to Lk+1, we must make sure their key ranges are not overlapped to keep any two SSTables at Level L k+1 from having

  • verlapped key ranges. However, this cannot be achieved with the LevelDB data
  • rganization …” Please explain why LevelDB cannot achieve it?

■ SSTable has limited capacity ■ Key range size of SSTable highly variable ■ SSTables cover different ranges at each sublevel

http://ranger.uta.edu/~sjiang/CSE6350-spring-19/lecture-7.pdf

slide-9
SLIDE 9

Question 7

Use Figures 2 and 3 to describe the LSM-trie’s structure and how compaction is performed in the trie.

slide-10
SLIDE 10

Conclusion

■ Optimized for many small items ■ High performance read and write ■ Hash based with some indices used for large items ■ No range search ■ Utilizes exponential levels (5) and linear levels (8 per exponential levels 1-4, 80 on level 5) to store up to 10TB of data