LSM-trie
An LSM-tree-based Ultra-Large Key-Value Store for small Data by: Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang
LSM-trie An LSM-tree-based Ultra-Large Key-Value Store for small - - PowerPoint PPT Presentation
LSM-trie An LSM-tree-based Ultra-Large Key-Value Store for small Data by: Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang Daniel Herring Design Goals and Assumptions Goals Efficient KV Store Inserts must support high-throughput
An LSM-tree-based Ultra-Large Key-Value Store for small Data by: Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang
FSK = hash(K) Insert(K, V) Insert(FSK, [K, V]) MemTable *Table 0.0 Immutable Data *Table 0.N Level 0 *Table 1.0i *Table 1.01 *Table 1.00 … Level 1 *Table 1.N0 *Table 1.N1 *Table 1.Ni … … Minor Compaction Major Compaction … … … N = number of entries in level i = pile number in entry
Level 1 - Pile 0 Level 1 - Pile 1 Level 1 - Pile i *Supports using different Table structures based on data size
Level 0 Level 1 Level 2 Level 3
(1) “In the meantime, for some KV stores, such as SILT [24], major efforts are made to optimize reads by minimizing metadata size, while write performance can be compromised without conducting multi-level incremental compactions” Explain how high write amplifications are produced in SILT.
SILT data taken from section 5 of [24] LIM, H., FAN, B., ANDERSEN, D. G., ANDKAMINSKY, M. SILT: A memory-efficient, high-performance key-value store.
SILT data taken from section 5 of [24] LIM, H., FAN, B., ANDERSEN, D. G., ANDKAMINSKY, M. SILT: A memory-efficient, high-performance key-value store.
▫ Sorting happens at next level’s major compaction ▫ Only full piles need compacted to next level ▫ Piles can contain non-full HTables
non overlapping
▫ Sorting does not affect other containers in child level ▫ Key 001 110 does not have to affect containers and tables holding key 001 111
▫ Pile only ever has N tables in it ▫ Sort at each level can discard upper bits
64 bit key at level 4 only has to compare 52 bits to sort keys 256 bit key at level 75 only has to compare 31 bits
(5) Use Figures 2 and 3 to describe the LSM-trie’s structure and how compaction is performed in the trie.
▫ Exponential Growth – each level is 10x larger than previous with space for 10x more data
▫ Combined Linear and Exponential Growth – Levels are 8x larger than previous but have space for 64x data
(3) Use Figure 1 to explain the difference between linear and exponential growth patterns.
▫ Minimize time of insert ▫ Minimize time resorting level ▫ Able to effectively partition
▫ Able to off load compaction work of lower levels ▫ Fast key lookup – only 1 pile at each level can contain key
▫ Increase time of read due to pile searches ▫ Unable to do range searching ▫ Uses different data structures for small data vs larger KB data
Htable for small data SSTable-Trie for large data (2) “Note that LSM-trie uses hash functions to organize its data and accordingly does not support range search.” Does LevelDB support range search?
(4) “Among all compactions moving data from Lk to Lk+1, we must make sure their key ranges are not overlapped to keep any two SSTables at Level Lk+1 from having
cannot be achieved with the LevelDB data
LevelDB cannot achieve it?
(8) What’s the difference between SSTable in LevelDB and HTable in LSM-trie?
▫ HTables may balance buckets and records relocation information in HTable metadata ▫ Limits each HTable to 95% fill to allow for balancing of random sized items
▫ Size is 16 bits per item ▫ Sized to minimize false positive rate ▫ Filters stored in HTable
(9) “However, a challenging issue is whether the buckets can be load balanced in terms of aggregate size
an HTable be load unbalanced? How to correct the problem?
▫ Bloom filters use the majority of memory
A store of 32 sub-levels with average 64B item size has 4.5GB of 16bit bloom filters
▫ Relocation Records use memory
The above store uses an estimate of 0.5GB of memory for relocation records
(7) “Therefore, the Bloom filter must be beefed up by using more bits.” Use an example to show why the Bloom filters have to be longer? (6) “The indices and Bloom filters in a KV store can grow very large.” Use an example to show that these metadata in LevelDB may have to be out of core.
KV Store L1.111 KV Store L1.111 KV Store L1.111 KV Store L1.111 KV Store L1.111 KV Store L1.111 KV Store L1.001
KV Store Front End Client MemTable L0 Piles KV Store L1.000 MemTable L1.00 Piles Level N thread Level N + 1 thread N = 0
▫ When Level 0 receives request, it sends a message to all level servers with key and asks if they have it. ▫ Servers having data respond with server id, and server level ▫ Level 0 system then determines which level is newest if its not in own data store, then requests it from appropriate server. ▫ Expected lookup time is close to O(1) when server processes number is allowed to grow based on O(log(n)) MemTable Input tables Havekey(k)? Piles Compaction Sort piles to next layer’s KV store servers
▫ When Level 0 receives request, stores data in its memTable ▫ When memTable is full, it converts to immutable HTable
local pile ▫ When local pile fills up piles are sorted and data sent to lower level servers based on Trie partitioning ▫ Lower level servers store data in to memTable and make immutable when appropriate, down the tree. ▫ Expected insert time O(1) when server processes number is allowed to grow based on O(log(n)) MemTable Input tables Havekey(k)? Piles Compaction Sort piles to next layer’s KV store servers