Small Index Large Table Author: Hyeontaek Lim, Bin Fan, D. G. - - PowerPoint PPT Presentation
Small Index Large Table Author: Hyeontaek Lim, Bin Fan, D. G. - - PowerPoint PPT Presentation
Small Index Large Table Author: Hyeontaek Lim, Bin Fan, D. G. Andersen, M. Kaminsky Presenter: Xiaoyu Zhang Motivation To achieve low latency, Aggressive usage of Dram-based index key value storage system to avoid bottleneck, caused by
Motivation
- To achieve low latency, Aggressive usage of Dram-based index key value
storage system to avoid bottleneck, caused by disk operation.
- DRAM is 8X more expensive, uses 25X more power per bit than flash
- DRAM is growing more slowly than disk or flash
SILT KEY-VALUE STORAGE SYSTEM
- SILT Key-Value Storage System
- Basic Storage Design
LogStore Hash Store Sorted Store
- Extending SILT Fuctionality
Question
(1) “Figure 1: The memory overhead and lookup performance of SILT and the recent key-value stores. For both axes, smaller is better.” Explain the positions of FAWN-DS, SkimpyStash, BufferHash, and SILT on the graph.
- LogStore handles inputs and deletes,
- n-flash hash table that does not require an in memory index to locate entries
- SortedStore > 80% of total entries.
SILT KEY-VALUE STORAGE SYSTEM
- Keys are first inserted into LogStore, in memory hash table maps key to offset.
- The LogStore is converted to an memory efficient HashStore
- Finally, it merges in bulk several HashStores along with an older version of
SortedStore.
SILT KEY-VALUE STORAGE SYSTEM
Questions
(2) Two design goals of SILT are low read amplification and low write
- amplification. Use any KV store we have studied as an example to show
how these amplifications are produced.
Questions
(3) Describe SILT’s structure using Figure 2 (Architecture of SILT). Compared with LevelDB, SILT has only three levels. What’s concern with a multi-level KV store when it has too few levels?
SILT KEY-VALUE STORAGE SYSTEM
- a partial-key cuckoo was used to reduce
the flash reads and the alternative bucket index
- To make it compact, it uses tag of actual
key, reduce unnecessary flash reads.
- move a key to its alternative bucket to
displace another key is very cost.
- 4 way set associative hash table
LogStore
From Bin Fan, 2013
Questions
(4) Use Figure 3 (Design of LogStore: an in-memory cuckoo hash table (index and filter) to describe how a PUT request and a GET request is served in a LogStore. In particular, explain how the tag is used in a LogStore.
SILT KEY-VALUE STORAGE SYSTEM
- LogStore -> a much large SortedStore,
high WA or incures memory overhead.
- Solution: write to a Hashstores, and then
performs bulk merge.
- Advantage: eliminate the index and
reorder the on-flash (key, value) pairs to save memory
- Hash filter - efficient in memory filter to
reject queries.
HashStore (memory efficient)
Questions
(5) Use Figure 4 to explain how a LogStotre is converted into a HashStore?
Question
(6) “Once a LogStore fills up (e.g., the insertion algorithm terminates without finding any vacant slot after a maximum number of displacements in the hash table), SILT freezes the LogStore and converts it into a more memory-efficient data structure.” Compared to LogStore, what’s the advantage of HashStore? Why doesn’t SILT create HashStore at the beginning (without first creating LogStore)?
kkkk
SILT KEY-VALUE STORAGE SYSTEM
SortedStore (kv entries sorted by key on flash)
- HashStore->sorted->SortedStore
- Trie structure: Each leaf nodes represent one key, and
the shortest unique prefix of the tree serves as index
- guarantees a correct index lookup, but says nothing
about the presence
a) a pair of numbers in each denotes the leaf nodes number in right and left b) a recursive representation of the trie c) its entropy-coded representation used by Sorted-Store.
SILT KEY-VALUE STORAGE SYSTEM
Questions
(7) “When fixed-length key-value entries are sorted by key on flash, a trie for the shortest unique prefixes of the keys serves as an index for these sorted data.” While a SortedStore is fully sorted, could you comment on the cost of merging a HashStore with a SortedStore? Compare this cost to the major compaction cost for LevelDB?
- Tradoffs: we need to balance write
amplification, read amplification, or memory amplification. For example, using larger tags reduces read amplification by reducing false positive rate or the number of HashStores. However, the HashStores then consume more DRAM due to the larger tags.
SILT KEY-VALUE STORAGE SYSTEM
SILT KEY-VALUE STORAGE SYSTEM
- The bottom right graph shows the
memory consumed by SILT, compared with the bottom left, which omits the intermediate HashStore, thus needs twice as much memory as the SILT. The top right graph intead ommits SorttedStore, and consumes four times as much memory. The top left one uses
- nly the basic LogStore, it uses 10X as
much memory as SILT.
LogStore construction, entry by entry insertion, 90% of write bandwidth LogStroe to HashStore conversion involves bulk data reads and writes SortedStore involves an external sort for the entire HashStores
SILT KEY-VALUE STORAGE SYSTEM
Conclusion
- SILT combines multiple stores to balance the use of memory, storage and
computation to form a memory efficient and high performance storage system.
- The integrated system use partial key cuckoo hashing and entropy-coded tries
to reduce drastically the amount of memory needed and provide high write speed, high read throughput.
- On average, only 0.7 bytes of memory per entry it stores and makes only 1.01
flash reads to serve a lookup and those can be done within 400 microseconds.