MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree - PowerPoint PPT Presentation

MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with a Matrix Container in NVM Ting Yao 1 , Yiwen Zhang 1 , Jiguang Wan 1 , Qiu Cui 2 , Liu Tang 2 , Hong Jiang 3 , Changsheng Xie 1 , and Xubin He 4 1 Huazhong University of Science and Technology, China 2 PingCAP, China 3 University of Texas at Arlington, USA 4 Temple University, USA

Outline  Background and Motivations  MatrixKV  Evaluation  Conclusion 2

LSM-tree based Key-value stores  Log-structured merge tree (LSM-tree) • Write intensive scenarios  Applications :  Properties: • Batched sequential writes: high write throughput • Fast read • Fast range queries 3

LSM-tree and RocksDB  Systems with DRAM-SSD storage Insert  Exponentially increased level sizes (AF) MemTable DRAM Immutable  Operations MemTable Flush 1. Insert L 0 Compaction 2. Flush SSD L 1 3. Compaction between L i -L i+1 ◦ L0-L1 compaction L n ◦ L1-L2 compaction SSD based RocksDB ◦ ……

Challenge 1: Write stall Random write an 80 GB Dataset to an SSD based RocksDB. (20 million KV items, 16byte-4KB) Write stall: Application throughput periodically drop to nearly zero.  Unpredictable performance.  Long tail latency. L0-L1 compaction! 3.1GB compaction data . 5

Root cause of write stall: L0-L1 compaction Merge & Sort C m Memory Read L 0 Disk L 1 CPU cycle. SSD bandwidth. L 2 L n SSTable L0-L1 compaction: The all-to-all coarse-grained compaction 6

Challenge 2: write amplification Random write an 80 GB Dataset to an SSD based RocksDB. (20 million KV items, 16byte-4KB) Write amplification: Average throughput decreases gradually.  Decreased performance. Increased LSM depth! More compaction and higher WA 7

Root cause of increased write amplification  Level by level compactions: Write C m Memory amplification increases with the depth of LSM-trees. L 0 Disk L 1  WA=AF * N; L 2 AF is the amplification factor of L n SSTable adjacent two levels. (AF=10) N is the number of levels. 8

State-of-art solution with NVM  NVM is byte-addressable, persistent, and fast!  NoveLSM: Adopting NVM to store large mutable MemTable.  1.7x higher random write performance but more severe write stalls! MemTable MemTable DRAM NVM Immutable Immutable MemTable MemTable L 0 L 1 SSD L n NoveLSM *Sudarsun Kannan, Nitish Bhat, Ada Gavrilovska, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. Redesigning lsms for 9 nonvolatile memory with novelsm. In 2018 USENIX Annual Technical Conference (ATC18), 2018.

Motivation All-to-all L0-L1 compaction Increased depth Higher write amplification Write stall Unstable performance Decreased performance MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores by exploiting NVM 10

Outline  Background and Motivations  MatrixKV  Evaluation  Conclusion 11

Overall Architecture Put 1. Matrix container in NVM : Manage L0’s data on NVM mem DRAM imm 2. Column compaction: A fine granularity Flush Receiver column compaction to reduce write stalls Cross-row hints PMDK Matrix Container NVM 3. Reducing levels on SSD: Reduce LSM- tree’s (L 0 of LSM-trees) level numbers to decrease WA (on SSD) Compactor Column compaction Posix 4. Cross-Row hint search: A hint search LSM-trees with reduced levels SSD algorithm in Matrix container to improve L1, L2, read performance 12

Matrix Container  Matrix container includes a receiver and Flush from DRAM a compactor. RowTable 3  Receiver stores flushed data row by row 2 and organized in RowTable. 1 Receiver 0  A: A receiver turns into a compactor A B once filled with RowTables 3 Compactor 2 1  Compactor compacts data from L 0 to L 1 0 NVM a-c c-e e-n n-o u-z on SSD column by column. g n SSD a c a c d e h o  B: NVM pages of a column are freed and SSTables on L 1 available for receiver to accept new data after the column compaction. 13

RowTable Data: sorted kv items Metadata: a sorted array k 0 K 1 K ... k n k 0 v 0 k 1 v 1 ... k n v n P 0 P 1 P... P n Offset Offset Offset Offset Page 0 Page 0 Page ... Pagen (a) RowTable structure  Consisting of data and metadata.  Data region: serialized KV items from the immutable MemTable  Metadata region: a sorted array. • Key • page number • offset in the page • forward pointer (i.e., $p_n$) 14

Fine grained column compaction  The non-overlapped L1 is a key space with multiple contiguous key ranges. 3 3 5 7 10 23 28 35  Example: Compactor ( NVM) 1. Range 0-3. 3 3 6 8 13 30 45 51 2. The amount of compaction data VS. the threshold of compaction. 1 1 4 9 10 13 38 42 3. Add the next subrange 3-5 -> Range 0-5. 3 3 8 11 12 14 40 48 4. Add the next subrange 5-8 -> Range L 1 0 3 5 8 10 15 20 26 30 0-8. (SSD) ... 5. Reach the threshold of compaction, Start column compaction 15

Fine grained column compaction  The non-overlapped L1 is a key space with multiple contiguous key ranges. 3 3 5 5 7 10 23 28 35  Example: Compactor ( NVM) 1. Range 0-3. 3 3 6 8 13 30 45 51 2. The amount of compaction data VS. the threshold of compaction. 1 1 4 4 9 10 13 38 42 3. Add the next subrange 3-5 -> Range 0-5. 3 3 8 11 12 14 40 48 4. Add the next subrange 5-8 -> Range L 1 0 3 5 8 10 15 20 26 30 0-8. (SSD) ... 5. Reach the threshold of compaction, Start column compaction 16

Fine grained column compaction  The non-overlapped L1 is a key space with multiple contiguous key ranges. 3 3 5 5 7 7 10 23 28 35  Example: Compactor ( NVM) 1. Range 0-3. 3 3 6 6 8 8 13 30 45 51 2. The amount of compaction data VS. the threshold of compaction. 1 1 4 4 9 10 13 38 42 3. Add the next subrange 3-5 -> Range 0-5. 3 3 8 8 11 12 14 40 48 4. Add the next subrange 5-8 -> Range L 1 0 3 5 8 10 15 20 26 30 0-8. (SSD) ... 5. Reach the threshold of compaction, Range [0-8] Range (8-30] Range ... Start column compaction 17

Reducing LSM-tree depth  WA=AF * N  Flattening LSM-trees with wider levels L 0 256MB L 0 8 GB L 1 256 MB • Make the AF unchanged NVM SSD L 2 2.56 GB L 1 8 GB • Reduce N L 2 80 GB L 3 25.6 GB SSD L 4 256 GB L 3 800 GB  Increased unsorted L0 L 5 2.56 TB L 4 8 TB  Column compaction Conventional LSM-tree Flattened LSM-tree in MatrixKV  Decrease search efficiency in L0  Cross-row hint search

Cross-Row hint search  Constructing with forward pointer 12 • RowTable i key x 3 5 7 10 23 28 35 RowTable3 10 23 • RowTable i-1, key y • y ≥ x 3 6 8 8 13 13 30 30 45 51 RowTable2  Search process with forward pointer 1 4 9 9 10 10 13 13 38 42 RowTable1 • E.g., fetch key=12 3 8 11 11 12 12 14 14 40 48 RowTable0

Evaluation Setup Comparisons  RocksDB-SSD: SSD based RocksDB  RocksDB-L0-NVM: placing L0 in NVM, system with DRAM, NVM, and SSD (8GB NVM)  NoveLSM: a heterogeneous system of DRAM, NVM, and SSD (8GB NVM)  MatrixKV: a heterogeneous system of DRAM, NVM, and SSD (8GB NVM) Test environment Linux 64-bit Linux 4.13.9 CPU 2 * Genuine Intel(R) 2.20GHz processors Memory 32 GB NVM 128 GB * 2 Intel Optane DC PMM FIO 4 KB (MB/s) Random: 2346(R), 1363(W) Sequential: 2567(R),1444(W) SSD 800GB Intel SSDSC2BB800G7 FIO 4 KB (MB/s) Random: 250(R), 68(W) Sequential: 445(R),354(W) 20

Random Write Throughput  MatrixKV obtains the best performance in different value sizes  E.g. 4 KB value size MatrixKV outperforms RocksDB- L0-NVM and NoveLSM by 3.6x and 2.6x. 21

Write stalls 1. Better random write throughout. 2. MatrixKV has more stable throughput. Reduce write stalls! 22

Tail Latency Latency (us) avg. 90% 99% 99.9% RocksDB-SSD 974 566 11055 17983 NoveLSM 450 317 2080 2169 RocksDB-L0-NVM 477 786 1112 528 MatrixKV 263 247 405 663  MatrixKV obtains the shortest latency in all cases.  E.g. 99% latency of MatrixKV is 27x, 5x, and 1.9x lower than RocksDB-SSD, NoveLSM, and RocksDB-L0-NVM respectively. 23

Fine granularity column compaction  Why MatrixKV reduces write stalls ？ • 467 times column compaction • 0.33 GB each 24

Write amplification  The WA of randomly writing 80 GB dataset.  WA = Amount of data written to SSDs / Amount of data written by users  MatrixKV ’ WA is 3.43x.  MatrixKV reduces the number of compactions with flattened LSM-trees. 25

Summary  Conventional SSD-based KV stores • unpredictable performance due to write stalls • sacrificed performance due to WA  MatrixKV: an LSM-tree based KV store on systems with DRAM, NVM, and SSD storages • Matrix container in NVM • Column compaction • Hint search • Reducing levels on SSD  Reduce write stalls and improves write performance. 26

Thanks! Open-source code: https://github.com/PDS-Lab/MatrixKV Email: tingyao@hust.edu.cn 27

MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree - PowerPoint PPT Presentation

MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with a Matrix Container in NVM Ting Yao 1 , Yiwen Zhang 1 , Jiguang Wan 1 , Qiu Cui 2 , Liu Tang 2 , Hong Jiang 3 , Changsheng Xie 1 , and Xubin He 4 1 Huazhong

Lehrstuhl fr Systemsicherheit Amplification DDoS Attacks Marc Khrer SPRING 9 Bochum, 31.

MAST BOLOGNA, 25-26 OCTOBER, 2016 DEPArray User Meeting HER2 expression and amplification

Privacy Amplification by Mixing and Diffusion Mechanisms Borja Balle, Gilles Barthe, Marco

Magnetic Field Amplification in SNR by Richtmyer-Meshkov Instability K. Nishihara, T. Sano

P7 Information Session Welcome to Kilmarnock Academy Kil ilmarnock Academy - Stalls Please

Previous lecture stalls reduce performance but are required to get correct results

Reducing Write Amplification of Flash Storage through Cooperative Data Management with NVM 32nd

Mechanical Sympathy for Elephants Reducing I/O and memory stalls Thomas Munro, PGCon 2020

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

Dave Mark Intrinsic Algorithm Reducing the world to mathematical equations! Reducing

MEN ALSO LIKE SHOPPING REDUCING GENDER BIAS AMPLIFICATION USING CORPUS-LEVEL CONSTRAINTS Jieyu

Pre-amplification critically analysed Jo Vandesompele professor, Ghent University co-founder and

Data Amplification: Instance-Optimal Property Estimation Yi Hao and Alon Orlitsky {yih179, alon}@

Probabilistic Computation Lecture 15 Computing with Less Randomness, or with Imperfect

Combating DNS amplification attacks using Cookies Supervisor: Roland van Rijswijk SURFnet By:

Randomness in Computing L ECTURE 3 Last time Probability amplification Verifying matrix

Finding and Understanding Bugs in Software Model Checkers Chengyu Zhang , Ting Su, Yichen Yan,

Improving Twitter Retrieval by Exploiting Structural Information Zhunchen Luo, Miles

SBND Detector Ting Miao (FNAL) Neutrino - Latin America Workshop April 27, 2016 SBND - S hort- B

Automatic Outlier Detection: A Bayesian Approach Jo-Anne Ting, University of Southern California

Using Derivatives in an Economics Set- ting. LectroCopy makes photocopy machines, and

Grid/Clo d Comp ting Grid/Clo d Comp ting Grid/Cloud Computing Grid/Cloud Computing over

Repairing Four-Atom Conjecture Ting-Ting Nan Advisor: Nigel Boston SP Coding and Information

Ti ank God for Second Presbyterian Church! Authentic Worship Gracious Relationships Serious

Sambuz

Useful Links

Newsletter

Mail Us