scheduling problems in write optimized key value stores
play

Scheduling Problems in Write-Optimized Key-Value Stores Prashant - PowerPoint PPT Presentation

Scheduling Problems in Write-Optimized Key-Value Stores Prashant Pandey 1 Michael A. Bender 1 Rob Johnson 1,2 1 Stony Brook University, NY 2 VMware Research Key-Value Stores are Ubiquitous K1 Rob K2 Michael K3 Don K4 Bill K5 Jun K6


  1. Scheduling Problems in Write-Optimized Key-Value Stores Prashant Pandey 1 Michael A. Bender 1 Rob Johnson 1,2 1 Stony Brook University, NY 2 VMware Research

  2. Key-Value Stores are Ubiquitous K1 Rob K2 Michael K3 Don K4 Bill K5 Jun K6 Yang ● Can store and retrieve <key, value> pairs. ● KV stores are building blocks of databases, file systems, etc. ● Example: B-tree, Hash tables, etc. 2

  3. Write-Optimized Key-Value Stores ● State-of-the-art key-value stores are write optimized. ● I.e. they move data around in batches . ● Batching amortizes the I/O cost of moving data. ● Write-optimized tree are designed for external memory. ● Examples: B ε -trees or Log-structured merge trees. 3

  4. Main idea of this talk: how should we schedule these batch data moves? 4

  5. Outline ● B ε -tree and operations ● Operations analysis ● Tradeoff between latency and I/O efficiency ● Scheduling problem in batch data moves 5

  6. Insert Operation in a B ε -tree B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 6

  7. Insert Operation in a B ε -tree B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 7

  8. Insert Operation in a B ε -tree B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 8

  9. Insert Operation in a B ε -tree B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 9

  10. Insert Operation in a B ε -tree B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 10

  11. Insert Operation in a B ε -tree B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 11

  12. Insert Operation in a B ε -tree B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 12

  13. Insert Operation in a B ε -tree B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 13

  14. Insert Operation in a B ε -tree B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 14

  15. Insert Operation in a B ε -tree #Messages going to one B - B ε Message buffer child must be at least B ε Pivots (B- B ε ) / B ε ≈ B 1-ε ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 15

  16. Insert Operation in a B ε -tree B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 16

  17. Insert Operation in a B ε -tree B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 17

  18. Insert Operation in a B ε -tree B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 18

  19. Insert Operation in a B ε -tree #Messages going to one B - B ε Message buffer child must be at least B ε Pivots (B- B ε ) / B ε ≈ B 1-ε ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 19

  20. Query Operation in a B ε -tree B - B ε Message buffer B ε Pivots Result ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 20

  21. B ε -tree B - B ε Message buffer 0 < ε < 1 B ε Pivots ... ≈ B ε children ... O ( log Bε N) B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε ... ≈ N / B leaves ... 21

  22. Performance Model ● How computation works ○ Data is transferred in blocks between RAM and disk. ○ The number of block transfers dominates the running time. ● Goal: minimize number of block transfers ○ Performance bounds are parameterized by block size B , memory size M , data size N . B RAM Disk B M 22

  23. Operations Insert query Range query B-tree Log B N log B N log B N + k/N B ε -tree Log B N / εB 1-ε log B N / ε log B N / ε + k/N B ε -tree (ε = 1/2) log B N / √B log B N log B N + k/N 23

  24. Operations Insert query Range query B-tree Log B N log B N log B N + k/N B ε -tree Log B N / εB 1-ε log B N / ε log B N / ε + k/N B ε -tree ( ε = 1/2) log B N / √B log B N log B N + k/N 24

  25. Moving More than B 1-ε Messages in a Flush B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 25

  26. Moving More than B 1-ε Messages in a Flush B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 26

  27. Moving More than B 1-ε Messages in a Flush B - B ε Message buffer B ε Pivots Flushing > B 1-ε messages during a flush to a ….. child reduces I/O costs per insert. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 27

  28. 28

  29. Avalanche B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 29

  30. Avalanche B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 30

  31. Avalanche B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 31

  32. Avalanche B - B ε Message buffer B ε Pivots ….. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 32

  33. Avalanche B - B ε Message buffer B ε Pivots An avalanche can increase the latency of an ….. operation. B - B ε B - B ε B ε B ε ….. ….. B - B ε B ε 33

  34. Flushing tradeoff ● Flushing less number of messages to a child can result in sub-optimal I/O performance. ● Flushing a lot of messages to a child can cause an avalanche. 34

  35. Scheduling Problem ● We now have a scheduling problem. ● Flushes are scheduled every εB 1-ε / log B N inserts. ● We can allow nodes to grow larger temporarily. 35

  36. Is there a schedule in which if we pick a point and flush to a chosen child we can bound the maximum size of a node? 36

  37. Possible Strategies to Pick the Child to Flush To? ● Pick the child to which you can flush the most number of messages. ● Pick the largest child such and find its sub-child where you can flush messages to resize the child without causing an avalanche. 37

  38. References ● http://supertech.csail.mit.edu/papers/BenderFaJa15.pdf ● https://www.usenix.org/system/files/conference/fast15/fast1 5-paper-jannen_william.pdf ● https://www.usenix.org/system/files/conference/fast16/fast1 6-papers-yuan.pdf 38

  39. Thank You!

  40. Abstract Write-optimized key-value stores, such as B ε -trees, are the state-of-the-art key-value stores. B ε -trees move data around in batches thereby amortizing the I/O cost of moving data. During batch data moves in practice, we see an inherent tension between operation latency and I/O bandwidth utilization in B ε -trees trees. This talk presents an open problem on how to schedule batch data moves in a B ε -tree. 41

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend