scaling log structured kv stores
play

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky - PowerPoint PPT Presentation

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan Log-Structured KV-Stores Log-Structured KV-Stores Why Log-Structured KV-Stores? Why Log-Structured KV-Stores? fast writes Why Log-Structured


  1. Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

  2. Log-Structured KV-Stores

  3. Log-Structured KV-Stores

  4. Why Log-Structured KV-Stores?

  5. Why Log-Structured KV-Stores? fast writes

  6. Why Log-Structured KV-Stores? memory storage

  7. Why Log-Structured KV-Stores?

  8. Why Log-Structured KV-Stores?

  9. Why Log-Structured KV-Stores? byte -addressable block -addressable

  10. write data

  11. write data

  12. write data

  13. In-Place Writes write data

  14. In-Place Writes B-trees write data

  15. In-Place Writes B-trees write data

  16. Log-Structured Writes

  17. Log-Structured Writes buffer writes

  18. Log-Structured Writes buffer writes

  19. Log-Structured Writes buffer writes

  20. Log-Structured Writes buffer writes

  21. Log-Structured Writes buffer writes

  22. Log-Structured KV-Stores fast writes buffer writes

  23. Log-Structured KV-Stores fast writes fast reads massive data

  24. Background

  25. Background buffer The Log-Structured Merge-Tree

  26. Background buffer LSM-tree

  27. buffer

  28. writes buffer

  29. key value pairs buffer

  30. key value Sherlock: a fictional detective Waldo: an inconspicuous traveler buffer

  31. buffer gets full

  32. level buffer sort & flush 0 1

  33. level buffer sort & flush … sorted runs 0 1

  34. 0 buffer 1 sort-merge 2

  35. level 0 buffer exponentially increasing capacities o n e 1 level 1 I / O p e r r u n level 2 2 level 3 3

  36. where’s level Waldo 0 buffer b i n a 1 r y s e a r c h i n g 2 3

  37. where’s level Waldo 0 buffer pointers o n e 1 I / O p e r r u n 2 3

  38. where’s level Waldo Bloom 0 buffer pointers filters 1 2 3

  39. where’s level Waldo Bloom 0 buffer pointers filters true 1 negative 2 3

  40. where’s level Waldo Bloom 0 buffer pointers filters true 1 negative false 2 positive 3

  41. where’s level Waldo Bloom 0 buffer pointers filters true 1 negative false 2 positive true 3 positive

  42. Bloom 0 buffer pointers filters merging frequency 1 2 3

  43. merging writes reads

  44. merging writes reads

  45. merging Leveling Tiering write-optimized read-optimized

  46. Leveling Tiering read-optimized write-optimized

  47. Leveling Tiering read-optimized write-optimized gather

  48. Leveling Tiering read-optimized write-optimized gather merge & flush

  49. Leveling Tiering read-optimized write-optimized gather

  50. Leveling Tiering read-optimized write-optimized gather merge

  51. Leveling Tiering read-optimized write-optimized gather merge flush

  52. Leveling Tiering read-optimized write-optimized gather merge

  53. Leveling Tiering read-optimized write-optimized log R ( N )

  54. Leveling Tiering read-optimized write-optimized 1 run per level R runs per level log R ( N ) size ratio

  55. Leveling Tiering read-optimized write-optimized 1 run per level R runs per level log R ( N ) size ratio

  56. Leveling Tiering read-optimized write-optimized 1 run per level R runs per level size ratio R

  57. Leveling Tiering read-optimized write-optimized 1 run per level 1 run per level size ratio R

  58. Leveling Tiering read-optimized write-optimized 1 run per level T runs per level size ratio R

  59. Leveling Tiering read-optimized write-optimized O(l Nl ) runs per level 1 run per level sorted log array size ratio R

  60. log Tiering Leveling sorted array

  61. log Tiering size ratio R Leveling sorted array

  62. log Tiering size ratio R Leveling sorted array

  63. R log Tiering size ratio R Leveling sorted R array

  64. Monkey Dostoevsky

  65. M onkey: O ptimal N avigable Key -Value Store SIGMOD17

  66. M onkey: O ptimal N avigable Key -Value Store SIGMOD17 Niv Dayan Manos Athanassoulis 
 Stratos Idreos

  67. M onkey: O ptimal N avigable Key -Value Store SIGMOD17 Bloom data filters

  68. Bloom data bits/entry filters x x x

  69. Bloom data bits/entry filters x x x

  70. false Bloom data positive rate filters O(e -x ) O(e -x ) O(e -x )

  71. false Bloom positive rate filters O(e -x ) O( e -x · log R ( N )) I/O O(e -x ) = O(e -x )

  72. false Bloom positive rate filters O(e -x ) O( e -x · log R ( N )) I/O O(e -x ) = O(e -x )

  73. false Bloom positive rate filters O(e -x ) O(e -x ) O(e -x ) most memory

  74. false Bloom positive rate filters O(e -x ) O(e -x ) O(e -x ) most memory saves at most 1 I/O!

  75. reallocate

  76. reallocate

  77. same memory - fewer false positives reallocate

  78. relax false positive rates 0 < p 0 < 1 0 < p 1 < 1 0 < p 2 < 1

  79. model relax read false positive rates = f( p 0 , p 1 …) cost 0 < p 0 < 1 0 < p 1 < 1 memory = f( p 0 , p 1 …) footprint 0 < p 2 < 1

  80. model relax L read ∑ false positive rates = p i cost 1 0 < p 0 < 1 0 < p 1 < 1 L memory T L − i ⋅ ln( p i ) N ∑ = − ln(2) 2 footprint 0 < p 2 < 1 i

  81. model relax optimize L read ∑ false positive rates = p i cost 1 0 < p 0 < 1 0 < p 1 < 1 L memory T L − i ⋅ ln( p i ) N ∑ = in terms of p 0 , p 1 … − ln(2) 2 footprint 0 < p 2 < 1 i

  82. false positive rate p 0 ≈ O( e -x / R 2 ) p 1 ≈ O( e -x / R 1 ) O( e -x / R 0 ) p 2 ≈

  83. false positive rate geometric O( e -x /R 2 ) progression = O(e - x ) I/O O( e -x /R 1 ) O( e -x /R 0 )

  84. O( e -x · log R ( N )) > O( e - x ) I/O

  85. O( e -x · log R ( N )) O( e - x ) I/O

  86. O( e -x · log R ( N )) read latency (ms) RocksDB Monkey O( e - x ) I/O number of entries (log scale)

  87. Existing Monkey

  88. Existing Monkey Dostoevsky

  89. tiering Monkey leveling

  90. I/O overheads with leveling point long range short range writes

  91. point false positive rates O( e - x / R 2 ) exponentially O( e - x / R ) decreasing O( e - x )

  92. false positive rates O(e - x / R 2 ) O(e - x / R ) O(e - x ) largest level point

  93. point long range short range writes largest level O(e - x )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend