Introducing LSM-tree into PostgreSQL, making it as a data gobbler - PowerPoint PPT Presentation

Introducing LSM-tree into PostgreSQL, making it as a data gobbler By Shichao Jin From https://zhuanlan.zhihu.com/p/56259025

About Me 8 years experience in DBMS R&D ● Redshift AWS and Facebook RocksDB ● Ex-PhD student at UWaterloo ● Fan of PostgreSQL ● Founder of VidarDB ● 2

Outline ● Experiment & Analysis ● LSM-tree Introduction ● Our Implementation 3

Background ● Storage engine accounts for a big chunk of the performance Data Structure ● What is “storage engine”? Storage Engine incarnation of a data DBMS structure 4

Background ● B+tree/B-tree: InnoDB, BerkeleyDB, WiredTiger ● LSM-tree: LevelDB, RocksDB 5

Experiment Setting Hardware : CPU: Xeon E3-1265L, 2.5GHz, 4 cores Disk: 7200 RPM RAM: 16GB DRR 3 Software : TPC-H 10GB, shuffled ! Ubuntu 18.04, PostgreSQL11.6, RocksDB 6.2, default setting libpqxx 6

Experiment Results 7

Why huge difference? Data structure! PostgreSQL uses B+tree ! So what’s wrong with B+tree? 8

B+tree Reading Pattern Rare case: data come in order Common case: data come in random order Red rectangle means disk page cached! 9

Problem with B+tree I only talk about the insertion case here, but the root cause comes from the reading pattern of B+tree. Every insertion of B+tree needs to trigger random reading to locate the exact position of B+tree. Note: Reading uses cache ! 10

Problem with B+tree (Cont.) Data is large, and usually come not in the order of primary key, the previous cached disk blocks are useless! B+tree insertion causes too many disk access! 11

Disk vs. SSD vs. RAM in my laptop Sequential vs. Random Seq Magnetic Disk SSD RAM Ran 12

Solution No touch of disk (excluding WAL) when insertion! Tiered design. LSM-tree (Log-structured merge-tree) comes to rescue! RAM SSD/Disk 13

LSM-tree Immutable table Mutable table new tuple Tail Index A ~ Z A ~ Z RAM Cache Disk A ~ Z A ~ Z A ~ Z L0 SST A ~ F G ~ N O ~ Z L1 SST A ~ G H ~ O P ~ Z L2 SST SST: Sorted String Table 14

LSM-tree Skiplist Immutable table Mutable table new tuple O(logN) Tail Index A ~ Z A ~ Z RAM Cache Disk A ~ Z A ~ Z A ~ Z L0 SST Sorted A ~ F G ~ N O ~ Z L1 SST kv pairs A ~ G H ~ O P ~ Z L2 SST 15

Basic Operations Put(key, value), Delete(key) Immutable table Mutable table new tuple K | 2020 Tail WAL Index A ~ Z A ~ Z RAM overwrite Cache Disk A ~ Z A ~ Z A ~ Z L0 SST K | 2019 A ~ F G ~ N O ~ Z L1 SST A ~ G H ~ O P ~ Z L2 SST 16

Basic Operations Get(Key), return value Immutable table Mutable table new tuple Tail Index A ~ Z A ~ Z RAM Cache Disk A ~ Z A ~ Z A ~ Z L0 SST A ~ F G ~ N O ~ Z L1 SST A ~ G H ~ O P ~ Z L2 SST 17

Basic Operations Iterator(): seek & next Immutable table Mutable table new tuple Tail C Index A ~ Z A ~ Z A RAM Cache D E Disk A ~ Z A ~ Z A ~ Z L0 SST B F A ~ F G ~ N O ~ Z L1 SST G A ~ G H ~ O P ~ Z L2 SST 18

Advanced Operations Flush(): RAM → L0, usually by the background thread Immutable table Mutable table new tuple Tail Index A ~ Z A ~ Z RAM Cache Disk A ~ Z A ~ Z A ~ Z L0 SST A ~ F G ~ N O ~ Z L1 SST A ~ G H ~ O P ~ Z L2 SST 19

Advanced Operations Compact(k1, k2): L0 → L1, L1 → L2, usually called by the background thread Immutable table Mutable table new tuple Tail Write Read Index A ~ Z A ~ Z RAM Cache Disk A ~ Z A ~ Z A ~ Z L0 SST RAM size A ~ F G ~ N O ~ Z L1 SST A ~ G H ~ O P ~ Z L2 SST 20

Advanced Operations Snapshot(seq): sequence number, which is always increasing Immutable table Mutable table new tuple Tail Index A ~ Z A ~ Z RAM Tuple Cache multi-version Disk A ~ Z A ~ Z A ~ Z L0 SST component A ~ F G ~ N O ~ Z L1 SST A ~ G H ~ O P ~ Z L2 SST 21

Implementation Foreign data wrapper (FDW) ● RocksDB is based on the multi-thread model ● Shared memory ● threads Process: Select, RocksDB Update…. Shared memory 22

More To Do Engine recognizes the data types of PostgreSQL ● Transaction? ● thread Migrate to pluggable storage engine, table AM ● 23

Motivation of designing a new data structure 1. random R/W of RAM is 2000X faster than magnetic disk, but 120X more expensive 2. Ratio of RAM/disk size is 40~200X larger than 1970s, when B+tree is designed 3. New storage choice, NVRAM 4. Requires expertise to choose different engines, data systems 24

Thanks! Shichao@vidardb.com Twitter: @jsc0218 github.com/vidardb/vidardb github.com/vidardb/PostgresForeignDataWrapper 25

Introducing LSM-tree into PostgreSQL, making it as a data gobbler - PowerPoint PPT Presentation

Introducing LSM-tree into PostgreSQL, making it as a data gobbler By Shichao Jin From https://zhuanlan.zhihu.com/p/56259025 About Me 8 years experience in DBMS R&D Redshift AWS and Facebook RocksDB Ex-PhD student at UWaterloo

LSM-trie An LSM-tree-based Ultra-Large Key-Value Store for small Data by: Xingbo Wu, Yuehai Xu,

LSM-trie: An LSM-tree-based Ultra- Large Key-Value Store for Small Data Xingbo Wu, Yuehai Xu, Zili

LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Xingbo Wu , Yuehai Xu ,

PostgreSQL Who, What, When, Where, Why, How? 1 QUIS? Who's involved with PostgreSQL? Core

Hacking PostgreSQL Stephen Frost Crunchy Data stephen@crunchydata.com FOSDEM 2019 February 3,

Hacking PostgreSQL Stephen Frost Crunchy Data stephen@crunchydata.com PGConf.EU 2018 October

PostgreSQL SQL-MED Ibrar Ahmed Senior Software Engineer @ Percona PostgreSQL Consultant What?

Breaking PostgreSQL at Scale. Christophe Pettus PostgreSQL Experts pgDay Paris 2019

PostgreSQL Provider The PostgreSQL provider gives the ability to deploy and congure resources

Stateful access control using LSM CS547 Thomas Uphill Stateful access cont rol using LSM 11

LSM SM-Tr Trie ie: : An An LSM SM-tre ree-base ased d Ultra-Lar arge ge Ke Key-Va Valu

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Look It Up: Practical PostgreSQL Indexing Christophe Pettus PostgreSQL Experts

PostgreSQL Replication Christophe Pettus PostgreSQL Experts PerconaLive, April 25, 2018

Securing PostgreSQL Christophe Pettus PostgreSQL Experts, Inc. PGDay FOSDEM 2018 Greetings!

Hosted PostgreSQL: An Objective Look Christophe Pettus PostgreSQL Experts, Inc. FOSDEM PGDay

Physical Database Design Basic considerations: Data independence: The user should be

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Massive Data Algorithmics Lecture 5: External Search Trees Massive Data Algorithmics Lecture 5:

B-T Inverters : Bayer ' ? McCreight , 1970 Boeing Research Labs . - tree ? why B- ? ? Bayer ,

Reminder Final exam File Organization The date for the Final has been decided:

Tirgul 6 Primary memory (RAM) : very fast, but costly Secondary storage (disk) : very cheap,

Chapter 8 Multiway Trees This chapter studies multiway trees. These trees can be used for

The Essentials of CAGD Chapter 11: Working with B-spline Curves Gerald Farin & Dianne

Introducing LSM-tree into PostgreSQL, making it as a data gobbler - PowerPoint PPT Presentation

Introducing LSM-tree into PostgreSQL, making it as a data gobbler By Shichao Jin From https://zhuanlan.zhihu.com/p/56259025 About Me 8 years experience in DBMS R&D Redshift AWS and Facebook RocksDB Ex-PhD student at UWaterloo

LSM-trie An LSM-tree-based Ultra-Large Key-Value Store for small Data by: Xingbo Wu, Yuehai Xu,

LSM-trie: An LSM-tree-based Ultra- Large Key-Value Store for Small Data Xingbo Wu, Yuehai Xu, Zili

LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Xingbo Wu , Yuehai Xu ,

PostgreSQL Who, What, When, Where, Why, How? 1 QUIS? Who's involved with PostgreSQL? Core

Hacking PostgreSQL Stephen Frost Crunchy Data stephen@crunchydata.com FOSDEM 2019 February 3,

Hacking PostgreSQL Stephen Frost Crunchy Data stephen@crunchydata.com PGConf.EU 2018 October

PostgreSQL SQL-MED Ibrar Ahmed Senior Software Engineer @ Percona PostgreSQL Consultant What?

Breaking PostgreSQL at Scale. Christophe Pettus PostgreSQL Experts pgDay Paris 2019

PostgreSQL Provider The PostgreSQL provider gives the ability to deploy and congure resources

Stateful access control using LSM CS547 Thomas Uphill Stateful access cont rol using LSM 11

LSM SM-Tr Trie ie: : An An LSM SM-tre ree-base ased d Ultra-Lar arge ge Ke Key-Va Valu

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Look It Up: Practical PostgreSQL Indexing Christophe Pettus PostgreSQL Experts

PostgreSQL Replication Christophe Pettus PostgreSQL Experts PerconaLive, April 25, 2018

Securing PostgreSQL Christophe Pettus PostgreSQL Experts, Inc. PGDay FOSDEM 2018 Greetings!

Hosted PostgreSQL: An Objective Look Christophe Pettus PostgreSQL Experts, Inc. FOSDEM PGDay

Physical Database Design Basic considerations: Data independence: The user should be

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Massive Data Algorithmics Lecture 5: External Search Trees Massive Data Algorithmics Lecture 5:

B-T Inverters : Bayer ' ? McCreight , 1970 Boeing Research Labs . - tree ? why B- ? ? Bayer ,

Reminder Final exam File Organization The date for the Final has been decided:

Tirgul 6 Primary memory (RAM) : very fast, but costly Secondary storage (disk) : very cheap,

Chapter 8 Multiway Trees This chapter studies multiway trees. These trees can be used for

The Essentials of CAGD Chapter 11: Working with B-spline Curves Gerald Farin &amp; Dianne

The Essentials of CAGD Chapter 11: Working with B-spline Curves Gerald Farin & Dianne