DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - - PowerPoint PPT Presentation
DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - - PowerPoint PPT Presentation
DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #11: OLTP INDEXES (PART 1) 2 ADMINISTRIVIA Midterm-Topics HW#3 Programming Assignment Due Date Project Topics 3 TODAYS AGENDA T-Tree Skip List Bw-Tree
ADMINISTRIVIA
Midterm-Topics HW#3 Programming Assignment Due Date Project Topics
2
TODAY’S AGENDA
T-Tree Skip List Bw-Tree
3
T-TREES
Based on AVL Trees. Instead of storing keys in nodes, store pointers to their original values. Proposed in 1986 from Univ. of Wisconsin Used in TimesTen and other early in-memory DBMSs during the 1990s.
4
A STUDY OF INDEX STRUCTURES FOR MAIN MEMORY DATABASE MANAGEMENT SYSTEMS VLDB 1986
T-TREES
5
T-Tree Node
T-TREES
6
Min-K
¤
Max-K
¤
Parent Pointer Right Child Pointer Left Child Pointer
¤ ¤ ¤ ¤
T-Tree Node
T-TREES
7
Min-K
¤
Max-K
¤
Parent Pointer Right Child Pointer Left Child Pointer
¤ ¤ ¤ ¤
Data Pointers
T-Tree Node
T-TREES
8
Min-K
¤
Max-K
¤
Parent Pointer Right Child Pointer Left Child Pointer
¤ ¤ ¤ ¤
Node Boundaries
Key Space (Low→High) T-Tree Node
T-TREES
9
Min-K
¤
Max-K
¤
Parent Pointer Right Child Pointer Left Child Pointer
¤ ¤ ¤ ¤
1 2 3 4 5 6 7
Key Space (Low→High) T-Tree Node
T-TREES
10
Min-K
¤
Max-K
¤
Parent Pointer Right Child Pointer Left Child Pointer
¤ ¤ ¤ ¤
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Key Space (Low→High) T-Tree Node
T-TREES
11
Min-K
¤
Max-K
¤
Parent Pointer Right Child Pointer Left Child Pointer
¤ ¤ ¤ ¤
1 2 3 4 5 6 7 1 2 3 4 5 6 7
T-TREES
Searching in a T-Tree is similar to searching in a binary tree. The key difference is that comparisons are made with the min and max values of the node rather than a single value as in a binary tree node.
12
A STUDY OF INDEX STRUCTURES FOR MAIN MEMORY DATABASE MANAGEMENT SYSTEMS VLDB 1986
T-TREES
Advantages
→ Uses less memory because it does not store keys inside of each node. → Inner nodes contain key/value pairs (like B-Tree).
Disadvantages
→ Traditional assumption of memory references having uniform cost is no longer valid given the current speed gap between cache access and main memory access. → Difficult to rebalance and support concurrent access. → Have to chase pointers when scanning range or performing binary search inside of a node.
13
OBSERVATION
The easiest way to implement a dynamic order- preserving index is to use a sorted linked list. All operations have to linear search.
→ Average Cost: O(N)
14
K1 K2 K3 K4 K6 K5 K7
OBSERVATION
The easiest way to implement a dynamic order- preserving index is to use a sorted linked list. All operations have to linear search.
→ Average Cost: O(N)
15
K1 K2 K3 K4 K6 K5 K7
OBSERVATION
The easiest way to implement a dynamic order- preserving index is to use a sorted linked list. All operations have to linear search.
→ Average Cost: O(N)
16
K1 K2 K3 K4 K6 K5 K7
SKIP LISTS
Multiple levels of linked lists with extra pointers that skip over intermediate nodes. Maintains keys in sorted order without requiring global rebalancing. Probabilistic data structure.
17
SKIP LISTS: A PROBABILISTIC ALTERNATIVE TO BALANCED TREES CACM Volume 33 Issue 6 1990
SKIP LISTS
A collection of lists at different levels
→ Lowest level is a sorted, singly linked list of all keys → 2nd level links every other key → 3rd level links every fourth key → In general, a level has half the keys of one below it
To insert a new key, flip a coin to decide how many levels to add the new key into. Provides approximate O(log n) search times.
18
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: EXAMPLE
19
∞ ∞ ∞
P=N P=N/2 P=N/4
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: EXAMPLE
20
∞ ∞ ∞
P=N P=N/2 P=N/4
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: EXAMPLE
21
∞ ∞ ∞
P=N P=N/2 P=N/4
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: EXAMPLE
22
∞ ∞ ∞
P=N P=N/2 P=N/4
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: EXAMPLE
23
∞ ∞ ∞
P=N P=N/2 P=N/4
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: EXAMPLE
24
∞ ∞ ∞
P=N P=N/2 P=N/4
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: EXAMPLE
25
∞ ∞ ∞
P=N P=N/2 P=N/4
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
26
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
27
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
28
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
K5 V5 K5 K5
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
29
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
K5 V5 K5 K5
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
30
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
K5 V5 K5 K5
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: SEARCH
31
∞ ∞ ∞
P=N P=N/2 P=N/4
Find K3
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: SEARCH
32
∞ ∞ ∞
P=N P=N/2 P=N/4
Find K3
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: SEARCH
33
∞ ∞ ∞
P=N P=N/2 P=N/4
Find K3
K3<K5
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: SEARCH
34
∞ ∞ ∞
P=N P=N/2 P=N/4
Find K3
K3<K5 K3>K2
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: SEARCH
35
∞ ∞ ∞
P=N P=N/2 P=N/4
Find K3
K3<K5 K3>K2 K3<K4
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: SEARCH
36
∞ ∞ ∞
P=N P=N/2 P=N/4
Find K3
K3<K5 K3>K2 K3<K4
SKIP LISTS: ADVANTAGES
Uses less memory than a typical B+tree (only if you don’t include reverse pointers). Insertions and deletions do not require rebalancing. It is possible to implement a concurrent skip list using only CAS instructions.
37
CONCURRENT SKIP LIST
Can implement insert and delete without locks using only CaS operations. The data structure only support links in one direction because CaS can only swap one pointer atomically.
38
CONCURRENT MAINTENANCE OF SKIP LISTS
- Univ. of Maryland Tech Report 1990
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
39
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
40
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
41
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
K5 K5 K5 V5
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
42
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
K5 K5 K5 V5
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
43
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
K5 K5 K5 V5
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
44
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
K5 K5 K5 V5
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
45
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
K5 K5 K5 V5 CaS
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
46
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
K5 K5 K5 V5 CaS
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
47
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
K5 K5 K5 V5 CaS CaS
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
48
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
K5 K5 K5 V5 CaS CaS CaS
End Levels
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
SKIP LISTS: INSERT
49
∞ ∞ ∞
P=N P=N/2 P=N/4
Insert K5
K5 K5 K5 V5 CaS CaS CaS
SKIP LISTS: DELETE
First logically remove a key from the index by setting a flag to tell threads to ignore. Then physically remove the key once we know that no other thread is holding the reference.
→ Perform CaS to update the predecessor’s pointer.
50
Source: Stephen Tu
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: DELETE
51
∞ ∞ ∞
P=N P=N/2 P=N/4
Delete K5
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: DELETE
52
∞ ∞ ∞
P=N P=N/2 P=N/4
Delete K5
Del
false
Del
false
Del
false
Del
false
Del
false
Del
false
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: DELETE
53
∞ ∞ ∞
P=N P=N/2 P=N/4
Delete K5
Del
false
Del
false
Del
false
Del
false
Del
false
Del
false
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: DELETE
54
∞ ∞ ∞
P=N P=N/2 P=N/4
Delete K5
Del
false
Del
false
Del
false
Del
false
Del
false
Del
false
Del
true
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: DELETE
55
∞ ∞ ∞
P=N P=N/2 P=N/4
Delete K5
Del
false
Del
false
Del
false
Del
false
Del
false
Del
false
Del
true
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: DELETE
56
∞ ∞ ∞
P=N P=N/2 P=N/4
Delete K5
Del
false
Del
false
Del
false
Del
false
Del
false
Del
false
Del
true
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: DELETE
57
∞ ∞ ∞
P=N P=N/2 P=N/4
Delete K5
Del
false
Del
false
Del
false
Del
false
Del
false
Del
false
Del
true
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6
Levels
SKIP LISTS: DELETE
58
∞ ∞ ∞
P=N P=N/2 P=N/4
Delete K5
Del
false
Del
false
Del
false
Del
false
Del
false
CONCURRENT SKIP LIST
Be careful about how you order operations. If the DBMS invokes operation on the index, it can never “fail”
→ A txn can only abort due to higher-level conflicts. → If a CaS fails, then the index will retry until it succeeds.
59
SKIP LIST OPTIMIZATIONS
Reducing RAND() invocations. Packing multiple keys in a node. Reverse iteration with a stack. Reusing nodes with memory pools.
60
SKIP LISTS: DONE RIGHT Ticki(?) Blog 2016
SKIP LIST: COMBINE NODES
Store multiple keys in a single node.
→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.
61
K2 V2 K3 V3 K6 V6
Source: Ticki
SKIP LIST: COMBINE NODES
Store multiple keys in a single node.
→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.
62
K2 V2 K3 V3 K6 V6
- Source: Ticki
SKIP LIST: COMBINE NODES
Store multiple keys in a single node.
→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.
63
K2 V2 K3 V3 K6 V6
- Insert K4
Source: Ticki
SKIP LIST: COMBINE NODES
Store multiple keys in a single node.
→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.
64
K2 V2 K3 V3 K6 V6
- Insert K4
Source: Ticki
SKIP LIST: COMBINE NODES
Store multiple keys in a single node.
→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.
65
K2 V2 K3 V3 K6 V6
- K4
V4
Insert K4
Source: Ticki
SKIP LIST: COMBINE NODES
Store multiple keys in a single node.
→ Insert Key: Find the node where it should go and look for a free slot. Perform CaS to store new key. If no slot is available, insert new node. → Search Key: Perform linear search on keys in each node.
66
K2 V2 K3 V3 K6 V6
- K4
V4
Search K6
Source: Ticki
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: REVERSE SEARCH
67
∞ ∞ ∞
P=N P=N/2 P=N/4
Find [K4,K2]
Source: Mark Papadakis
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: REVERSE SEARCH
68
∞ ∞ ∞
P=N P=N/2 P=N/4
Find [K4,K2]
Source: Mark Papadakis
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: REVERSE SEARCH
69
∞ ∞ ∞
P=N P=N/2 P=N/4
Find [K4,K2]
K2<K5
Source: Mark Papadakis
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: REVERSE SEARCH
70
∞ ∞ ∞
P=N P=N/2 P=N/4
Find [K4,K2]
K2<K5 K2=K2
Source: Mark Papadakis
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: REVERSE SEARCH
71
∞ ∞ ∞
P=N P=N/2 P=N/4
Find [K4,K2]
K2<K5 K2=K2
Source: Mark Papadakis
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: REVERSE SEARCH
72
∞ ∞ ∞
P=N P=N/2 P=N/4
Find [K4,K2]
K2<K5 K2=K2
Stack:
Source: Mark Papadakis
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: REVERSE SEARCH
73
∞ ∞ ∞
P=N P=N/2 P=N/4
Find [K4,K2]
K2<K5 K2=K2
Stack: K2
Source: Mark Papadakis
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: REVERSE SEARCH
74
∞ ∞ ∞
P=N P=N/2 P=N/4
Find [K4,K2]
K2<K5 K2=K2
Stack: K2 K3
Source: Mark Papadakis
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: REVERSE SEARCH
75
∞ ∞ ∞
P=N P=N/2 P=N/4
Find [K4,K2]
K2<K5 K2=K2
Stack: K2 K4 K3
Source: Mark Papadakis
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: REVERSE SEARCH
76
∞ ∞ ∞
P=N P=N/2 P=N/4
Find [K4,K2]
K2<K5 K2=K2
Stack: K2 K4 K3
Source: Mark Papadakis
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: REVERSE SEARCH
77
∞ ∞ ∞
P=N P=N/2 P=N/4
Find [K4,K2]
K2<K5 K2=K2
Stack: K2 K4 K3
Source: Mark Papadakis
End
K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6
Levels
SKIP LISTS: REVERSE SEARCH
78
∞ ∞ ∞
P=N P=N/2 P=N/4
Find [K4,K2]
K2<K5 K2=K2
Stack: K2 K4 K3
Source: Mark Papadakis
OBSERVATION
Because CaS only updates a single address at a time, this limits the design of our data structures
→ We cannot have reverse pointers in a latch-free concurrent Skip List. → We cannot build a latch-free B+Tree.
What if we had an indirection layer that allowed us to update multiple addresses atomically?
79
BW-TREE
Latch-free B+Tree index
→ Threads never need to set latches or block.
Key Idea #1: Deltas
→ No updates in place → Reduces cache invalidation.
Key Idea #2: Mapping Table
→ Allows for CaS of physical locations of pages.
80
THE BW-TREE: A B-TREE FOR NEW HARDWARE ICDE 2013
BW-TREE: MAPPING TABLE
81
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer Index Page
BW-TREE: MAPPING TABLE
82
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer Index Page
102 101 104
BW-TREE: MAPPING TABLE
83
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer Index Page
102 101 104
BW-TREE: MAPPING TABLE
84
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer Index Page
BW-TREE: MAPPING TABLE
85
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
102 104 102 104
Index Page
BW-TREE: DELTA UPDATES
86
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer Page 102
Source: Justin Levandoski
BW-TREE: DELTA UPDATES
87
Each update to a page produces a new delta. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer Page 102
Source: Justin Levandoski
BW-TREE: DELTA UPDATES
88
Each update to a page produces a new delta. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
Source: Justin Levandoski
BW-TREE: DELTA UPDATES
89
Each update to a page produces a new delta. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
Delta physically points to base page.
Source: Justin Levandoski
BW-TREE: DELTA UPDATES
90
Each update to a page produces a new delta. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
Install delta address in physical address slot of mapping table using CAS. Delta physically points to base page.
Source: Justin Levandoski
BW-TREE: DELTA UPDATES
91
Each update to a page produces a new delta. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
Install delta address in physical address slot of mapping table using CAS. Delta physically points to base page.
Source: Justin Levandoski
BW-TREE: DELTA UPDATES
92
Each update to a page produces a new delta. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
Install delta address in physical address slot of mapping table using CAS. Delta physically points to base page.
Source: Justin Levandoski
BW-TREE: DELTA UPDATES
93
Each update to a page produces a new delta. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48
Install delta address in physical address slot of mapping table using CAS. Delta physically points to base page.
Source: Justin Levandoski
BW-TREE: DELTA UPDATES
94
Each update to a page produces a new delta. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48
Install delta address in physical address slot of mapping table using CAS. Delta physically points to base page.
Source: Justin Levandoski
BW-TREE: SEARCH
95
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48
BW-TREE: SEARCH
96
Traverse tree like a regular B+tree. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48
BW-TREE: SEARCH
97
Traverse tree like a regular B+tree. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48
If mapping table points to delta chain, stop at first
- ccurrence of search key.
BW-TREE: SEARCH
98
Traverse tree like a regular B+tree. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48
Otherwise, perform binary search on base page. If mapping table points to delta chain, stop at first
- ccurrence of search key.
BW-TREE: CONTENTION UPDATES
99
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
BW-TREE: CONTENTION UPDATES
100
Threads may try to install updates to same state of the page. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
BW-TREE: CONTENTION UPDATES
101
Threads may try to install updates to same state of the page. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 16
BW-TREE: CONTENTION UPDATES
102
Threads may try to install updates to same state of the page. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48
Winner succeeds, any losers must retry or abort
▲Insert 16
BW-TREE: CONTENTION UPDATES
103
Threads may try to install updates to same state of the page. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48
Winner succeeds, any losers must retry or abort
▲Insert 16
BW-TREE: CONTENTION UPDATES
104
Threads may try to install updates to same state of the page. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48
Winner succeeds, any losers must retry or abort
▲Insert 16
BW-TREE: CONTENTION UPDATES
105
Threads may try to install updates to same state of the page. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48
Winner succeeds, any losers must retry or abort
▲Insert 16
X
BW-TREE: DELTA TYPES
Record Update Deltas
→ Insert/Delete/Update of record on a page
Structure Modification Deltas
→ Split/Merge information
106
BW-TREE: CONSOLIDATION
107
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
BW-TREE: CONSOLIDATION
108
Consolidate updates by creating new page with deltas applied. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
BW-TREE: CONSOLIDATION
109
Consolidate updates by creating new page with deltas applied. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102
BW-TREE: CONSOLIDATION
110
Consolidate updates by creating new page with deltas applied. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102
▲Insert 50
BW-TREE: CONSOLIDATION
111
Consolidate updates by creating new page with deltas applied. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102
BW-TREE: CONSOLIDATION
112
Consolidate updates by creating new page with deltas applied. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48
CAS-ing the mapping table address ensures no deltas are missed.
▲Insert 55
New 102
BW-TREE: CONSOLIDATION
113
Consolidate updates by creating new page with deltas applied. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48
CAS-ing the mapping table address ensures no deltas are missed.
▲Insert 55
New 102
BW-TREE: CONSOLIDATION
114
Consolidate updates by creating new page with deltas applied. Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48
CAS-ing the mapping table address ensures no deltas are missed.
▲Insert 55
New 102
Old page + deltas are marked as garbage.
BW-TREE: GARBAGE COLLECTION
Operations are tagged with an epoch
→ Each epoch tracks the threads that are part of it and the
- bjects that can be reclaimed.
→ Thread joins an epoch prior to each operation and post
- bjects that can be reclaimed for the current epoch (not
necessarily the one it joined)
Garbage for an epoch reclaimed only when all threads have exited the epoch.
115
BW-TREE: GARBAGE COLLECTION
116
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102
Epoch Table
BW-TREE: GARBAGE COLLECTION
117
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102 CPU1
Epoch Table
BW-TREE: GARBAGE COLLECTION
118
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102 CPU1
Epoch Table
CPU1
BW-TREE: GARBAGE COLLECTION
119
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102 CPU2 CPU1
Epoch Table
CPU1
BW-TREE: GARBAGE COLLECTION
120
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102 CPU2 CPU1
Epoch Table
CPU2 CPU1
BW-TREE: GARBAGE COLLECTION
121
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102 CPU2 CPU1
Epoch Table
CPU2 CPU1
BW-TREE: GARBAGE COLLECTION
122
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102 CPU2 CPU1
Epoch Table
CPU2 CPU1
BW-TREE: GARBAGE COLLECTION
123
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102 CPU2
Epoch Table
CPU2 CPU1
BW-TREE: GARBAGE COLLECTION
124
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102 CPU2
Epoch Table
CPU2
BW-TREE: GARBAGE COLLECTION
125
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102
Epoch Table
BW-TREE: GARBAGE COLLECTION
126
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
▲Insert 50
Page 102
▲Delete 48 ▲Insert 55
New 102
Epoch Table
BW-TREE: GARBAGE COLLECTION
127
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer New 102
Epoch Table
BW-TREE: STRUCTURE MODIFICATIONS
Split Delta Record
→ Mark that a subset of the base page’s key range is now located at another page. → Use a logical pointer to the new page.
Separator Delta Record
→ Provide a shortcut in the modified page’s parent on what ranges to find the new page.
129
102 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
130
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer
105
102 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
131
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
102 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
132
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
133
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
134
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
135
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
▲Split
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
136
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
▲Split
XX
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
137
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
▲Split
XX
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
138
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
▲Split
XX
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
139
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
▲Split
XX
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
140
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
▲Split
XX
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
141
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
▲Split
XX
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
142
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
▲Split
XX
[-∞,3) [3,7) [7,∞)
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
143
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
▲Separator ▲Split
XX
[-∞,3) [3,7) [7,∞)
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
144
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
▲Separator ▲Split
XX
[-∞,3) [3,7) [7,∞) [5,7)
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
145
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
▲Separator ▲Split
XX
[-∞,3) [3,7) [7,∞) [5,7)
102 105 104 101 103
BW-TREE: STRUCTURE MODIFICATIONS
146
Mapping Table
PID Addr 101 102 103 104
Logical Pointer Physical Pointer 3 4 5 6 1 2 7 8
105
5 6
▲Separator ▲Split
XX
[-∞,3) [3,7) [7,∞) [5,7)
BW-TREE: PERFORMANCE
147
Source: Justin Levandoski
10.4 3.83 2.84 0.56 0.66 0.33 4.23 1.02 0.72
2 4 6 8 10 12 Xbox Synthetic Deduplication
Operations/sec (M)
Bw-Tree B+Tree Skip List
Processor: 1 socket, 4 cores w/ 2×HT
BW-TREE: PERFORMANCE
148
9.94 15.5 13.3 8.09 29 1.51 2.51 2.78 25.1
10 20 30 40 Insert-Only Read-Only Read/Update
Operations/sec (M)
Open Bw-Tree B+Tree Skip List
Processor: 1 socket, 10 cores w/ 2×HT Workload: 50m Random Integer Keys (64-bit)
Source: Ziqi Wang
BW-TREE: PERFORMANCE
149
9.94 15.5 13.3 8.09 29 1.51 2.51 2.78 25.1 17.9 30.5 22 44.9 51.5 42.9
20 40 60 Insert-Only Read-Only Read/Update
Operations/sec (M)
Open Bw-Tree B+Tree Skip List Masstree ART
Processor: 1 socket, 10 cores w/ 2×HT Workload: 50m Random Integer Keys (64-bit)
Source: Ziqi Wang
PARTING THOUGHTS
Managing a concurrent index looks a lot like managing a database. Non-concurrent Skip List is easy to implement. A Bw-Tree is hard to implement.
150
NEXT CLASS
Let's add latches back in our OLTP indexes! Other implementation issues. Crash course on performance testing.
151