cs143 index
play

CS143: Index 1 Topics to Learn Important concepts Dense index vs. - PowerPoint PPT Presentation

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index vs. non-clustering index) Tree-based vs. hash-based index Tree-based index Indexed


  1. (b) Coalesce with sibling (leaf) a 20 40 60 Merge e b c d 60 20 30 40 • Delete 50 – Merge c and d . Move everything on the right to the left. 60

  2. (b) Coalesce with sibling (leaf) a 20 40 60 e b c d 60 20 30 40 • Delete 50 – Once everything is moved, delete d 61

  3. (b) Coalesce with sibling (leaf) a 20 40 60 e b c d 60 20 30 40 • Delete 50 – After leaf node merge, • From its parent, delete the pointer and key to the deleted node 62

  4. (b) Coalesce with sibling (leaf) a 20 60 Underflow? e b c 60 20 30 40 • Delete 50 – Check underflow at a. Min 2 ptrs, currently 3 63

  5. (c) Leaf node, redistribute with neighbor 64

  6. (c) Redistribute (leaf) a 20 40 60 b c d e 40 50 60 20 25 30 • Delete 50 65

  7. (c) Redistribute (leaf) a 20 40 60 b c d e 40 60 20 25 30 Underflow? • Delete 50 Can be merged? – Underflow? Min 3 ptrs, currently 2 – Check if d can be merged with its sibling c or e – If not, redistribute the keys in d with a sibling • Say, with c 66

  8. (c) Redistribute (leaf) a 20 40 60 Redistribute b c d e 40 60 20 25 30 • Delete 50 – Redistribute c and d, so that nodes c and d are roughly “half full” • Move the key 30 and its tuple pointer to the d 67

  9. (c) Redistribute (leaf) a 20 40 60 b c d e 40 60 20 25 30 • Delete 50 – Update the key in the parent 68

  10. (c) Redistribute (leaf) 30 Underflow? a 20 40 60 b c d e 60 20 25 30 40 • Delete 50 – No underflow at a . Done. 69

  11. (d) Non-leaf node, coalesce with neighbor 70

  12. (d) Coalesce (non-leaf) a 50 90 b c 70 30 d e f g 30 40 50 60 70 10 20 • Delete 20 – Underflow! Merge d with e. • Move everything in the right to the left 71

  13. (d) Coalesce (non-leaf) a 50 90 b c 70 30 d e f g 50 60 70 10 30 40 • Delete 20 – From the parent node, delete pointer and key to the deleted node 72

  14. (d) Coalesce (non-leaf) a 50 90 underflow! b c 70 Can be merged? d f g 50 60 70 10 30 40 • Delete 20 – Underflow at b ? Min 2 ptrs, currently 1. – Try to merge with its sibling. • Nodes b and c : 3 ptrs in total. Max 4 ptrs. • Merge b and c . 73

  15. (d) Coalesce (non-leaf) a 50 90 merge b c 70 d f g 50 60 70 10 30 40 • Delete 20 – Merge b and c • Pull down the mid-key 50 in the parent node • Move everything in the right node to the left. • Very important: when we merge non-leaf nodes , we always pull down the mid-key in the parent and place it in the merged node. 74

  16. (d) Coalesce (non-leaf) a 50 90 b c 70 d f g 50 60 70 10 30 40 • Delete 20 – Merge b and c • Pull down the mid-key 50 in the parent node • Move everything in the right node to the left. • Very important: when we merge non-leaf nodes , we always pull down the mid-key in the parent and place it in the merged node. 75

  17. (d) Coalesce (non-leaf) a 90 b c 50 70 d f g 50 60 70 10 30 40 • Delete 20 – Delete pointer to the merged node. 76

  18. (d) Coalesce (non-leaf) a 90 b 50 70 d f g 50 60 70 10 30 40 • Delete 20 – Underflow at a ? Min 2 ptrs. Currently 2. Done. 77

  19. (e) Non-leaf node, redistribute with neighbor 78

  20. (e) Redistribute (non-leaf) a 50 99 b c 30 70 90 97 d e f g 50 60 70 30 40 10 20 • Delete 20 – Underflow! Merge d with e. 79

  21. (e) Redistribute (non-leaf) a 50 99 b c 30 70 90 97 d e f g 50 60 70 10 30 40 • Delete 20 – After merge, remove the key and ptr to the deleted node from the parent 80

  22. (e) Redistribute (non-leaf) a 50 99 underflow! b c 70 90 97 Can be merged? d f g 50 60 70 10 30 40 • Delete 20 – Underflow at b ? Min 2 ptrs, currently 1. – Merge b with c ? Max 4 ptrs, 5 ptrs in total. – If cannot be merged, redistribute the keys with a sibling. • Redistribute b and c 81

  23. (e) Redistribute (non-leaf) a 50 99 redistribute b c 70 90 97 d f g 50 60 70 10 30 40 • Delete 20 Redistribution at a non-leaf node is done in two steps. Step 1 : Temporarily, make the left node b “overflow” by pulling down the mid-key and moving everything to the left. 82

  24. (e) Redistribute (non-leaf) a 99 redistribute temporary overflow b c 97 50 70 90 d f g 50 60 70 10 30 40 • Delete 20 Step 2 : Apply the “overflow handling algorithm” (the same algorithm used for B+tree insertion) to the overflowed node – Detailed algorithm in the next slide 83

  25. (e) Redistribute (non-leaf) a 99 redistribute b c 97 50 70 90 d f g 50 60 70 10 30 40 • Delete 20 Step 2 : “overflow handling algorithm” – Pick the mid-key (say 90) in the node and move it to parent. – Move everything to the right of 90 to the empty node c . 84

  26. (e) Redistribute (non-leaf) a 90 99 b c 97 50 70 d f g 50 60 70 10 30 40 • Delete 20 – Underflow at a ? Min 2 ptrs, currently 3. Done 85

  27. Important Points • Remember: – For leaf node merging, we delete the mid-key from the parent – For non-leaf node merging/redistribution, we pull down the mid-key from their parent. • Exact algorithm: Figure 12.17 • In practice – Coalescing is often not implemented • Too hard and not worth it 86

  28. Where does n come from? • n determined by – Size of a node – Size of search key – Size of an index pointer • Q: 1024B node, 10B key, 8B ptr à n ? 87

  29. Question on B+tree • SELECT * FROM Student 70 WHERE sid > 60? 50 80 80 90 70 20 30 50 60 88

  30. Summary on tree index • Issues to consider – Sparse vs. dense – Primary (clustering) vs. secondary (non-clustering) • Indexed sequential file (ISAM) – Simple algorithm. Sequential blocks – Not suitable for dynamic environment • B+trees – Balanced, minimum space guarantee – Insertion, deletion algorithms 89

  31. Index Creation in SQL • CREATE INDEX <indexname> ON <table>(<attr>,<attr>,…) • Example – CREATE INDEX stidx ON Student(sid) • Creates a B+tree on the attributes • Speeds up lookup on sid 90

  32. Primary (Clustering) Index • MySQL: – Primary key becomes the clustering index • DB2: – CREATE INDEX idx ON Student(sid) CLUSTER – Tuples in the table are sequenced by sid • Oracle: Index-Organized Table (IOT) – CREATE TABLE T ( ... ) ORGANIZATION INDEX – B+tree on primary key – Tuples are stored at the leaf nodes of B+tree • Periodic reorganization may still be necessary to improve range scan performance 91

  33. Next topic • Hash index – Static hashing – Extendible hashing 92

  34. What is a Hash Table? • Hash Table – Hash function • h (k): key à integer [0…n] • e.g., h (‘Susan’) = 7 – Array for keys: T[0…n] – Given a key k , store it in T[ h ( k )] 0 1 Neil 2 h(Susan) = 4 h(James) = 3 3 James h(Neil) = 1 4 Susan 5 93

  35. Hashing for DBMS (Static Hashing) Disk blocks (buckets) 0 1 (key, record) search key → h(key) 2 3 . 4 . . 94

  36. Overflow and Chaining • Insert h(a) = 1 d 0 h(b) = 2 h(c) = 1 e a 1 h(d) = 0 c b h(e) = 1 2 3 • Delete h(b) = 2 h(c) = 1 95

  37. Major Problem of Static Hashing • How to cope with growth? – Data tends to grow in size – Overflow blocks unavoidable hash buckets overflow blocks 10 39 20 31 30 35 33 36 40 50 60 32 38 34 70 80 90 96

  38. Extendible Hashing (two ideas) (a) Use i of b bits output by hash function b 00110101 h(K) → use i → grows over time 97

  39. Extendible Hashing (two ideas) (b) Use directory that maintains pointers to hash buckets (indirection) directory hash bucket . . c h(c) . e . . . 98

  40. Example • h(k) is 4 bits; 2 keys/bucket i = 1 Insert 0111 0001 1 i = 0111 0 1 i = 1 1001 1100 99

  41. Example i = 1 Insert 1010 0001 1 0111 i = 0 1 i = 1 1001 1010 1100 overflow! Increase i of the bucket. Split it. 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend