CS143: Index 1 Topics to Learn Important concepts Dense index vs. - PowerPoint PPT Presentation

(b) Coalesce with sibling (leaf) a 20 40 60 Merge e b c d 60 20 30 40 • Delete 50 – Merge c and d . Move everything on the right to the left. 60

(b) Coalesce with sibling (leaf) a 20 40 60 e b c d 60 20 30 40 • Delete 50 – Once everything is moved, delete d 61

(b) Coalesce with sibling (leaf) a 20 40 60 e b c d 60 20 30 40 • Delete 50 – After leaf node merge, • From its parent, delete the pointer and key to the deleted node 62

(b) Coalesce with sibling (leaf) a 20 60 Underflow? e b c 60 20 30 40 • Delete 50 – Check underflow at a. Min 2 ptrs, currently 3 63

(c) Leaf node, redistribute with neighbor 64

(c) Redistribute (leaf) a 20 40 60 b c d e 40 50 60 20 25 30 • Delete 50 65

(c) Redistribute (leaf) a 20 40 60 b c d e 40 60 20 25 30 Underflow? • Delete 50 Can be merged? – Underflow? Min 3 ptrs, currently 2 – Check if d can be merged with its sibling c or e – If not, redistribute the keys in d with a sibling • Say, with c 66

(c) Redistribute (leaf) a 20 40 60 Redistribute b c d e 40 60 20 25 30 • Delete 50 – Redistribute c and d, so that nodes c and d are roughly “half full” • Move the key 30 and its tuple pointer to the d 67

(c) Redistribute (leaf) a 20 40 60 b c d e 40 60 20 25 30 • Delete 50 – Update the key in the parent 68

(c) Redistribute (leaf) 30 Underflow? a 20 40 60 b c d e 60 20 25 30 40 • Delete 50 – No underflow at a . Done. 69

(d) Non-leaf node, coalesce with neighbor 70

(d) Coalesce (non-leaf) a 50 90 b c 70 30 d e f g 30 40 50 60 70 10 20 • Delete 20 – Underflow! Merge d with e. • Move everything in the right to the left 71

(d) Coalesce (non-leaf) a 50 90 b c 70 30 d e f g 50 60 70 10 30 40 • Delete 20 – From the parent node, delete pointer and key to the deleted node 72

(d) Coalesce (non-leaf) a 50 90 underflow! b c 70 Can be merged? d f g 50 60 70 10 30 40 • Delete 20 – Underflow at b ? Min 2 ptrs, currently 1. – Try to merge with its sibling. • Nodes b and c : 3 ptrs in total. Max 4 ptrs. • Merge b and c . 73

(d) Coalesce (non-leaf) a 50 90 merge b c 70 d f g 50 60 70 10 30 40 • Delete 20 – Merge b and c • Pull down the mid-key 50 in the parent node • Move everything in the right node to the left. • Very important: when we merge non-leaf nodes , we always pull down the mid-key in the parent and place it in the merged node. 74

(d) Coalesce (non-leaf) a 50 90 b c 70 d f g 50 60 70 10 30 40 • Delete 20 – Merge b and c • Pull down the mid-key 50 in the parent node • Move everything in the right node to the left. • Very important: when we merge non-leaf nodes , we always pull down the mid-key in the parent and place it in the merged node. 75

(d) Coalesce (non-leaf) a 90 b c 50 70 d f g 50 60 70 10 30 40 • Delete 20 – Delete pointer to the merged node. 76

(d) Coalesce (non-leaf) a 90 b 50 70 d f g 50 60 70 10 30 40 • Delete 20 – Underflow at a ? Min 2 ptrs. Currently 2. Done. 77

(e) Non-leaf node, redistribute with neighbor 78

(e) Redistribute (non-leaf) a 50 99 b c 30 70 90 97 d e f g 50 60 70 30 40 10 20 • Delete 20 – Underflow! Merge d with e. 79

(e) Redistribute (non-leaf) a 50 99 b c 30 70 90 97 d e f g 50 60 70 10 30 40 • Delete 20 – After merge, remove the key and ptr to the deleted node from the parent 80

(e) Redistribute (non-leaf) a 50 99 underflow! b c 70 90 97 Can be merged? d f g 50 60 70 10 30 40 • Delete 20 – Underflow at b ? Min 2 ptrs, currently 1. – Merge b with c ? Max 4 ptrs, 5 ptrs in total. – If cannot be merged, redistribute the keys with a sibling. • Redistribute b and c 81

(e) Redistribute (non-leaf) a 50 99 redistribute b c 70 90 97 d f g 50 60 70 10 30 40 • Delete 20 Redistribution at a non-leaf node is done in two steps. Step 1 : Temporarily, make the left node b “overflow” by pulling down the mid-key and moving everything to the left. 82

(e) Redistribute (non-leaf) a 99 redistribute temporary overflow b c 97 50 70 90 d f g 50 60 70 10 30 40 • Delete 20 Step 2 : Apply the “overflow handling algorithm” (the same algorithm used for B+tree insertion) to the overflowed node – Detailed algorithm in the next slide 83

(e) Redistribute (non-leaf) a 99 redistribute b c 97 50 70 90 d f g 50 60 70 10 30 40 • Delete 20 Step 2 : “overflow handling algorithm” – Pick the mid-key (say 90) in the node and move it to parent. – Move everything to the right of 90 to the empty node c . 84

(e) Redistribute (non-leaf) a 90 99 b c 97 50 70 d f g 50 60 70 10 30 40 • Delete 20 – Underflow at a ? Min 2 ptrs, currently 3. Done 85

Important Points • Remember: – For leaf node merging, we delete the mid-key from the parent – For non-leaf node merging/redistribution, we pull down the mid-key from their parent. • Exact algorithm: Figure 12.17 • In practice – Coalescing is often not implemented • Too hard and not worth it 86

Where does n come from? • n determined by – Size of a node – Size of search key – Size of an index pointer • Q: 1024B node, 10B key, 8B ptr à n ? 87

Question on B+tree • SELECT * FROM Student 70 WHERE sid > 60? 50 80 80 90 70 20 30 50 60 88

Summary on tree index • Issues to consider – Sparse vs. dense – Primary (clustering) vs. secondary (non-clustering) • Indexed sequential file (ISAM) – Simple algorithm. Sequential blocks – Not suitable for dynamic environment • B+trees – Balanced, minimum space guarantee – Insertion, deletion algorithms 89

Index Creation in SQL • CREATE INDEX <indexname> ON <table>(<attr>,<attr>,…) • Example – CREATE INDEX stidx ON Student(sid) • Creates a B+tree on the attributes • Speeds up lookup on sid 90

Primary (Clustering) Index • MySQL: – Primary key becomes the clustering index • DB2: – CREATE INDEX idx ON Student(sid) CLUSTER – Tuples in the table are sequenced by sid • Oracle: Index-Organized Table (IOT) – CREATE TABLE T ( ... ) ORGANIZATION INDEX – B+tree on primary key – Tuples are stored at the leaf nodes of B+tree • Periodic reorganization may still be necessary to improve range scan performance 91

Next topic • Hash index – Static hashing – Extendible hashing 92

What is a Hash Table? • Hash Table – Hash function • h (k): key à integer [0…n] • e.g., h (‘Susan’) = 7 – Array for keys: T[0…n] – Given a key k , store it in T[ h ( k )] 0 1 Neil 2 h(Susan) = 4 h(James) = 3 3 James h(Neil) = 1 4 Susan 5 93

Hashing for DBMS (Static Hashing) Disk blocks (buckets) 0 1 (key, record) search key → h(key) 2 3 . 4 . . 94

Overflow and Chaining • Insert h(a) = 1 d 0 h(b) = 2 h(c) = 1 e a 1 h(d) = 0 c b h(e) = 1 2 3 • Delete h(b) = 2 h(c) = 1 95

Major Problem of Static Hashing • How to cope with growth? – Data tends to grow in size – Overflow blocks unavoidable hash buckets overflow blocks 10 39 20 31 30 35 33 36 40 50 60 32 38 34 70 80 90 96

Extendible Hashing (two ideas) (a) Use i of b bits output by hash function b 00110101 h(K) → use i → grows over time 97

Extendible Hashing (two ideas) (b) Use directory that maintains pointers to hash buckets (indirection) directory hash bucket . . c h(c) . e . . . 98

Example • h(k) is 4 bits; 2 keys/bucket i = 1 Insert 0111 0001 1 i = 0111 0 1 i = 1 1001 1100 99

Example i = 1 Insert 1010 0001 1 0111 i = 0 1 i = 1 1001 1010 1100 overflow! Increase i of the bucket. Split it. 100

CS143: Index 1 Topics to Learn Important concepts Dense index vs. - PowerPoint PPT Presentation

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index vs. non-clustering index) Tree-based vs. hash-based index Tree-based index Indexed

CS143: Disks and Files 1 System Architecture CPU System Bus Disk Controller ... Word (1B

Index Rules and Methodology Index Name Ticker S-Network US Equity 3000 Index SN3000 S-Network

FAANG+ holdings in S&P 500 & MSCI EM Index S&P 500 Index Weighting 20% MSCI EM Index

THE INDEX OF RETAIL PRICES REVISION OF THE INDEX OF RETAIL PRICES INDEX OF RETAIL PRICES The

Index Rules and Methodology S-Network Europe Equity 500 Index (Ticker: SNE500) S-Network Europe

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

Index Blocking Factors, Views Rose-Hulman Institute of Technology Curt Clifton Index Redux

TCC Index China Telematics Brands Exposure Analysis Report 2013.3-2013.5 1 TCC Index TCC

NVIDIA INDEX IMPLEMENTING ADVANCED DATA VISUALIZATION WITH NVIDIA INDEX Alexander Kuhn and Marc

NPFL103: Information Retrieval (3) Index construction, Distributed and dynamic indexing, Index

Web Information Retrieval Lecture 3 Index Construction Index construction This time:

Multi-Indexed Files : Outline ! Introduction ! Inverted Files ! Multilist Files rasitjutrakul

Results of the 2013 SRI Index review Corli le Roux, Head of SRI Index and Sustainability

New Zealand Consumers Price Index: Retrospective superlative index and impact of alternative

Castlestone FAANG+ UCITS Funds Q2 2020 FAANG+ holdings in S&P 500 & MSCI EM Index

The Geary-Khamis index and the Lehr index: how much do they differ? 15 th Meeting of the Ottawa

Storage and Indexing DBS Database Systems Reading: R&G Chapters 8, 9 & 10.1 Implementing

CS 225 Data Structures Mar March h 11 11 BT BTrees Wade Fa Wa Fagen-Ul Ulmsch schnei

A Datatype for Binary Trees Functions on Trees Amtoft from Hatcliff Binary Trees One of many

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R

Module 6: Planning Indexes Overview Introduction to Indexes Index Architecture How

Virtual banking beyond borders 2 M a r k e t C h a l l e n g e 68 MIL ILLION REFUGEES

Overview of Storage and Indexing Chapter 8 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu

R-Tree An R-tree is a depth-balanced tree Each node corresponds to a disk page Leaf

CS143: Index 1 Topics to Learn Important concepts Dense index vs. - PowerPoint PPT Presentation

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index vs. non-clustering index) Tree-based vs. hash-based index Tree-based index Indexed

CS143: Disks and Files 1 System Architecture CPU System Bus Disk Controller ... Word (1B

Index Rules and Methodology Index Name Ticker S-Network US Equity 3000 Index SN3000 S-Network

FAANG+ holdings in S&amp;P 500 &amp; MSCI EM Index S&amp;P 500 Index Weighting 20% MSCI EM Index

THE INDEX OF RETAIL PRICES REVISION OF THE INDEX OF RETAIL PRICES INDEX OF RETAIL PRICES The

Index Rules and Methodology S-Network Europe Equity 500 Index (Ticker: SNE500) S-Network Europe

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

Index Blocking Factors, Views Rose-Hulman Institute of Technology Curt Clifton Index Redux

TCC Index China Telematics Brands Exposure Analysis Report 2013.3-2013.5 1 TCC Index TCC

NVIDIA INDEX IMPLEMENTING ADVANCED DATA VISUALIZATION WITH NVIDIA INDEX Alexander Kuhn and Marc

NPFL103: Information Retrieval (3) Index construction, Distributed and dynamic indexing, Index

Web Information Retrieval Lecture 3 Index Construction Index construction This time:

Multi-Indexed Files : Outline ! Introduction ! Inverted Files ! Multilist Files rasitjutrakul

Results of the 2013 SRI Index review Corli le Roux, Head of SRI Index and Sustainability

New Zealand Consumers Price Index: Retrospective superlative index and impact of alternative

Castlestone FAANG+ UCITS Funds Q2 2020 FAANG+ holdings in S&amp;P 500 &amp; MSCI EM Index

The Geary-Khamis index and the Lehr index: how much do they differ? 15 th Meeting of the Ottawa

Storage and Indexing DBS Database Systems Reading: R&amp;G Chapters 8, 9 &amp; 10.1 Implementing

CS 225 Data Structures Mar March h 11 11 BT BTrees Wade Fa Wa Fagen-Ul Ulmsch schnei

A Datatype for Binary Trees Functions on Trees Amtoft from Hatcliff Binary Trees One of many

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R

Module 6: Planning Indexes Overview Introduction to Indexes Index Architecture How

Virtual banking beyond borders 2 M a r k e t C h a l l e n g e 68 MIL ILLION REFUGEES

Overview of Storage and Indexing Chapter 8 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu

R-Tree An R-tree is a depth-balanced tree Each node corresponds to a disk page Leaf

FAANG+ holdings in S&P 500 & MSCI EM Index S&P 500 Index Weighting 20% MSCI EM Index

Castlestone FAANG+ UCITS Funds Q2 2020 FAANG+ holdings in S&P 500 & MSCI EM Index

Storage and Indexing DBS Database Systems Reading: R&G Chapters 8, 9 & 10.1 Implementing