tree structured indexes
play

Tree-Structured Indexes < k , list of rids of data records with - PDF document

Introduction As for any index, 3 alternatives for data entries k* : Data record with key value k < k , rid of data record with search key value k > Tree-Structured Indexes < k , list of rids of data records with search key


  1. Introduction � As for any index, 3 alternatives for data entries k* : � Data record with key value k � < k , rid of data record with search key value k > Tree-Structured Indexes � < k , list of rids of data records with search key k > � Choice is orthogonal to the indexing technique used to locate data entries k* . � Tree-structured indexing techniques support both range searches and equality searches . � ISAM : static structure; B+ tree : dynamic, adjusts gracefully under inserts and deletes. Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Database Management Systems, R. Ramakrishnan and J. Gehrke 2 Range Searches Range Searches � `` Find all students with gpa > 3.0 ’’ � Simple idea: Create an `index’ file – If data entries are sorted, do binary search to find – What is search cost if each index page has F entries? first such student, then scan to find others. � Problem? Index File k1 k2 kN Page 3 Page N Page 1 Page 2 Data (Entries) File Data File Page N Page 1 Page 2 Page 3 � Can do binary search on (smaller) index file! Database Management Systems, R. Ramakrishnan and J. Gehrke 3 Database Management Systems, R. Ramakrishnan and J. Gehrke 4 index entry ISAM Example ISAM Tree P0 K 1 P 1 K 2 P 2 K m P m � Each node can hold 2 entries � Index file may still be quite large. But we can – What is search cost if each leaf node can hold L apply the idea repeatedly! entries and each index node can hold F entries? Root 40 Non-leaf Pages 20 33 51 63 Leaf Pages Overflow 46* 55* page 10* 15* 20* 27* 33* 37* 40* 51* 63* 97* Primary pages � Leaf pages contain data entries . Database Management Systems, R. Ramakrishnan and J. Gehrke 5 Database Management Systems, R. Ramakrishnan and J. Gehrke 6

  2. After Inserting 23*, 48*, 41*, 42* ... … Then Deleting 42* Root Root 40 40 Index Index Pages Pages 20 33 51 63 20 33 51 63 Primary Primary Leaf Leaf 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 97* 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 97* 63* 63* Pages Pages 48* 41* 48* 41* Overflow 23* Overflow 23* Pages Pages 42* 42* Database Management Systems, R. Ramakrishnan and J. Gehrke 7 Database Management Systems, R. Ramakrishnan and J. Gehrke 8 … Then Deleting 51* After Deleting 41* and 51* Root Root 40 40 Index Index Pages Pages 20 33 51 63 20 33 51 63 Primary Primary Leaf Leaf 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* 10* 15* 20* 27* 33* 37* 40* 46* 55* 63* 97* Pages Pages 48* 41* 48* 41* Overflow 23* Overflow 23* Pages Pages � Note 51 appears in Index Page but not in Leaf pages! Database Management Systems, R. Ramakrishnan and J. Gehrke 9 Database Management Systems, R. Ramakrishnan and J. Gehrke 10 B+ Tree: The Most Widely Used Index Example B+ Tree � Insert/delete at log F N cost; keep tree height- � Search begins at root, and key comparisons balanced . (F = fanout, N = # leaf pages) direct it to a leaf (as in ISAM). � Minimum 50% occupancy (except for root). Each � Search for 5*, 15*, all data entries >= 24* ... node contains d <= m <= 2 d entries. The parameter d is called the order of the tree. Root � Supports equality and range-searches efficiently. 30 13 17 24 Index Entries (Direct search) 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Data Entries � Based on the search for 15*, we know it is not in the tree! ("Sequence set") Database Management Systems, R. Ramakrishnan and J. Gehrke 11 Database Management Systems, R. Ramakrishnan and J. Gehrke 12

  3. B+-tree Search Performance B+ Trees in Practice � Assume leaf pages can hold L data entries � Typical order: 100. Typical fill-factor: 67%. � Assume B+-tree has order d – average fanout = 133 � Typical capacities: � Assume the tree has to index N data entries – Height 4: 133 4 = 312,900,700 records – Height 3: 133 3 = 2,352,637 records � Can often hold top levels in buffer pool: � What is the best-case search performance – Level 1 = 1 page = 8 Kbytes (measured in number of I/Os)? – Level 2 = 133 pages = 1 Mbyte � What is the worst-case search performance – Level 3 = 17,689 pages = 133 MBytes Database Management Systems, R. Ramakrishnan and J. Gehrke 13 Database Management Systems, R. Ramakrishnan and J. Gehrke 14 Inserting 23* Inserting 8* … Root Root 13 17 24 30 13 17 24 30 3* 5* 19* 20* 22* 23* 24* 27* 29* 33* 34* 38* 39* 2* 7* 14* 16* 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Database Management Systems, R. Ramakrishnan and J. Gehrke 15 Database Management Systems, R. Ramakrishnan and J. Gehrke 16 Entry to be inserted in parent node Inserting 8* … Inserting 8* … (Note that 17 is pushed up and only appears once in the index. Contrast this with leaf split) 17 Entry to be inserted in parent node (Note that 5 is copied up and continues to appear in the leaf) Root Root 13 17 24 30 5 13 24 30 5 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Database Management Systems, R. Ramakrishnan and J. Gehrke 17 Database Management Systems, R. Ramakrishnan and J. Gehrke 18

  4. After Inserting 8* Inserting 8* … Note how tree grew Root by one level! Root 17 13 17 24 30 5 13 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* � In this example, could have “redistributed” to sibling instead of splitting 3* 7* 8* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 2* 5* 14* 16* � Not usually done in practice (Why?) Database Management Systems, R. Ramakrishnan and J. Gehrke 19 Database Management Systems, R. Ramakrishnan and J. Gehrke 20 Deleting 19* … Deleting 20* … Root Root 17 17 5 13 24 30 5 13 24 30 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 2* 3* 5* 7* 8* 14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39* Database Management Systems, R. Ramakrishnan and J. Gehrke 21 Database Management Systems, R. Ramakrishnan and J. Gehrke 22 Deleting 20* … After Deleting 20* Root Root 17 17 Redistribution: note how entry is copied up 5 13 24 30 5 13 27 30 2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* 2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* Database Management Systems, R. Ramakrishnan and J. Gehrke 23 Database Management Systems, R. Ramakrishnan and J. Gehrke 24

  5. Deleting 24* … Deleting 24* … Root Root 17 17 5 13 27 30 5 13 27 30 3* 7* 8* 22* 24* 27* 29* 33* 34* 38* 39* 3* 7* 8* 22* 27* 29* 33* 34* 38* 39* 2* 5* 14* 16* 2* 5* 14* 16* Database Management Systems, R. Ramakrishnan and J. Gehrke 25 Database Management Systems, R. Ramakrishnan and J. Gehrke 26 Deleting 24* … Deleting 24* … Root Merge: note how entry is pulled down (contrast 17 Merge: note how with merge of leaf node) entry is deleted Root 5 13 17 30 5 13 30 2* 3* 5* 7* 8* 22* 27* 33* 34* 38* 39* 14* 16* 29* 2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* Database Management Systems, R. Ramakrishnan and J. Gehrke 27 Database Management Systems, R. Ramakrishnan and J. Gehrke 28 Example of Non-leaf Re-distribution After Re-distribution � Entries are re-distributed by ` pushing through ’ the � During deletion of 24* splitting entry in the parent node. � In contrast to previous example, can re-distribute � Suffices to re-distribute index entry with key 20; entry from left child of root to right child. we’ve re-distributed 17 as well for illustration. Root Root 22 17 30 17 20 5 13 5 13 20 22 30 2* 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39* 2* 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39* Database Management Systems, R. Ramakrishnan and J. Gehrke 29 Database Management Systems, R. Ramakrishnan and J. Gehrke 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend