modification of records
play

Modification of Records How to handle the following operations on - PowerPoint PPT Presentation

Modification of Records How to handle the following operations on the record level? 1. Insertion 2. Deletion 3. Update 1 1. Insertion Easy case: records not in sequence Insert new record at end of file If records are fixed-length,


  1. Modification of Records How to handle the following operations on the record level? 1. Insertion 2. Deletion 3. Update 1

  2. 1. Insertion  Easy case: records not in sequence  Insert new record at end of file  If records are fixed-length, insert new record in deleted slot  Difficult case: records are sorted  Find position and slide following records  If records are sequenced by linking, insert overflow blocks 2

  3. 2. Deletion a. Immediately reclaim space by shifting other records or removing overflows b. Mark deleted and list as free for re-use  Tradeoffs: How expensive is immediate reclaim?  How much space is wasted?  3

  4. Problem with Deletion  Dangling pointers: R1 ?  When using physical addresses: Never reused May be reused  When using logical addresses: ID LOC Never reuse ID 7788 nor space in the map 7788 4

  5. 3. Update  If records are fixed-length and the order is not affected:  Fetch the record, modify it, write it back  Otherwise:  Delete the old record  Insert the new record overwriting the tombstones from the deletion 5

  6. Pointer Swizzling  Swizzling = replacement of physical addresses by memory addresses when loading blocks into memory  Automatic Swizzling: swizzle all addresses when loading a block (need to swizzle all pointer from and to the block)  Swizzling on Demand: use addresses which are invalid as memory addresses 6

  7. Data Organizaton  There are millions of ways to organize the data on disk  Flexibility Space Utilization Complexity Performance 7

  8. Summary 9 More things you should know:  Memory Hierarchy  Storage on harddisks  Values, Records, Blocks, Files  Storing and modifying records 8

  9. Index Structures 9

  10. Finding Records  How do we find the records for a query?  Example: SELECT * FROM Sells  Need to examine every block in every file  Group blocks into files by relation!  Example: SELECT * FROM Sells WHERE price = 20;  Need to examine every block in the file 10

  11. Finding Records  Use of indexes allows to narrow search to (almost) only the relevant blocks Blocks Value Index Matching records Holding records  Indexes can be dense or sparse 11

  12. Dense Index Dense Index Sequential File 10 10 20 20 30 30 40 40 50 50 60 60 70 80 70 80 90 100 90 110 100 120 12

  13. Sparse Index 2nd level Sparse Index Sequential File 10 10 10 20 90 30 170 50 30 250 70 40 90 330 50 110 410 60 130 490 150 570 70 80 170 190 90 210 100 230 13

  14. Deletion from Sparse Index  Delete 40 10 10 20 30 30 30 30 50 40 40 70 50 90 60 110 130 70 150 80 14

  15. Deletion from Sparse Index  Delete 30 10 10 10 20 40 30 40 30 30 50 50 40 40 40 70 70 50 90 90 60 110 110 130 130 70 150 150 80 15

  16. Deletion from Sparse Index  Delete 30 & 40 10 10 10 20 50 30 30 30 70 50 40 40 70 50 90 90 60 110 110 130 130 70 150 150 80 16

  17. Insertion into Sparse Index  Insert 35 10 10 20 30 30 30 50 35 70 50 90 60 110 130 70 150 80 17

  18. Insertion into Sparse Index  Insert 25 10 10 20 30 30 50 35 70 50 90 60 110 130 70 150 80 25 18

  19. Sparse vs Dense  Sparse uses less index space per record (can keep more of index in memory)  Sparse allows multi-level indexes  Dense can tell if record exists without accessing it  Dense needed for secondary indexes  Primary index = order of records in storage  Secondary index = impose different order 19

  20. Secondary Index 2nd level Secondary Index Sequential File 20 10 10 40 20 10 50 20 10 20 20 20 50 Careful when 30 30 40 Looking for 20 50 10 50 50 60 60 20 20

  21. Secondary Index 2nd level Secondary Index Sequential File 20 10 10 40 50 20 30 10 40 20 50 50 60 30 10 50 60 20 21

  22. Combining Indexes  SELECT * FROM Sells WHERE beer = “Od.Cl.“ AND price = “20“ Beer index Sells Price index OC 20 C.Ch.  Just intersect buckets in memory! 22

  23. Conventional Indexes  Sparse, Dense, Multi-level, ...  Advantages:  Simple  Sequential index is good for scans  Disadvantage:  Inserts expensive  Lose sequentiality and balance 23

  24. Example: Unbalanced Index 10 39 20 31 30 35 33 36 40 50 60 32 38 34 70 80 90 overflow area (not sequential) 24

  25. B+Trees 25

  26. Idea  Conventional indexes are fixed-level  Give up sequentiality of the index in favour of balance  B+Tree = variant of B-Tree  Allows index tree to grow as needed  Ensures that all blocks are between half used and completely full 26

  27. Characteristics  Parameter n determines number of keys and pointers per node  Key size 4 and pointer size 8 allows for maximal n = 340 (4n + 8(n+1) < 4096)  Leafs contain at least n/2 key-pointer pairs to records and a pointer to the next leaf  Interior nodes contain at least (n-1)/2 keys and at least n/2 pointers to other nodes  No restrictions for the root node 27

  28. Example: B+Tree (n=3) 42 11 23 64 3 6 9 11 15 17 23 31 37 42 57 64 85 28

  29. Example: Leaf node 42 57 To next leaf To record To record With key 42 With key 57 29

  30. Example: Interior node 11 23 To keys To keys To keys K < 11 11 ≤ K < 23 23 ≤ K 30

  31. Restrictions Full node min. node 11 23 42 64 Non-leaf 11 15 17 64 85 Leaf Counts even when null 31

  32. Insertion  If there is place in the appropriate leaf, just insert it there  Otherwise:  Split the leaf in two and divide the keys  Insert the smallest value reachable through the right node into the parent node  Recurse until there is enough room  Special case: Splitting the root results in a new root 32

  33. Example: Insertion  Insert 85 11 23 42 3 6 9 11 17 23 31 37 42 57 42 57 85 33

  34. Example: Insertion  Insert 15 11 23 42 3 6 9 11 17 11 15 17 23 31 37 42 57 85 34

  35. Example: Insertion  Insert 64 42 42 11 23 42 11 23 64 3 6 9 11 15 17 23 31 37 42 57 85 42 57 64 85 35

  36. Deletion  If there are enought keys left in the appropriate leaf, just delete the key  Otherwise:  If there is a direct sibling with more than minimum key, steal one!  If not, join the node with a direct sibling and delete the smallest value reachable through the former right sibling from its parent  Special case: If the root contains only one pointer after deletion, delete it 36

  37. Example: Deletion  Delete 9 42 11 23 64 3 6 9 3 6 9 3 6 11 15 17 23 31 37 42 57 64 85 37

  38. Example: Deletion  Delete 3 42 11 23 15 23 64 3 6 3 6 6 6 11 11 15 17 15 17 23 31 37 42 57 64 85 38

  39. Example: Deletion  Delete 11 42 15 23 23 64 6 11 6 11 6 15 17 6 15 17 23 31 37 42 57 64 85 39

  40. Example: Deletion  Delete 17, 37 42 23 64 6 15 17 6 15 23 31 37 23 31 42 57 64 85 40

  41. Example: Deletion  Delete 31 42 23 42 64 64 6 15 6 15 23 6 15 23 23 31 23 42 57 64 85 41

  42. Efficiency  Need to load one block for each level!  With n = 340 and an average fill of 255 pointers, we can index 255^3 = 16.6 million records in only 3 levels  There are at most 342 blocks in the first two levels  First two levels can be kept in memory using less than 1.4 Mbyte  Only need to access one block! 42

  43. Range Queries  Queries often restrict an attribute to a range of values  Example: SELECT * FROM Sells WHERE beer > 20;  Records are found efficiently by searching for value 20 and then traversing the leafs  Can also be used if there is both an upper and a lower limit 43

  44. Summary 10 More things you should know:  Dense Index, Sparse Index  Multi-Level Indexes  Primary vs Secondary Index  Structure of B+Trees  Insertion and Deletion in B+Trees 44

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend