outline summary
play

Outline/summary Conventional Indexes Sparse vs. dense Primary - PowerPoint PPT Presentation

Outline/summary Conventional Indexes Sparse vs. dense Primary vs. secondary B trees B+trees vs. indexed sequential Hashing schemes --> Next Hashing <key> key h(key) Buckets (typically 1 . disk block)


  1. Outline/summary • Conventional Indexes • Sparse vs. dense • Primary vs. secondary • B trees • B+trees vs. indexed sequential • Hashing schemes --> Next

  2. Hashing <key> key → h(key) Buckets (typically 1 . disk block) . .

  3. T wo alternatives . . . records (1) key → h(key) . . .

  4. T wo alternatives record (2) key → h(key) key 1 Index • Alt (2) for “secondary” search key

  5. Example hash function • Key = ‘x 1 x 2 … x n ’ n byte character string • Have b buckets • h: add x 1 + x 2 + ….. x n – compute sum modulo b

  6.  This may not be best function …  Read Knuth Vol. 3 if you really need to select a good function. Good hash function:  Expected number of keys/bucket is the same for all buckets

  7. Within a bucket: • Do we keep keys sorted? • Yes, if CPU time critical & Inserts/Deletes not too frequent

  8. Next: example to illustrate inserts, overfmows, deletes h(K)

  9. EXAMPLE 2 records/bucket 0 INSERT: d h(a) = 1 1 a e c h(b) = 2 2 b h(c) = 1 3 h(d) = 0 h(e) = 1

  10. EXAMPLE: deletion Delete: 0 a e 1 b d f c d c 2 e 3 f maybe move g “g” up

  11. Rule of thumb: • T ry to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fjt • If < 50%, wasting space • If > 80%, overfmows signifjcant depends on how good hash function is & on # keys/bucket

  12. How do we cope with growth? • Overfmows and reorganizations • Dynamic hashing • Extensible • Linear

  13. Extensible hashing: two ideas (a) Use i of b bits output by hash function b b 00110101 h(K) → i use i → grows over time….

  14. (b) Use directory . . h(K)[ i ] to bucket . . . .

  15. Example: h(k) is 4 bits; 2 keys/bucket i = 2 1 00 i = 1 0001 01 0 1 10 2 1 1001 11 1010 1100 New directory Insert 2 1 1100 1010

  16. Example continued 2 0000 i = 2 0001 00 2 1 01 0111 0001 10 0111 2 11 1001 1010 Insert: 2 0111 1100 0000

  17. Example continued i = 3 2 0000 000 0001 i = 2 001 00 2 0111 010 01 011 3 10 1001 1001 100 11 3 2 1010 1001 101 1010 Insert: 110 2 1001 1100 111

  18. Extensible hashing: deletion • No merging of blocks • Merge blocks and cut directory if possible (Reverse insert procedure)

  19. Deletion example: • Run thru insert example in reverse!

  20. Extensible hashing Summary Can handle growing fjles + - with less wasted space - with no full reorganizations Indirection - (Not bad if directory in memory) Directory doubles in size - (Now it fjts, now it does not)

  21. Linear hashing • Another dynamic hashing scheme T wo ideas: b (a) Use i low order bits of 01110101 hash grows i (b) Number of buckets in use grows linearly Constraint: 2 i-1 ≤ n+1 < 2 i (We take n to be the id of the largest bucket in use, starting at 0.)

  22. Example b =4 bits, i =2, 2 keys/bucket • insert 0101 0101 • can have overfmow chains! Future growth 0000 0101 buckets 1010 1111 00 01 10 11 n = 01 (number of last bucket in use) Rule If h(k)[ i ] ≤ n , then look at bucket h(k)[i ] else, look at bucket h(k)[ i ] - 2 i -1

  23. Example b =4 bits, i =2, 2 keys/bucket • insert 1110 1110 bucket h(k)[ i ] - 2 i -1 is the bucket whose ith bit is fmipped in binary Future growth 0000 0101 buckets 1010 1111 00 01 10 11 n = 01 (number of last bucket in use) Rule If h(k)[ i ] ≤ n , then look at bucket h(k)[i ] else, look at bucket h(k)[ i ] - 2 i -1

  24. Example b =4 bits, i =2, 2 keys/bucket 0101 • insert 0101 Future growth 0000 0101 1010 1111 buckets 0101 1010 1111 00 01 10 11 n = 01 10 11 Rule If h(k)[ i ] ≤ n , then look at bucket h(k)[i ] else, look at bucket h(k)[ i ] - 2 i -1

  25. Example Continued: How to grow beyond this? Constraint: 2 i-1 ≤ n+1 < 2 i i = 2 3 0101 0000 0101 1010 1111 0101 0101 101 100 0 00 01 10 0 0 0 11 . . . 100 101 110 111 n = 11 100 101 Rule If h(k)[ i ] ≤ n , then look at bucket h(k)[i ] else, look at bucket h(k)[ i ] - 2 i -1

  26.  When do we expand fjle? • Keep track of: # records = U # buckets • If U > threshold then increase n (and maybe i )

  27. Linear Hashing Summary Can handle growing fjles + - with less wasted space - with no full reorganizations + No indirection like extensible hashing Can still have overfmow chains -

  28. Example: BAD CASE Very full Very empty Need to move n here… Would waste space...

  29. Summary Hashing - How it works - Dynamic hashing - Extensible - Linear

  30. B+trees vs Hashing • Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5

  31. B+T rees vs Hashing • INDEXING (Including B T rees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend