Hash-Based Indexes (From Chapter 11) - PDF document

Hash-Based Indexes (From Chapter 11) �� Introduction � As for any index, 3 alternatives for data entries k* : � Hash-based indexes are best for equality selections . � Static and dynamic hashing techniques exist �� Static Hashing � # primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed. � h ( k ) mod N = bucket to which data entry with key k belongs . (N = # of buckets) 0 h(key) mod N 2 key h N-1 Primary bucket pages Overflow pages ��

Static Hashing (Contd.) � Buckets contain data entries . � Hash fn works on search key field(s) of record r. Must distribute values over range 0 ... N-1. � h ( key ) = � Long overflow chains can develop and degrade performance �� Extendible Hashing � Main idea: If bucket (primary page) becomes full, why not re-organize file by doubling # of buckets? � But reading and writing all buckets is expensive! � Idea: �� Insert h (r)=14 2 LOCAL DEPTH Bucket A 4* 12* 32*16* GLOBAL DEPTH 2 2 Bucket B 00 1* 5* 21*13* 01 2 10 Bucket C 10* 11 2 DIRECTORY Bucket D 15* 7* 19* ��

Insert h (r)=20 2 LOCAL DEPTH Bucket A 32* 16* 4* 12* GLOBAL DEPTH 2 2 Bucket B 00 1* 5* 21*13* 01 10 2 Bucket C 11 10* 2 DIRECTORY Bucket D 15* 7* 19* �� Insert h (r)=20 2 LOCAL DEPTH Bucket A 32* 16* GLOBAL DEPTH 2 2 Bucket B 00 1* 5* 21*13* 01 2 10 Bucket C 10* 11 2 DIRECTORY Bucket D 15* 7* 19* 2 Bucket A2 4* 12* 20* (`split image' of Bucket A) �� Insert h (r)=32 LOCAL DEPTH GLOBAL DEPTH 0 0 Bucket A 1*10* 4* 12* DIRECTORY ��

Insert h (r)=16 LOCAL DEPTH GLOBAL DEPTH 1 Bucket A 1 4* 12* 10* 32* 0 1 1 Bucket B 1* DIRECTORY �� Insert h (r)=20 2 LOCAL DEPTH Bucket A 32* 16* 4* 12* GLOBAL DEPTH 1 2 Bucket B 1* 00 01 2 10 Bucket C 10* 11 DIRECTORY �� Insert h (r)=5, 15, 7, 19 3 LOCAL DEPTH 32* 16* Bucket A GLOBAL DEPTH 3 1 5* 15* 7* 000 1* Bucket B 001 2 010 10* Bucket C 011 100 101 110 111 3 DIRECTORY 4* 12* 20* Bucket A2 (`split image' of Bucket A) ��

Deletions � Inverse of insertion � If removal of data entry makes bucket empty, merge with ‘split image’ � If each directory element points to same bucket as its split image, can halve directory �� Comments on Extendible Hashing � If directory fits in memory, equality search answered with _____ I/O; else _____ � 100MB file, 100 bytes/rec, 4K pages contain 1,000,000 records (as data entries) and 25,000 directory elements; chances are high that directory will fit in memory. � Directory grows in spurts, and, if the distribution of hash values is ________, directory can grow large �� Linear Hashing � This is another dynamic hashing scheme, an alternative to Extendible Hashing � LH handles the problem of long overflow chains without using a directory, and handles duplicates � Main idea: ��

Inserting h(r) = 43 Level=2, N=4 h PRIMARY h 3 2 PAGES Next=0 32* 44* 36* 000 00 9* 25* 5* 001 01 14* 18*10*30* 010 10 31*35* 7* 11* 011 11 ( This info (The actual contents is for illustration of the linear hashed only!) file) �� Example (Inserting h(r) = 43) Level=2 h h OVERFLOW PRIMARY 3 2 Next=0 PAGES PAGES 32* 44* 36* 000 00 9* 25* 5* 001 01 14* 18*10*30* 010 10 31*35* 7* 11* 43* 011 11 �� Inserting h(r) = 50 (End of a Round) Level=2 PRIMARY OVERFLOW PAGES h3 h PAGES 2 32* 000 00 9* 25* 001 01 010 10 66* 18* 10* 34* Next=3 31* 35* 7* 11* 43* 011 11 100 44* 36* 00 101 5* 37*29* 01 14* 30* 22* 110 10 ��

Overview of LH File � In the middle of a round. Buckets split in this round: Bucket to be split If h ( search key value ) Level Next is in this range, must use h Level+1 ( search key value ) Buckets that existed at the to decide if entry is in beginning of this round: `split image' bucket. this is the range of h Level `split image' buckets: created (through splitting of other buckets) in this round �� Summary � Hash-based indexes: best for ______ searches, cannot support _____ searches. � Static Hashing can lead to ________________. � Extendible Hashing uses directory doubling to avoid ___________ � Duplicates may require ________________ � Linear hashing avoids directory by splitting in rounds � Naturally handles ______________ � Uses overflow buckets (but not very long in practice) ��

Hash-Based Indexes (From Chapter 11) - PDF document

Hash-Based Indexes (From Chapter 11) Introduction

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Module 7: Creating and Maintaining Indexes Overview Creating Indexes Creating Index

Modern OLTP Indexes (Part 2) 1 / 43 Modern OLTP Indexes (Part 2) Recap Recap 2 / 43 Modern OLTP

Indexing Shan-Hung Wu CS, NTHU Outline Overview API in VanillaCore Hash-Based

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

Generics Asumu Takikawa RacketCon 2012 1 What are generics? 2 What are generics? hash-ref

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

Module 6: Planning Indexes Overview Introduction to Indexes Index Architecture How

Security Proofs for the MD6 Hash Algorithm Ahmed Ezzat Outline Introduction to hash

LUX Hash Function Ivica Nikoli c, Alex Biryukov, Dmitry Khovratovich University of Luxembourg

HASH FUNCTIONS Mihir Bellare UCSD 1 Mihir Bellare UCSD 2 Hash functions Hash functions

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used

CS 10: Problem solving via Object Oriented Programming Winter

Program Security CMPSC 443 - Spring 2012 Introduction Computer and Network Security Professor

Chapter 6: File Systems File systems n Files n Directories & naming n File system

Finish Proj 3A NOW! No deadline extension for the rest of quarter Project 0 resubmission for

Database Management Course Content Systems Introduction Database Design Theory

FindStat a database and search engine for combinatorial statistics and maps Martin Rubey and

Peachnote Massive OMR recognized 1,6 M music sheets, 500 M notes multiple collections: IMSLP,

site with Apache solr Presentation by Janmejaya Mishra (drupal.org id - janmejaya) Deepak

Hash-Based Indexes (From Chapter 11) - PDF document

Hash-Based Indexes (From Chapter 11) Introduction

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Module 7: Creating and Maintaining Indexes Overview Creating Indexes Creating Index

Modern OLTP Indexes (Part 2) 1 / 43 Modern OLTP Indexes (Part 2) Recap Recap 2 / 43 Modern OLTP

Indexing Shan-Hung Wu CS, NTHU Outline Overview API in VanillaCore Hash-Based

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

Generics Asumu Takikawa RacketCon 2012 1 What are generics? 2 What are generics? hash-ref

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

Module 6: Planning Indexes Overview Introduction to Indexes Index Architecture How

Security Proofs for the MD6 Hash Algorithm Ahmed Ezzat Outline Introduction to hash

LUX Hash Function Ivica Nikoli c, Alex Biryukov, Dmitry Khovratovich University of Luxembourg

HASH FUNCTIONS Mihir Bellare UCSD 1 Mihir Bellare UCSD 2 Hash functions Hash functions

Topic 22 Hash Tables &quot; hash collision n. [from the techspeak] (var. `hash clash') When used

CS 10: Problem solving via Object Oriented Programming Winter

Program Security CMPSC 443 - Spring 2012 Introduction Computer and Network Security Professor

Chapter 6: File Systems File systems n Files n Directories &amp; naming n File system

Finish Proj 3A NOW! No deadline extension for the rest of quarter Project 0 resubmission for

Database Management Course Content Systems Introduction Database Design Theory

FindStat a database and search engine for combinatorial statistics and maps Martin Rubey and

Peachnote Massive OMR recognized 1,6 M music sheets, 500 M notes multiple collections: IMSLP,

site with Apache solr Presentation by Janmejaya Mishra (drupal.org id - janmejaya) Deepak

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used

Chapter 6: File Systems File systems n Files n Directories & naming n File system