1 1. Need a hashing function, h(k), that To provide a unique set - PDF document

Searching A systematic method for locating a record with a key value k j = K. Searching – successful search Chapter 9 – unsuccessful search sections 9.1-9.4.1 – exact match query – range query Maps A Simple List-Based Map • A map models a searchable collection • We can efficiently implement a map using an of key-value entries unsorted list • The main operations of a map are for – We store the items of the map in a list S (based on a doubly-linked list), in arbitrary order searching, inserting, and deleting items • Multiple entries with the same key are not allowed �� • Applications: – address book 5 c 8 c 9 c 6 c – student-record database �� Performance of a List- Hashing Based Map Table Representations of Data • Performance: – put takes O (1) time since we can insert the new item at the 1 to 1 mapping beginning or at the end of the sequence ex. 5000 employees – get and remove take O ( n ) time since in the worst case (the item is not found) we traverse the entire sequence to look for key= an item with the given key • The unsorted list implementation is effective only for spaces= maps of small size or for maps in which puts are the most common operations, while searches and removals are rarely performed (e.g., historical record of logins to a workstation) to provide a unique set of keys 1

1. Need a hashing function, h(k), that To provide a unique set of keys: maps key K onto an address in the YOU MUST HAVE A UNIQUE KEY! table. ex. binary search ≈ f(n) = 2 log 2 N 2. Must ensure that h(k 1 ) ≠ h(k 2 ) 2 log 2 5000 = 21.6 comparisons O(log 2 n) Simple, naïve example: table size = M h(k) = k mod M How would you like O(1)? Example: add 10,2,19,14,24,23 Choosing a Hash Function Table size=7 * a good hash function maps keys h(k) = k mod 7 uniformly and randomly 10 mod 7 = 3 – a poor hash function maps keys 2 mod 7 = 2 non-uniformly, or maps contiguous clusters 0 of keys into clusters of hash table locations. 1 19 mod 7 = 5 2 2 3 10 4 5 19 6 Where do 14, 24 and 23 go? Example Hash functions 2. Shift Folding – key is divided into sections 1. Division Method – sections are added together – choose a prime number as table size, M – ex. 9 digit key k=013402122 – interpret keys as integers – h(k) = k mod M 013+402+122 = 537 = h(k) – can use multiplication, subtraction, addition, (whatever) in some fashion, to combine into a final value 2

4. Middle Squaring 3. Boundary Folding – take middle portion of key, square it and adjust – like shift folding – ex. k = 013402122 – every other number is reversed before folding h(k) = 402 2 = 161604 – ex. k = 013402122 adjust - 013 + 204 + 122 = 339 = h(k) 1) could mod M – not much difference 2) could take middle 4 digits (6160) – decide which method gives better scattered result based on experiments – used for character codes. 5. Truncation 6. Digit or Character Extraction – similar to truncation – simply delete part of the key ε use – when key has a predictable value, extract remaining digits before applying another hash function. – ex. k = 013402122 h(k) = 122 – ex. a company coding scheme – easy to compute – not very random ε uniform – seldom used alone - commonly used in conjunction with another method Collisions 7. Random Number Generator ∼∼∼∼∼∼∼ – use key as the “seed” – next number computed is hash value def. h(k 1 ) = h( k 2 ) – unlikely 2 different seeds will give the recall we need to map keys in a same random number UNIFORM and RANDOM – can be computationally expensive fashion (For an arbitrary key, any possible table address, 0 to M-1, should be equally likely to be chosen by your hashing function.) 3

K = C 3 C 2 C 1, A BAD EXAMPLE numerically K= C 3 *256 2 *256 1 +C1*256 0 reduced mod 256 has value C 1 M = 2 8 = 256 Hash 6 keys: RX1,RX2,RX3,RY1,RY2,RY3 h(k) = k mod M = k mod 256 h(RX1) = h(RY1) = ‘1’ = 49 keys: variable names (registers in assembly) h(RX2) = h(RY2) = ‘2’ = 50 up to 3 characters, use 8-bit ASCII chars h(RX3) = h(RY3) = ‘3’ = 51 ( 1- 24 bit integer, divided into 3 equal 8 bit sections) 1) 6 original keys map into only 3 unique Problem with policy - has the effect of selecting addresses the low order character as the value of h(k). 2) contiguous runs of keys, in general, result in contiguous runs of table space Q(2) = Q(1) × (364/365) Hashing Q(3) = Q(2) × (363/365) How often do collisions really happen? develop a recurrence relation Von Mises Birthday Paradox * As soon as the table is 12.9% full, there is 23 + people, 50% probability of a match greater than a 95% chance that 2 will – probe is ∅ -1. collide. – Q(n) is probability if you randomly toss n § Moral of the story§ balls into a table with 365 slots, there will Even in sparsely occupied hash table, be no collision – P(n) = 1- Q(n) probability of a collision collisions are relatively common. – Q(1) = 1 // there will be no collisions 2) double hashing Collision Resolution Policies – calculate a probe decrement P(K) = max (1, K mod M) Open Addressing Inserting keys into other empty locations 3) rehashing in the table – apply h(k) – if a collision, apply h 1 (k) – if still a collision, apply h 2 (k) 1) linear probing – use entire sequence of hash functions – go to next open space – wrap around, if necessary 4

Collision Resolution Policies Example of Double Hashing Chaining • Consider a hash h ( k ) d ( k ) �� k use linked lists table storing integer �� keys that handles �� Quadratic Collision Processing collision with double �� hashing – examines locations whose distance form �� the initial collision point increases as the – N = 13 �� square of the distance from the previous �� – h ( k ) = k mod 13 location tried. – d ( k ) = 7 − k mod 7 – ex. h(k) = A we collide • Insert keys 18, 41, try A+1 2 , A+2 2 , A+3 2 , ... A+R 2 0 1 2 3 4 5 6 7 8 9 10 11 12 22, 44, 59, 32, 31, – uses wraparound 73, in this order – leaves increasingly larger gaps between �� successive relocation positions 0 1 2 3 4 5 6 7 8 9 10 11 12 Clusters Load Factors def, contiguous runs of occupied entries Primary Clustering Suppose table T is of size M, and N * look at linear probing entries are occupied, ( M-N are empty) * causes a small “puddle” of keys to α = N/M load factor of T form at the collision location. * the small puddle grows larger ex. M = 100, N = 75, α = 0.75 * the larger it grows, the faster it grows we say T is 75% full. * small puddles connect to form large puddles note - linear probing is subject to primary clustering Performance double hashing is not 1) Based on Uniformity and Randomness of h(k) Secondary Clustering 2) Based on Collision Resolution Policy – when any 2 keys have a collision at a given location, they both subsequently examine the 3) Based on Load Factor (Density of Table) same sequence of alternative locations until the collision is resolved. “ Density - Dependent Search Technique” – not as bad as primary clusters – secondary clusters do not form larger // means you can achieve a highly efficient result if you are willing to secondary clusters waste enough vacant records – quadratic collision processing is subject to secondary clusters 5

1 1. Need a hashing function, h(k), that To provide a unique set - PDF document

Searching A systematic method for locating a record with a key value k j = K. Searching successful search Chapter 9 unsuccessful search sections 9.1-9.4.1 exact match query range query Maps A Simple List-Based Map A

ME 515 Mechatronics Introduction to C++ Asanga Ratnaweera Department of Mechanical Engineering

Procedures in Assembly Procedures Syntax CS Basics Save Registers 7) Procedures

3. Java - Language Constructs I Convention for class names: use CamelCase Words are combined

Arithmetizing Circuits around NC 1 and L Raghavendra Rao B V Institute of Mathematical Sciences,

CS320: Performance Evaluation Plotting data sets Semi-log plots Log-log plots Analyzing Program

Transformations of Logarithmic Functions MHF4U: Advanced Functions Below are the graphs of f ( x )

Data processing and ab initio analysis Al Kikhney EMBL Hamburg Outline 3D 2D 1D

Assessing the PH Assumption So far, weve been considering the following Cox PH model:

201ab Quantitative methods non-linear Transformations E D V UL | UCSD Psychology 1 Linearly

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

iLab Modern cryptography for communications security Benjamin Hof hof@in.tum.de Lehrstuhl fr

Paging 11/10/16 Recall from Tuesday Our solution to fragmentation is to split up a processs

Operating Systems Provides services for programmers Schedules the execution of other

Hoare Logic and Model Checking Semantics of Hoare Logic Kasper Svendsen University of Cambridge

Intuitionistic Temporal Logic from Reactive Programming Wolfgang Jeltsch Institute of Cybernetics

De-Kun Li, MD, PhD Division of Research, Kaiser Foundation Research Institute, Kaiser

The webinar will begin shortly. Draft PFAS Chemical Action Plan Public Comment Webinar Were

61A Lecture 27 Friday, November 8 Announcements 2 Announcements Homework 8 due Tuesday

Web Mining and Recommender Systems Dimensionality Reduction Learning Goals In this section we

Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical

Search Marco Chiarandini Department of Mathematics & Computer Science University of Southern

UMTS Standardization UMTS Release 99 (2000) Based on GSM Based on GSM, Backward

LONG TERM EVOLUTION (LTE) ECE 525E-MOBILE COMMUNICATION Thursday, 25 April 2019 1 WHAT IS IS

Random Access Procedure UE picks one of the 64 RACH preambles available in an LTE cell. The

Sambuz

Useful Links

Newsletter

Mail Us

1 1. Need a hashing function, h(k), that To provide a unique set - PDF document

Searching A systematic method for locating a record with a key value k j = K. Searching successful search Chapter 9 unsuccessful search sections 9.1-9.4.1 exact match query range query Maps A Simple List-Based Map A

ME 515 Mechatronics Introduction to C++ Asanga Ratnaweera Department of Mechanical Engineering

Procedures in Assembly Procedures Syntax CS Basics Save Registers 7) Procedures

3. Java - Language Constructs I Convention for class names: use CamelCase Words are combined

Arithmetizing Circuits around NC 1 and L Raghavendra Rao B V Institute of Mathematical Sciences,

CS320: Performance Evaluation Plotting data sets Semi-log plots Log-log plots Analyzing Program

Transformations of Logarithmic Functions MHF4U: Advanced Functions Below are the graphs of f ( x )

Data processing and ab initio analysis Al Kikhney EMBL Hamburg Outline 3D 2D 1D

Assessing the PH Assumption So far, weve been considering the following Cox PH model:

201ab Quantitative methods non-linear Transformations E D V UL | UCSD Psychology 1 Linearly

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

iLab Modern cryptography for communications security Benjamin Hof hof@in.tum.de Lehrstuhl fr

Paging 11/10/16 Recall from Tuesday Our solution to fragmentation is to split up a processs

Operating Systems Provides services for programmers Schedules the execution of other

Hoare Logic and Model Checking Semantics of Hoare Logic Kasper Svendsen University of Cambridge

Intuitionistic Temporal Logic from Reactive Programming Wolfgang Jeltsch Institute of Cybernetics

De-Kun Li, MD, PhD Division of Research, Kaiser Foundation Research Institute, Kaiser

The webinar will begin shortly. Draft PFAS Chemical Action Plan Public Comment Webinar Were

61A Lecture 27 Friday, November 8 Announcements 2 Announcements Homework 8 due Tuesday

Web Mining and Recommender Systems Dimensionality Reduction Learning Goals In this section we

Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical

Search Marco Chiarandini Department of Mathematics &amp; Computer Science University of Southern

UMTS Standardization UMTS Release 99 (2000) Based on GSM Based on GSM, Backward

LONG TERM EVOLUTION (LTE) ECE 525E-MOBILE COMMUNICATION Thursday, 25 April 2019 1 WHAT IS IS

Random Access Procedure UE picks one of the 64 RACH preambles available in an LTE cell. The

Sambuz

Useful Links

Newsletter

Mail Us

Search Marco Chiarandini Department of Mathematics & Computer Science University of Southern