Hash- Tables Introduction Dictionary Dictionary stores key-value - - PowerPoint PPT Presentation

hash tables introduction dictionary
SMART_READER_LITE
LIVE PREVIEW

Hash- Tables Introduction Dictionary Dictionary stores key-value - - PowerPoint PPT Presentation

Hash- Tables Introduction Dictionary Dictionary stores key-value pairs Find( k ) Insert( k , v ) Delete( k ) O ( n ) O ( 1 ) O ( n ) List O ( log n ) O ( n ) O ( n ) Sorted Array O ( log n ) O ( log n ) O ( log n ) Balanced BST


slide-1
SLIDE 1

Hash- Tables

slide-2
SLIDE 2

Introduction

slide-3
SLIDE 3

Dictionary

Dictionary

◮ stores key-value pairs

Find(k) Insert(k, v) Delete(k) List

O(n) O(1) O(n)

Sorted Array

O(log n) O(n) O(n)

Balanced BST

O(log n) O(log n) O(log n)

Dictionary implementations we know. Goal

◮ All operations in O(1) time.

3 / 22

slide-4
SLIDE 4

Hash Tables

slide-5
SLIDE 5

Naive Approach

Direct Access Table

◮ One large array A. ◮ For each key value pair (k, v): A[k] = v.

A 1 2 k1 · · · k2 v1 v2 · · ·

Problems

  • 1. Keys must be a non-negative integers.
  • 2. Very large key range. Thus, huge amount of memory needed.

5 / 22

slide-6
SLIDE 6

Problem 1: Getting Integer Keys

Prehashing

◮ Take a key k and map it on a non-negative integer k′. ◮ Easy in theory, because all finite data can be represented as integer. ◮ k′ should not change when object changes. ◮ In ideal case: k′ is unique for the object.

6 / 22

slide-7
SLIDE 7

Problem 2: Getting Small Keys

Hashing

◮ U is huge universe of all possible (non-neg. int.) keys. ◮ Hash function hm : U → {0, 1, . . . , m − 1} reduces keys to small range

  • f integers. Ideally, m ∈ Θ(n) with a small constant c ≥ 1.

◮ Computing hm should require O(1) time with small constant.

Keyspace

hm

. . . Hashtable

1 2 m − 1 m − 2 m − 3

7 / 22

slide-8
SLIDE 8

Collisions

Problem with Hashing

◮ Because |U| ≫ m, in some cases: hm(k1) = hm(k2).

This is called collision. Questions

  • 1. How to design h such that number of collisions is low?
  • 2. How do we handle collisions?

For 1.

◮ For this class, assume hm is given and has uniform distribution of

hash values.

◮ Thus, expected size of sets with same hash value is n

m.

◮ α = n

m is called load factor of the table.

8 / 22

slide-9
SLIDE 9

Chaining

slide-10
SLIDE 10

Chaining

Idea

◮ Use a list (or other data structure) of colliding items in each slot of the

table.

k4 k1 k3 k0 k2

Find Operation

  • 1. Use hash to determine slot in table.
  • 2. Search in list for item.

10 / 22

slide-11
SLIDE 11

Open Addressing

slide-12
SLIDE 12

Open Addressing

Idea

◮ Store all items in the array (i. e., one item per slot).

Problem

◮ How do we handle collisions?

k1 k2

hm(k2)

?

Solution: Probing

◮ If slot is already used, compute new hash value. Repeat until free slot

was found

12 / 22

slide-13
SLIDE 13

Probing

Hash function h specifies order of slots for a key k.

hm : U × {0, 1, . . . , m − 1} → {0, 1, . . . , m − 1}

Resulting order: σ(k) = hm(k, 0), hm(k, 1), . . . , hm(k, m − 1) In ideal case, σ(k) is permutation of {0, 1, . . . , m − 1}.

k0 k1 k2

hm(k2, 0) hm(k2, 1) hm(k2, 2)

13 / 22

slide-14
SLIDE 14

Example

Let hm(49, 0) = 4, hm(49, 1) = 6, hm(49, 2) = 1, and hm(49, 3) = 5. Perform

  • 1. Insert(49)
  • 2. Delete(58)
  • 3. Find(49)

1 2 3 4 5 6 7

58 13 20 48

14 / 22

slide-15
SLIDE 15

Example

Let hm(49, 0) = 4, hm(49, 1) = 6, hm(49, 2) = 1, and hm(49, 3) = 5. Perform

  • 1. Insert(49)
  • 2. Delete(58)
  • 3. Find(49)

1 2 3 4 5 6 7

58 13 20 48 49

14 / 22

slide-16
SLIDE 16

Example

Let hm(49, 0) = 4, hm(49, 1) = 6, hm(49, 2) = 1, and hm(49, 3) = 5. Perform

  • 1. Insert(49)
  • 2. Delete(58)
  • 3. Find(49)

1 2 3 4 5 6 7

58 13 20 49 48 58

14 / 22

slide-17
SLIDE 17

Example

Let hm(49, 0) = 4, hm(49, 1) = 6, hm(49, 2) = 1, and hm(49, 3) = 5. Perform

  • 1. Insert(49)
  • 2. Delete(58)
  • 3. Find(49)

1 2 3 4 5 6 7

13 20 49 48 49

Not found

14 / 22

slide-18
SLIDE 18

Open Addressing – Delete

Delete

◮ Simple deletion can lead to failure of Insert/Find. ◮ Flag slot with deleted item as ‘Deleted’. ◮ Use flag for Insert/Find.

Question

◮ What if k is already in the table, but Insert encounters a field flagged

as ‘Deleted’?

15 / 22

slide-19
SLIDE 19

Probing Strategies – Linear Probing

Idea

◮ Slightly increase index

hm(k, i) = (hm(k) + i) mod m

Good

◮ Gives a permutation (i. e., no index checked twice)

Problem

◮ Clustering: consecutive groups of occupied slots. ◮ For 0.01 < α < 0.99, there are clusters of size Θ(log n) even if hm is

perfect.

16 / 22

slide-20
SLIDE 20

Probing Strategies – Double Hashing

h(k, i) = (h1(k) + i · h2(k)) mod m h1 and h2 should be independent, i. e., probability for h1(x) = h1(y) and h2(x) = h2(y) is 1 m2 .

Hits all slots if h2(k) and m have no common divisor (e. g. m = 2r and h2(k) is always odd). Assuming ideal hash function h, the expected cost for an operation is ≤

1 1 − α.

17 / 22

slide-21
SLIDE 21

Using Hash Tables

slide-22
SLIDE 22

Dictionaries

Dictionary

◮ Stores key-value pairs ◮ The key is an identifier. ◮ The value is an information associated with the key.

Operations

◮ Insert. Inserts or overrides a given key-value pair into the dictionary. ◮ Find. Returns the value associated with the given key.

(often implemented as two function: Contains and GetValue)

◮ Delete. Deletes the key-value pair with the key.

19 / 22

slide-23
SLIDE 23

Example: Counting

You are given an array A of integers. Determine the most frequent number.

20 / 22

slide-24
SLIDE 24

Example: Counting

You are given an array A of integers. Determine the most frequent number. Idea

◮ Numbers in A are keys. ◮ Value in dictionary is counter for associated key. ◮ Key with largest associated value is answer.

20 / 22

slide-25
SLIDE 25

Example: Counting

Input: An array A of integers. Output: The most frequent number in A.

1 Create empty dictionary D. 2 For i = 0 To |A| − 1 3

If D contains key A[i] Then

4

Insert key-value pair (A[i], 0).

5

Increase the value of key A[i] by 1.

6 Let k be the key in D with the largest associated value. 7 Return k

21 / 22

slide-26
SLIDE 26

Exercises

You are given an array A of integers. Find the last (i. e., with the highest index) non-repeating integer in A in linear time. You are given an array A of integers and an integer k. Determine whether there are two distinct indices i and j such that A[i] = A[j] and |i − j| ≤ k. Given two strings S and T (only lowercase letters). T is generated by shuffling S and then adding one more letter at a random position. Determine the letter that was added into T. You are given an array A of integers. Determine the longest connected subsequence without repeating characters.

22 / 22