Review Linked List Struktur Data & Algoritme insert, find, - - PowerPoint PPT Presentation

review
SMART_READER_LITE
LIVE PREVIEW

Review Linked List Struktur Data & Algoritme insert, find, - - PowerPoint PPT Presentation

Review Linked List Struktur Data & Algoritme insert, find, delete operations take O(n) Stack & Queue ( Data Structures & Algorithms ) insert, find, delete operations take O(1) but the access is restricted Hash Table


slide-1
SLIDE 1

1

Struktur Data & Algoritme ( Data Structures & Algorithms)

Denny (denny@cs.ui.ac.id) Suryana Setiawan (setiawan@cs.ui.ac.id)

Fakultas I lm u Kom puter Universitas I ndonesia Sem ester Genap - 2 0 0 4 / 2 0 0 5

Version 2 .0 - I nternal Use Only

Hash Table

SDA/ TOPIC/ V2.0/ 2

Review

Linked List insert, find, delete operations take O(n) Stack & Queue insert, find, delete operations take O(1) but the access is restricted Binary Search Tree insert, find, delete operations take O(log n) in average

case, but take O(n) in worst case

AVL Tree, Red-Black Tree insert, find, delete operations take O(log n)

SDA/ TOPIC/ V2.0/ 3

Review

Array all operations take O(1) time data accessed using index (integer) size should be determined first not growable

SDA/ TOPIC/ V2.0/ 4

Objectives

Understand hash table and its operations Understand the advantage and disadvantage using

hash table

slide-2
SLIDE 2

2

SDA/ TOPIC/ V2.0/ 5

Outline

Hashing Definition Hash function Collition resolution

  • Open hashing
  • Separate chaining
  • Closed hashing (Open addressing)
  • Linear probing
  • Quadratic probing
  • Double hashing
  • Primary Clustering, Secondary Clustering

Access: insert, find, delete

SDA/ TOPIC/ V2.0/ 6

Hash Tables

Hashing is used for storing relatively large amount of

data in a table called hash table ADT.

Hash table is usually fixed as H-size, which is larger

than the amount of data that we want to store.

We define the load factor (λ) to be the ratio of data to

the size of the hash table.

Hash function maps an item into an index in range. 1 2 3 H-1 key

hash function item hash table

SDA/ TOPIC/ V2.0/ 7

Hash Tables (2)

Hashing is a technique used to perform insertions,

deletions, and finds in constant average time.

To insert or find a certain data, we assign a key to the

elements and use a function to determine the location

  • f the element within the table called hash function.

Hash tables are arrays of cells with fixed size

containing data or keys corresponding to data.

For each key, we use the hashing function to map key

into some number in the range 0 to H-size-1 using hashing function.

SDA/ TOPIC/ V2.0/ 8

Hash Function

Hashing function should have the following features: Easy to compute. Two distinct key map to two different cells in array (Not

true in general) - why?.

This can be achieved by using direct-address table

where universal set of keys is reasonably small.

Distributes the keys evenly among cells. One simple hashing function is to use mod function

with a prime number.

Any manipulation of digits, with least complexity and

good distribution can be used.

slide-3
SLIDE 3

3

SDA/ TOPIC/ V2.0/ 9

Hash Function: Truncation

Part of the key is simply ignored, with the remainder

truncated or concatenated to form the index. Phone no: index 731-3018 338 539-2309 329 428-1397 217

SDA/ TOPIC/ V2.0/ 10

Hash Function: Folding

The data can be split up into smaller chunks which

are then folded together in some form. Phone no: 3-group index 7313018 73+13+018 104 5392309 53+92+309 454 4281397 42+81+397 520

SDA/ TOPIC/ V2.0/ 11

Hash Function: Modular arithmetic

Convert the data into an integer, divide by the size of

the hash table, and take the remainder as the index. 3-group index 731+3018 3749 % 100 = 49 539+2309 2848 % 100 = 48 428+1397 1825 % 100 = 25

SDA/ TOPIC/ V2.0/ 12

Choosing a hash function

A good has function should satisfy two criteria:

  • 1. It should be quick to compute
  • 2. It should minimize the number of collisions
slide-4
SLIDE 4

4

SDA/ TOPIC/ V2.0/ 13

Example of hash function

Hash function for string X = 128 A3 X3 + A2 X2 + A1 X1 + A0 X0 (((A3 X) + A2) X + A1) X + A0 The result of hash function is much larger than the

size of table, so we should modulo the result with the size of hash table.

SDA/ TOPIC/ V2.0/ 14

Example of hash function

int hash(String key, int tableSize) { int hashVal = 0; for (int i=0; i < key.length(); i++) { hashVal = (hashVal * 128 + key.charAt(i)) % tableSize; } return hashVal % tableSize; }

Modulo (A + B) % C = (A % C + B % C) % C (A * B) % C = (A % C * B % C) % C

SDA/ TOPIC/ V2.0/ 15

Example of hash function

int hash(String key, int tableSize) { int hashVal = 0; for (int i=0; i < key.length(); i++) { hashVal = (hashVal * 37 + key.charAt(i)); } hashVal %= tableSize; if (hashVal < 0) { hashVal += tableSize; } return hashVal; }

SDA/ TOPIC/ V2.0/ 16

Example of hash function

int hash(String key, int tableSize) { int hashVal = 0; for (int i=0; i < key.length(); i++) { hashVal += key.charAt(i) } return hashVal % tableSize; }

slide-5
SLIDE 5

5

SDA/ TOPIC/ V2.0/ 17

Collision resolution

When two keys map into the same cell, we get a

collision.

We may have collision in insertion, and need to set a

procedure (collision resolution) to resolve it.

SDA/ TOPIC/ V2.0/ 18

Closed Hashing

If collision, try to find alternative cells within table. Closed hashing also known as open addressing. For insertion, we try cells in sequence by using

incremented function like:

hi(x) = (hash(x) + f(i)) mod H-size

f(0) = 0

Function f is used as collision resolution strategy. The table is bigger than the number of data. Different method to choose function f : Linear probing Quadratic probing Double hashing

SDA/ TOPIC/ V2.0/ 19

Linear probing

Use a linear function

f(i) = i

Find the first position in the table for the key, which is

close to the actual position.

Least complex function. May result in primary clustering. Elements that hash to the different location probe the

same alternative cells

The complexity of this probing is dependent on the

value of λ (load factor).

We do not use this probing if λ > 0.5.

SDA/ TOPIC/ V2.0/ 20

Hashing - insert

dawn emerald

. . .

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

crystal marigold

alpha flamingo hallmark moon

slide-6
SLIDE 6

6

SDA/ TOPIC/ V2.0/ 21

. . .

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

cobalt? marigold? private?

alpha crystal dawn emerald flamingo hallmark moon marigold private

Hashing - lookup

SDA/ TOPIC/ V2.0/ 22

Hashing - delete

. . .

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

delete emerald delete moon

alpha crystal dawn flamingo hallmark marigold private lazy deletion - why?

SDA/ TOPIC/ V2.0/ 23

Hashing - operation after delete

. . .

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

custom (insert) marigold?

alpha crystal dawn flamingo hallmark marigold private

SDA/ TOPIC/ V2.0/ 24

. . . canary

alpha crystal dawn custom flamingo hallmark marigold private

cobalt . . . canary

alpha crystal dawn custom flamingo hallmark marigold private

dark

Primary Clustering

Elements that hash to the different location probe the

same alternative cells

slide-7
SLIDE 7

7

SDA/ TOPIC/ V2.0/ 25

Quadratic probing

Eliminate the primary clustering by selecting f(i) = i2 There is more problem with a hash table that is more

than half full.

You have to select appropriate table size that is not

square of a number.

We can prove that quadratic probing with table size

prime number and at least half empty will always find a location for an element.

Can use increment to collision by noting that

quadratic function f(i) = i2 = f(i-1) + 2 i - 1.

Elements that hash to the same location will probe

the same alternative cells (secondary clustering).

SDA/ TOPIC/ V2.0/ 26

Double hashing

Collision resolution function is another hash function

like f(i) = i * hash2 (x)

Each time a factor of hash2 (x) is added to probe. Have to be careful for the choice of second hash

function to ensure that it does not come to zero and it probes all the cells.

It is essential to have a prime size hash table.

SDA/ TOPIC/ V2.0/ 27

. . . canary

alpha crystal dawn custom flamingo hallmark marigold private

cobalt . . . done

alpha crystal dawn custom flamingo hallmark marigold private

dark

Double Hashing

SDA/ TOPIC/ V2.0/ 28

Open Hashing

Collision problems is solved by inserting all elements

that hash to the same bucket into a single collection

  • f values.

Open Hashing: To keep a linked list of all the elements that are

hashed to the same cell (separate chaining).

Each cell in the hash table contains a pointer to a

linked list containing the data.

Functions and Analysis of Open Hashing: Inserting a new element in to the table: We add the

element at the beginning or the end of the appropriate linked list.

Depending if you would want to check for duplicates or

not.

It also depends on how frequent you expect to access

the most recently added elements.

slide-8
SLIDE 8

8

SDA/ TOPIC/ V2.0/ 29

1 2 4 3 5

Open Hashing

SDA/ TOPIC/ V2.0/ 30

Open Hashing

For search, we use the hash function to determine

which linked list holds the element, and then traverse the linked list to find the element.

Deletion is done to the element in the appropriate

linked list after we find the element to be deleted.

We could use other kinds of lists like a tree or

another hash table for each cell in the hash table to resolve collision.

The main advantage of this method is the fact that it

can handle any amount of data (dynamic expansion).

The main disadvantage of this method is the memory

usage for each cell.

SDA/ TOPIC/ V2.0/ 31

Analysis of Open Hash

In general the average length of a list is the load

factor λ.

Complexity of insertion depends on hashing function

and where insertion is done but in general has the same complexity of insertion to the linked list + time to evaluate the hashing function used.

For search, time complexity is the constant time to

evaluate the hashing function + traversing the list.

Worst case O(n) for search. Average case depends λ. General rule for open hashing is to make λ≈1. Used for dynamic size data.

SDA/ TOPIC/ V2.0/ 32

Issues

Other issues common to all closed hashing

resolutions:

Confusing after deletion. Simpler than open hashing function Good if we do not expect too many collisions. If search is unsuccessful, we may have to search the

whole table.

Use of large table compare to number of data

expected.

slide-9
SLIDE 9

9

SDA/ TOPIC/ V2.0/ 33

Summary

Hash tables: array Hash function: function that maps key into number [0

⇒ size of hash table)

Collition resolution Open hashing

  • Separate chaining

Closed hashing (Open addressing)

  • Linear probing
  • Quadratic probing
  • Double hashing

Primary Clustering, Secondary Clustering

SDA/ TOPIC/ V2.0/ 34

Summary

Advantage running time

  • O(1) + O(collition resolution)

Disadvantage it’s difficult (not efficient) to print all elements in hash

table

it’s not efficient to find minimum element or maximum

element

not growable (for closed hash/open addressing) waste some space (load factor)

SDA/ TOPIC/ V2.0/ 35

Further Reading

Chapter 19

SDA/ TOPIC/ V2.0/ 36

What’s Next

Graph