Advanced Algorithms COMS31900 Hashing part three Cuckoo Hashing - - PowerPoint PPT Presentation

advanced algorithms coms31900 hashing part three cuckoo
SMART_READER_LITE
LIVE PREVIEW

Advanced Algorithms COMS31900 Hashing part three Cuckoo Hashing - - PowerPoint PPT Presentation

Advanced Algorithms COMS31900 Hashing part three Cuckoo Hashing Rapha el Clifford Slides by Benjamin Sach Back to the start (again) A dynamic dictionary stores ( key , value ) -pairs and supports: add ( key , value ) , lookup ( key )


slide-1
SLIDE 1

Advanced Algorithms – COMS31900 Hashing part three Cuckoo Hashing

Rapha¨ el Clifford Slides by Benjamin Sach

slide-2
SLIDE 2

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining A hash function maps For any n operations, the expected run-time is O(1) per operation. Using weakly universal hashing:

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key)

a key x to position h(x)

A set H of hash functions is weakly universal if for any two keys x, y ∈ U (with x = y),

Pr

  • h(x) = h(y)
  • 1

m

(h is picked uniformly at random from H)

slide-3
SLIDE 3

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining A hash function maps For any n operations, the expected run-time is O(1) per operation. Using weakly universal hashing:

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key) in fact this result can be generalised . . .

a key x to position h(x)

A set H of hash functions is weakly universal if for any two keys x, y ∈ U (with x = y),

Pr

  • h(x) = h(y)
  • 1

m

(h is picked uniformly at random from H)

slide-4
SLIDE 4

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining A hash function maps For any n operations, the expected run-time is O(1) per operation. Using weakly universal hashing:

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key) in fact this result can be generalised . . .

a key x to position h(x) bucketing

A set H of hash functions is weakly universal if for any two keys x, y ∈ U (with x = y),

Pr

  • h(x) = h(y)
  • 1

m

(h is picked uniformly at random from H)

slide-5
SLIDE 5

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining A hash function maps For any n operations, the expected run-time is O(1) per operation. Using weakly universal hashing:

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key) in fact this result can be generalised . . .

a key x to position h(x) bucketing

A set H of hash functions is weakly universal if for any two keys x, y ∈ U (with x = y),

Pr

  • h(x) = h(y)
  • 1

m

(h is picked uniformly at random from H)

slide-6
SLIDE 6

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining A hash function maps For any n operations, the expected run-time is O(1) per operation. Using weakly universal hashing:

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key) in fact this result can be generalised . . .

a key x to position h(x) bucketing

A set H of hash functions is weakly universal if for any two keys x, y ∈ U (with x = y),

Pr

  • h(x) = h(y)
  • 1

m

(h is picked uniformly at random from H)

slide-7
SLIDE 7

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining A hash function maps For any n operations, the expected run-time is O(1) per operation. Using weakly universal hashing:

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key) in fact this result can be generalised . . .

a key x to position h(x) bucketing

A set H of hash functions is weakly universal if for any two keys x, y ∈ U (with x = y),

Pr

  • h(x) = h(y)
  • 1

m

(h is picked uniformly at random from H)

slide-8
SLIDE 8

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining A hash function maps For any n operations, the expected run-time is O(1) per operation. Using weakly universal hashing:

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key) in fact this result can be generalised . . .

a key x to position h(x) bucketing

We require that we can recover

any key from its bucket in O(s) time

where s is the number of keys in the bucket A set H of hash functions is weakly universal if for any two keys x, y ∈ U (with x = y),

Pr

  • h(x) = h(y)
  • 1

m

(h is picked uniformly at random from H)

slide-9
SLIDE 9

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining A hash function maps For any n operations, the expected run-time is O(1) per operation. Using weakly universal hashing:

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key) in fact this result can be generalised . . .

a key x to position h(x) bucketing

We require that we can recover

any key from its bucket in O(s) time

where s is the number of keys in the bucket A set H of hash functions is weakly universal if for any two keys x, y ∈ U (with x = y),

Pr

  • h(x) = h(y)
  • 1

m

(h is picked uniformly at random from H)

slide-10
SLIDE 10

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining For any n operations, the expected run-time is O(1) per operation. Using weakly universal hashing:

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key) in fact this result can be generalised . . .

bucketing

We require that we can recover

any key from its bucket in O(s) time

where s is the number of keys in the bucket A set H of hash functions is weakly universal if for any two keys x, y ∈ U (with x = y),

Pr

  • h(x) = h(y)
  • 1

m

(h is picked uniformly at random from H)

slide-11
SLIDE 11

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining For any n operations, the expected run-time is O(1) per operation. Using weakly universal hashing:

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key) in fact this result can be generalised . . .

bucketing

We require that we can recover

any key from its bucket in O(s) time

where s is the number of keys in the bucket A set H of hash functions is weakly universal if for any two keys x, y ∈ U (with x = y),

Pr

  • h(x) = h(y)
  • 1

m

(h is picked uniformly at random from H) Locating the bucket containing a given key takes O(1) time

slide-12
SLIDE 12

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key)

bucketing

We require that we can recover

any key from its bucket in O(s) time

where s is the number of keys in the bucket Locating the bucket containing a given key takes O(1) time

slide-13
SLIDE 13

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key)

bucketing

We require that we can recover

any key from its bucket in O(s) time

where s is the number of keys in the bucket

If our construction has the property that, for any two keys x, y ∈ U (with x = y), the probability that x and y are in the same bucket is O

1

m

  • Locating the bucket containing

a given key takes O(1) time

slide-14
SLIDE 14

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key)

bucketing

We require that we can recover

any key from its bucket in O(s) time

where s is the number of keys in the bucket

If our construction has the property that, for any two keys x, y ∈ U (with x = y), the probability that x and y are in the same bucket is O

1

m

  • Locating the bucket containing

a given key takes O(1) time

slide-15
SLIDE 15

Back to the start (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key)

bucketing

We require that we can recover

any key from its bucket in O(s) time

where s is the number of keys in the bucket

If our construction has the property that, for any two keys x, y ∈ U (with x = y), the probability that x and y are in the same bucket is O

1

m

  • For any n operations, the expected run-time is O(1) per operation.

Locating the bucket containing a given key takes O(1) time

slide-16
SLIDE 16

Dynamic perfect hashing

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time
slide-17
SLIDE 17

Dynamic perfect hashing

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

What does amortised expected O(1) time mean?!

slide-18
SLIDE 18

Dynamic perfect hashing

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

What does amortised expected O(1) time mean?! let’s build it up. . .

slide-19
SLIDE 19

Dynamic perfect hashing

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

What does amortised expected O(1) time mean?! let’s build it up. . . “O(1) worst-case time per operation”

means every operation takes constant time

slide-20
SLIDE 20

Dynamic perfect hashing

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

What does amortised expected O(1) time mean?! let’s build it up. . . “O(1) worst-case time per operation”

means every operation takes constant time

“The total worst-case time complexity of performing any n operations is O(n)”

slide-21
SLIDE 21

Dynamic perfect hashing

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

What does amortised expected O(1) time mean?! let’s build it up. . . “O(1) worst-case time per operation”

means every operation takes constant time

“The total worst-case time complexity of performing any n operations is O(n)”

this does not imply that every operation takes constant time

slide-22
SLIDE 22

Dynamic perfect hashing

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

What does amortised expected O(1) time mean?! let’s build it up. . . “O(1) worst-case time per operation”

means every operation takes constant time

“The total worst-case time complexity of performing any n operations is O(n)”

this does not imply that every operation takes constant time However, it does mean that the amortised worst-case time complexity of an operation is O(1)

slide-23
SLIDE 23

Dynamic perfect hashing

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

What does amortised expected O(1) time mean?! let’s build it up. . . “O(1) worst-case time per operation”

means every operation takes constant time in expectation

“The total worst-case time complexity of performing any n operations is O(n)”

this does not imply that every operation takes constant time in expectation However, it does mean that the amortised worst-case time complexity of an operation is O(1)

expected expected

expected

slide-24
SLIDE 24

Dynamic perfect hashing

In Cuckoo hashing there is a single hash table but two hash functions: h1 and h2.

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time
slide-25
SLIDE 25

Dynamic perfect hashing

In Cuckoo hashing there is a single hash table but two hash functions: h1 and h2.

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

Each key in the table is either stored at position h1(x) or h2(x).

slide-26
SLIDE 26

Dynamic perfect hashing

In Cuckoo hashing there is a single hash table but two hash functions: h1 and h2.

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

Each key in the table is either stored at position h1(x) or h2(x).

h1(x) h2(x)

slide-27
SLIDE 27

Dynamic perfect hashing

In Cuckoo hashing there is a single hash table but two hash functions: h1 and h2.

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

Each key in the table is either stored at position h1(x) or h2(x).

h1(x)

x

h2(x)

slide-28
SLIDE 28

Dynamic perfect hashing

In Cuckoo hashing there is a single hash table but two hash functions: h1 and h2.

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

Each key in the table is either stored at position h1(x) or h2(x).

h1(x) h2(x)

x

slide-29
SLIDE 29

Dynamic perfect hashing

In Cuckoo hashing there is a single hash table but two hash functions: h1 and h2.

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

Each key in the table is either stored at position h1(x) or h2(x).

h1(x)

x

h2(x)

slide-30
SLIDE 30

Dynamic perfect hashing

In Cuckoo hashing there is a single hash table but two hash functions: h1 and h2.

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

Each key in the table is either stored at position h1(x) or h2(x).

h1(x) h2(x)

x

slide-31
SLIDE 31

Dynamic perfect hashing

In Cuckoo hashing there is a single hash table but two hash functions: h1 and h2.

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

Each key in the table is either stored at position h1(x) or h2(x).

h1(x) h2(x)

x

Important: We never store multiple keys at the same position

slide-32
SLIDE 32

Dynamic perfect hashing

In Cuckoo hashing there is a single hash table but two hash functions: h1 and h2.

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

Each key in the table is either stored at position h1(x) or h2(x).

h1(x) h2(x)

x

Important: We never store multiple keys at the same position Therefore, as claimed, lookup takes O(1) time. . .

slide-33
SLIDE 33

Dynamic perfect hashing

In Cuckoo hashing there is a single hash table but two hash functions: h1 and h2.

A dynamic dictionary stores (key, value)-pairs and supports:

add(key, value), lookup(key) (which returns value) and delete(key)

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

Each key in the table is either stored at position h1(x) or h2(x).

h1(x) h2(x)

x

Important: We never store multiple keys at the same position Therefore, as claimed, lookup takes O(1) time. . . but how do we do inserts?

slide-34
SLIDE 34

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

x

slide-35
SLIDE 35

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop (and congratulate yourself on a job well done)

x

slide-36
SLIDE 36

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

x

slide-37
SLIDE 37

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

x

slide-38
SLIDE 38

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

y x

slide-39
SLIDE 39

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x)

y x

slide-40
SLIDE 40

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x) evict key y and replace it with key x

y x

slide-41
SLIDE 41

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x) evict key y and replace it with key x

y x

slide-42
SLIDE 42

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x) evict key y and replace it with key x

where should we put key y? y x

slide-43
SLIDE 43

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x) evict key y and replace it with key x

where should we put key y?

in the other position it’s allowed in

y x

slide-44
SLIDE 44

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x) evict key y and replace it with key x

where should we put key y?

in the other position it’s allowed in

y x

h1(y) h2(y)

slide-45
SLIDE 45

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x) evict key y and replace it with key x Step 3: Let pos be the other position y is allowed to be in

i.e pos = h2(y) if h1(x) = h1(y) and pos = h1(y) otherwise

y x

h1(y) h2(y)

slide-46
SLIDE 46

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x) evict key y and replace it with key x Step 3: Let pos be the other position y is allowed to be in

i.e pos = h2(y) if h1(x) = h1(y) and pos = h1(y) otherwise

Step 4: Attempt to put y in position pos

if that position is empty, stop

y x

h1(y) h2(y)

slide-47
SLIDE 47

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x) evict key y and replace it with key x Step 3: Let pos be the other position y is allowed to be in

i.e pos = h2(y) if h1(x) = h1(y) and pos = h1(y) otherwise

Step 4: Attempt to put y in position pos

if that position is empty, stop

h1(y) h2(y)

y x

slide-48
SLIDE 48

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x) evict key y and replace it with key x Step 3: Let pos be the other position y is allowed to be in

i.e pos = h2(y) if h1(x) = h1(y) and pos = h1(y) otherwise

Step 4: Attempt to put y in position pos

if that position is empty, stop

z y x

h1(y) h2(y) h1(z) h2(z)

slide-49
SLIDE 49

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x) evict key y and replace it with key x Step 3: Let pos be the other position y is allowed to be in

i.e pos = h2(y) if h1(x) = h1(y) and pos = h1(y) otherwise

Step 4: Attempt to put y in position pos

if that position is empty, stop

Step 5: Let z be the key currently in position pos evict key z and replace it with key y

z y x

h1(y) h2(y) h1(z) h2(z)

slide-50
SLIDE 50

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x) evict key y and replace it with key x Step 3: Let pos be the other position y is allowed to be in

i.e pos = h2(y) if h1(x) = h1(y) and pos = h1(y) otherwise

Step 4: Attempt to put y in position pos

if that position is empty, stop

Step 5: Let z be the key currently in position pos evict key z and replace it with key y

h1(y) h2(y)

y x z

h1(z) h2(z)

slide-51
SLIDE 51

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x) evict key y and replace it with key x Step 3: Let pos be the other position y is allowed to be in

i.e pos = h2(y) if h1(x) = h1(y) and pos = h1(y) otherwise

Step 4: Attempt to put y in position pos

if that position is empty, stop

Step 5: Let z be the key currently in position pos evict key z and replace it with key y

h1(y) h2(y)

y x z

h1(z) h2(z)

slide-52
SLIDE 52

Inserts in Cuckoo hashing

h1(x) h2(x)

Step 1: Attempt to put x in position h1(x)

if that position is empty, stop

Step 2: Let y be the key currently in position h1(x) evict key y and replace it with key x Step 3: Let pos be the other position y is allowed to be in

i.e pos = h2(y) if h1(x) = h1(y) and pos = h1(y) otherwise

Step 4: Attempt to put y in position pos

if that position is empty, stop

Step 5: Let z be the key currently in position pos evict key z and replace it with key y

and so on. . .

h1(y) h2(y)

y x z

h1(z) h2(z)

slide-53
SLIDE 53

Pseudocode

h1(x) h2(x) h1(y) h2(y)

y x z

h1(z) h2(z)

add(x):

pos ← h1(x) Repeat at most n times: If T[pos] is empty then T[pos] ← x. Otherwise, y ← T[pos], T[pos] ← x,

pos ← the other possible location for y. (i.e. if y was evicted from h1(y) then pos ← h2(y), otherwise pos ← h1(y).)

x ← y.

Repeat

Give up and rehash the whole table.

i.e. empty the table, pick two new hash functions and reinsert every key

slide-54
SLIDE 54

Rehashing

If we fail to insert a new key x, (i.e. we still have an “evicted” key after moving around keys n times) then we declare the table “rubbish” and rehash.

slide-55
SLIDE 55

Rehashing

If we fail to insert a new key x,

  • (i.e. we still have an “evicted” key after moving around keys n times)

then we declare the table “rubbish” and rehash. What does rehashing involve?

slide-56
SLIDE 56

Rehashing

If we fail to insert a new key x,

  • (i.e. we still have an “evicted” key after moving around keys n times)

then we declare the table “rubbish” and rehash. What does rehashing involve? Suppose that the table contains the k keys x1, . . . , xk at the time of we fail to insert key x.

slide-57
SLIDE 57

Rehashing

If we fail to insert a new key x,

  • To rehash we:

(i.e. we still have an “evicted” key after moving around keys n times) then we declare the table “rubbish” and rehash. What does rehashing involve? Suppose that the table contains the k keys x1, . . . , xk at the time of we fail to insert key x.

slide-58
SLIDE 58

Rehashing

If we fail to insert a new key x,

  • To rehash we:

(i.e. we still have an “evicted” key after moving around keys n times) then we declare the table “rubbish” and rehash. What does rehashing involve? Suppose that the table contains the k keys x1, . . . , xk at the time of we fail to insert key x. Randomly pick two new hash functions h1 and h2. (More about this in a minute.)

slide-59
SLIDE 59

Rehashing

If we fail to insert a new key x,

  • To rehash we:

(i.e. we still have an “evicted” key after moving around keys n times) then we declare the table “rubbish” and rehash. What does rehashing involve? Suppose that the table contains the k keys x1, . . . , xk at the time of we fail to insert key x. Randomly pick two new hash functions h1 and h2. (More about this in a minute.) Build a new empty hash table of the same size

slide-60
SLIDE 60

Rehashing

If we fail to insert a new key x,

  • Reinsert the keys x1, . . . , xk and then x,

To rehash we: (i.e. we still have an “evicted” key after moving around keys n times) then we declare the table “rubbish” and rehash. What does rehashing involve? Suppose that the table contains the k keys x1, . . . , xk at the time of we fail to insert key x. Randomly pick two new hash functions h1 and h2. (More about this in a minute.) Build a new empty hash table of the same size

  • ne by one, using the normal add operation.
slide-61
SLIDE 61

Rehashing

If we fail to insert a new key x,

  • Reinsert the keys x1, . . . , xk and then x,

To rehash we: (i.e. we still have an “evicted” key after moving around keys n times) then we declare the table “rubbish” and rehash. What does rehashing involve? Suppose that the table contains the k keys x1, . . . , xk at the time of we fail to insert key x. Randomly pick two new hash functions h1 and h2. (More about this in a minute.) Build a new empty hash table of the same size If we fail while rehashing. . . we start from the beginning again

  • ne by one, using the normal add operation.
slide-62
SLIDE 62

Rehashing

If we fail to insert a new key x,

  • Reinsert the keys x1, . . . , xk and then x,

To rehash we: (i.e. we still have an “evicted” key after moving around keys n times) then we declare the table “rubbish” and rehash. What does rehashing involve? Suppose that the table contains the k keys x1, . . . , xk at the time of we fail to insert key x. Randomly pick two new hash functions h1 and h2. (More about this in a minute.) Build a new empty hash table of the same size If we fail while rehashing. . . we start from the beginning again This is rather slow. . . but we will prove that it happens rarely

  • ne by one, using the normal add operation.
slide-63
SLIDE 63

Assumptions

We will follow the analysis in the paper Cuckoo hashing for undergraduates, 2006, by Rasmus Pagh (see the link on unit web page).

slide-64
SLIDE 64

Assumptions

We will follow the analysis in the paper Cuckoo hashing for undergraduates, 2006, by Rasmus Pagh (see the link on unit web page). We make the following assumptions:

slide-65
SLIDE 65

Assumptions

We will follow the analysis in the paper Cuckoo hashing for undergraduates, 2006, by Rasmus Pagh (see the link on unit web page). We make the following assumptions:

h1 and h2 are independent

i.e. h1(x) says nothing about h2(x), and vice versa.

slide-66
SLIDE 66

Assumptions

We will follow the analysis in the paper Cuckoo hashing for undergraduates, 2006, by Rasmus Pagh (see the link on unit web page). We make the following assumptions:

h1 and h2 are truly random

i.e. each key is independently mapped to each of the m positions

in the hash table with probability 1

m .

h1 and h2 are independent

i.e. h1(x) says nothing about h2(x), and vice versa.

slide-67
SLIDE 67

Assumptions

We will follow the analysis in the paper Cuckoo hashing for undergraduates, 2006, by Rasmus Pagh (see the link on unit web page). We make the following assumptions:

h1 and h2 are truly random

Computing the value of h1(x) and h2(x) takes O(1) worst-case time

i.e. each key is independently mapped to each of the m positions

in the hash table with probability 1

m .

h1 and h2 are independent

i.e. h1(x) says nothing about h2(x), and vice versa.

slide-68
SLIDE 68

Assumptions

We will follow the analysis in the paper Cuckoo hashing for undergraduates, 2006, by Rasmus Pagh (see the link on unit web page). We make the following assumptions:

h1 and h2 are truly random

Computing the value of h1(x) and h2(x) takes O(1) worst-case time

i.e. each key is independently mapped to each of the m positions

in the hash table with probability 1

m .

h1 and h2 are independent

i.e. h1(x) says nothing about h2(x), and vice versa.

There are at most n keys in the hash table at any time.

slide-69
SLIDE 69

Assumptions

We will follow the analysis in the paper Cuckoo hashing for undergraduates, 2006, by Rasmus Pagh (see the link on unit web page). We make the following assumptions:

h1 and h2 are truly random

Computing the value of h1(x) and h2(x) takes O(1) worst-case time

i.e. each key is independently mapped to each of the m positions

in the hash table with probability 1

m .

h1 and h2 are independent

i.e. h1(x) says nothing about h2(x), and vice versa.

There are at most n keys in the hash table at any time.

REASONABLE ASSUMPTION

slide-70
SLIDE 70

Assumptions

We will follow the analysis in the paper Cuckoo hashing for undergraduates, 2006, by Rasmus Pagh (see the link on unit web page). We make the following assumptions:

h1 and h2 are truly random

Computing the value of h1(x) and h2(x) takes O(1) worst-case time

i.e. each key is independently mapped to each of the m positions

in the hash table with probability 1

m .

h1 and h2 are independent

i.e. h1(x) says nothing about h2(x), and vice versa.

There are at most n keys in the hash table at any time.

UNREASONABLE ASSUMPTION REASONABLE ASSUMPTION

slide-71
SLIDE 71

Assumptions

We will follow the analysis in the paper Cuckoo hashing for undergraduates, 2006, by Rasmus Pagh (see the link on unit web page). We make the following assumptions:

h1 and h2 are truly random

Computing the value of h1(x) and h2(x) takes O(1) worst-case time

i.e. each key is independently mapped to each of the m positions

in the hash table with probability 1

m .

h1 and h2 are independent

i.e. h1(x) says nothing about h2(x), and vice versa.

There are at most n keys in the hash table at any time.

UNREASONABLE ASSUMPTION REASONABLE ASSUMPTION

QUESTIONABLE

ASSUMPTION

slide-72
SLIDE 72

Assumptions

We will follow the analysis in the paper Cuckoo hashing for undergraduates, 2006, by Rasmus Pagh (see the link on unit web page). We make the following assumptions:

h1 and h2 are truly random

Computing the value of h1(x) and h2(x) takes O(1) worst-case time

i.e. each key is independently mapped to each of the m positions

in the hash table with probability 1

m .

h1 and h2 are independent

i.e. h1(x) says nothing about h2(x), and vice versa.

There are at most n keys in the hash table at any time.

UNREASONABLE ASSUMPTION REASONABLE ASSUMPTION

QUESTIONABLE

ASSUMPTION

N O T A C T U A L L Y A N

ASSUMPTION

slide-73
SLIDE 73

Cuckoo graph

Hash table

(size m)

slide-74
SLIDE 74

Cuckoo graph

Hash table The cuckoo graph:

(size m)

slide-75
SLIDE 75

Cuckoo graph

Hash table The cuckoo graph:

(size m)

A vertex for each position of the table.

slide-76
SLIDE 76

Cuckoo graph

Hash table The cuckoo graph:

m vertices

(size m)

A vertex for each position of the table.

slide-77
SLIDE 77

Cuckoo graph

Hash table The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

slide-78
SLIDE 78

Cuckoo graph

Hash table

x1 h2(x1) h1(x1)

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

slide-79
SLIDE 79

Cuckoo graph

Hash table

x1 h2(x1) h1(x1) x2

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

slide-80
SLIDE 80

Cuckoo graph

Hash table

x1 h2(x1) h1(x1) x2 x3

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

slide-81
SLIDE 81

Cuckoo graph

Hash table

x1 h2(x1) h1(x1) x2 x3 x4

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

slide-82
SLIDE 82

Cuckoo graph

Hash table

x2 x3 x4

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

x1

slide-83
SLIDE 83

Cuckoo graph

Hash table

x2 x3 x4 x5

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

x1

slide-84
SLIDE 84

Cuckoo graph

Hash table

x2 x3 x4 x5

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

h1(x5) x1 h2(x5)

slide-85
SLIDE 85

Cuckoo graph

Hash table

x2 x3 x4 x5

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

h1(x5) x1 h2(x5)

There is no space for x5. . .

slide-86
SLIDE 86

Cuckoo graph

Hash table

x2 x3 x4 x5

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

h1(x5) x1 h2(x5)

There is no space for x5. . . so we make space by moving x2 and then x3

slide-87
SLIDE 87

Cuckoo graph

Hash table

x2 x3 x4 x5

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

h1(x5) x1 h2(x5)

There is no space for x5. . . so we make space by moving x2 and then x3

slide-88
SLIDE 88

Cuckoo graph

Hash table

x2 x3 x4 x5

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

h1(x5) x1 h2(x5)

There is no space for x5. . . so we make space by moving x2 and then x3

slide-89
SLIDE 89

Cuckoo graph

Hash table

x2 x4 x5

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

x3 h1(x5) x1 h2(x5)

There is no space for x5. . . so we make space by moving x2 and then x3

slide-90
SLIDE 90

Cuckoo graph

Hash table

x2 x4 x5

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

x3 h1(x5) x1 h2(x5)

There is no space for x5. . . so we make space by moving x2 and then x3 The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph

slide-91
SLIDE 91

Cuckoo graph

Hash table

x2 x4 x5

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

x3 x1

The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph

slide-92
SLIDE 92

Cuckoo graph

Hash table

x2 x4 x5

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

x3 x1

The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph

slide-93
SLIDE 93

Cuckoo graph

Hash table

x2 x4 x5 x6

The cuckoo graph: For each key x there is an undirected edge

m vertices

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

x3 x1

The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph

slide-94
SLIDE 94

Cuckoo graph

Hash table

x2 x4 x5 x6

The cuckoo graph: For each key x there is an undirected edge

m vertices

Inserting key x6 creates a cycle.

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

x3 x1

The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph

slide-95
SLIDE 95

Cuckoo graph

Hash table

x2 x4 x5 x6

The cuckoo graph: For each key x there is an undirected edge

m vertices

Inserting key x6 creates a cycle.

(size m)

A vertex for each position of the table. between h1(x) and h2(x). Cycles are dangerous. . .

x3 x1

The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph

slide-96
SLIDE 96

Cuckoo graph

Hash table

x2 x4 x5 x6

The cuckoo graph: For each key x there is an undirected edge

m vertices

Inserting key x6 creates a cycle.

x7

(size m)

A vertex for each position of the table. between h1(x) and h2(x). Cycles are dangerous. . .

x3 x1

The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph

slide-97
SLIDE 97

Cuckoo graph

Hash table

x2 x4 x5 x6

The cuckoo graph: For each key x there is an undirected edge

m vertices

Inserting key x6 creates a cycle.

x7

When key x7 is inserted where does it go?

(size m)

A vertex for each position of the table. between h1(x) and h2(x). Cycles are dangerous. . .

x3 x1

The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph

slide-98
SLIDE 98

Cuckoo graph

Hash table

x2 x4 x5 x6

The cuckoo graph: For each key x there is an undirected edge

m vertices

Inserting key x6 creates a cycle.

x7

When key x7 is inserted where does it go?

(size m)

A vertex for each position of the table. between h1(x) and h2(x). Cycles are dangerous. . .

x3 x1

there are 6 keys but only 5 spaces The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph

slide-99
SLIDE 99

Cuckoo graph

Hash table

x2 x4 x5 x6

The cuckoo graph: For each key x there is an undirected edge

m vertices

Inserting key x6 creates a cycle.

x7

When key x7 is inserted where does it go?

(size m)

A vertex for each position of the table. between h1(x) and h2(x). Cycles are dangerous. . . The keys would be moved around in an infinite loop

x3

but we stop and rehash after n moves. . .

x1

there are 6 keys but only 5 spaces The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph

slide-100
SLIDE 100

Cuckoo graph

Hash table

x2 x4 x5 x6

The cuckoo graph: For each key x there is an undirected edge

m vertices

Inserting key x6 creates a cycle.

x7

When key x7 is inserted where does it go?

(size m)

A vertex for each position of the table. between h1(x) and h2(x). Cycles are dangerous. . . The keys would be moved around in an infinite loop

x3

but we stop and rehash after n moves. . .

x1

there are 6 keys but only 5 spaces The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph Inserting a key into a cycle always causes a rehash

slide-101
SLIDE 101

Cuckoo graph

Hash table

x2 x4 x5 x6

The cuckoo graph: For each key x there is an undirected edge

m vertices x7

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

x3 x1

The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph Inserting a key into a cycle always causes a rehash

slide-102
SLIDE 102

Cuckoo graph

Hash table

x2 x4 x5 x6

The cuckoo graph: For each key x there is an undirected edge

m vertices x7

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

x3 x1

The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph Inserting a key into a cycle always causes a rehash

slide-103
SLIDE 103

Cuckoo graph

Hash table

x2 x4 x5 x6

The cuckoo graph: For each key x there is an undirected edge

m vertices x7

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

x3 x1

The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph Inserting a key into a cycle always causes a rehash This is the only way a rehash can happen

slide-104
SLIDE 104

Cuckoo graph

Hash table

x2 x4 x5 x6

The cuckoo graph: For each key x there is an undirected edge

m vertices x7

We will analyse the probability of either a cycle or a long path occuring in the graph

(size m)

A vertex for each position of the table. between h1(x) and h2(x).

x3

while inserting any n keys.

x1

The number of moves performed while adding a key is the length of the corresponding path in the cuckoo graph Inserting a key into a cycle always causes a rehash This is the only way a rehash can happen

slide-105
SLIDE 105

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

slide-106
SLIDE 106

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

slide-107
SLIDE 107

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

slide-108
SLIDE 108

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

i j

slide-109
SLIDE 109

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

i j

Probability of a shortest path of length 1 is at most

1 2·m

slide-110
SLIDE 110

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

i j

Probability of a shortest path of length 2 is at most

1 4·m

slide-111
SLIDE 111

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

i j

Probability of a shortest path of length 3 is at most

1 8·m

slide-112
SLIDE 112

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

i j

Probability of a shortest path of length 4 is at most

1 16·m

slide-113
SLIDE 113

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

i j

Probability of a shortest path of length 4 is at most

1 16·m

How likely is it that there even is a path?

slide-114
SLIDE 114

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

How likely is it that there even is a path?

slide-115
SLIDE 115

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

How likely is it that there even is a path?

slide-116
SLIDE 116

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

How likely is it that there even is a path?

If a path exists from i to j, there must be a shortest path (from i to j)

slide-117
SLIDE 117

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

How likely is it that there even is a path?

If a path exists from i to j, there must be a shortest path (from i to j) Therefore the probability of a path from i to j existing is at most. . .

ℓ=1 1 cℓ·m

(using the union bound over all possible path lengths.)

slide-118
SLIDE 118

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

How likely is it that there even is a path?

If a path exists from i to j, there must be a shortest path (from i to j) Therefore the probability of a path from i to j existing is at most. . .

ℓ=1 1 cℓ·m

(using the union bound over all possible path lengths.)

= 1

m

ℓ=1 1 cℓ

slide-119
SLIDE 119

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

How likely is it that there even is a path?

If a path exists from i to j, there must be a shortest path (from i to j) Therefore the probability of a path from i to j existing is at most. . .

ℓ=1 1 cℓ·m

(using the union bound over all possible path lengths.)

= 1

m

ℓ=1 1 cℓ = 1 m·(c−1) = 1 m

slide-120
SLIDE 120

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What does this say?

(let c = 2 for simplicity)

How likely is it that there even is a path?

If a path exists from i to j, there must be a shortest path (from i to j) Therefore the probability of a path from i to j existing is at most. . .

ℓ=1 1 cℓ·m

(using the union bound over all possible path lengths.)

= 1

m

ℓ=1 1 cℓ = 1 m·(c−1) = 1 m So a path from i to j is rather unlikely to exist

slide-121
SLIDE 121

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What is the proof?

slide-122
SLIDE 122

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What is the proof? The proof is in the directors cut of the slides (see notes)

slide-123
SLIDE 123

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What is the proof? The proof is in the directors cut of the slides (see notes) Can we at least see the pictures?

slide-124
SLIDE 124

Paths in the cuckoo graph

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

table size is m

n keys

What is the proof? The proof is in the directors cut of the slides (see notes) Can we at least see the pictures?

The proof is by induction on the length ℓ: Base case: ℓ = 1.

i j

key x

k i j ℓ−1

Inductive step: Argue that each key has prob

2 m2 to

create an edge (i, j) Union bound over all n keys Pick a third point k to split the path very very unlikely very

unlikely

Union bound over all k then all keys

slide-125
SLIDE 125

Back to the start (again) (again)

A dynamic dictionary stores (key, value)-pairs and supports:

Universe U of u keys. Hash table T of size m n. Collisions are fixed by chaining

n arbitrary operations arrive online, one at a time.

add(key, value), lookup(key) (which returns value) and delete(key)

bucketing

We require that we can recover

any key from its bucket in O(s) time

where s is the number of keys in the bucket

If our construction has the property that, for any two keys x, y ∈ U (with x = y), the probability that x and y are in the same bucket is O

1

m

  • For any n operations, the expected run-time is O(1) per operation.

Locating the bucket containing a given key takes O(1) time

table size is m

n keys

slide-126
SLIDE 126

Don’t put all your eggs in one bucket

Hash table We say that two keys x, y are in the same bucket (conceptually)

x y z w

iff there is a path between h1(x) and h1(y) in the cuckoo graph. table size is m

n keys

slide-127
SLIDE 127

Don’t put all your eggs in one bucket

Hash table We say that two keys x, y are in the same bucket (conceptually) For two distinct keys x, y, the probability

  • ℓ=1

4 cℓ · m = 4 m ·

  • ℓ=1

1 cℓ = 4 m(c − 1) = O 1 m

  • x

y z

where c > 1 is a constant.

w

iff there is a path between h1(x) and h1(y) in the cuckoo graph. that they are in the same bucket is at most (another union bound over all possible path lengths.) table size is m

n keys

slide-128
SLIDE 128

Don’t put all your eggs in one bucket

Hash table We say that two keys x, y are in the same bucket (conceptually) For two distinct keys x, y, the probability

  • ℓ=1

4 cℓ · m = 4 m ·

  • ℓ=1

1 cℓ = 4 m(c − 1) = O 1 m

  • x

y z

where c > 1 is a constant.

w

iff there is a path between h1(x) and h1(y) in the cuckoo graph. that they are in the same bucket is at most (another union bound over all possible path lengths.) table size is m

n keys

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

slide-129
SLIDE 129

Don’t put all your eggs in one bucket

Hash table We say that two keys x, y are in the same bucket (conceptually) For two distinct keys x, y, the probability

  • ℓ=1

4 cℓ · m = 4 m ·

  • ℓ=1

1 cℓ = 4 m(c − 1) = O 1 m

  • x

y z

where c > 1 is a constant.

w

iff there is a path between h1(x) and h1(y) in the cuckoo graph. that they are in the same bucket is at most (another union bound over all possible path lengths.) table size is m

n keys

slide-130
SLIDE 130

Don’t put all your eggs in one bucket

Hash table We say that two keys x, y are in the same bucket (conceptually) For two distinct keys x, y, the probability

  • ℓ=1

4 cℓ · m = 4 m ·

  • ℓ=1

1 cℓ = 4 m(c − 1) = O 1 m

  • x

y z

where c > 1 is a constant. The time for an operation on x is bounded by

w

iff there is a path between h1(x) and h1(y) in the cuckoo graph. that they are in the same bucket is at most (another union bound over all possible path lengths.) the number of items in the bucket. (assuming there are no cycles.) table size is m

n keys

slide-131
SLIDE 131

Don’t put all your eggs in one bucket

Hash table We say that two keys x, y are in the same bucket (conceptually) For two distinct keys x, y, the probability

  • ℓ=1

4 cℓ · m = 4 m ·

  • ℓ=1

1 cℓ = 4 m(c − 1) = O 1 m

  • x

y z

where c > 1 is a constant. The time for an operation on x is bounded by

w

So we have that the expected time per operation is O(1) iff there is a path between h1(x) and h1(y) in the cuckoo graph. that they are in the same bucket is at most (another union bound over all possible path lengths.) the number of items in the bucket. (assuming there are no cycles.) (assuming that m 2cn and there are no cycles). table size is m

n keys

slide-132
SLIDE 132

Don’t put all your eggs in one bucket

Hash table We say that two keys x, y are in the same bucket (conceptually) For two distinct keys x, y, the probability

  • ℓ=1

4 cℓ · m = 4 m ·

  • ℓ=1

1 cℓ = 4 m(c − 1) = O 1 m

  • x

y z

where c > 1 is a constant. The time for an operation on x is bounded by

w

So we have that the expected time per operation is O(1) iff there is a path between h1(x) and h1(y) in the cuckoo graph. that they are in the same bucket is at most (another union bound over all possible path lengths.) the number of items in the bucket. (assuming there are no cycles.) Further, lookups take O(1) time in the worst case. (assuming that m 2cn and there are no cycles). table size is m

n keys

slide-133
SLIDE 133

Rehashing

The previous analysis on the expected running time holds when there are no cycles.

slide-134
SLIDE 134

Rehashing

The previous analysis on the expected running time holds when there are no cycles. However, we would expect there to be cycles every now and then, causing a rehash.

slide-135
SLIDE 135

Rehashing

The previous analysis on the expected running time holds when there are no cycles. However, we would expect there to be cycles every now and then, causing a rehash. How often does this happen? (sketch proof)

slide-136
SLIDE 136

Rehashing

The previous analysis on the expected running time holds when there are no cycles. However, we would expect there to be cycles every now and then, causing a rehash. How often does this happen? (sketch proof) Consider inserting n keys into the table. . .

slide-137
SLIDE 137

Rehashing

The previous analysis on the expected running time holds when there are no cycles. However, we would expect there to be cycles every now and then, causing a rehash. A cycle is a path from a vertex i back to itself. How often does this happen? (sketch proof) Consider inserting n keys into the table. . .

i

slide-138
SLIDE 138

Rehashing

The previous analysis on the expected running time holds when there are no cycles. However, we would expect there to be cycles every now and then, causing a rehash. A cycle is a path from a vertex i back to itself. How often does this happen? (sketch proof) Consider inserting n keys into the table. . . so use previous result with i = j.. . .

i

slide-139
SLIDE 139

Rehashing

The previous analysis on the expected running time holds when there are no cycles. However, we would expect there to be cycles every now and then, causing a rehash. A cycle is a path from a vertex i back to itself. How often does this happen? (sketch proof) Consider inserting n keys into the table. . . so use previous result with i = j.. . .

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

i

slide-140
SLIDE 140

Rehashing

The previous analysis on the expected running time holds when there are no cycles. However, we would expect there to be cycles every now and then, causing a rehash. A cycle is a path from a vertex i back to itself. How often does this happen? (sketch proof) Consider inserting n keys into the table. . . so use previous result with i = j.. . .

For any positions i and j, and any constant c > 1, if m 2cn then the probability that there exists a shortest path in the cuckoo graph from i to j with length ℓ 1, is at most

1 cℓ·m .

LEMMA

The probability that a position i is involved in a cycle is at most

  • ℓ=1

1 cℓ · m = 1 m(c − 1) .

(another union bound over all possible path lengths.)

i

slide-141
SLIDE 141

Rehashing

The probability that a position i is involved in a cycle is at most

  • ℓ=1

1 cℓ · m = 1 m(c − 1) .

(another union bound over all possible path lengths.)

slide-142
SLIDE 142

Rehashing

The probability that a position i is involved in a cycle is at most

  • ℓ=1

1 cℓ · m = 1 m(c − 1) .

The probability that there is at least one cycle is at most

m · 1 m(c − 1) = 1 c − 1 .

(another union bound over all possible path lengths.)

slide-143
SLIDE 143

Rehashing

The probability that a position i is involved in a cycle is at most

  • ℓ=1

1 cℓ · m = 1 m(c − 1) .

The probability that there is at least one cycle is at most

m · 1 m(c − 1) = 1 c − 1 .

(another union bound over all possible path lengths.) (union bound over all m positions in the table.)

slide-144
SLIDE 144

Rehashing

The probability that a position i is involved in a cycle is at most

  • ℓ=1

1 cℓ · m = 1 m(c − 1) .

The probability that there is at least one cycle is at most

m · 1 m(c − 1) = 1 c − 1 .

If we set c = 3, the probability is at most 1

2 that a cycle occurs

(another union bound over all possible path lengths.) (union bound over all m positions in the table.) (that there is a rehash) during the n insertions.

slide-145
SLIDE 145

Rehashing

The probability that a position i is involved in a cycle is at most

  • ℓ=1

1 cℓ · m = 1 m(c − 1) .

The probability that there is at least one cycle is at most

m · 1 m(c − 1) = 1 c − 1 .

If we set c = 3, the probability is at most 1

2 that a cycle occurs

The probability that there are two rehashes is 1

4 , and so on.

(another union bound over all possible path lengths.) (union bound over all m positions in the table.) (that there is a rehash) during the n insertions.

slide-146
SLIDE 146

Rehashing

The probability that a position i is involved in a cycle is at most

  • ℓ=1

1 cℓ · m = 1 m(c − 1) .

The probability that there is at least one cycle is at most

m · 1 m(c − 1) = 1 c − 1 .

If we set c = 3, the probability is at most 1

2 that a cycle occurs

The probability that there are two rehashes is 1

4 , and so on.

So the expected number of rehashes during n insertions is at most ∞

i=1

1

2

i = 1.

(another union bound over all possible path lengths.) (union bound over all m positions in the table.) (that there is a rehash) during the n insertions.

slide-147
SLIDE 147

Rehashing

If the expected time for one rehash is O(n) then the expected time for all rehashes is also O(n) (this is because we only expect there to be one rehash).

slide-148
SLIDE 148

Rehashing

If the expected time for one rehash is O(n) then Therefore the amortised expected time for the rehashes over the n insertions is the expected time for all rehashes is also O(n) (this is because we only expect there to be one rehash).

O(1) per insertion (i.e. divide the total cost with n).

slide-149
SLIDE 149

Rehashing

If the expected time for one rehash is O(n) then Therefore the amortised expected time for the rehashes over the n insertions is Why is the expected time per rehash O(n)? the expected time for all rehashes is also O(n) (this is because we only expect there to be one rehash).

O(1) per insertion (i.e. divide the total cost with n).

slide-150
SLIDE 150

Rehashing

If the expected time for one rehash is O(n) then Therefore the amortised expected time for the rehashes over the n insertions is Why is the expected time per rehash O(n)? First pick a new random h1 and h2 and construct the cuckoo graph the expected time for all rehashes is also O(n) (this is because we only expect there to be one rehash).

O(1) per insertion (i.e. divide the total cost with n).

using the at most n keys.

slide-151
SLIDE 151

Rehashing

If the expected time for one rehash is O(n) then Therefore the amortised expected time for the rehashes over the n insertions is Why is the expected time per rehash O(n)? First pick a new random h1 and h2 and construct the cuckoo graph the expected time for all rehashes is also O(n) (this is because we only expect there to be one rehash).

O(1) per insertion (i.e. divide the total cost with n).

using the at most n keys. Check for a cycle in the graph in O(n) time (and start again if you find one)

slide-152
SLIDE 152

Rehashing

If the expected time for one rehash is O(n) then Therefore the amortised expected time for the rehashes over the n insertions is Why is the expected time per rehash O(n)? First pick a new random h1 and h2 and construct the cuckoo graph the expected time for all rehashes is also O(n) (this is because we only expect there to be one rehash).

O(1) per insertion (i.e. divide the total cost with n).

using the at most n keys. Check for a cycle in the graph in O(n) time (and start again if you find one) (you can do this using breadth-first search)

slide-153
SLIDE 153

Rehashing

If the expected time for one rehash is O(n) then Therefore the amortised expected time for the rehashes over the n insertions is Why is the expected time per rehash O(n)? If there is no cycle, insert all the elements, First pick a new random h1 and h2 and construct the cuckoo graph the expected time for all rehashes is also O(n) (this is because we only expect there to be one rehash).

O(1) per insertion (i.e. divide the total cost with n).

using the at most n keys. Check for a cycle in the graph in O(n) time (and start again if you find one) (you can do this using breadth-first search) this takes O(n) time in expectation (as we have seen).

slide-154
SLIDE 154

A word about the assumptions

We have assumed true randomness. As we have discussed, this is not realistic.

slide-155
SLIDE 155

A word about the assumptions

We have assumed true randomness. As we have discussed, this is not realistic. We have seen that weakly universal hash families are realistic where any two keys x, y are independent

slide-156
SLIDE 156

A word about the assumptions

We have assumed true randomness. As we have discussed, this is not realistic. We have seen that weakly universal hash families are realistic where any two keys x, y are independent

A set H of hash functions is weakly universal if for any two distinct keys x, y ∈ U,

Pr

  • h(x) = h(y)
  • 1

m (where h is picked uniformly at random from H)

slide-157
SLIDE 157

A word about the assumptions

We have assumed true randomness. As we have discussed, this is not realistic. We have seen that weakly universal hash families are realistic We can define a stronger hash families with k-wise independence. here the hash values of any choice of k keys are independent. where any two keys x, y are independent

A set H of hash functions is weakly universal if for any two distinct keys x, y ∈ U,

Pr

  • h(x) = h(y)
  • 1

m (where h is picked uniformly at random from H)

slide-158
SLIDE 158

A word about the assumptions

We have assumed true randomness. As we have discussed, this is not realistic. We have seen that weakly universal hash families are realistic We can define a stronger hash families with k-wise independence. here the hash values of any choice of k keys are independent. where any two keys x, y are independent

A set H of hash functions is weakly universal if for any two distinct keys x, y ∈ U,

Pr

  • h(x) = h(y)
  • 1

m (where h is picked uniformly at random from H)

A set H of hash functions is k-wise independent if for any k distinct keys x1, x2 . . . xk ∈ U and k values v1, v2, . . . vk ∈ {0, 1, 2 . . . m − 1},

Pr  

i

h(xi) = vi   = 1 mk

(where h is picked uniformly at random from H)

slide-159
SLIDE 159

A word about the assumptions

We have assumed true randomness. As we have discussed, this is not realistic. We have seen that weakly universal hash families are realistic We can define a stronger hash families with k-wise independence. here the hash values of any choice of k keys are independent. where any two keys x, y are independent

slide-160
SLIDE 160

A word about the assumptions

We have assumed true randomness. As we have discussed, this is not realistic. We have seen that weakly universal hash families are realistic We can define a stronger hash families with k-wise independence. here the hash values of any choice of k keys are independent. where any two keys x, y are independent

slide-161
SLIDE 161

A word about the assumptions

We have assumed true randomness. As we have discussed, this is not realistic. We have seen that weakly universal hash families are realistic We can define a stronger hash families with k-wise independence. here the hash values of any choice of k keys are independent. where any two keys x, y are independent It is feasible to construct a (log n)-wise independent family of hash functions such that h(x) can be computed in O(1) time

slide-162
SLIDE 162

A word about the assumptions

We have assumed true randomness. As we have discussed, this is not realistic. We have seen that weakly universal hash families are realistic We can define a stronger hash families with k-wise independence. here the hash values of any choice of k keys are independent. where any two keys x, y are independent It is feasible to construct a (log n)-wise independent family of hash functions such that h(x) can be computed in O(1) time By changing the cuckoo hashing algorithm to perform a rehash after log n moves it can be shown (via a similar but harder proof) that the results still hold

slide-163
SLIDE 163

A word about the assumptions

We have assumed true randomness. As we have discussed, this is not realistic. We have seen that weakly universal hash families are realistic We can define a stronger hash families with k-wise independence. here the hash values of any choice of k keys are independent. where any two keys x, y are independent

THEOREM

In the Cuckoo hashing scheme:

  • Every lookup and every delete takes O(1) worst-case time,
  • The space is O(n) where n is the number of keys stored
  • An insert takes amortised expected O(1) time

It is feasible to construct a (log n)-wise independent family of hash functions such that h(x) can be computed in O(1) time By changing the cuckoo hashing algorithm to perform a rehash after log n moves it can be shown (via a similar but harder proof) that the results still hold