Space Efficient Hash Tables with Worst Case Constant Access Time - - PowerPoint PPT Presentation

space efficient hash tables with worst case constant
SMART_READER_LITE
LIVE PREVIEW

Space Efficient Hash Tables with Worst Case Constant Access Time - - PowerPoint PPT Presentation

Fotakis/Pagh/Sanders/Spirakis: d -ary Cuckoo Hashing 1 INFORMATIK Space Efficient Hash Tables with Worst Case Constant Access Time Dimitris Fotakis and Peter Sanders (MPII) Rasmus Pagh (IT U. Copenhagen) Paul


slide-1
SLIDE 1

Fotakis/Pagh/Sanders/Spirakis: d-ary Cuckoo Hashing

✞ ☎ ✁ ✞ ☎ ❝

INFORMATIK

1

Space Efficient Hash Tables with Worst Case Constant Access Time

Dimitris Fotakis and Peter Sanders (MPII) Rasmus Pagh (IT U. Copenhagen) Paul Spirakis (CTI)

slide-2
SLIDE 2

Fotakis/Pagh/Sanders/Spirakis: d-ary Cuckoo Hashing

✞ ☎ ✁ ✞ ☎ ❝

INFORMATIK

2

Overview

✷ The Problem and Related Work ✷ Cuckoo Hashing ✷ d-ary Cuckoo Hashing ✷ Analysis ✷ Relation to Bipartite Matching ✷ Filter Hashing ✷ Discussion

slide-3
SLIDE 3

Fotakis/Pagh/Sanders/Spirakis: d-ary Cuckoo Hashing

✞ ☎ ✁ ✞ ☎ ❝

INFORMATIK

3

The Problem

Represent a set of n elements (with associated information) using space (1 + ǫ)n. Support operations insert, delete, lookup, (doall) efficiently. Assume a truly random hash function h

slide-4
SLIDE 4

Fotakis/Pagh/Sanders/Spirakis: d-ary Cuckoo Hashing

✞ ☎ ✁ ✞ ☎ ❝

INFORMATIK

4

Related Work

Uniform hashing: Expected time ≈ 1

ǫ

h3 h1 h2

Dynamic Perfect Hashing,

[Dietzfelbinger et al. 94]

Worst case constant time for lookup but ǫ is not small. Approaching the Information Theoretic Lower Bound:

[Brodnik Munro 99,Raman Rao 02]

Space (1 + o(1))×lower bound without associated information

[Pagh 01] static case.

slide-5
SLIDE 5

Fotakis/Pagh/Sanders/Spirakis: d-ary Cuckoo Hashing

✞ ☎ ✁ ✞ ☎ ❝

INFORMATIK

5

Cuckoo Hashing

[Pagh Rodler 01]

Table of size (2 + ǫ)n. Two choices for each element. Insert moves elements; rebuild if necessary. Very fast lookup and delete. Expected constant insertion time.

h1 h2

slide-6
SLIDE 6

Fotakis/Pagh/Sanders/Spirakis: d-ary Cuckoo Hashing

✞ ☎ ✁ ✞ ☎ ❝

INFORMATIK

6

d-ary Cuckoo Hashing

d choices for each element.

Worst case d probes for delete and lookup. Task: maintain L-perfect matching in the bipartite graph

(L = Elements, R = Cells, E = Choices),

e.g., insert by BFS.

h1 h2 h3

slide-7
SLIDE 7

Fotakis/Pagh/Sanders/Spirakis: d-ary Cuckoo Hashing

✞ ☎ ✁ ✞ ☎ ❝

INFORMATIK

7

Experiments

1 2 3 4 5 0.2 0.4 0.6 0.8 1 ε * #probes for insert space utilization d=2 d=3 d=4 d=5

slide-8
SLIDE 8

Fotakis/Pagh/Sanders/Spirakis: d-ary Cuckoo Hashing

✞ ☎ ✁ ✞ ☎ ❝

INFORMATIK

8

Tradeoff: Space ↔ Lookup/Deletion Time

Lookup and Delete: d = O

  • log 1

ǫ

  • probes

Proof Outline: the bipartite graph (L, R, E) has an L-perfect matching

⇔ Hall’s Theorem ∃M ⊆ L : |neighbors(M)| < |M| . . . Chernoff bounds . . .

true whp if d ≥ 2(1 + ǫ) ln( e

ǫ)

h1 h2 h3

slide-9
SLIDE 9

Fotakis/Pagh/Sanders/Spirakis: d-ary Cuckoo Hashing

✞ ☎ ✁ ✞ ☎ ❝

INFORMATIK

9

Tradeoff: Space ↔ Insertion time

Insert:

1 ǫ O(log log(1/ǫ))

, (experiments) −

→ O(1/ǫ)?

Expansion property: half the nodes within

O(log(1/ǫ)) from a free node

Shrinking property: number of far-away nodes shrinks geometrically with distance

⇒ short average augmenting path length

slide-10
SLIDE 10

Fotakis/Pagh/Sanders/Spirakis: d-ary Cuckoo Hashing

✞ ☎ ✁ ✞ ☎ ❝

INFORMATIK

10

Average Case Analysis of Bipartite Matching

[Motwani 94]: A bipartite graph (L, R, E) with |L| = |R| and

|E| > n ln n random edges

has a perfect matching whp. Time O(|E| log |L| / log log |L|) Here: slight assymmetry, very sparse, linear time

slide-11
SLIDE 11

Fotakis/Pagh/Sanders/Spirakis: d-ary Cuckoo Hashing

✞ ☎ ✁ ✞ ☎ ❝

INFORMATIK

11

Filter Hashing

✷ O

  • log2 1

ǫ

  • layers

✷ shrinking geometrically ✷ perfect hashing for the overflow table ✷ realistic hash functions

slide-12
SLIDE 12

Fotakis/Pagh/Sanders/Spirakis: d-ary Cuckoo Hashing

✞ ☎ ✁ ✞ ☎ ❝

INFORMATIK

12

Discussion

h1 h2 h3

d-ary Cuckoo: fast, practical, very space efficient

Open Question

✷ “real” hash functions ✷ Tighten insertion time ✷ average case lookup time ✷ average case max cardinality bipartite matching for sparse

symmetric graphs