Strong Randomness Properties of (Hyper-)Graphs Generated by Simple - - PowerPoint PPT Presentation

strong randomness properties of hyper graphs generated by
SMART_READER_LITE
LIVE PREVIEW

Strong Randomness Properties of (Hyper-)Graphs Generated by Simple - - PowerPoint PPT Presentation

Strong Randomness Properties of (Hyper-)Graphs Generated by Simple Hash Functions Martin Aum uller Technische Universit at Ilmenau, Germany AofA15 Strobl, June 8, 2015 Joint work with Martin Dietzfelbinger and Philipp Woelfel. M.


slide-1
SLIDE 1

Strong Randomness Properties of (Hyper-)Graphs Generated by Simple Hash Functions

Martin Aum¨ uller

Technische Universit¨ at Ilmenau, Germany

AofA’15 Strobl, June 8, 2015 Joint work with Martin Dietzfelbinger and Philipp Woelfel.

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 1/17

slide-2
SLIDE 2

Example: Cuckoo Hashing (Pagh/Rodler, 2001/2004)

A hashing-based implementation of the dictionary data type. Setting: set S ⊆ U of n keys two tables T1[0..m − 1] and T2[0..m − 1], m ≥ (1 + ε)n two (hash) functions h1, h2 with hi : U → [m] Rules: each table cell can hold exactly one key a key x must be stored either in T1[h1(x)] or T2[h2(x)] (fast lookups and deletions!)

Definition

If S can be stored according to these rules, we call (h1, h2) suitable for S.

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 2/17

slide-3
SLIDE 3

Example: Cuckoo Hashing (Pagh/Rodler, 2001/2004)

A hashing-based implementation of the dictionary data type. Setting: set S ⊆ U of n keys two tables T1[0..m − 1] and T2[0..m − 1], m ≥ (1 + ε)n two (hash) functions h1, h2 with hi : U → [m] Rules: each table cell can hold exactly one key a key x must be stored either in T1[h1(x)] or T2[h2(x)] (fast lookups and deletions!)

Definition

If S can be stored according to these rules, we call (h1, h2) suitable for S.

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 2/17

slide-4
SLIDE 4

Improving Cuckoo Hashing: Stash

Original Analysis: (h1, h2) unsuitable with probability O(1/n). In fact: Θ(1/n) (Schellbach ’09, Drmota/Kutzelnigg ’12) (Kirsch/Mitzenmacher/Wieder ’08): Θ(1/n) is too large. Proposal: Can put up to s = O(1) keys into additional storage

Theorem (K/M/W ’08)

Let S ⊆ U with |S| = n. If (h1, h2) are fully random, then Pr((h1, h2) unsuitable for S with stash size s) = O(1/ns+1). Again: Θ(1/ns+1). (Kutzelnigg ’10)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 3/17

slide-5
SLIDE 5

Analysis of Cuckoo Hashing with a Stash

What is a criteria for (h1, h2) being unsuitable for stash size s? Tool: Cuckoo graph G(S, h1, h2) (Devroye/Morin ’03)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 4/17

slide-6
SLIDE 6

Analysis of Cuckoo Hashing with a Stash

What is a criteria for (h1, h2) being unsuitable for stash size s? Tool: Cuckoo graph G(S, h1, h2) (Devroye/Morin ’03)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 4/17

slide-7
SLIDE 7

Analysis of Cuckoo Hashing with a Stash

What is a criteria for (h1, h2) being unsuitable for stash size s? Tool: Cuckoo graph G(S, h1, h2) (Devroye/Morin ’03)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 4/17

slide-8
SLIDE 8

Analysis of Cuckoo Hashing with a Stash

What is a criteria for (h1, h2) being unsuitable for stash size s? Tool: Cuckoo graph G(S, h1, h2) (Devroye/Morin ’03) Excess (Janson et al. ’93): #edges - #vertices (Here: 3)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 4/17

slide-9
SLIDE 9

Analysis of Cuckoo Hashing with a Stash

What is a criteria for (h1, h2) being unsuitable for stash size s? Tool: Cuckoo graph G(S, h1, h2) (Devroye/Morin ’03) Excess (Janson et al. ’93): #edges - #vertices (Here: 3) 3 more keys than table cells ⇒ at least 3 keys must be put into stash

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 4/17

slide-10
SLIDE 10

Analysis of Cuckoo Hashing with a Stash

What is a criteria for (h1, h2) being unsuitable for stash size s? Tool: Cuckoo graph G(S, h1, h2) (Devroye/Morin ’03) Excess (Janson et al. ’93): #edges - #vertices (Here: 3) 3 more keys than table cells ⇒ at least 3 keys must be put into stash

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 4/17

slide-11
SLIDE 11

Analysis of Cuckoo Hashing with a Stash

What is a criteria for (h1, h2) being unsuitable for stash size s? Tool: Cuckoo graph G(S, h1, h2) (Devroye/Morin ’03) Excess (Janson et al. ’93): #edges - #vertices (Here: 3) 3 more keys than table cells ⇒ at least 3 keys must be put into stash

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 4/17

slide-12
SLIDE 12

Analysis of Cuckoo Hashing with a Stash

What is a criteria for (h1, h2) being unsuitable for stash size s? Tool: Cuckoo graph G(S, h1, h2) (Devroye/Morin ’03) Excess (Janson et al. ’93): #edges - #vertices (Here: 3) 3 more keys than table cells ⇒ at least 3 keys must be put into stash

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 4/17

slide-13
SLIDE 13

Analysis of Cuckoo Hashing with a Stash

What is a criteria for (h1, h2) being unsuitable for stash size s? Tool: Cuckoo graph G(S, h1, h2) (Devroye/Morin ’03) Excess (Janson et al. ’93): #edges - #vertices (Here: 3) 3 more keys than table cells ⇒ at least 3 keys must be put into stash Minimal “bad subgraph”: a MOSs. (Example: s = 2.)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 4/17

slide-14
SLIDE 14

Analysis of Cuckoo Hashing with a Stash

What is a criteria for (h1, h2) being unsuitable for stash size s? Tool: Cuckoo graph G(S, h1, h2) (Devroye/Morin ’03)

Theorem (K/M/W ’08)

Let (V ′, E ′) consists of all connected components of G(S, h1, h2) having more than one cycle. Then Stash size = |E ′| − |V ′|.

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 4/17

slide-15
SLIDE 15

The Quest

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 5/17

slide-16
SLIDE 16

The Quest

Analysis well understood when hash functions are fully random.

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 5/17

slide-17
SLIDE 17

The Quest

Analysis well understood when hash functions are fully random. Replace fully random hash functions by an explicit, efficient construction of hash functions.

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 5/17

slide-18
SLIDE 18

The Quest

Analysis well understood when hash functions are fully random. Replace fully random hash functions by an explicit, efficient construction of hash functions. “Simple hash functions that work in as many applications as possible”

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 5/17

slide-19
SLIDE 19

The Quest

Analysis well understood when hash functions are fully random. Replace fully random hash functions by an explicit, efficient construction of hash functions. “Simple hash functions that work in as many applications as possible” Other recent approaches, e. g., Thorup/Pˇ atra¸ scu ’11, Reingold/Rothblum/Wieder ’14

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 5/17

slide-20
SLIDE 20

The Quest

Analysis well understood when hash functions are fully random. Replace fully random hash functions by an explicit, efficient construction of hash functions. “Simple hash functions that work in as many applications as possible” Other recent approaches, e. g., Thorup/Pˇ atra¸ scu ’11, Reingold/Rothblum/Wieder ’14 Focus on hashing-based algorithms and data structures that allow good enough bounds via first-moment method (C.H. [stash], generalized C.H., load balancing, ...)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 5/17

slide-21
SLIDE 21

The Quest

Analysis well understood when hash functions are fully random. Replace fully random hash functions by an explicit, efficient construction of hash functions. “Simple hash functions that work in as many applications as possible” Other recent approaches, e. g., Thorup/Pˇ atra¸ scu ’11, Reingold/Rothblum/Wieder ’14 Focus on hashing-based algorithms and data structures that allow good enough bounds via first-moment method (C.H. [stash], generalized C.H., load balancing, ...) Generic approach?

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 5/17

slide-22
SLIDE 22

Key Ingredient: Linear Functions

h(x) = ((a · x + b) mod p) mod m, where p ≥ |U| is a prime, and a and b are chosen uniformly at random from {0, . . . , p − 1}. → very simple structure! (Remark: This function is 2-wise independent, i. e., for any pair x, y ∈ U, x = y, h(x) and h(y) are fully random.)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 6/17

slide-23
SLIDE 23

The Hash Class (Version for this Talk)

For given c, n ≥ 1, we combine linear functions with lookups in tables of size √n filled with random values. hi(x) = fi(x) ⊕

c

  • j=1

z(i)

j [ gj(x) ],

i = 1, 2 Class of all these pairs (h1, h2) of hash functions: Z. (Extension of hash functions from (Dietzfelbinger/Woelfel ’03))

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 7/17

slide-24
SLIDE 24

Example: Cuckoo Hashing with a Stash

Main Task

For given S and stash size s, calculate Pr((h1, h2) unsuitable for S with stash size s). Minimal bad subgraph: MOSs. (Example: s = 2.)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 8/17

slide-25
SLIDE 25

Thus, we have Pr

(h1,h2)∈Z((h1, h2) unsuitable for S with stash size s)

= Pr

(h1,h2)∈Z(∃T ⊆ S : G(T, h1, h2) forms a MOSs)

  • T⊆S

Pr

(h1,h2)∈Z(G(T, h1, h2) forms a MOSs)

if (h1, h2) are fully random, we provide a direct counting argument that this is O(1/ns+1) giving an alternative proof to the original analysis by Kirsch, Mitzenmacher and Wieder (who used machinery like Markov chain coupling)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 9/17

slide-26
SLIDE 26

Behavior of the Hash Class on Fixed T ⊆ S

Recall: hi(x) = fi(x) ⊕

c

  • j=1

z(i)

j [ gj(x) ],

i = 1, 2

Central Observation

Let T ⊆ S. If there is a gj such that at most one pair of keys in T collides under gj (i. e., gj(x) = gj(y)), then h1, h2 are fully random on T. if this is the case: (h1, h2) T-good.

  • therwise (each gj has more than one colliding pair of keys): (h1, h2)

is T-bad.

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 10/17

slide-27
SLIDE 27

Collecting “Harmful” Hash Functions

We split our set of hash functions into “harmful” and “harmless” ones. (h1, h2) are harmful, if there exists T ⊆ S s.t. G(T, h1, h2) forms a MOSs, and (h1, h2) is T-bad. BMOSs := the set of all the harmful pairs (h1, h2). (An event in our probability space!)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 11/17

slide-28
SLIDE 28

Splitting the Calculation

We calculate: Pr(NMOSs

S

> 0) ≤ Pr(NMOSs

S

> 0 ∩ ¬BMOSs) + Pr(BMOSs)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 12/17

slide-29
SLIDE 29

Splitting the Calculation

We calculate: Pr(NMOSs

S

> 0) ≤ Pr(NMOSs

S

> 0 ∩ ¬BMOSs) + Pr(BMOSs) for this summand, we have Pr(NMOSs

S

> 0 ∩ ¬BMOSs) ≤ E∗ NMOSs

S

  • ,

which is O(1/ns+1).

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 12/17

slide-30
SLIDE 30

Splitting the Calculation

We calculate: Pr(NMOSs

S

> 0) ≤ O(1/ns+1) + Pr(BMOSs) for this summand, we have Pr(NMOSs

S

> 0 ∩ ¬BMOSs) ≤ E∗ NMOSs

S

  • ,

which is O(1/ns+1).

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 12/17

slide-31
SLIDE 31

Splitting the Calculation

We calculate: Pr(NMOSs

S

> 0) ≤ O(1/ns+1) + Pr(BMOSs) for this summand, we have Pr(NMOSs

S

> 0 ∩ ¬BMOSs) ≤ E∗ NMOSs

S

  • ,

which is O(1/ns+1). The hard part: Calculating/bounding

Pr(BMOSs) = Pr(∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩(h1, h2) are T-bad )

Wish: Use full randomness nonetheless Idea: Find a suitable event that contains BMOSs

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 12/17

slide-32
SLIDE 32

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ”. #collisions g1 5 g2 7 g3 4

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-33
SLIDE 33

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ”. #collisions g1 5 g2 7 g3 4 Recall: hi(x) = fi(x) ⊕ c

j=1 z(i) j [gj(x)]

T-bad ⇔ each gj has more than one pair of colliding keys

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-34
SLIDE 34

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ”. #collisions g1 5 g2 7 g3 4 12 12 8 7 4 7 8 5 4 5

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-35
SLIDE 35

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ”. #collisions g1 4 g2 5 g3 4 12 8 7 4 7 8 5 4 5

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-36
SLIDE 36

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ”. #collisions g1 4 g2 5 g3 4 12 8 7 4 7 8 5 4 5

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-37
SLIDE 37

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ”. #collisions g1 4 g2 5 g3 4 12 8 7 4 7 8 5 4 5

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-38
SLIDE 38

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ”. #collisions g1 4 g2 4 g3 3 12 8 7 4 7 8 5 4 5

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-39
SLIDE 39

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ”. #collisions g1 3 g2 4 g3 3 12 7 4 7 8 5 4 5

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-40
SLIDE 40

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ”. #collisions g1 3 g2 4 g3 3 12 7 4 7 8 5 4 5

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-41
SLIDE 41

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ”. #collisions g1 2 g2 3 g3 3 12 4 7 8 5 4 5

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-42
SLIDE 42

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ”. #collisions g1 2 g2 3 g3 3 12 4 7 8 5 4 5

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-43
SLIDE 43

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ”. #collisions g1 2 g2 3 g3 3 12 4 7 8 5 4 5

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-44
SLIDE 44

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ” #collisions g1 2 g2 3 g3 3 12 4 7 8 5 4 5

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-45
SLIDE 45

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ” #collisions g1 1 g2 3 g3 3 12 4 7 8 5 4

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-46
SLIDE 46

Peeling of Bad Graphs (Simplified)

Assume “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ” #collisions g1 1 g2 3 g3 3 12 4 7 8 5 4 T-good ⇔ exists gj with at most one pair of colliding keys

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-47
SLIDE 47

Peeling of Bad Graphs (Simplified)

If “∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) are T-bad ” #collisions g1 1 g2 3 g3 3 12 4 7 8 5 4 then “∃T ′⊆S : G(T ′, h1, h2) forms “peeled graph” ∩ (h1, h2) are T ′-good”

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 13/17

slide-48
SLIDE 48

Result of Peeling

Pr(∃T ⊆ S : G(T, h1, h2) forms a MOSs ∩ (h1, h2) T-bad ) ≤ Pr(∃T ′ ⊆ S : G(T ′, h1, h2) is peeling result ∩ (h1, h2) T ′-good) can again use first-moment approach resulting graphs are sparser → they are more likely to occur use: when process stops each gj, 1 ≤ j ≤ c, has a colliding pair of keys probability boost of ≈ (1/√n)c probability of BMOSs is O(n/√nc), which is O(1/ns+1) for c = Θ(s) Some applications need an additional “reduction step”. (Preserve collisions, make graphs smaller.)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 14/17

slide-49
SLIDE 49

Result

Graph property of interest: A, via first-moment approach E∗(#subgraphs with property A) = O

  • n−α

. Assume there exists peelable graph property B ⊇ A with

n

  • t=2

tO(1)E∗(#subgraphs with property B with t edges) = O

. Trick: B can be quite general, e. g., “leafless”.

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 15/17

slide-50
SLIDE 50

Result

Graph property of interest: A, via first-moment approach E∗(#subgraphs with property A) = O

  • n−α

. Assume there exists peelable graph property B ⊇ A with

n

  • t=2

tO(1)E∗(#subgraphs with property B with t edges) = O

. Trick: B can be quite general, e. g., “leafless”. Using c ≥ 2(α + β) g-functions and tables gives Pr

(h1,h2)∈Z (Graph contains subgraph with property A) = O(n−α).

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 15/17

slide-51
SLIDE 51

Examples

Graphs:

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 16/17

slide-52
SLIDE 52

Examples

Graphs: Cuckoo hashing (with a stash) Applications which need that largest component is O(log n) w.h.p. Simulation of a uniform hash function (Pagh/Pagh ’03) Constructing a perfect hash function (Bothelo/Pagh/Ziviani ’13)

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 16/17

slide-53
SLIDE 53

Examples

Graphs: Cuckoo hashing (with a stash) Applications which need that largest component is O(log n) w.h.p. Simulation of a uniform hash function (Pagh/Pagh ’03) Constructing a perfect hash function (Bothelo/Pagh/Ziviani ’13) Hypergraphs:

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 16/17

slide-54
SLIDE 54

Examples

Graphs: Cuckoo hashing (with a stash) Applications which need that largest component is O(log n) w.h.p. Simulation of a uniform hash function (Pagh/Pagh ’03) Constructing a perfect hash function (Bothelo/Pagh/Ziviani ’13) Hypergraphs: Parallel/Sequential Load Balancing: basically match bounds from fully random case (Schickinger/Steger ’00). Generalized cuckoo hashing (≥ 3 hash functions, ℓ ≥ 2 keys per cell): Admits first-moment approach, but could not find suitable peelable graph property in the hypergraph setting to prove table loads → 1.

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 16/17

slide-55
SLIDE 55

Conclusion

We have seen: a class of hash functions that behaves “well” in different applications in first-moment type analyses: Can use full randomness, no properties

  • f hash class exposed

Open: better bounds for some applications? bounds beyond first moment?

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 17/17

slide-56
SLIDE 56

Conclusion

We have seen: a class of hash functions that behaves “well” in different applications in first-moment type analyses: Can use full randomness, no properties

  • f hash class exposed

Open: better bounds for some applications? bounds beyond first moment?

Thank you!

  • M. Aum¨

uller Graphs Generated by Simple Hash Functions 17/17