CS 473: Algorithms Ruta Mehta University of Illinois, - - PowerPoint PPT Presentation

cs 473 algorithms
SMART_READER_LITE
LIVE PREVIEW

CS 473: Algorithms Ruta Mehta University of Illinois, - - PowerPoint PPT Presentation

CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 29 CS 473: Algorithms, Spring 2018 Fingerprinting Lecture 11 Feb 20, 2018 Most slides are courtesy Prof. Chekuri


slide-1
SLIDE 1

CS 473: Algorithms

Ruta Mehta

University of Illinois, Urbana-Champaign

Spring 2018

Ruta (UIUC) CS473 1 Spring 2018 1 / 29

slide-2
SLIDE 2

CS 473: Algorithms, Spring 2018

Fingerprinting

Lecture 11

Feb 20, 2018

Most slides are courtesy Prof. Chekuri

Ruta (UIUC) CS473 2 Spring 2018 2 / 29

slide-3
SLIDE 3

Fingerprinting

Source: Wikipedia

Process of mapping a large data item to a much shorter bit string, called its fingerprint. Fingerprints uniquely identifies data “for all practical purposes”.

Ruta (UIUC) CS473 3 Spring 2018 3 / 29

slide-4
SLIDE 4

Fingerprinting

Source: Wikipedia

Process of mapping a large data item to a much shorter bit string, called its fingerprint. Fingerprints uniquely identifies data “for all practical purposes”. Typically used to avoid comparison and transmission of bulky data. Eg: Web browser can store/fetch file fingerprints to check if it is changed.

Ruta (UIUC) CS473 3 Spring 2018 3 / 29

slide-5
SLIDE 5

Fingerprinting

Source: Wikipedia

Process of mapping a large data item to a much shorter bit string, called its fingerprint. Fingerprints uniquely identifies data “for all practical purposes”. Typically used to avoid comparison and transmission of bulky data. Eg: Web browser can store/fetch file fingerprints to check if it is changed. As you may have guessed, fingerprint functions are hash functions.

Ruta (UIUC) CS473 3 Spring 2018 3 / 29

slide-6
SLIDE 6

Bloom Filters

Hashing:

1

To insert x in dictionary store x in table in location h(x)

2

To lookup y in dictionary check contents of location h(y)

Ruta (UIUC) CS473 4 Spring 2018 4 / 29

slide-7
SLIDE 7

Bloom Filters

Hashing:

1

To insert x in dictionary store x in table in location h(x)

2

To lookup y in dictionary check contents of location h(y) Bloom Filter: tradeoff space for false positives

1

What if elements (x) are unwieldy objects such a long strings, images, etc with non-uniform sizes.

2

To insert x in dictionary, set bit at location h(x) to 1 (initially all bits are set to 0)

3

To lookup y if bit in location h(y) is 1 say yes, else no.

Ruta (UIUC) CS473 4 Spring 2018 4 / 29

slide-8
SLIDE 8

Bloom Filters

Bloom Filter: tradeoff space for false positives Reducing false positives:

1

Pick k hash functions h1, h2, . . . , hk independently

2

Insert x: for 1 ≤ i ≤ k set bit in location hi(x) in table i to 1

Ruta (UIUC) CS473 5 Spring 2018 5 / 29

slide-9
SLIDE 9

Bloom Filters

Bloom Filter: tradeoff space for false positives Reducing false positives:

1

Pick k hash functions h1, h2, . . . , hk independently

2

Insert x: for 1 ≤ i ≤ k set bit in location hi(x) in table i to 1

3

Lookup y: compute hi(y) for 1 ≤ i ≤ k and say yes only if each bit in the corresponding location is 1, otherwise say no. If probability of false positive for one hash function is α < 1 then with k independent hash function it is

Ruta (UIUC) CS473 5 Spring 2018 5 / 29

slide-10
SLIDE 10

Bloom Filters

Bloom Filter: tradeoff space for false positives Reducing false positives:

1

Pick k hash functions h1, h2, . . . , hk independently

2

Insert x: for 1 ≤ i ≤ k set bit in location hi(x) in table i to 1

3

Lookup y: compute hi(y) for 1 ≤ i ≤ k and say yes only if each bit in the corresponding location is 1, otherwise say no. If probability of false positive for one hash function is α < 1 then with k independent hash function it is αk.

Ruta (UIUC) CS473 5 Spring 2018 5 / 29

slide-11
SLIDE 11

Outline

Use of hash functions for designing fast algorithms

Problem

Given a text T of length m and pattern P of length n, m ≫ n, find all occurrences of P in T.

Ruta (UIUC) CS473 6 Spring 2018 6 / 29

slide-12
SLIDE 12

Outline

Use of hash functions for designing fast algorithms

Problem

Given a text T of length m and pattern P of length n, m ≫ n, find all occurrences of P in T.

Karp-Rabin Randomized Algorithm

Ruta (UIUC) CS473 6 Spring 2018 6 / 29

slide-13
SLIDE 13

Outline

Use of hash functions for designing fast algorithms

Problem

Given a text T of length m and pattern P of length n, m ≫ n, find all occurrences of P in T.

Karp-Rabin Randomized Algorithm

It involves: Sampling a prime String equality via mod p arithmetic Rabin’s fingerprinting scheme – rolling hash Karp-Rabin pattern matching algorithm: O(m + n) time.

Ruta (UIUC) CS473 6 Spring 2018 6 / 29

slide-14
SLIDE 14

Part I Sampling a Prime

Ruta (UIUC) CS473 7 Spring 2018 7 / 29

slide-15
SLIDE 15

Sampling a prime

Problem

Given an integer x > 0, sample a prime uniformly at random from all the primes between 1 and x.

Ruta (UIUC) CS473 8 Spring 2018 8 / 29

slide-16
SLIDE 16

Sampling a prime

Problem

Given an integer x > 0, sample a prime uniformly at random from all the primes between 1 and x.

Procedure

1

Sample a number p uniformly at random from {1, . . . , x}.

2

If p is a prime, then output p. Else go to Step (1).

Ruta (UIUC) CS473 8 Spring 2018 8 / 29

slide-17
SLIDE 17

Sampling a prime

Problem

Given an integer x > 0, sample a prime uniformly at random from all the primes between 1 and x.

Procedure

1

Sample a number p uniformly at random from {1, . . . , x}.

2

If p is a prime, then output p. Else go to Step (1).

Checking if p is prime

Agrawal-Kayal-Saxena primality test: deterministic but slow Miller-Rabin randomized primality test: fast but randomized

  • utputs ‘prime’ when it is not with very low probability.

Ruta (UIUC) CS473 8 Spring 2018 8 / 29

slide-18
SLIDE 18

Sampling a Prime: Analysis

Is the returned prime sampled uniformly at random?

Ruta (UIUC) CS473 9 Spring 2018 9 / 29

slide-19
SLIDE 19

Sampling a Prime: Analysis

Is the returned prime sampled uniformly at random? π(x) : number of primes in {1, . . . , x},

Lemma

For a fixed prime p∗ ≤ x, Pr[algorithm outputs p∗] = 1/π(x).

Ruta (UIUC) CS473 9 Spring 2018 9 / 29

slide-20
SLIDE 20

Sampling a Prime: Analysis

Is the returned prime sampled uniformly at random? π(x) : number of primes in {1, . . . , x},

Lemma

For a fixed prime p∗ ≤ x, Pr[algorithm outputs p∗] = 1/π(x).

Proof.

Event A : a prime is picked in a round. Pr[A] =

Ruta (UIUC) CS473 9 Spring 2018 9 / 29

slide-21
SLIDE 21

Sampling a Prime: Analysis

Is the returned prime sampled uniformly at random? π(x) : number of primes in {1, . . . , x},

Lemma

For a fixed prime p∗ ≤ x, Pr[algorithm outputs p∗] = 1/π(x).

Proof.

Event A : a prime is picked in a round. Pr[A] = π(x)/x.

Ruta (UIUC) CS473 9 Spring 2018 9 / 29

slide-22
SLIDE 22

Sampling a Prime: Analysis

Is the returned prime sampled uniformly at random? π(x) : number of primes in {1, . . . , x},

Lemma

For a fixed prime p∗ ≤ x, Pr[algorithm outputs p∗] = 1/π(x).

Proof.

Event A : a prime is picked in a round. Pr[A] = π(x)/x. Event B : number (prime) p∗ is picked. Pr[B] =

Ruta (UIUC) CS473 9 Spring 2018 9 / 29

slide-23
SLIDE 23

Sampling a Prime: Analysis

Is the returned prime sampled uniformly at random? π(x) : number of primes in {1, . . . , x},

Lemma

For a fixed prime p∗ ≤ x, Pr[algorithm outputs p∗] = 1/π(x).

Proof.

Event A : a prime is picked in a round. Pr[A] = π(x)/x. Event B : number (prime) p∗ is picked. Pr[B] = 1/x.

Ruta (UIUC) CS473 9 Spring 2018 9 / 29

slide-24
SLIDE 24

Sampling a Prime: Analysis

Is the returned prime sampled uniformly at random? π(x) : number of primes in {1, . . . , x},

Lemma

For a fixed prime p∗ ≤ x, Pr[algorithm outputs p∗] = 1/π(x).

Proof.

Event A : a prime is picked in a round. Pr[A] = π(x)/x. Event B : number (prime) p∗ is picked. Pr[B] = 1/x. Pr[A ∩ B] =

Ruta (UIUC) CS473 9 Spring 2018 9 / 29

slide-25
SLIDE 25

Sampling a Prime: Analysis

Is the returned prime sampled uniformly at random? π(x) : number of primes in {1, . . . , x},

Lemma

For a fixed prime p∗ ≤ x, Pr[algorithm outputs p∗] = 1/π(x).

Proof.

Event A : a prime is picked in a round. Pr[A] = π(x)/x. Event B : number (prime) p∗ is picked. Pr[B] = 1/x. Pr[A ∩ B] =Pr[B] = 1/x. Why?

Ruta (UIUC) CS473 9 Spring 2018 9 / 29

slide-26
SLIDE 26

Sampling a Prime: Analysis

Is the returned prime sampled uniformly at random? π(x) : number of primes in {1, . . . , x},

Lemma

For a fixed prime p∗ ≤ x, Pr[algorithm outputs p∗] = 1/π(x).

Proof.

Event A : a prime is picked in a round. Pr[A] = π(x)/x. Event B : number (prime) p∗ is picked. Pr[B] = 1/x. Pr[A ∩ B] =Pr[B] = 1/x. Why? Because B ⊂ A.

Ruta (UIUC) CS473 9 Spring 2018 9 / 29

slide-27
SLIDE 27

Sampling a Prime: Analysis

Is the returned prime sampled uniformly at random? π(x) : number of primes in {1, . . . , x},

Lemma

For a fixed prime p∗ ≤ x, Pr[algorithm outputs p∗] = 1/π(x).

Proof.

Event A : a prime is picked in a round. Pr[A] = π(x)/x. Event B : number (prime) p∗ is picked. Pr[B] = 1/x. Pr[A ∩ B] =Pr[B] = 1/x. Why? Because B ⊂ A. Pr[B|A] =

Ruta (UIUC) CS473 9 Spring 2018 9 / 29

slide-28
SLIDE 28

Sampling a Prime: Analysis

Is the returned prime sampled uniformly at random? π(x) : number of primes in {1, . . . , x},

Lemma

For a fixed prime p∗ ≤ x, Pr[algorithm outputs p∗] = 1/π(x).

Proof.

Event A : a prime is picked in a round. Pr[A] = π(x)/x. Event B : number (prime) p∗ is picked. Pr[B] = 1/x. Pr[A ∩ B] =Pr[B] = 1/x. Why? Because B ⊂ A. Pr[B|A] = Pr[A ∩ B] Pr[A] = Pr[B] Pr[A] = 1/x π(x)/x = 1 π(x)

Ruta (UIUC) CS473 9 Spring 2018 9 / 29

slide-29
SLIDE 29

Sampling a prime: Expected number of samples

Procedure

1

Sample a number p uniformly at random from {1, . . . , x}.

2

If p is a prime, then output p. Else go to Step (1).

Running time in expectation

Q: How many samples in expectation before termination? A: x/π(x). Exercise.

Ruta (UIUC) CS473 10 Spring 2018 10 / 29

slide-30
SLIDE 30

How many primes between 0 and x

π(x) : Number of primes between 0 and x.

  • J. Hadamard and C. J. de la Vall´

ee-Poussin (1896)

Prime Number Theorem: limx→∞

π(x) x/ ln x = 1

Ruta (UIUC) CS473 11 Spring 2018 11 / 29

slide-31
SLIDE 31

How many primes between 0 and x

π(x) : Number of primes between 0 and x.

  • J. Hadamard and C. J. de la Vall´

ee-Poussin (1896)

Prime Number Theorem: limx→∞

π(x) x/ ln x = 1

Chebyshev (from 1848)

π(x) ≥ 7 8 x ln x = (1.262..) x lg x > x lg x

Ruta (UIUC) CS473 11 Spring 2018 11 / 29

slide-32
SLIDE 32

How many primes between 0 and x

π(x) : Number of primes between 0 and x.

  • J. Hadamard and C. J. de la Vall´

ee-Poussin (1896)

Prime Number Theorem: limx→∞

π(x) x/ ln x = 1

Chebyshev (from 1848)

π(x) ≥ 7 8 x ln x = (1.262..) x lg x > x lg x y ∼ {1, . . . , x} u.a.r., then y is a prime w.p.

π(x) x

>

1 lg x .

Ruta (UIUC) CS473 11 Spring 2018 11 / 29

slide-33
SLIDE 33

How many primes between 0 and x

π(x) : Number of primes between 0 and x.

  • J. Hadamard and C. J. de la Vall´

ee-Poussin (1896)

Prime Number Theorem: limx→∞

π(x) x/ ln x = 1

Chebyshev (from 1848)

π(x) ≥ 7 8 x ln x = (1.262..) x lg x > x lg x y ∼ {1, . . . , x} u.a.r., then y is a prime w.p.

π(x) x

>

1 lg x .

If we want k ≥ 4 primes then x ≥ 2k lg k suffices. π(x) ≥ π(2k lg k) = 2k lg k lg 2 + lg k + lg lg k ≥ k(2 lg k) 2 lg k = k

Ruta (UIUC) CS473 11 Spring 2018 11 / 29

slide-34
SLIDE 34

Part II String Equality

Ruta (UIUC) CS473 12 Spring 2018 12 / 29

slide-35
SLIDE 35

String Equality

Problem

Alice, the captain of a Mars lander, receives an N-bit string x, and Bob, back at mission control, receives a string y. They know nothing about each others strings, but want to check if x = y.

Ruta (UIUC) CS473 13 Spring 2018 13 / 29

slide-36
SLIDE 36

String Equality

Problem

Alice, the captain of a Mars lander, receives an N-bit string x, and Bob, back at mission control, receives a string y. They know nothing about each others strings, but want to check if x = y. Alice sends Bob x, and Bob confirms if x = y. But sending N bits is costly! Can they share less communication and check equality?

Ruta (UIUC) CS473 13 Spring 2018 13 / 29

slide-37
SLIDE 37

String Equality

Problem

Alice, the captain of a Mars lander, receives an N-bit string x, and Bob, back at mission control, receives a string y. They know nothing about each others strings, but want to check if x = y. Alice sends Bob x, and Bob confirms if x = y. But sending N bits is costly! Can they share less communication and check equality?

Possibilities:

If want 100% surety then NO. If OK with 99.99% surety then O(lg N) may suffice!!!

Ruta (UIUC) CS473 13 Spring 2018 13 / 29

slide-38
SLIDE 38

String Equality

Problem

Alice, the captain of a Mars lander, receives an N-bit string x, and Bob, back at mission control, receives a string y. They know nothing about each others strings, but want to check if x = y. Alice sends Bob x, and Bob confirms if x = y. But sending N bits is costly! Can they share less communication and check equality?

Possibilities:

If want 100% surety then NO. If OK with 99.99% surety then O(lg N) may suffice!!!

If x = y, then Pr[Bob says equal] = 1. If x = y, then Pr[Bob says un-equal] = 0.9999.

Ruta (UIUC) CS473 13 Spring 2018 13 / 29

slide-39
SLIDE 39

String Equality

Problem

Alice, the captain of a Mars lander, receives an N-bit string x, and Bob, back at mission control, receives a string y. They know nothing about each others strings, but want to check if x = y. Alice sends Bob x, and Bob confirms if x = y. But sending N bits is costly! Can they share less communication and check equality?

Possibilities:

If want 100% surety then NO. If OK with 99.99% surety then O(lg N) may suffice!!!

If x = y, then Pr[Bob says equal] = 1. If x = y, then Pr[Bob says un-equal] = 0.9999.

HOW?

Ruta (UIUC) CS473 13 Spring 2018 13 / 29

slide-40
SLIDE 40

String Equality: Randomized Algorithm

x, y : N-bit strings.

Ruta (UIUC) CS473 14 Spring 2018 14 / 29

slide-41
SLIDE 41

String Equality: Randomized Algorithm

x, y : N-bit strings. (Recall) If M = ⌈2(5N) lg 5N⌉, then 5N primes in {1, . . . , M}.

Ruta (UIUC) CS473 14 Spring 2018 14 / 29

slide-42
SLIDE 42

String Equality: Randomized Algorithm

x, y : N-bit strings. (Recall) If M = ⌈2(5N) lg 5N⌉, then 5N primes in {1, . . . , M}.

Procedure

Define hp(x) = x mod p

1

Alice picks a random prime p from {1, . . . M}.

Ruta (UIUC) CS473 14 Spring 2018 14 / 29

slide-43
SLIDE 43

String Equality: Randomized Algorithm

x, y : N-bit strings. (Recall) If M = ⌈2(5N) lg 5N⌉, then 5N primes in {1, . . . , M}.

Procedure

Define hp(x) = x mod p

1

Alice picks a random prime p from {1, . . . M}.

2

She sends Bob prime p, and also hp(x) = x mod p.

3

Bob checks if hp(y) = hp(x). If so, he says equal else un-equal.

Ruta (UIUC) CS473 14 Spring 2018 14 / 29

slide-44
SLIDE 44

String Equality: Randomized Algorithm

x, y : N-bit strings. (Recall) If M = ⌈2(5N) lg 5N⌉, then 5N primes in {1, . . . , M}.

Procedure

Define hp(x) = x mod p

1

Alice picks a random prime p from {1, . . . M}.

2

She sends Bob prime p, and also hp(x) = x mod p.

3

Bob checks if hp(y) = hp(x). If so, he says equal else un-equal.

Lemma

If x = y then Bob always says equal.

Ruta (UIUC) CS473 14 Spring 2018 14 / 29

slide-45
SLIDE 45

String Equality: Randomized Algorithm

x, y : N-bit strings. (Recall) If M = ⌈2(5N) lg 5N⌉, then 5N primes in {1, . . . , M}.

Procedure

Define hp(x) = x mod p

1

Alice picks a random prime p from {1, . . . M}.

2

She sends Bob prime p, and also hp(x) = x mod p.

3

Bob checks if hp(y) = hp(x). If so, he says equal else un-equal.

Lemma

If x = y then, Pr[Bob says equal] ≤ 1/5 (error probability).

Ruta (UIUC) CS473 15 Spring 2018 15 / 29

slide-46
SLIDE 46

String Equality: Randomized Algorithm

x, y : N-bit strings. (Recall) If M = ⌈2(sN) lg sN⌉, then sN primes in {1, . . . , M}.

Procedure

Define hp(x) = x mod p

1

Alice picks a random prime p from {1, . . . M}.

2

She sends Bob prime p, and also hp(x) = x mod p.

3

Bob checks if hp(y) = hp(x). If so, he says equal else un-equal.

Lemma

If x = y then, Pr[Bob says equal] ≤ 1/s (error probability).

Ruta (UIUC) CS473 16 Spring 2018 16 / 29

slide-47
SLIDE 47

Question.

Let x = 6 = 2 ∗ 3. If we draw a p u.a.r. from {2, 3, 5, 7}, then what is the probability that x mod p = 0? (A) 0. (B) 1. (C) 1/4. (D) 1/2. (E) none of the above.

Ruta (UIUC) CS473 17 Spring 2018 17 / 29

slide-48
SLIDE 48

Question.

Let x = 6 = 2 ∗ 3. If we draw a p u.a.r. from {2, 3, 5, 7}, then what is the probability that x mod p = 0? (A) 0. (B) 1. (C) 1/4. (D) 1/2. (E) none of the above. Now, let y = 21. What is the probability that (y − x) mod p = 15 mod p = 0? (A) 0. (B) 1. (C) 1/4. (D) 1/2.

Ruta (UIUC) CS473 17 Spring 2018 17 / 29

slide-49
SLIDE 49

String Equality: Randomized Algorithm

Error probability

x, y N-bit string, M = ⌈2(sN) lg sN⌉, and hp(x) = x mod p

Lemma

If x = y then, Pr[Bob says equal] = Pr[hp(x) = hp(y)] ≤ 1/s

Proof.

Given x = y, hp(x) = hp(y) ⇒ x mod p = y mod p.

Ruta (UIUC) CS473 18 Spring 2018 18 / 29

slide-50
SLIDE 50

String Equality: Randomized Algorithm

Error probability

x, y N-bit string, M = ⌈2(sN) lg sN⌉, and hp(x) = x mod p

Lemma

If x = y then, Pr[Bob says equal] = Pr[hp(x) = hp(y)] ≤ 1/s

Proof.

Given x = y, hp(x) = hp(y) ⇒ x mod p = y mod p. D = |x − y|, then D mod p = 0, and D ≤ 2N.

Ruta (UIUC) CS473 18 Spring 2018 18 / 29

slide-51
SLIDE 51

String Equality: Randomized Algorithm

Error probability

x, y N-bit string, M = ⌈2(sN) lg sN⌉, and hp(x) = x mod p

Lemma

If x = y then, Pr[Bob says equal] = Pr[hp(x) = hp(y)] ≤ 1/s

Proof.

Given x = y, hp(x) = hp(y) ⇒ x mod p = y mod p. D = |x − y|, then D mod p = 0, and D ≤ 2N. D = p1 . . . pk prime factorization.

Ruta (UIUC) CS473 18 Spring 2018 18 / 29

slide-52
SLIDE 52

String Equality: Randomized Algorithm

Error probability

x, y N-bit string, M = ⌈2(sN) lg sN⌉, and hp(x) = x mod p

Lemma

If x = y then, Pr[Bob says equal] = Pr[hp(x) = hp(y)] ≤ 1/s

Proof.

Given x = y, hp(x) = hp(y) ⇒ x mod p = y mod p. D = |x − y|, then D mod p = 0, and D ≤ 2N. D = p1 . . . pk prime factorization. All pi ≥ 2 ⇒ D ≥ 2k.

Ruta (UIUC) CS473 18 Spring 2018 18 / 29

slide-53
SLIDE 53

String Equality: Randomized Algorithm

Error probability

x, y N-bit string, M = ⌈2(sN) lg sN⌉, and hp(x) = x mod p

Lemma

If x = y then, Pr[Bob says equal] = Pr[hp(x) = hp(y)] ≤ 1/s

Proof.

Given x = y, hp(x) = hp(y) ⇒ x mod p = y mod p. D = |x − y|, then D mod p = 0, and D ≤ 2N. D = p1 . . . pk prime factorization. All pi ≥ 2 ⇒ D ≥ 2k. 2k ≤ D ≤ 2N ⇒ k ≤ N. D has at most N divisors.

Ruta (UIUC) CS473 18 Spring 2018 18 / 29

slide-54
SLIDE 54

String Equality: Randomized Algorithm

Error probability

x, y N-bit string, M = ⌈2(sN) lg sN⌉, and hp(x) = x mod p

Lemma

If x = y then, Pr[Bob says equal] = Pr[hp(x) = hp(y)] ≤ 1/s

Proof.

Given x = y, hp(x) = hp(y) ⇒ x mod p = y mod p. D = |x − y|, then D mod p = 0, and D ≤ 2N. D = p1 . . . pk prime factorization. All pi ≥ 2 ⇒ D ≥ 2k. 2k ≤ D ≤ 2N ⇒ k ≤ N. D has at most N divisors. Probability that a random prime p from {1, . . . , M} is a divisor =

k π(M) ≤ N π(M)

Ruta (UIUC) CS473 18 Spring 2018 18 / 29

slide-55
SLIDE 55

String Equality: Randomized Algorithm

Error probability

x, y N-bit string, M = ⌈2(sN) lg sN⌉, and hp(x) = x mod p

Lemma

If x = y then, Pr[Bob says equal] = Pr[hp(x) = hp(y)] ≤ 1/s

Proof.

Given x = y, hp(x) = hp(y) ⇒ x mod p = y mod p. D = |x − y|, then D mod p = 0, and D ≤ 2N. D = p1 . . . pk prime factorization. All pi ≥ 2 ⇒ D ≥ 2k. 2k ≤ D ≤ 2N ⇒ k ≤ N. D has at most N divisors. Probability that a random prime p from {1, . . . , M} is a divisor =

k π(M) ≤ N π(M) ≤ N M/ lg M = N 2(sN) lg sN lg M ≤ 1 s

Ruta (UIUC) CS473 18 Spring 2018 18 / 29

slide-56
SLIDE 56

Error Probability and Communication

Low Error Probability

1

Choose large enough s. Error prob: 1/s.

Ruta (UIUC) CS473 19 Spring 2018 19 / 29

slide-57
SLIDE 57

Error Probability and Communication

Low Error Probability

1

Choose large enough s. Error prob: 1/s.

2

Alice repeats the process R times, and Bob says equal only if he gets equal all R times.

Ruta (UIUC) CS473 19 Spring 2018 19 / 29

slide-58
SLIDE 58

Error Probability and Communication

Low Error Probability

1

Choose large enough s. Error prob: 1/s.

2

Alice repeats the process R times, and Bob says equal only if he gets equal all R times. Error probability:

1 sR .

Ruta (UIUC) CS473 19 Spring 2018 19 / 29

slide-59
SLIDE 59

Error Probability and Communication

Low Error Probability

1

Choose large enough s. Error prob: 1/s.

2

Alice repeats the process R times, and Bob says equal only if he gets equal all R times. Error probability:

1 sR . For s = 5, R = 10, 1 510 ≤ 0.000001.

Ruta (UIUC) CS473 19 Spring 2018 19 / 29

slide-60
SLIDE 60

Error Probability and Communication

Low Error Probability

1

Choose large enough s. Error prob: 1/s.

2

Alice repeats the process R times, and Bob says equal only if he gets equal all R times. Error probability:

1 sR . For s = 5, R = 10, 1 510 ≤ 0.000001.

M = ⌈2(sN) lg sN⌉

Amount of Communication

Each round sends 2 integers ≤ M. # bits: 2 lg M ≤ 4(lg s + lg N).

Ruta (UIUC) CS473 19 Spring 2018 19 / 29

slide-61
SLIDE 61

Error Probability and Communication

Low Error Probability

1

Choose large enough s. Error prob: 1/s.

2

Alice repeats the process R times, and Bob says equal only if he gets equal all R times. Error probability:

1 sR . For s = 5, R = 10, 1 510 ≤ 0.000001.

M = ⌈2(sN) lg sN⌉

Amount of Communication

Each round sends 2 integers ≤ M. # bits: 2 lg M ≤ 4(lg s + lg N). If x and y are copies of Wikipedia, about 25 billion characters. If 8 bits per character, then N ≈ 238 bits.

Ruta (UIUC) CS473 19 Spring 2018 19 / 29

slide-62
SLIDE 62

Error Probability and Communication

Low Error Probability

1

Choose large enough s. Error prob: 1/s.

2

Alice repeats the process R times, and Bob says equal only if he gets equal all R times. Error probability:

1 sR . For s = 5, R = 10, 1 510 ≤ 0.000001.

M = ⌈2(sN) lg sN⌉

Amount of Communication

Each round sends 2 integers ≤ M. # bits: 2 lg M ≤ 4(lg s + lg N). If x and y are copies of Wikipedia, about 25 billion characters. If 8 bits per character, then N ≈ 238 bits. Second approach will send 10(2 lg (10N lg 5N)) ≤ 1280 bits.

Ruta (UIUC) CS473 19 Spring 2018 19 / 29

slide-63
SLIDE 63

Part III Karp-Rabin Pattern Matching Algorithm

Ruta (UIUC) CS473 20 Spring 2018 20 / 29

slide-64
SLIDE 64

Pattern Matching

Given a string T of length m and pattern P of length n, s.t. m ≫ n, find all occurrences of P in T.

Example

T=abracadabra, P=ab.

Ruta (UIUC) CS473 21 Spring 2018 21 / 29

slide-65
SLIDE 65

Pattern Matching

Given a string T of length m and pattern P of length n, s.t. m ≫ n, find all occurrences of P in T.

Example

T=abracadabra, P=ab. Solution S = {1, 8}.

Ruta (UIUC) CS473 21 Spring 2018 21 / 29

slide-66
SLIDE 66

Pattern Matching

Given a string T of length m and pattern P of length n, s.t. m ≫ n, find all occurrences of P in T.

Example

T=abracadabra, P=ab. Solution S = {1, 8}. For j > i, let Ti...j = T[i]T[i + 1] . . . T[j].

Ruta (UIUC) CS473 21 Spring 2018 21 / 29

slide-67
SLIDE 67

Pattern Matching

Given a string T of length m and pattern P of length n, s.t. m ≫ n, find all occurrences of P in T.

Example

T=abracadabra, P=ab. Solution S = {1, 8}. For j > i, let Ti...j = T[i]T[i + 1] . . . T[j].

Brute force algorithm

S = ∅. For each i = 1 . . . m − n + 1 If Ti...i+n−1 = P then S = S ∪ {i}.

Ruta (UIUC) CS473 21 Spring 2018 21 / 29

slide-68
SLIDE 68

Pattern Matching

Given a string T of length m and pattern P of length n, s.t. m ≫ n, find all occurrences of P in T.

Example

T=abracadabra, P=ab. Solution S = {1, 8}. For j > i, let Ti...j = T[i]T[i + 1] . . . T[j].

Brute force algorithm

S = ∅. For each i = 1 . . . m − n + 1 If Ti...i+n−1 = P then S = S ∪ {i}. O(mn) run-time.

Ruta (UIUC) CS473 21 Spring 2018 21 / 29

slide-69
SLIDE 69

Using Hash Function

Pick a prime p u.a.r. from {1, . . . , M}. hp(x) = x mod p.

Brute force algorithm using hash function

S = ∅. For each i = 1 . . . m − n + 1 If hp(Ti...i+n−1) = hp(P) then S = S ∪ {i}.

Ruta (UIUC) CS473 22 Spring 2018 22 / 29

slide-70
SLIDE 70

Using Hash Function

Pick a prime p u.a.r. from {1, . . . , M}. hp(x) = x mod p.

Brute force algorithm using hash function

S = ∅. For each i = 1 . . . m − n + 1 If hp(Ti...i+n−1) = hp(P) then S = S ∪ {i}. If x is of length n, then computing hp(x) takes O(n) running time. Overall O(mn) running time.

Ruta (UIUC) CS473 22 Spring 2018 22 / 29

slide-71
SLIDE 71

Using Hash Function

Pick a prime p u.a.r. from {1, . . . , M}. hp(x) = x mod p.

Brute force algorithm using hash function

S = ∅. For each i = 1 . . . m − n + 1 If hp(Ti...i+n−1) = hp(P) then S = S ∪ {i}. If x is of length n, then computing hp(x) takes O(n) running time. Overall O(mn) running time. Can we compute hp(Ti+1...i+n) using hp(Ti...i+n−1) fast?

Ruta (UIUC) CS473 22 Spring 2018 22 / 29

slide-72
SLIDE 72

mod p math

Let a and b be (non-negative) integers. (a + b) mod p = ((a mod p) + (b mod p)) mod p

Ruta (UIUC) CS473 23 Spring 2018 23 / 29

slide-73
SLIDE 73

mod p math

Let a and b be (non-negative) integers. (a + b) mod p = ((a mod p) + (b mod p)) mod p (a · b) mod p = ((a mod p) · (b mod p)) mod p

Ruta (UIUC) CS473 23 Spring 2018 23 / 29

slide-74
SLIDE 74

Rolling Hash

x = Ti...i+n−1 and x′ = Ti+1...i+n.

Example

x = 1011001, and x′ = 0110010 (or x′ = 0110011).

Ruta (UIUC) CS473 24 Spring 2018 24 / 29

slide-75
SLIDE 75

Rolling Hash

x = Ti...i+n−1 and x′ = Ti+1...i+n.

Example

x = 1011001, and x′ = 0110010 (or x′ = 0110011). x′ = 2(x − xhb2n−1) + x′

lb

Ruta (UIUC) CS473 24 Spring 2018 24 / 29

slide-76
SLIDE 76

Rolling Hash

x = Ti...i+n−1 and x′ = Ti+1...i+n.

Example

x = 1011001, and x′ = 0110010 (or x′ = 0110011). x′ = 2(x − xhb2n−1) + x′

lb

= 2x − xhb2n + x′

lb

Ruta (UIUC) CS473 24 Spring 2018 24 / 29

slide-77
SLIDE 77

Rolling Hash

x = Ti...i+n−1 and x′ = Ti+1...i+n.

Example

x = 1011001, and x′ = 0110010 (or x′ = 0110011). x′ = 2(x − xhb2n−1) + x′

lb

= 2x − xhb2n + x′

lb

hp(x′) = x′ mod p = (2(x mod p) − xhb(2n mod p) + x′

lb) mod p

= (2hp(x) − xhbhp(2n) + x′

lb) mod p

Ruta (UIUC) CS473 24 Spring 2018 24 / 29

slide-78
SLIDE 78

Karp-Rabin Algorithm

p : a random prime from {1, . . . , M}.

1

Set S = ∅. Compute hp(T1...n), hp(2n), and hp(P).

2

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n) by applying rolling hash.

Ruta (UIUC) CS473 25 Spring 2018 25 / 29

slide-79
SLIDE 79

Karp-Rabin Algorithm

p : a random prime from {1, . . . , M}.

1

Set S = ∅. Compute hp(T1...n), hp(2n), and hp(P).

2

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n) by applying rolling hash.

Running Time

In Step 1, computing hp(x) for an n bit x is in O(n) time.

Ruta (UIUC) CS473 25 Spring 2018 25 / 29

slide-80
SLIDE 80

Karp-Rabin Algorithm

p : a random prime from {1, . . . , M}.

1

Set S = ∅. Compute hp(T1...n), hp(2n), and hp(P).

2

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n) by applying rolling hash.

Running Time

In Step 1, computing hp(x) for an n bit x is in O(n) time. Assuming O(lg M) bit arithmetic can be done in O(1) time, Since hp(.) produces lg M bit numbers, both steps inside for loop can be done in O(1) time.

Ruta (UIUC) CS473 25 Spring 2018 25 / 29

slide-81
SLIDE 81

Karp-Rabin Algorithm

p : a random prime from {1, . . . , M}.

1

Set S = ∅. Compute hp(T1...n), hp(2n), and hp(P).

2

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n) by applying rolling hash.

Running Time

In Step 1, computing hp(x) for an n bit x is in O(n) time. Assuming O(lg M) bit arithmetic can be done in O(1) time, Since hp(.) produces lg M bit numbers, both steps inside for loop can be done in O(1) time. Overall O(m + n) time.

Ruta (UIUC) CS473 25 Spring 2018 25 / 29

slide-82
SLIDE 82

Karp-Rabin Algorithm

p : a random prime from {1, . . . , M}.

1

Set S = ∅. Compute hp(T1...n), hp(2n), and hp(P).

2

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n) by applying rolling hash.

Running Time

In Step 1, computing hp(x) for an n bit x is in O(n) time. Assuming O(lg M) bit arithmetic can be done in O(1) time, Since hp(.) produces lg M bit numbers, both steps inside for loop can be done in O(1) time. Overall O(m + n) time. Can’t do better.

Ruta (UIUC) CS473 25 Spring 2018 25 / 29

slide-83
SLIDE 83

Karp-Rabin Algorithm: Error Probability

1

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n).

Lemma

If match at any position i then i ∈ S. In otherwords if Ti...i+n−1 = P, then i ∈ S. All matched positions are in S.

Ruta (UIUC) CS473 26 Spring 2018 26 / 29

slide-84
SLIDE 84

Karp-Rabin Algorithm: Error Probability

1

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n).

Lemma

If match at any position i then i ∈ S. In otherwords if Ti...i+n−1 = P, then i ∈ S. All matched positions are in S. Can it contain unmatched positions?

Ruta (UIUC) CS473 26 Spring 2018 26 / 29

slide-85
SLIDE 85

Karp-Rabin Algorithm: Error Probability

1

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n).

Lemma

If match at any position i then i ∈ S. In otherwords if Ti...i+n−1 = P, then i ∈ S. All matched positions are in S. Can it contain unmatched positions? YES!

Ruta (UIUC) CS473 26 Spring 2018 26 / 29

slide-86
SLIDE 86

Karp-Rabin Algorithm: Error Probability

1

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n).

Lemma

If match at any position i then i ∈ S. In otherwords if Ti...i+n−1 = P, then i ∈ S. All matched positions are in S. Can it contain unmatched positions? YES! With what probability?

Ruta (UIUC) CS473 26 Spring 2018 26 / 29

slide-87
SLIDE 87

Karp-Rabin Algorithm: Error Probability

Pr[S contains an index i, while there is no match at i]

1

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n).

Ruta (UIUC) CS473 27 Spring 2018 27 / 29

slide-88
SLIDE 88

Karp-Rabin Algorithm: Error Probability

Pr[S contains an index i, while there is no match at i]

1

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n).

Set M = ⌈2(sn) lg sn⌉. Given x = y, Pr[hp(x) = hp(y)] ≤ 1/s.

Ruta (UIUC) CS473 27 Spring 2018 27 / 29

slide-89
SLIDE 89

Karp-Rabin Algorithm: Error Probability

Pr[S contains an index i, while there is no match at i]

1

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n).

Set M = ⌈2(sn) lg sn⌉. Given x = y, Pr[hp(x) = hp(y)] ≤ 1/s.

False positive: Pr[S contains an i, while no match at i]

Ruta (UIUC) CS473 27 Spring 2018 27 / 29

slide-90
SLIDE 90

Karp-Rabin Algorithm: Error Probability

Pr[S contains an index i, while there is no match at i]

1

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n).

Set M = ⌈2(sn) lg sn⌉. Given x = y, Pr[hp(x) = hp(y)] ≤ 1/s.

False positive: Pr[S contains an i, while no match at i]

Given Ti...i+n−1 = P, Pr[i ∈ S] ≤ 1/s.

Ruta (UIUC) CS473 27 Spring 2018 27 / 29

slide-91
SLIDE 91

Karp-Rabin Algorithm: Error Probability

Pr[S contains an index i, while there is no match at i]

1

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n).

Set M = ⌈2(sn) lg sn⌉. Given x = y, Pr[hp(x) = hp(y)] ≤ 1/s.

False positive: Pr[S contains an i, while no match at i]

Given Ti...i+n−1 = P, Pr[i ∈ S] ≤ 1/s. Pr[Any index in S is wrong]

Ruta (UIUC) CS473 27 Spring 2018 27 / 29

slide-92
SLIDE 92

Karp-Rabin Algorithm: Error Probability

Pr[S contains an index i, while there is no match at i]

1

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n).

Set M = ⌈2(sn) lg sn⌉. Given x = y, Pr[hp(x) = hp(y)] ≤ 1/s.

False positive: Pr[S contains an i, while no match at i]

Given Ti...i+n−1 = P, Pr[i ∈ S] ≤ 1/s. Pr[Any index in S is wrong] ≤ m/s (Union bound).

Ruta (UIUC) CS473 27 Spring 2018 27 / 29

slide-93
SLIDE 93

Karp-Rabin Algorithm: Error Probability

Pr[S contains an index i, while there is no match at i]

1

For each i = 1, . . . , m − n + 1

1

If hp(Ti...i+n−1) = hp(P), then S = S ∪ {i}.

2

Compute hp(Ti+1...i+n) using hp(Ti...i+n−1) and hp(2n).

Set M = ⌈2(sn) lg sn⌉. Given x = y, Pr[hp(x) = hp(y)] ≤ 1/s.

False positive: Pr[S contains an i, while no match at i]

Given Ti...i+n−1 = P, Pr[i ∈ S] ≤ 1/s. Pr[Any index in S is wrong] ≤ m/s (Union bound). To ensure S is correct with at least 0.99 probability, we need 1 − m s = 0.99 ⇔ m s = 1 100 ⇔ s = 100m .

Ruta (UIUC) CS473 27 Spring 2018 27 / 29

slide-94
SLIDE 94

Karp-Rabin Algorithm

Back to running time

Running Time

In Step 1, computing hp(x) for an n bit x is in O(n) time. Assuming O(lg M) bit arithmetic can be done in O(1) time, Since hp(.) produces lg M bit numbers, both steps inside for loop can be done in O(1) time. Overall O(m + n) time. Can’t do better. M = ⌈200mn lg 100mn⌉ ⇒ lg M = O(lg m)

Ruta (UIUC) CS473 28 Spring 2018 28 / 29

slide-95
SLIDE 95

Karp-Rabin Algorithm

Back to running time

Running Time

In Step 1, computing hp(x) for an n bit x is in O(n) time. Assuming O(lg M) bit arithmetic can be done in O(1) time, Since hp(.) produces lg M bit numbers, both steps inside for loop can be done in O(1) time. Overall O(m + n) time. Can’t do better. M = ⌈200mn lg 100mn⌉ ⇒ lg M = O(lg m) Even if T is entire Wikipedia, with bit length m ≈ 238,

Ruta (UIUC) CS473 28 Spring 2018 28 / 29

slide-96
SLIDE 96

Karp-Rabin Algorithm

Back to running time

Running Time

In Step 1, computing hp(x) for an n bit x is in O(n) time. Assuming O(lg M) bit arithmetic can be done in O(1) time, Since hp(.) produces lg M bit numbers, both steps inside for loop can be done in O(1) time. Overall O(m + n) time. Can’t do better. M = ⌈200mn lg 100mn⌉ ⇒ lg M = O(lg m) Even if T is entire Wikipedia, with bit length m ≈ 238, lg M ≈ 64 (assuming bit-length of n ≤ 216)

Ruta (UIUC) CS473 28 Spring 2018 28 / 29

slide-97
SLIDE 97

Karp-Rabin Algorithm

Back to running time

Running Time

In Step 1, computing hp(x) for an n bit x is in O(n) time. Assuming O(lg M) bit arithmetic can be done in O(1) time, Since hp(.) produces lg M bit numbers, both steps inside for loop can be done in O(1) time. Overall O(m + n) time. Can’t do better. M = ⌈200mn lg 100mn⌉ ⇒ lg M = O(lg m) Even if T is entire Wikipedia, with bit length m ≈ 238, lg M ≈ 64 (assuming bit-length of n ≤ 216) 64-bit arithmetic is doable on laptops!

Ruta (UIUC) CS473 28 Spring 2018 28 / 29

slide-98
SLIDE 98

Take away points

1

Hashing is a powerful and important technique. Many practical applications.

2

Randomization fundamental to understand hashing.

3

Good and efficient hashing possible in theory and practice with proper definitions (universal, perfect, etc).

4

Related ideas of creating a compact fingerprint/sketch for

  • bjects is very powerful in theory and practice.

Ruta (UIUC) CS473 29 Spring 2018 29 / 29