CS 473: Algorithms Chandra Chekuri Ruta Mehta University of - PowerPoint PPT Presentation

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall 2016 Chandra & Ruta (UIUC) CS473 1 Fall 2016 1 / 22

CS 473: Algorithms, Fall 2016 Fingerprinting Lecture 11 September 28, 2016 Chandra & Ruta (UIUC) CS473 2 Fall 2016 2 / 22

Fingerprinting Source: Wikipedia Process of mapping a large data item to a much shorter bit string, called its fingerprint. Fingerprints uniquely identifies data for all practical purposes . Chandra & Ruta (UIUC) CS473 3 Fall 2016 3 / 22

Fingerprinting Source: Wikipedia Process of mapping a large data item to a much shorter bit string, called its fingerprint. Fingerprints uniquely identifies data for all practical purposes . Typically used to avoid comparison and transmission of bulky data. Eg: Web browser can store/fetch file fingerprints to check if it is changed. Chandra & Ruta (UIUC) CS473 3 Fall 2016 3 / 22

Fingerprinting Source: Wikipedia Process of mapping a large data item to a much shorter bit string, called its fingerprint. Fingerprints uniquely identifies data for all practical purposes . Typically used to avoid comparison and transmission of bulky data. Eg: Web browser can store/fetch file fingerprints to check if it is changed. As you may have guessed, fingerprint functions are hash functions. Chandra & Ruta (UIUC) CS473 3 Fall 2016 3 / 22

Bloom Filters Hashing: To insert x in dictionary store x in table in location h(x) 1 To lookup y in dictionary check contents of location h(y) 2 Chandra & Ruta (UIUC) CS473 4 Fall 2016 4 / 22

Bloom Filters Hashing: To insert x in dictionary store x in table in location h(x) 1 To lookup y in dictionary check contents of location h(y) 2 Bloom Filter: tradeoff space for false positives Storing items in dictionary expensive in terms of memory, 1 especially if items are unwieldy objects such a long strings, images, etc with non-uniform sizes. To insert x in dictionary set bit to 1 in location h(x) (initially all 2 bits are set to 0 ) To lookup y if bit in location h(y) is 1 say yes, else no. 3 Chandra & Ruta (UIUC) CS473 4 Fall 2016 4 / 22

Bloom Filters Chandra & Ruta (UIUC) CS473 5 Fall 2016 5 / 22

Bloom Filters Bloom Filter: tradeoff space for false positives To insert x in dictionary set bit to 1 in location h(x) (initially all 1 bits are set to 0 ) To lookup y if bit in location h(y) is 1 say yes, else no 2 No false negatives but false positives possible due to collisions 3 Chandra & Ruta (UIUC) CS473 5 Fall 2016 5 / 22

Bloom Filters Bloom Filter: tradeoff space for false positives To insert x in dictionary set bit to 1 in location h(x) (initially all 1 bits are set to 0 ) To lookup y if bit in location h(y) is 1 say yes, else no 2 No false negatives but false positives possible due to collisions 3 Reducing false positives: Pick k hash functions h 1 , h 2 , . . . , h k independently 1 To insert x for 1 ≤ i ≤ k set bit in location h i (x) in table i to 1 2 To lookup y compute h i (y) for 1 ≤ i ≤ k and say yes only if 3 each bit in the corresponding location is 1 , otherwise say no. If probability of false positive for one hash function is α < 1 then with k independent hash function it is Chandra & Ruta (UIUC) CS473 5 Fall 2016 5 / 22

Bloom Filters Bloom Filter: tradeoff space for false positives To insert x in dictionary set bit to 1 in location h(x) (initially all 1 bits are set to 0 ) To lookup y if bit in location h(y) is 1 say yes, else no 2 No false negatives but false positives possible due to collisions 3 Reducing false positives: Pick k hash functions h 1 , h 2 , . . . , h k independently 1 To insert x for 1 ≤ i ≤ k set bit in location h i (x) in table i to 1 2 To lookup y compute h i (y) for 1 ≤ i ≤ k and say yes only if 3 each bit in the corresponding location is 1 , otherwise say no. If probability of false positive for one hash function is α < 1 then with k independent hash function it is α k . Chandra & Ruta (UIUC) CS473 5 Fall 2016 5 / 22

Outline Use of hash functions for designing fast algorithms Problem Given a text T of length m and pattern P of length n , m ≫ n , find all occurrences of P in T . Chandra & Ruta (UIUC) CS473 6 Fall 2016 6 / 22

Outline Use of hash functions for designing fast algorithms Problem Given a text T of length m and pattern P of length n , m ≫ n , find all occurrences of P in T . Karp-Rabin Randomized Algorithm Chandra & Ruta (UIUC) CS473 6 Fall 2016 6 / 22

Outline Use of hash functions for designing fast algorithms Problem Given a text T of length m and pattern P of length n , m ≫ n , find all occurrences of P in T . Karp-Rabin Randomized Algorithm Sampling a prime String equality via mod p arithmetic Rabin’s fingerprinting scheme – rolling hash Karp-Rabin pattern matching algorithm: O(m + n) time. Chandra & Ruta (UIUC) CS473 6 Fall 2016 6 / 22

Sampling a prime Problem Given an integer x > 0 , sample a prime uniformly at random from all the primes between 1 and x . Chandra & Ruta (UIUC) CS473 7 Fall 2016 7 / 22

Sampling a prime Problem Given an integer x > 0 , sample a prime uniformly at random from all the primes between 1 and x . Procedure Sample a number p uniformly at random from { 1 , . . . , x } . 1 If p is a prime, then output p . Else go to Step (1). 2 Chandra & Ruta (UIUC) CS473 7 Fall 2016 7 / 22

Sampling a prime Problem Given an integer x > 0 , sample a prime uniformly at random from all the primes between 1 and x . Procedure Sample a number p uniformly at random from { 1 , . . . , x } . 1 If p is a prime, then output p . Else go to Step (1). 2 Checking if p is prime Agrawal-Kayal-Saxena primality test: deterministic but slow Miller-Rabin randomized primality test: fast but randomized outputs ‘prime’ when it is not with very low probability . Chandra & Ruta (UIUC) CS473 7 Fall 2016 7 / 22

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 22

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π (x) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π (x) . Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 22

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π (x) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π (x) . Proof. A : Event that a prime is picked in a round. Pr[A] = π (x) / x . Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 22

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π (x) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π (x) . Proof. A : Event that a prime is picked in a round. Pr[A] = π (x) / x . B : Number (prime) p ∗ is picked. Pr[B] = 1 / x . B ⊂ A . Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 22

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π (x) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π (x) . Proof. A : Event that a prime is picked in a round. Pr[A] = π (x) / x . B : Number (prime) p ∗ is picked. Pr[B] = 1 / x . B ⊂ A . Pr[B | A] = Pr [A ∩ B] = Pr [B] 1 / x 1 [A] = π (x) / x = Pr [A] Pr π (x) Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 22

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π (x) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π (x) . Proof. A : Event that a prime is picked in a round. Pr[A] = π (x) / x . B : Number (prime) p ∗ is picked. Pr[B] = 1 / x . B ⊂ A . Pr[B | A] = Pr [A ∩ B] = Pr [B] 1 / x 1 [A] = π (x) / x = Pr [A] Pr π (x) Running time in expectation Q: How many samples in expectation before termination? A: x /π (x) . Exercise. Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 22

How many primes between 0 and x π (x) : Number of primes between 0 and x . Prime Number Theorem π (x) lim x →∞ x / ln x = 1 By Jacques Hadamard and Charles Jean de la Vall´ ee-Poussin in 1896 Chandra & Ruta (UIUC) CS473 9 Fall 2016 9 / 22

How many primes between 0 and x π (x) : Number of primes between 0 and x . Prime Number Theorem π (x) lim x →∞ x / ln x = 1 By Jacques Hadamard and Charles Jean de la Vall´ ee-Poussin in 1896 Chebyshev (from 1848) π (x) ≥ 7 ln x = (1 . 262 .. ) x x x lg x > 8 lg x Chandra & Ruta (UIUC) CS473 9 Fall 2016 9 / 22

How many primes between 0 and x π (x) : Number of primes between 0 and x . Prime Number Theorem π (x) lim x →∞ x / ln x = 1 By Jacques Hadamard and Charles Jean de la Vall´ ee-Poussin in 1896 Chebyshev (from 1848) π (x) ≥ 7 ln x = (1 . 262 .. ) x x x lg x > 8 lg x π (x) 1 y ∼ { 1 , . . . , x } u.a.r., then y is a prime w.p. > lg x . x Chandra & Ruta (UIUC) CS473 9 Fall 2016 9 / 22

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of - PowerPoint PPT Presentation

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall 2016 Chandra & Ruta (UIUC) CS473 1 Fall 2016 1 / 22 CS 473: Algorithms, Fall 2016 Fingerprinting Lecture 11 September 28, 2016 Chandra

MA/CSSE 473 Day 35 Greedy Algorithms MA/CSSE 473 Day 35 HW 13 due tomorrow HW 14

CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC)

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

MA/CSSE 473 Day 13 Brute Force Divide and Conquer MA/CSSE 473 Day 13 Student Questions

MA/CSSE 473 Day 11 Data Encryption MA/CSSE 473 Day 11 HW 5 is due tomorrow. HW 6 due

MA/CSSE 473 Day 05 Factors and Primes Recursive division algorithm MA/CSSE 473 Day 05

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC)

CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC)

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

Estimating the false positive percentage of Kepler single systems

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Constructions and Applications for Accurate Counting of the Bloom Filter False Positive Free Zone

Specification Mining With Few False Positives Claire Le Goues Westley Weimer University of

Outline Bloom filters Applications of Bloom filters Our replacement for Bloom filters

Creating a trust-group for security information sharing (in Asia Pacific?) Romain Wartel, ISGC

Christian Folini / @ChrFolini Introducing the OWASP ModSecurity Core Rule Set 3.0 Seat Belts

THE MILLERRABIN PRIMALITY TEST 1. Fast Modular Exponentiation Given positive integers a , e ,