CS 473: Algorithms Ruta Mehta University of Illinois, - PowerPoint PPT Presentation

CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 29

CS 473: Algorithms, Spring 2018 Fingerprinting Lecture 11 Feb 20, 2018 Most slides are courtesy Prof. Chekuri Ruta (UIUC) CS473 2 Spring 2018 2 / 29

Fingerprinting Source: Wikipedia Process of mapping a large data item to a much shorter bit string, called its fingerprint. Fingerprints uniquely identifies data “for all practical purposes” . Ruta (UIUC) CS473 3 Spring 2018 3 / 29

Fingerprinting Source: Wikipedia Process of mapping a large data item to a much shorter bit string, called its fingerprint. Fingerprints uniquely identifies data “for all practical purposes” . Typically used to avoid comparison and transmission of bulky data. Eg: Web browser can store/fetch file fingerprints to check if it is changed. Ruta (UIUC) CS473 3 Spring 2018 3 / 29

Fingerprinting Source: Wikipedia Process of mapping a large data item to a much shorter bit string, called its fingerprint. Fingerprints uniquely identifies data “for all practical purposes” . Typically used to avoid comparison and transmission of bulky data. Eg: Web browser can store/fetch file fingerprints to check if it is changed. As you may have guessed, fingerprint functions are hash functions. Ruta (UIUC) CS473 3 Spring 2018 3 / 29

Bloom Filters Hashing: To insert x in dictionary store x in table in location h ( x ) 1 To lookup y in dictionary check contents of location h ( y ) 2 Ruta (UIUC) CS473 4 Spring 2018 4 / 29

Bloom Filters Hashing: To insert x in dictionary store x in table in location h ( x ) 1 To lookup y in dictionary check contents of location h ( y ) 2 Bloom Filter: tradeoff space for false positives What if elements ( x ) are unwieldy objects such a long strings, 1 images, etc with non-uniform sizes. To insert x in dictionary, set bit at location h ( x ) to 1 (initially 2 all bits are set to 0 ) To lookup y if bit in location h ( y ) is 1 say yes, else no. 3 Ruta (UIUC) CS473 4 Spring 2018 4 / 29

Bloom Filters Bloom Filter: tradeoff space for false positives Reducing false positives: Pick k hash functions h 1 , h 2 , . . . , h k independently 1 Insert x : for 1 ≤ i ≤ k set bit in location h i ( x ) in table i to 1 2 Ruta (UIUC) CS473 5 Spring 2018 5 / 29

Bloom Filters Bloom Filter: tradeoff space for false positives Reducing false positives: Pick k hash functions h 1 , h 2 , . . . , h k independently 1 Insert x : for 1 ≤ i ≤ k set bit in location h i ( x ) in table i to 1 2 Lookup y : compute h i ( y ) for 1 ≤ i ≤ k and say yes only if 3 each bit in the corresponding location is 1 , otherwise say no. If probability of false positive for one hash function is α < 1 then with k independent hash function it is Ruta (UIUC) CS473 5 Spring 2018 5 / 29

Bloom Filters Bloom Filter: tradeoff space for false positives Reducing false positives: Pick k hash functions h 1 , h 2 , . . . , h k independently 1 Insert x : for 1 ≤ i ≤ k set bit in location h i ( x ) in table i to 1 2 Lookup y : compute h i ( y ) for 1 ≤ i ≤ k and say yes only if 3 each bit in the corresponding location is 1 , otherwise say no. If probability of false positive for one hash function is α < 1 then with k independent hash function it is α k . Ruta (UIUC) CS473 5 Spring 2018 5 / 29

Outline Use of hash functions for designing fast algorithms Problem Given a text T of length m and pattern P of length n , m ≫ n , find all occurrences of P in T . Ruta (UIUC) CS473 6 Spring 2018 6 / 29

Outline Use of hash functions for designing fast algorithms Problem Given a text T of length m and pattern P of length n , m ≫ n , find all occurrences of P in T . Karp-Rabin Randomized Algorithm Ruta (UIUC) CS473 6 Spring 2018 6 / 29

Outline Use of hash functions for designing fast algorithms Problem Given a text T of length m and pattern P of length n , m ≫ n , find all occurrences of P in T . Karp-Rabin Randomized Algorithm It involves: Sampling a prime String equality via mod p arithmetic Rabin’s fingerprinting scheme – rolling hash Karp-Rabin pattern matching algorithm: O ( m + n ) time. Ruta (UIUC) CS473 6 Spring 2018 6 / 29

Part I Sampling a Prime Ruta (UIUC) CS473 7 Spring 2018 7 / 29

Sampling a prime Problem Given an integer x > 0 , sample a prime uniformly at random from all the primes between 1 and x . Ruta (UIUC) CS473 8 Spring 2018 8 / 29

Sampling a prime Problem Given an integer x > 0 , sample a prime uniformly at random from all the primes between 1 and x . Procedure Sample a number p uniformly at random from { 1 , . . . , x } . 1 If p is a prime, then output p . Else go to Step (1). 2 Ruta (UIUC) CS473 8 Spring 2018 8 / 29

Sampling a prime Problem Given an integer x > 0 , sample a prime uniformly at random from all the primes between 1 and x . Procedure Sample a number p uniformly at random from { 1 , . . . , x } . 1 If p is a prime, then output p . Else go to Step (1). 2 Checking if p is prime Agrawal-Kayal-Saxena primality test: deterministic but slow Miller-Rabin randomized primality test: fast but randomized outputs ‘prime’ when it is not with very low probability . Ruta (UIUC) CS473 8 Spring 2018 8 / 29

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? Ruta (UIUC) CS473 9 Spring 2018 9 / 29

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Ruta (UIUC) CS473 9 Spring 2018 9 / 29

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = Ruta (UIUC) CS473 9 Spring 2018 9 / 29

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Ruta (UIUC) CS473 9 Spring 2018 9 / 29

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = Ruta (UIUC) CS473 9 Spring 2018 9 / 29

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = 1 / x . Ruta (UIUC) CS473 9 Spring 2018 9 / 29

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = 1 / x . Pr[ A ∩ B ] = Ruta (UIUC) CS473 9 Spring 2018 9 / 29

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = 1 / x . Pr[ A ∩ B ] =Pr[ B ] = 1 / x . Why? Ruta (UIUC) CS473 9 Spring 2018 9 / 29

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = 1 / x . Pr[ A ∩ B ] =Pr[ B ] = 1 / x . Why? Because B ⊂ A . Ruta (UIUC) CS473 9 Spring 2018 9 / 29

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = 1 / x . Pr[ A ∩ B ] =Pr[ B ] = 1 / x . Why? Because B ⊂ A . Pr[ B | A ] = Ruta (UIUC) CS473 9 Spring 2018 9 / 29

Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = 1 / x . Pr[ A ∩ B ] =Pr[ B ] = 1 / x . Why? Because B ⊂ A . Pr[ B | A ] = Pr[ A ∩ B ] = Pr[ B ] 1 / x 1 Pr[ A ] = π ( x ) / x = Pr[ A ] π ( x ) Ruta (UIUC) CS473 9 Spring 2018 9 / 29

CS 473: Algorithms Ruta Mehta University of Illinois, - PowerPoint PPT Presentation

CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 29 CS 473: Algorithms, Spring 2018 Fingerprinting Lecture 11 Feb 20, 2018 Most slides are courtesy Prof. Chekuri

MA/CSSE 473 Day 35 Greedy Algorithms MA/CSSE 473 Day 35 HW 13 due tomorrow HW 14

CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC)

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

MA/CSSE 473 Day 13 Brute Force Divide and Conquer MA/CSSE 473 Day 13 Student Questions

MA/CSSE 473 Day 11 Data Encryption MA/CSSE 473 Day 11 HW 5 is due tomorrow. HW 6 due

MA/CSSE 473 Day 05 Factors and Primes Recursive division algorithm MA/CSSE 473 Day 05

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC)

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

Sampling discretization of integral norms. Lecture 2 Vladimir Temlyakov Chemnitz, September,

About the formalization of some results by The Chebyshev in number theory factorization of n !

How well do the Hermite-Pad e approximants reduce the Gibbs phenomenon? Ana C. Matos joint

SINGULAR DISTRIBUTIONS AND SYMMETRY OF THE SPECTRUM A.Olevskii Well discuss the "Fourier

Chapter 3: The Second Moment The Probabilistic Method Summer 2020 Freie Universitt Berlin

Adaptive Signal Processing Stephen Casey American University scasey@american.edu February 21th,

(BUILDING AN) AI PLATFORM ON HTCONDOR Motivations, lessons learnt and Next Steps Cedalion

Fully Automated Nagios Cdric TEMPLE 1 RMLL 2009 Presentation outline Introducing FAN

CS 473: Algorithms Ruta Mehta University of Illinois, - PowerPoint PPT Presentation

CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 29 CS 473: Algorithms, Spring 2018 Fingerprinting Lecture 11 Feb 20, 2018 Most slides are courtesy Prof. Chekuri

MA/CSSE 473 Day 35 Greedy Algorithms MA/CSSE 473 Day 35 HW 13 due tomorrow HW 14

CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC)

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

MA/CSSE 473 Day 13 Brute Force Divide and Conquer MA/CSSE 473 Day 13 Student Questions

MA/CSSE 473 Day 11 Data Encryption MA/CSSE 473 Day 11 HW 5 is due tomorrow. HW 6 due

MA/CSSE 473 Day 05 Factors and Primes Recursive division algorithm MA/CSSE 473 Day 05

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC)

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

Sampling discretization of integral norms. Lecture 2 Vladimir Temlyakov Chemnitz, September,

About the formalization of some results by The Chebyshev in number theory factorization of n !

How well do the Hermite-Pad e approximants reduce the Gibbs phenomenon? Ana C. Matos joint

SINGULAR DISTRIBUTIONS AND SYMMETRY OF THE SPECTRUM A.Olevskii Well discuss the &quot;Fourier

Chapter 3: The Second Moment The Probabilistic Method Summer 2020 Freie Universitt Berlin

Adaptive Signal Processing Stephen Casey American University scasey@american.edu February 21th,

(BUILDING AN) AI PLATFORM ON HTCONDOR Motivations, lessons learnt and Next Steps Cedalion

Fully Automated Nagios Cdric TEMPLE 1 RMLL 2009 Presentation outline Introducing FAN

SINGULAR DISTRIBUTIONS AND SYMMETRY OF THE SPECTRUM A.Olevskii Well discuss the "Fourier