cs 473 algorithms
play

CS 473: Algorithms Ruta Mehta University of Illinois, - PowerPoint PPT Presentation

CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 29 CS 473: Algorithms, Spring 2018 Fingerprinting Lecture 11 Feb 20, 2018 Most slides are courtesy Prof. Chekuri


  1. CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 29

  2. CS 473: Algorithms, Spring 2018 Fingerprinting Lecture 11 Feb 20, 2018 Most slides are courtesy Prof. Chekuri Ruta (UIUC) CS473 2 Spring 2018 2 / 29

  3. Fingerprinting Source: Wikipedia Process of mapping a large data item to a much shorter bit string, called its fingerprint. Fingerprints uniquely identifies data “for all practical purposes” . Ruta (UIUC) CS473 3 Spring 2018 3 / 29

  4. Fingerprinting Source: Wikipedia Process of mapping a large data item to a much shorter bit string, called its fingerprint. Fingerprints uniquely identifies data “for all practical purposes” . Typically used to avoid comparison and transmission of bulky data. Eg: Web browser can store/fetch file fingerprints to check if it is changed. Ruta (UIUC) CS473 3 Spring 2018 3 / 29

  5. Fingerprinting Source: Wikipedia Process of mapping a large data item to a much shorter bit string, called its fingerprint. Fingerprints uniquely identifies data “for all practical purposes” . Typically used to avoid comparison and transmission of bulky data. Eg: Web browser can store/fetch file fingerprints to check if it is changed. As you may have guessed, fingerprint functions are hash functions. Ruta (UIUC) CS473 3 Spring 2018 3 / 29

  6. Bloom Filters Hashing: To insert x in dictionary store x in table in location h ( x ) 1 To lookup y in dictionary check contents of location h ( y ) 2 Ruta (UIUC) CS473 4 Spring 2018 4 / 29

  7. Bloom Filters Hashing: To insert x in dictionary store x in table in location h ( x ) 1 To lookup y in dictionary check contents of location h ( y ) 2 Bloom Filter: tradeoff space for false positives What if elements ( x ) are unwieldy objects such a long strings, 1 images, etc with non-uniform sizes. To insert x in dictionary, set bit at location h ( x ) to 1 (initially 2 all bits are set to 0 ) To lookup y if bit in location h ( y ) is 1 say yes, else no. 3 Ruta (UIUC) CS473 4 Spring 2018 4 / 29

  8. Bloom Filters Bloom Filter: tradeoff space for false positives Reducing false positives: Pick k hash functions h 1 , h 2 , . . . , h k independently 1 Insert x : for 1 ≤ i ≤ k set bit in location h i ( x ) in table i to 1 2 Ruta (UIUC) CS473 5 Spring 2018 5 / 29

  9. Bloom Filters Bloom Filter: tradeoff space for false positives Reducing false positives: Pick k hash functions h 1 , h 2 , . . . , h k independently 1 Insert x : for 1 ≤ i ≤ k set bit in location h i ( x ) in table i to 1 2 Lookup y : compute h i ( y ) for 1 ≤ i ≤ k and say yes only if 3 each bit in the corresponding location is 1 , otherwise say no. If probability of false positive for one hash function is α < 1 then with k independent hash function it is Ruta (UIUC) CS473 5 Spring 2018 5 / 29

  10. Bloom Filters Bloom Filter: tradeoff space for false positives Reducing false positives: Pick k hash functions h 1 , h 2 , . . . , h k independently 1 Insert x : for 1 ≤ i ≤ k set bit in location h i ( x ) in table i to 1 2 Lookup y : compute h i ( y ) for 1 ≤ i ≤ k and say yes only if 3 each bit in the corresponding location is 1 , otherwise say no. If probability of false positive for one hash function is α < 1 then with k independent hash function it is α k . Ruta (UIUC) CS473 5 Spring 2018 5 / 29

  11. Outline Use of hash functions for designing fast algorithms Problem Given a text T of length m and pattern P of length n , m ≫ n , find all occurrences of P in T . Ruta (UIUC) CS473 6 Spring 2018 6 / 29

  12. Outline Use of hash functions for designing fast algorithms Problem Given a text T of length m and pattern P of length n , m ≫ n , find all occurrences of P in T . Karp-Rabin Randomized Algorithm Ruta (UIUC) CS473 6 Spring 2018 6 / 29

  13. Outline Use of hash functions for designing fast algorithms Problem Given a text T of length m and pattern P of length n , m ≫ n , find all occurrences of P in T . Karp-Rabin Randomized Algorithm It involves: Sampling a prime String equality via mod p arithmetic Rabin’s fingerprinting scheme – rolling hash Karp-Rabin pattern matching algorithm: O ( m + n ) time. Ruta (UIUC) CS473 6 Spring 2018 6 / 29

  14. Part I Sampling a Prime Ruta (UIUC) CS473 7 Spring 2018 7 / 29

  15. Sampling a prime Problem Given an integer x > 0 , sample a prime uniformly at random from all the primes between 1 and x . Ruta (UIUC) CS473 8 Spring 2018 8 / 29

  16. Sampling a prime Problem Given an integer x > 0 , sample a prime uniformly at random from all the primes between 1 and x . Procedure Sample a number p uniformly at random from { 1 , . . . , x } . 1 If p is a prime, then output p . Else go to Step (1). 2 Ruta (UIUC) CS473 8 Spring 2018 8 / 29

  17. Sampling a prime Problem Given an integer x > 0 , sample a prime uniformly at random from all the primes between 1 and x . Procedure Sample a number p uniformly at random from { 1 , . . . , x } . 1 If p is a prime, then output p . Else go to Step (1). 2 Checking if p is prime Agrawal-Kayal-Saxena primality test: deterministic but slow Miller-Rabin randomized primality test: fast but randomized outputs ‘prime’ when it is not with very low probability . Ruta (UIUC) CS473 8 Spring 2018 8 / 29

  18. Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? Ruta (UIUC) CS473 9 Spring 2018 9 / 29

  19. Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Ruta (UIUC) CS473 9 Spring 2018 9 / 29

  20. Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = Ruta (UIUC) CS473 9 Spring 2018 9 / 29

  21. Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Ruta (UIUC) CS473 9 Spring 2018 9 / 29

  22. Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = Ruta (UIUC) CS473 9 Spring 2018 9 / 29

  23. Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = 1 / x . Ruta (UIUC) CS473 9 Spring 2018 9 / 29

  24. Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = 1 / x . Pr[ A ∩ B ] = Ruta (UIUC) CS473 9 Spring 2018 9 / 29

  25. Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = 1 / x . Pr[ A ∩ B ] =Pr[ B ] = 1 / x . Why? Ruta (UIUC) CS473 9 Spring 2018 9 / 29

  26. Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = 1 / x . Pr[ A ∩ B ] =Pr[ B ] = 1 / x . Why? Because B ⊂ A . Ruta (UIUC) CS473 9 Spring 2018 9 / 29

  27. Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = 1 / x . Pr[ A ∩ B ] =Pr[ B ] = 1 / x . Why? Because B ⊂ A . Pr[ B | A ] = Ruta (UIUC) CS473 9 Spring 2018 9 / 29

  28. Sampling a Prime: Analysis Is the returned prime sampled uniformly at random ? π ( x ) : number of primes in { 1 , . . . , x } , Lemma For a fixed prime p ∗ ≤ x , Pr[ algorithm outputs p ∗ ] = 1 /π ( x ) . Proof. Event A : a prime is picked in a round. Pr[ A ] = π ( x ) / x . Event B : number (prime) p ∗ is picked. Pr[ B ] = 1 / x . Pr[ A ∩ B ] =Pr[ B ] = 1 / x . Why? Because B ⊂ A . Pr[ B | A ] = Pr[ A ∩ B ] = Pr[ B ] 1 / x 1 Pr[ A ] = π ( x ) / x = Pr[ A ] π ( x ) Ruta (UIUC) CS473 9 Spring 2018 9 / 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend