Limited independence and Hashing Lecture 04 January 24, 2019 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 Limited independence and Hashing Lecture 04 January 24, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 40

Pseudorandomness Randomized algorithms rely on independent random bits Psuedorandomness: when can we avoid or limit number of random bits? Motivated by fundamental theoretical questions and applications Applications: hashing, cryptography, streaming, simulations, derandomization, . . . A large topic in TCS with many connections to mathematics. This course: need t -wise independent variables and hashing Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 40

Part I t -wise independent random variables Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 40

Pairwise independent random variables Definition Random variables X 1 , X 2 , . . . , X n from a range B are independent if for all b 1 , b 2 , . . . , b n ∈ B n � Pr[ X 1 = b 1 , X 2 = b 2 , . . . , X n = b n ] = Pr[ X i = b i ] . i =1 Uniformly distributed if Pr[ X i = b ] = 1 / | B | for all i , b ∈ B . Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 40

Pairwise independent random variables Definition Random variables X 1 , X 2 , . . . , X n from a range B are independent if for all b 1 , b 2 , . . . , b n ∈ B n � Pr[ X 1 = b 1 , X 2 = b 2 , . . . , X n = b n ] = Pr[ X i = b i ] . i =1 Uniformly distributed if Pr[ X i = b ] = 1 / | B | for all i , b ∈ B . Definition Random variables X 1 , X 2 , . . . , X n from a range B are pairwise independent if for all 1 ≤ i < j ≤ n and for all b , b ′ ∈ B , Pr[ X i = b , X j = b ′ ] = Pr[ X i = b ] · Pr[ X j = b ′ ] . Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 40

Pairwise independent random variables Definition Random variables X 1 , X 2 , . . . , X n from a range B are pairwise independent if for all 1 ≤ i < j ≤ n and for all b , b ′ ∈ B , Pr[ X i = b , X j = b ′ ] = Pr[ X i = b ] · Pr[ X j = b ′ ] . If X 1 , X 2 , . . . , X n are independent than they are pairwise independent but converse is not necessarily true Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 40

Pairwise independent random variables Definition Random variables X 1 , X 2 , . . . , X n from a range B are pairwise independent if for all 1 ≤ i < j ≤ n and for all b , b ′ ∈ B , Pr[ X i = b , X j = b ′ ] = Pr[ X i = b ] · Pr[ X j = b ′ ] . If X 1 , X 2 , . . . , X n are independent than they are pairwise independent but converse is not necessarily true Example: X 1 , X 2 are independent bits (variables from { 0 , 1 } ) and X 3 = X 1 ⊕ X 2 . X 1 , X 2 , X 3 are pairwise independent but not independent. Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 40

Motivation for pairwise independence from streaming Want n uniformly distr random variables X 1 , X 2 , . . . , X n , say bits But cannot store n bits because n is too large. Achievable: storage of O (log n ) random bits given i where 1 ≤ i ≤ n can generate X i in O (log n ) time X 1 , X 2 , . . . , X n are pairwise independent and uniform Hence, with small storage, can generate n random variables “on the fly”. In several applications, pairwise independence (or generalizations) suffice Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 40

Generating pairwise independent bits Assume for simplicity n = 2 k − 1 (otherwise consider nearest power of 2 ). Hence k = O (log n ) Let Y 1 , Y 2 , . . . , Y k be independent bits For any S ⊂ { 1 , 2 , . . . , k } , S � = ∅ , define X S = ⊕ i ∈ S Y i 2 k − 1 random variables X S Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 40

Generating pairwise independent bits Assume for simplicity n = 2 k − 1 (otherwise consider nearest power of 2 ). Hence k = O (log n ) Let Y 1 , Y 2 , . . . , Y k be independent bits For any S ⊂ { 1 , 2 , . . . , k } , S � = ∅ , define X S = ⊕ i ∈ S Y i 2 k − 1 random variables X S Claim: If S � = T then X S and X T are independent Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 40

Pairwise independent variables with larger range Suppose we want n pairwise independent random variables in range { 0 , 1 , 2 , . . . , m − 1 } Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 40

Pairwise independent variables with larger range Suppose we want n pairwise independent random variables in range { 0 , 1 , 2 , . . . , m − 1 } Now each X i needs to be a log m bit string Use preceding construction for each bit independently Requires O (log m log n ) bits total Can in fact do O (log n + log m ) bits Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 40

Using prime numbers and fields Assume n = p and m − 1 = p where p is a prime number Want p pairwise random variables distributed uniformly in Z p = { 0 , 1 , 2 , . . . , p − 1 } Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 40

Using prime numbers and fields Assume n = p and m − 1 = p where p is a prime number Want p pairwise random variables distributed uniformly in Z p = { 0 , 1 , 2 , . . . , p − 1 } Choose a , b ∈ { 0 , 1 , 2 , . . . , p − 1 } uniformly and independently at random. Requires 2 ⌈ log p ⌉ random bits For 0 ≤ i ≤ p − 1 set X i = ai + b mod p Note that one needs to store only a , b , p and can generate X i efficiently on the fly Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 40

Using prime numbers and fields Assume n = p and m − 1 = p where p is a prime number Want p pairwise random variables distributed uniformly in Z p = { 0 , 1 , 2 , . . . , p − 1 } Choose a , b ∈ { 0 , 1 , 2 , . . . , p − 1 } uniformly and independently at random. Requires 2 ⌈ log p ⌉ random bits For 0 ≤ i ≤ p − 1 set X i = ai + b mod p Note that one needs to store only a , b , p and can generate X i efficiently on the fly Exercise: Prove that each X i is uniformly distributed in Z p . Claim: For i � = j , X i and X j are independent. Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 40

Using prime numbers and fields Claim: For i � = j , X i and X j are independent. Some math required: Z p is a field for any prime p . That is { 0 , 1 , 2 , . . . , p − 1 } forms a commutative group under addition mod p (easy). And more importantly { 1 , 2 , . . . , p − 1 } forms a commutative group under multiplication. Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 40

Some math required... Lemma (LemmaUnique) Let p be a prime number, x : an integer number in { 1 , . . . , p − 1 } . = ⇒ There exists a unique y s.t. xy = 1 mod p . In other words: For every element there is a unique inverse. ⇒ Z p = { 0 , 1 , . . . , p − 1 } when working modulo p is a field . = Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 40

Proof of LemmaUnique Claim Let p be a prime number. For any x , y , z ∈ { 1 , . . . , p − 1 } s.t. y � = z , we have that xy mod p � = xz mod p . Proof. Assume for the sake of contradiction xy mod p = xz mod p . Then x ( y − z ) = 0 mod p = ⇒ p divides x ( y − z ) = ⇒ p divides y − z = ⇒ y − z = 0 = ⇒ y = z . And that is a contradiction. Chandra (UIUC) CS498ABD 12 Spring 2019 12 / 40

Proof of LemmaUnique Lemma (LemmaUnique) Let p be a prime number, x : an integer number in { 1 , . . . , p − 1 } . = ⇒ There exists a unique y s.t. xy = 1 mod p . Proof. By the above claim if xy = 1 mod p and xz = 1 mod p then y = z . Hence uniqueness follows. Chandra (UIUC) CS498ABD 13 Spring 2019 13 / 40

Proof of LemmaUnique Lemma (LemmaUnique) Let p be a prime number, x : an integer number in { 1 , . . . , p − 1 } . = ⇒ There exists a unique y s.t. xy = 1 mod p . Proof. By the above claim if xy = 1 mod p and xz = 1 mod p then y = z . Hence uniqueness follows. Existence. For any x ∈ { 1 , . . . , p − 1 } we have that { x ∗ 1 mod p , x ∗ 2 mod p , . . . , x ∗ ( p − 1) mod p } = { 1 , 2 , . . . , p − 1 } . ⇒ There exists a number y ∈ { 1 , . . . , p − 1 } such that = xy = 1 mod p . Chandra (UIUC) CS498ABD 13 Spring 2019 13 / 40

Proof of pairwise independence Lemma If x � = y then for each ( r , s ) ∈ Z p × Z p there is exactly one pair ( a , b ) ∈ Z p × Z p such that ax + b mod p = r and ay + b mod p = s . Proof. Solve the two equations: ax + b = r mod p and ay + b = s mod p We get a = r − s mod p and b = r − ax mod p . x − y One-to-one correspondence between ( a , b ) and ( r , s ) Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 40

Proof of pairwise independence Lemma If x � = y then for each ( r , s ) ∈ Z p × Z p there is exactly one pair ( a , b ) ∈ Z p × Z p such that ax + b mod p = r and ay + b mod p = s . Proof. Solve the two equations: ax + b = r mod p and ay + b = s mod p We get a = r − s mod p and b = r − ax mod p . x − y One-to-one correspondence between ( a , b ) and ( r , s ) ⇒ if ( a , b ) is uniformly at random from Z p then ( r , s ) is uniformly at random from Z p Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 40

Pairwise Independence and Chebyshev’s Inequality Chebyshev’s Inequality For a ≥ 0 , Pr[ | X − E[ X ] | ≥ a ] ≤ Var ( X ) equivalently for any a 2 1 � t > 0 , Pr[ | X − E[ X ] | ≥ t σ X ] ≤ t 2 where σ X = Var ( X ) is the standard deviation of X . Suppose X = X 1 + X 2 + . . . + X n . If X 1 , X 2 , . . . , X n are independent then Var ( X ) = � i Var ( X i ) . Recall application to random walk on line Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 40

Limited independence and Hashing Lecture 04 January 24, 2019 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 Limited independence and Hashing Lecture 04 January 24, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 40 Pseudorandomness Randomized algorithms rely on independent random bits

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Database Systems Index: Hashing Based on slides by Feifei Li, University of Utah Hashing n

Hashing (Application of Probability) Ashwinee Panda Final CS 70 Lecture! 9 Aug 2018 Overview

Union-Find [10] In the last class Hashing Collision Handling for Hashing Closed

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

Hashing Chapter 5 1 Objectives Understand the idea of hashing Compare hashing to sorting

Linear probing with constant independence Anna Pagh, Rasmus Pagh, and Milan Ru i IT

Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files

Lecture 8: Hashing I Lecture Overview Dictionaries and Python Motivation Prehashing

Advanced Algorithms COMS31900 Hashing part two Static Perfect Hashing Rapha el Clifford

Hashing Hashing What is it? A form of narcotic intake? A side order for your eggs? A

CS70: Jean Walrand: Lecture 23. Bayes Rule, Independence, Mutual Independence 1. Conditional

Limited independence and Hashing 05 -06 Lecture 06/07 September 8 and 10, 2020 Chandra (UIUC)

Random Variables # faulty pixels in monitor Independence # alpha particles in a second

Lecture 19: More Than Two Random Variables 0/ 17 Definition If X 1 , X 2 , . . . , X n are

Foundations of Computing II Lecture 9: Pairwise-Independent Hashing Stefano Tessaro

Chapter X: Graph Mining Information Retrieval & Data Mining Universitt des Saarlandes,

The story of the film so far... With every experiment we associate a probability space ( , F ,

5. Scaling up November 1, 2019 Slides by Marta Arias, Jos Luis Balczar, Ramon

HMMs for Pairwise Sequence Alignment based on Ch. 4 from Biological Sequence Analysis by R.

15-853:Algorithms in the Real World Announcements: HW2 due tomorrow noon. Small correction

Limited independence and Hashing Lecture 04 January 24, 2019 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 Limited independence and Hashing Lecture 04 January 24, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 40 Pseudorandomness Randomized algorithms rely on independent random bits

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Database Systems Index: Hashing Based on slides by Feifei Li, University of Utah Hashing n

Hashing (Application of Probability) Ashwinee Panda Final CS 70 Lecture! 9 Aug 2018 Overview

Union-Find [10] In the last class Hashing Collision Handling for Hashing Closed

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

Hashing Chapter 5 1 Objectives Understand the idea of hashing Compare hashing to sorting

Linear probing with constant independence Anna Pagh, Rasmus Pagh, and Milan Ru i IT

Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files

Lecture 8: Hashing I Lecture Overview Dictionaries and Python Motivation Prehashing

Advanced Algorithms COMS31900 Hashing part two Static Perfect Hashing Rapha el Clifford

Hashing Hashing What is it? A form of narcotic intake? A side order for your eggs? A

CS70: Jean Walrand: Lecture 23. Bayes Rule, Independence, Mutual Independence 1. Conditional

Limited independence and Hashing 05 -06 Lecture 06/07 September 8 and 10, 2020 Chandra (UIUC)

Random Variables # faulty pixels in monitor Independence # alpha particles in a second

Lecture 19: More Than Two Random Variables 0/ 17 Definition If X 1 , X 2 , . . . , X n are

Foundations of Computing II Lecture 9: Pairwise-Independent Hashing Stefano Tessaro

Chapter X: Graph Mining Information Retrieval &amp; Data Mining Universitt des Saarlandes,

The story of the film so far... With every experiment we associate a probability space ( , F ,

5. Scaling up November 1, 2019 Slides by Marta Arias, Jos Luis Balczar, Ramon

HMMs for Pairwise Sequence Alignment based on Ch. 4 from Biological Sequence Analysis by R.

15-853:Algorithms in the Real World Announcements: HW2 due tomorrow noon. Small correction

Chapter X: Graph Mining Information Retrieval & Data Mining Universitt des Saarlandes,