 
              Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong University Oct. 25, 2019 Algorithms for Big Data (VI) 1/13
Review . Algorithms for Big Data (VI) . Output ; , On input from a -universal family; Pick log We learnt AMS algorithm to estimate f log costs An ad-hoc algorithm for f bits. log log using for 2/13
Review . Algorithms for Big Data (VI) . Output ; , On input from a -universal family; Pick log log costs An ad-hoc algorithm for f bits. 2/13 ( ) kn 1 − 1/ k ( log m + log n ) We learnt AMS algorithm to estimate ∥ f ∥ k k for k ≥ 2 using O
Review from a -universal family; Algorithms for Big Data (VI) . Output ; , On input Pick 2/13 bits. ( ) kn 1 − 1/ k ( log m + log n ) We learnt AMS algorithm to estimate ∥ f ∥ k k for k ≥ 2 using O An ad-hoc algorithm for ∥ f ∥ 2 2 costs O ( log m + log n ) .
bits. Review Algorithms for Big Data (VI) 2/13 ( ) kn 1 − 1/ k ( log m + log n ) We learnt AMS algorithm to estimate ∥ f ∥ k k for k ≥ 2 using O An ad-hoc algorithm for ∥ f ∥ 2 2 costs O ( log m + log n ) . ▶ Pick h : [ n ] → {− 1 , 1 } from a 4 -universal family; ▶ On input ( j , ∆) , x ← x + ∆ · h ( j ) ; ▶ Output x 2 .
x is close to that of f ! An Algebraic View f , we know that E Algorithms for Big Data (VI) The 2-norm of the vector . x . Our algorithm outputs Let x It is instructive to view the Tug-of-War algorithm from linear algebra. . where Consider the matrix . function times (to apply the averaging trick), each time with Assume that we run the algorithm 3/13
x is close to that of f ! An Algebraic View f , we know that E Algorithms for Big Data (VI) The 2-norm of the vector . x . Our algorithm outputs Let x It is instructive to view the Tug-of-War algorithm from linear algebra. . where Consider the matrix . function times (to apply the averaging trick), each time with Assume that we run the algorithm 3/13
x is close to that of f ! An Algebraic View It is instructive to view the Tug-of-War algorithm from linear algebra. Consider the matrix where . Let x f , we know that E . Our algorithm outputs x . The 2-norm of the vector Algorithms for Big Data (VI) 3/13 Assume that we run the algorithm k times (to apply the averaging trick), each time with function h i .
x is close to that of f ! An Algebraic View It is instructive to view the Tug-of-War algorithm from linear algebra. Let x f , we know that E . Our algorithm outputs x . The 2-norm of the vector Algorithms for Big Data (VI) 3/13 Assume that we run the algorithm k times (to apply the averaging trick), each time with function h i . Consider the matrix A = ( a ij ) i ∈ [ k ] , j ∈ [ n ] where a ij = h i ( j ) .
x is close to that of f ! An Algebraic View It is instructive to view the Tug-of-War algorithm from linear algebra. Algorithms for Big Data (VI) The 2-norm of the vector 3/13 Assume that we run the algorithm k times (to apply the averaging trick), each time with function h i . Consider the matrix A = ( a ij ) i ∈ [ k ] , j ∈ [ n ] where a ij = h i ( j ) . ∑ k i =1 x 2 ∥ x ∥ 2 x 2 = ∥ f ∥ 2 [ ] Let x = A f , we know that E i = 2 i 2 . Our algorithm outputs k k .
An Algebraic View It is instructive to view the Tug-of-War algorithm from linear algebra. Algorithms for Big Data (VI) x The 2-norm of the vector 3/13 Assume that we run the algorithm k times (to apply the averaging trick), each time with function h i . Consider the matrix A = ( a ij ) i ∈ [ k ] , j ∈ [ n ] where a ij = h i ( j ) . ∑ k i =1 x 2 ∥ x ∥ 2 x 2 = ∥ f ∥ 2 [ ] Let x = A f , we know that E i = 2 i 2 . Our algorithm outputs k k . √ k is close to that of f !
This operation is ofuen referred as dimension reduction or metric embedding. Dimension Reduction Suppose , what the matrix does is to map a vector in to a vector in without changing its norm much. The algorithm we met is similar to one important dimension reduction technique - Johnson-Lindenstrauss transformation. Algorithms for Big Data (VI) 4/13
Dimension Reduction without changing its norm much. The algorithm we met is similar to one important dimension reduction technique - Johnson-Lindenstrauss transformation. Algorithms for Big Data (VI) 4/13 Suppose k ≪ n , what the matrix A does is to map a vector in R n to a vector in R k This operation is ofuen referred as dimension reduction or metric embedding.
Dimension Reduction without changing its norm much. The algorithm we met is similar to one important dimension reduction technique - Johnson-Lindenstrauss transformation. Algorithms for Big Data (VI) 4/13 Suppose k ≪ n , what the matrix A does is to map a vector in R n to a vector in R k This operation is ofuen referred as dimension reduction or metric embedding.
Johnson-Lindenstrauss transformation x Algorithms for Big Data (VI) independently. by drawing each of its entry from We construct y x y x y x y Theorem satisfying log where exists an matrix . There points , consider a set of and any positive integer For any 5/13
Johnson-Lindenstrauss transformation Theorem We construct by drawing each of its entry from independently. Algorithms for Big Data (VI) 5/13 For any 0 < ε < 1 2 and any positive integer m , consider a set of m points S ⊆ R n . There exists an matrix A ∈ R k × n where k = O ( ε − 2 log m ) satisfying (1 − ε ) ∥ x − y ∥ ≤ ∥ A x − A y ∥ ≤ (1 + ε ) ∥ x − y ∥ . ∀ x , y ∈ S ,
Johnson-Lindenstrauss transformation Theorem Algorithms for Big Data (VI) 5/13 For any 0 < ε < 1 2 and any positive integer m , consider a set of m points S ⊆ R n . There exists an matrix A ∈ R k × n where k = O ( ε − 2 log m ) satisfying (1 − ε ) ∥ x − y ∥ ≤ ∥ A x − A y ∥ ≤ (1 + ε ) ∥ x − y ∥ . ∀ x , y ∈ S , We construct A by drawing each of its entry from N (0 , 1 k ) independently.
Gaussian Distribution Recall the density function of a variable is The distribution function is d Assume and , then Algorithms for Big Data (VI) 6/13
The distribution function is Gaussian Distribution d Assume and , then Algorithms for Big Data (VI) 6/13 Recall the density function of a variable X ∼ N ( µ , σ 2 ) is e − ( x − µ )2 1 2 σ 2 . f X ( x ) = √ 2 πσ
Gaussian Distribution The distribution function is Algorithms for Big Data (VI) , then and Assume 6/13 Recall the density function of a variable X ∼ N ( µ , σ 2 ) is e − ( x − µ )2 1 2 σ 2 . f X ( x ) = √ 2 πσ ∫ x 1 e − ( t − µ )2 F X ( x ) = 2 σ 2 d t . √ 2 πσ −∞
Gaussian Distribution The distribution function is Algorithms for Big Data (VI) 6/13 Recall the density function of a variable X ∼ N ( µ , σ 2 ) is e − ( x − µ )2 1 2 σ 2 . f X ( x ) = √ 2 πσ ∫ x 1 e − ( t − µ )2 F X ( x ) = 2 σ 2 d t . √ 2 πσ −∞ Assume X 1 ∼ N ( µ 1 , σ 2 1 ) and X 2 ∼ N ( µ 2 , σ 2 2 ) , then aX 1 + bX 2 ∼ N ( a µ 1 + b µ 2 , a 2 σ 2 1 + b 2 σ 2 2 ) .
We need a concentration inequality for squared sum of Gaussians: Proof of JL The statement is equivalent to x y x y We only need to show that for every unit length vector f , Pr f Assume x f , then . Pr Algorithms for Big Data (VI) 7/13
We need a concentration inequality for squared sum of Gaussians: Proof of JL The statement is equivalent to We only need to show that for every unit length vector f , Pr f Assume x f , then . Pr Algorithms for Big Data (VI) 7/13 1 − ε ≤ ∥ A ( x − y ) ∥ ≤ 1 + ε . ∥ x − y ∥
We need a concentration inequality for squared sum of Gaussians: Proof of JL The statement is equivalent to We only need to show that for every unit length vector f , Assume x f , then . Pr Algorithms for Big Data (VI) 7/13 1 − ε ≤ ∥ A ( x − y ) ∥ ≤ 1 + ε . ∥ x − y ∥ Pr [ |∥ A f ∥ − 1 | > ε ] ≤ 1 − δ .
We need a concentration inequality for squared sum of Gaussians: Proof of JL The statement is equivalent to We only need to show that for every unit length vector f , Pr Algorithms for Big Data (VI) 7/13 1 − ε ≤ ∥ A ( x − y ) ∥ ≤ 1 + ε . ∥ x − y ∥ Pr [ |∥ A f ∥ − 1 | > ε ] ≤ 1 − δ . j ∈ [ n ] a ij · f j ∼ N (0 , 1 Assume x = A f , then x i = ∑ k ) .
Proof of JL Pr Algorithms for Big Data (VI) The statement is equivalent to 7/13 We only need to show that for every unit length vector f , 1 − ε ≤ ∥ A ( x − y ) ∥ ≤ 1 + ε . ∥ x − y ∥ Pr [ |∥ A f ∥ − 1 | > ε ] ≤ 1 − δ . j ∈ [ n ] a ij · f j ∼ N (0 , 1 Assume x = A f , then x i = ∑ k ) . We need a concentration inequality for squared sum of Gaussians: [� � ] k � � ∑ x 2 i − 1 ≤ 1 − δ . � ≥ ε � � � � � i =1
Concentration Theorem Assume be i.i.d , then for , Pr The proof is similar to the proof of the Chernofg bound we met before. Algorithms for Big Data (VI) 8/13
Concentration Theorem Algorithms for Big Data (VI) The proof is similar to the proof of the Chernofg bound we met before. 8/13 Pr Assume X 1 , X 2 , . . . , X k be i.i.d N (0 , 1) , then for 0 < ε < 1 , [� � k ] < 2 e − ε 2 k � � ∑ X 2 i − k � ≥ ε k � � 8 . � � � i =1
Recommend
More recommend