Random Projections Instructor: Sham Kakade 1 The - PDF document

CSE 547/Stat 548: Machine Learning for Big Data Lecture Random Projections Instructor: Sham Kakade 1 The Johnson-Lindenstrauss lemma Theorem 1.1. (Johnson-Lindenstrauss) Let ǫ ∈ (0 , 1 / 2) . Let Q ⊂ R d be a set of n points and k = 20 log n . There ǫ 2 exists a Lipshcitz mapping f : R d → R k such that for all u, v ∈ Q : (1 − ǫ ) � u − v � 2 ≤ � f ( u ) − f ( v ) � 2 ≤ (1 + ǫ ) � u − v � 2 To prove JL, we appeal to the following lemma: Theorem 1.2. (Norm preservation) Let x ∈ R d . Assume that the entries in A ⊂ R k × d are sampled independently from N (0 , 1) . Then, Pr((1 − ǫ ) � x � 2 ≤ � 1 Ax � 2 ≤ (1 + ǫ ) � x � 2 ) ≥ 1 − 2 e − ( ǫ 2 − ǫ 3 ) k/ 4 √ k The proof of the JL just appeals to the union bound: Proof. The proof is constructive and is an example of the probabilistic method . Choose an f which is a random 1 projection . Let f = k Ax where A is a k × d matrix, where each entry is sampled i.i.d from a Gaussian N (0 , 1) . √ Note there are O ( n 2 ) pairs of u, v ∈ Q . By the union bound, Pr( ∃ u, v s.t. the following event fails: (1 − ǫ ) � u − v � 2 ≤ � 1 A ( u − v ) � 2 ≤ (1 + ǫ ) � u − v � 2 ) √ k Pr( s.t. the following event fails: (1 − ǫ ) � u − v � 2 ≤ � 1 A ( u − v ) � 2 ≤ (1 + ǫ ) � u − v � 2 ) � ≤ √ k u,v ∈ Q 2 n 2 e − ( ǫ 2 − ǫ 3 ) k/ 4 ≤ < 1 the last step follows if we choose k = 20 ǫ 2 log n ) . Note that that the probability of finding a map f which satisfies the desired conditions is strictly greater than 0 , so such a map must exist. (Aside: this proof technique is known as ‘the probabilistic method’ — note that the theorem is a deterministic statement while the proof is via a probabilistic argument.) Now let us prove the norm preservation lemma: Proof. First let us show that for any x ∈ R d , we have that: E [ � 1 Ax � 2 ] = E [ � x � 2 ] . √ k 1

To see this, let us examine the expected value of the entry [ Ax ] 2 j d E [[ Ax ] 2 � A i,j x i ) 2 ] j ] = E [( i =1 � = E [ A i,j A i ′ ,j x i ′ x i ] i,i ′ � A 2 i,i x 2 = E [ i ] i � x 2 = i i � x � 2 = and note that: k � 1 Ax � 2 = 1 � [ Ax ] 2 √ j k k j =1 which proves the first claim (note that all we require for this proof is independence and unit variance in constructing A ). Note that above shows that ˜ Z j = [ Ax ] j / � x � is distributed as N (0 , 1) , and ˜ Z j are independent. We now bound the failure probability of one side. By the union bound, k Pr( � 1 Ax � 2 > (1 + ǫ ) � x � 2 ) � Z 2 ˜ √ = Pr( i > (1 + ǫ ) k ) k i =1 n 2 Pr( χ 2 = k > (1 + ǫ ) k ) (where χ 2 k is the chi-squared distribution with k degrees of freedom). Now we appeal to a concentration result below, which bounds this probability by: exp( − k 4( ǫ 2 − ǫ 3 )) ≤ A similar argument handles the other side (and the factor of 2 in the bound). The following lemma for χ 2 - distributions was used in the above proof. Lemma 1.3. We have that: k ≥ (1 + ǫ ) k ) ≤ exp( − k 4( ǫ 2 − ǫ 3 )) Pr( χ 2 k ≤ (1 − ǫ ) k ) ≤ exp( − k 4( ǫ 2 − ǫ 3 )) Pr( χ 2 2

Proof. Let Z 1 , Z 2 , . . . Z k be i.i.d. N (0 , 1) random variables. By Markov’s inequality, k � Pr( χ 2 Z 2 k ≥ (1 + ǫ ) k ) = Pr( i > (1 + ǫ ) k ) i =1 Pr( e λ � k i =1 Z 2 i > e (1+ ǫ ) kλ ) = E [ e λ � k i =1 Z 2 i ] ≤ e (1+ ǫ ) kλ ( E [ e λZ 2 1 ]) k = e (1+ ǫ ) kλ � k/ 2 � 1 e − (1+ ǫ ) kλ = 1 − 2 λ where the last step follows from evaluating the expectation, which holds for 0 < λ ≤ 1 / 2 (this expectation is just the ǫ moment generating function). Choosing λ = 2(1+ ǫ ) which minimizes the above expression (and is less than 1 / 2 as required), we have: Pr( χ 2 ((1 + ǫ ) e − ǫ ) k 2 ) k ≥ (1 + ǫ ) k ) = exp( − k 4( ǫ 2 − ǫ 3 )) ≤ using the upper bound 1 + ǫ ≤ exp( ǫ − ( ǫ 2 − ǫ 3 ) / 2) . The other bound is proved in a similar manner. The following lemma shows that nothing is fundamental about using Gaussian in particular. Many distributions with unit variance and certain boundedness properties (or higher order moment conditions) suffice. Lemma 1.4. Assume for A ∈ R × k × d that each A i , j is uniform on {− 1 , 1 } . Then for any vector x ∈ R d : Pr( � 1 Ax � 2 ≥ (1 + ǫ ) � x � 2 ) ≤ exp( − k 4( ǫ 2 − ǫ 3 )) √ k Pr( � 1 Ax � 2 ≤ (1 − ǫ ) � x � 2 ) ≤ exp( − k 4( ǫ 2 − ǫ 3 )) √ k 3

Random Projections Instructor: Sham Kakade 1 The - PDF document

CSE 547/Stat 548: Machine Learning for Big Data Lecture Random Projections Instructor: Sham Kakade 1 The Johnson-Lindenstrauss lemma Theorem 1.1. (Johnson-Lindenstrauss) Let (0 , 1 / 2) . Let Q R d be a set of n points and k = 20 log

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Discrete Random Variables October 7, 2010 Discrete Random Variables Random Variables In many

Chapter 9 Object recognition Random Forests 9.9 Random forests 2 9.9 Random forests

Back to Random Walks on Graphs Random walk on a graph: Stationary distribution: Back to Random

Probability and Random Processes Lecture 10 Random processes Kolmogorovs extension

continuous random variables continuous random variables Discrete random variable: takes values in

Uncertainty in Eddy Sources of Random Error Random Errors: . . . Covariance Measurements:

Simulation Random numbers Random numbers Anyone who considers arithmetic methods of

Projections of random fractals and measures and Liouville quantum gravity Kenneth Falconer

Random Projections and Dimension Reduction Rishi Advani 1 Madison Crim 2 Sean OHagan 3 1 Cornell

Cashflow Projections October 2015 Merced County had no Cashflow projections. Merced County

CLSD Finance Presentation Long-range projections review January 10, 2011 Long-range projections

b What are household projections and why are they important? Household projections are the

Household Analysis Review Group 12 April 2011 Incorporating Survey Data in Household Projections

EXO-200 Mike Jewell Stanford University NorCal HEP-EXchange December 2 nd , 2017 Neutrinoless

Data Mining in Aeronautics, Science, and Exploration Systems 2007 Conference June 26-27, 2007

The Total Curvature and Betti Numbers of Complex Projective Manifolds Convex, Discrete and

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

Tagger Comparison (Gao, Johnson) John Wieting CS 598 Unsupervised POS tagging Predict the

Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of Linguistics and Cognitive

Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of

Strong Gravitational Lensing and ML: generative models for galaxies Adam Coogan Dark Machines