randomness
play

Randomness DS-GA 1013 / MATH-GA 2824 Optimization-based Data - PowerPoint PPT Presentation

Randomness DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html Carlos Fernandez-Granda Gaussian random variables Gaussian random vectors Randomized projections SVD of a


  1. Chebyshev bound x || 2 Let y := || � 2 , � ( y − E ( y )) 2 ≥ k 2 ǫ 2 � P ( | y − k | ≥ k ǫ ) = P

  2. Chebyshev bound x || 2 Let y := || � 2 , � ( y − E ( y )) 2 ≥ k 2 ǫ 2 � P ( | y − k | ≥ k ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality k 2 ǫ 2

  3. Chebyshev bound x || 2 Let y := || � 2 , � ( y − E ( y )) 2 ≥ k 2 ǫ 2 � P ( | y − k | ≥ k ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality k 2 ǫ 2 = Var ( y ) k 2 ǫ 2

  4. Chebyshev bound x || 2 Let y := || � 2 , � ( y − E ( y )) 2 ≥ k 2 ǫ 2 � P ( | y − k | ≥ k ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality k 2 ǫ 2 = Var ( y ) k 2 ǫ 2 2 = k ǫ 2

  5. Non-asymptotic Chernoff tail bound Let � x be an iid standard Gaussian random vector of dimension k For any ǫ > 0 � � � � − k ǫ 2 x || 2 k ( 1 − ǫ ) < || � 2 < k ( 1 + ǫ ) ≥ 1 − 2 exp P 8

  6. Proof x || 2 Let y := || � 2 . The result is implied by � � − k ǫ 2 P ( y > k ( 1 + ǫ )) ≤ exp 8 � � − k ǫ 2 P ( y < k ( 1 − ǫ )) ≤ exp 8

  7. Proof Fix t > 0 P ( y > a )

  8. Proof Fix t > 0 P ( y > a ) = P (exp ( t y ) > exp ( at ))

  9. Proof Fix t > 0 P ( y > a ) = P (exp ( t y ) > exp ( at )) ≤ exp ( − at ) E (exp ( t y )) by Markov’s inequality

  10. Proof Fix t > 0 P ( y > a ) = P (exp ( t y ) > exp ( at )) ≤ exp ( − at ) E (exp ( t y )) by Markov’s inequality � � k �� � 2 ≤ exp ( − at ) E exp t x i i = 1

  11. Proof Fix t > 0 P ( y > a ) = P (exp ( t y ) > exp ( at )) ≤ exp ( − at ) E (exp ( t y )) by Markov’s inequality � � k �� � 2 ≤ exp ( − at ) E exp t x i i = 1 k � � � 2 �� ≤ exp ( − at ) E exp t x i by independence of x 1 , . . . , x k i = 1

  12. Proof Lemma (by direct integration) � � t x 2 �� 1 exp = √ 1 − 2 t E Equivalent to controlling higher-order moments since � ∞ � � t x 2 � i � � � t x 2 �� exp = E E i ! i = 0 � t i � x 2 i �� ∞ � E = . i ! i = 0

  13. Proof Fix t > 0 k � � � 2 �� P ( y > a ) ≤ exp ( − at ) exp t x i E i = 1 = exp ( − at ) k ( 1 − 2 t ) 2

  14. Proof Setting a := k ( 1 + ǫ ) and t := 1 1 2 − 2 ( 1 + ǫ ) , we conclude � � − k ǫ P ( y > k ( 1 + ǫ )) ≤ ( 1 + ǫ ) k 2 exp 2 � � − k ǫ 2 ≤ exp 8

  15. Projection onto a fixed subspace P S 1 � P S 2 � z z 0 . 007 = ||P S 1 � z || 2 < ||P S 2 � z || 2 = 0 . 043 || � x || 2 || � x || 2 � 0 . 043 dim ( S 2 ) 0 . 007 = 6 . 14 ≈ (not a coincidence) dim ( S 1 )

  16. Projection onto a fixed subspace Let S be a k -dimensional subspace of R n and � z ∈ R n a vector of iid standard Gaussian noise 2 is a χ 2 random variable with k degrees of freedom z || 2 ||P S � It has the same distribution as k � 2 y := x i i = 1 where x 1 , . . . , x k are iid standard Gaussians.

  17. Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: z || 2 ||P S � 2

  18. Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2

  19. Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z

  20. Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z

  21. Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z w T � = � w

  22. Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z w T � = � w k � w [ i ] 2 = � i = 1

  23. Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z w T � = � w k � w [ i ] 2 = � i = 1 w := U T � � z is Gaussian with mean zero and covariance matrix w = U T Σ � Σ � z U

  24. Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z w T � = � w k � w [ i ] 2 = � i = 1 w := U T � � z is Gaussian with mean zero and covariance matrix w = U T Σ � Σ � z U = U T U = I

  25. Non-asymptotic Chernoff tail bound Let � x be an iid standard Gaussian random vector of dimension k For any ǫ > 0 � � � � − k ǫ 2 x || 2 k ( 1 − ǫ ) < || � 2 < k ( 1 + ǫ ) ≥ 1 − 2 exp P 8

  26. Projection onto a fixed subspace Let S be a k -dimensional subspace of R n and � z ∈ R n a vector of iid standard Gaussian noise For any ǫ > 0 � � − k ǫ 2 P ( k ( 1 − ǫ ) < ||P S � z || 2 < k ( 1 + ǫ )) ≥ 1 − 2 exp 8

  27. Gaussian random variables Gaussian random vectors Randomized projections SVD of a random matrix Randomized SVD

  28. Dimensionality reduction ◮ PCA preserves the most energy ( ℓ 2 norm) ◮ Problem 1: Computationally expensive ◮ Problem 2: Depends on all of the data ◮ (Possible) Solution: Just project randomly! x 2 , . . . ∈ R m compute A � x 2 , . . . ∈ R m where ◮ For a data set � x 1 , � x 1 , A � A ∈ R k × n ( k < n ) has iid standard Gaussian entries

  29. Fixed vector Let A be a a × b matrix with iid standard Gaussian entries v ∈ R b is a deterministic vector with unit ℓ 2 norm, then A � If � v is an a -dimensional iid standard Gaussian vector Proof:

  30. Fixed vector Let A be a a × b matrix with iid standard Gaussian entries v ∈ R b is a deterministic vector with unit ℓ 2 norm, then A � If � v is an a -dimensional iid standard Gaussian vector Proof: ( A � v ) [ i ] , 1 ≤ i ≤ a is Gaussian with mean zero and variance � � A T v T Σ A i , : � i , : � = � Var v v v T I � = � v v || 2 = || � 2 = 1

  31. Non-asymptotic Chernoff tail bound Let � x be an iid standard Gaussian random vector of dimension k For any ǫ > 0 � � � � − k ǫ 2 x || 2 k ( 1 − ǫ ) < || � 2 < k ( 1 + ǫ ) ≥ 1 − 2 exp P 8

  32. Fixed vector Let A be a a × b matrix with iid standard Gaussian entries v ∈ R p with unit norm and any ǫ ∈ ( 0 , 1 ) For any � � � a ( 1 − ǫ ) ≤ || A � v || 2 ≤ a ( 1 + ǫ ) � � − a ǫ 2 / 8 with probability at least 1 − 2 exp

  33. Johnson-Lindenstrauss lemma Let A be a k × n matrix with iid standard Gaussian entries x p ∈ R n be any fixed set of p deterministic vectors Let � x 1 , . . . , � For any pair � x i , � x j and any ǫ ∈ ( 0 , 1 ) � � � � 2 � � � � 1 x i − 1 x j || 2 � � � � x j || 2 √ √ ( 1 − ǫ ) || � x i − � 2 ≤ A � A � x j ≤ ( 1 + ǫ ) || � x i − � � � � � 2 k k 2 with probability at least 1 p as long as k ≥ 16 log ( p ) ǫ 2

  34. Proof Aim: Control action of A the normalized differences x i − � � x j � v ij := || � x i − � x j || 2 Our event of interest is the intersection of the events � � v ij || 2 E ij = k ( 1 − ǫ ) < || A � 2 < k ( 1 + ǫ ) 1 ≤ i < p , i < j ≤ p

  35. Fixed vector Let A be a a × b matrix with iid standard Gaussian entries v ∈ R b with unit norm and any ǫ ∈ ( 0 , 1 ) For any � � � a ( 1 − ǫ ) ≤ || A � v || 2 ≤ a ( 1 + ǫ ) � � − a ǫ 2 / 8 with probability at least 1 − 2 exp This implies � � ≤ 2 if k ≥ 16 log ( p ) E c P ij p 2 ǫ 2

  36. Union bound For any events S 1 , S 2 , . . . , S n in a probability space n � P ( ∪ i S i ) ≤ P ( S i ) . i = 1

  37. Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound   �  E ij P i , j

  38. Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound     � �  = 1 − P E c  E ij P ij i , j i , j

  39. Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound     � �  = 1 − P E c  E ij P ij i , j i , j � � � E c ≥ 1 − P ij i , j

  40. Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound     � �  = 1 − P E c  E ij P ij i , j i , j � � � E c ≥ 1 − P ij i , j ≥ 1 − p ( p − 1 ) 2 p 2 2

  41. Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound     � �  = 1 − P E c  E ij P ij i , j i , j � � � E c ≥ 1 − P ij i , j ≥ 1 − p ( p − 1 ) 2 p 2 2 ≥ 1 p

  42. Dimensionality reduction for visualization Motivation: Visualize high-dimensional features projected onto 2D or 3D Example: Seeds from three different varieties of wheat: Kama, Rosa and Canadian Features: ◮ Area ◮ Perimeter ◮ Compactness ◮ Length of kernel ◮ Width of kernel ◮ Asymmetry coefficient ◮ Length of kernel groove

  43. Dimensionality reduction for visualization Randomized projection PCA

  44. Nearest neighbors in random subspace Nearest neighbors classification (Algorithm 4.2 in Lecture Notes 1) computes n distances in R m for each new example Cost: O ( nmp ) for p examples Idea: Use a k × m iid standard Gaussian matrix to project onto k -dimensional space beforehand Cost: ◮ kmn operations to project training set ◮ kmp operations to project test set ◮ knp to perform nearest-neighbor classification Much faster!

  45. Face recognition Training set: 360 64 × 64 images from 40 different subjects (9 each) Test set: 1 new image from each subject We model each image as a vector in R 4096 ( m = 4096) To classify we: 1. Project onto random a k -dimensional subspace 2. Apply nearest-neighbor classification using the ℓ 2 -norm distance in R k

  46. Performance Average 40 Maximum 35 Minimum 30 30 25 Errors 20 15 10 10 5 0 0 20 40 60 80 100 120 140 160 180 200 Dimension

  47. Nearest neighbor in R 50 Test image Projection Closest projection Corresponding image

  48. Gaussian random variables Gaussian random vectors Randomized projections SVD of a random matrix Randomized SVD

  49. Singular values of n × k matrix, k = 100 n / k 2 1 . 5 5 10 20 50 1 √ n σ i 100 200 0 . 5 0 20 40 60 80 100 i

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend