randomization
play

Randomization DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data - PowerPoint PPT Presentation

Randomization DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science https://cims.nyu.edu/~cfgranda/pages/MTDS_spring19/index.html Carlos Fernandez-Granda Motivating applications Gaussian random variables Randomized dimensionality


  1. Variance � d  � 2  �� � 2 � x || 2 � x [ i ] 2 || � = E � E 2   i = 1   d d � � x [ i ] 2 � x [ j ] 2 � = E   i = 1 j = 1 d d � � x [ i ] 2 � x [ j ] 2 � � � = E i = 1 j = 1

  2. Variance � d  � 2  �� � 2 � x || 2 � x [ i ] 2 || � = E � E 2   i = 1   d d � � x [ i ] 2 � x [ j ] 2 � = E   i = 1 j = 1 d d � � x [ i ] 2 � x [ j ] 2 � � � = E i = 1 j = 1 d − 1 d d � � � x [ i ] 4 � x [ i ] 2 � x [ j ] 2 � � � � � � � = + 2 E E E i = 1 i = 1 j = i + 1

  3. Variance � d  � 2  �� � 2 � x || 2 � x [ i ] 2 || � = E � E 2   i = 1   d d � � x [ i ] 2 � x [ j ] 2 � = E   i = 1 j = 1 d d � � x [ i ] 2 � x [ j ] 2 � � � = E i = 1 j = 1 d − 1 d d � � � x [ i ] 4 � x [ i ] 2 � x [ j ] 2 � � � � � � � = + 2 E E E i = 1 i = 1 j = i + 1 = 3 d + d ( d − 1 ) 4th moment of standard Gaussian equals 3

  4. Variance � d  � 2  �� � 2 � x || 2 � x [ i ] 2 || � = E � E 2   i = 1   d d � � x [ i ] 2 � x [ j ] 2 � = E   i = 1 j = 1 d d � � x [ i ] 2 � x [ j ] 2 � � � = E i = 1 j = 1 d − 1 d d � � � x [ i ] 4 � x [ i ] 2 � x [ j ] 2 � � � � � � � = + 2 E E E i = 1 i = 1 j = i + 1 = 3 d + d ( d − 1 ) 4th moment of standard Gaussian equals 3 = d ( d + 2 )

  5. Variance �� � 2 � � 2 � � � x || 2 x || 2 x || 2 || � || � − E || � Var = E 2 2 2 = d ( d + 2 ) − d 2 = 2 d � Relative standard deviation around mean scales as 2 / d Geometrically, probability density concentrates close to surface of a √ sphere with radius d

  6. Non-asymptotic tail bound Let � x be an iid standard Gaussian random vector of dimension d For any ǫ > 0 2 � � x || 2 P d ( 1 − ǫ ) < || � 2 < d ( 1 + ǫ ) ≥ 1 − d ǫ 2

  7. Markov’s inequality Let x be a nonnegative random variable For any positive constant a > 0, P ( x ≥ a ) ≤ E ( x ) a

  8. Proof Define the indicator variable 1 x ≥ a x − a 1 x ≥ a ≥ 0

  9. Proof Define the indicator variable 1 x ≥ a x − a 1 x ≥ a ≥ 0 E ( x ) ≥ a E ( 1 x ≥ a ) = a P ( x ≥ a )

  10. Chebyshev bound x || 2 Let y := || � 2 , P ( | y − d | ≥ d ǫ )

  11. Chebyshev bound x || 2 Let y := || � 2 , ( y − E ( y )) 2 ≥ d 2 ǫ 2 � � P ( | y − d | ≥ d ǫ ) = P

  12. Chebyshev bound x || 2 Let y := || � 2 , ( y − E ( y )) 2 ≥ d 2 ǫ 2 � � P ( | y − d | ≥ d ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality d 2 ǫ 2

  13. Chebyshev bound x || 2 Let y := || � 2 , ( y − E ( y )) 2 ≥ d 2 ǫ 2 � � P ( | y − d | ≥ d ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality d 2 ǫ 2 = Var ( y ) d 2 ǫ 2

  14. Chebyshev bound x || 2 Let y := || � 2 , ( y − E ( y )) 2 ≥ d 2 ǫ 2 � � P ( | y − d | ≥ d ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality d 2 ǫ 2 = Var ( y ) d 2 ǫ 2 2 = d ǫ 2

  15. Non-asymptotic Chernoff tail bound Let � x be an iid standard Gaussian random vector of dimension d For any ǫ > 0 − d ǫ 2 � � � � x || 2 d ( 1 − ǫ ) < || � 2 < d ( 1 + ǫ ) ≥ 1 − 2 exp P 8

  16. Proof x || 2 Let y := || � 2 . The result is implied by − d ǫ 2 � � P ( y > d ( 1 + ǫ )) ≤ exp 8 − d ǫ 2 � � P ( y < d ( 1 − ǫ )) ≤ exp 8

  17. Proof Fix t > 0 P ( y > a )

  18. Proof Fix t > 0 P ( y > a ) = P ( exp ( t y ) > exp ( at ))

  19. Proof Fix t > 0 P ( y > a ) = P ( exp ( t y ) > exp ( at )) ≤ exp ( − at ) E ( exp ( t y )) by Markov’s inequality

  20. Proof Fix t > 0 P ( y > a ) = P ( exp ( t y ) > exp ( at )) ≤ exp ( − at ) E ( exp ( t y )) by Markov’s inequality � d � �� � 2 ≤ exp ( − at ) E exp t x i i = 1

  21. Proof Fix t > 0 P ( y > a ) = P ( exp ( t y ) > exp ( at )) ≤ exp ( − at ) E ( exp ( t y )) by Markov’s inequality � d � �� � 2 ≤ exp ( − at ) E exp t x i i = 1 d � 2 �� ≤ exp ( − at ) � � E exp t x i by independence of x 1 , . . . , x d i = 1

  22. Proof Lemma (by direct integration) 1 t x 2 �� � � exp = √ 1 − 2 t E Equivalent to controlling higher-order moments since � ∞ t x 2 � i � � � t x 2 �� � � exp = E E i ! i = 0 ∞ � t i � x 2 i �� E � = . i ! i = 0

  23. Proof Fix t > 0 d � 2 �� � � P ( y > a ) ≤ exp ( − at ) exp t x i E i = 1 = exp ( − at ) d ( 1 − 2 t ) 2

  24. Proof Setting a := d ( 1 + ǫ ) and t := 1 1 2 − 2 ( 1 + ǫ ) , we conclude � − d ǫ � P ( y > d ( 1 + ǫ )) ≤ ( 1 + ǫ ) d 2 exp 2 − d ǫ 2 � � ≤ exp 8

  25. Projection onto a fixed subspace Probability density is isotropic and has variance d Projection onto fixed k -dimensional subspace should capture fraction of variance equal to k / d Variance of projection should be k

  26. Projection onto a fixed subspace Let S be a k -dimensional subspace of R d and � x a d -dimensional standard Gaussian vector x ) = UU T � P S ( � x is not a Gaussian vector Covariance: x ) = UU T Σ � x UU T Σ P S ( �

  27. Projection onto a fixed subspace Let S be a k -dimensional subspace of R d and � x a d -dimensional standard Gaussian vector x ) = UU T � P S ( � x is not a Gaussian vector Covariance: x ) = UU T Σ � x UU T Σ P S ( � = UU T

  28. Projection onto a fixed subspace Let S be a k -dimensional subspace of R d and � x a d -dimensional standard Gaussian vector x ) = UU T � P S ( � x is not a Gaussian vector Covariance: x ) = UU T Σ � x UU T Σ P S ( � = UU T Not full rank

  29. Projection onto a fixed subspace Coefficients U T � x are a Gaussian vector with covariance x = U T Σ � x U = U T U = I Σ U T �

  30. Projection onto a fixed subspace Coefficients U T � x are a Gaussian vector with covariance x = U T Σ � x U = U T U = I Σ U T � We have x ) || 2 2 = ( UU T � x ) T UU T � ||P S ( � x 2 � � � � � U T � = x � � � � � � � 2

  31. Projection onto a fixed subspace Coefficients U T � x are a Gaussian vector with covariance x = U T Σ � x U = U T U = I Σ U T � We have x ) || 2 2 = ( UU T � x ) T UU T � ||P S ( � x 2 � � � � � U T � = x � � � � � � � 2 For any ǫ > 0 − k ǫ 2 � � � � x ) || 2 k ( 1 − ǫ ) < ||P S ( � ≥ 1 − 2 exp P 2 < k ( 1 + ǫ ) 8

  32. Linear regression To analyze the performance of the least-squares estimator we assume a linear model with additive iid Gaussian noise y train := X train � � β true + � z train The LS estimator equals � y train − X train � β LS := arg min � � β � 2 � β

  33. Training error The training error is the projection of the noise onto the orthogonal complement of the column space of X train � y train − � y LS = P col ( X train ) ⊥ � z train Dimension of orthogonal complement of col ( X train ) equals n − p � y LS || 2 || � y train − � 2 Training RMSE := n � 1 − p ≈ σ n

  34. Temperature prediction via linear regression 7 1 p / n 6 1 + p / n Training error Average error (deg Celsius) 5 Test error 4 3 2 1 0 200 500 1000 2000 5000 Number of training data

  35. Motivating applications Gaussian random variables Randomized dimensionality reduction Compressed sensing

  36. Randomized linear maps We use Gaussian matrices as randomized linear maps from R d to R k , k < d Each entry is sampled independently from standard Gaussian Question: Do we preserve distances between points in set? Equivalently, are any fixed vectors in the null space?

  37. Fixed vector Let A be a k × d matrix with iid standard Gaussian entries v ∈ R d is a deterministic vector with unit ℓ 2 norm, then A � If � v is a k -dimensional standard Gaussian vector

  38. Fixed vector Let A be a k × d matrix with iid standard Gaussian entries v ∈ R d is a deterministic vector with unit ℓ 2 norm, then A � If � v is a k -dimensional standard Gaussian vector Proof: ( A � v ) [ i ] , 1 ≤ i ≤ k is Gaussian with mean zero and variance � � A T v T Σ A i , : � i , : � = � Var v v

  39. Fixed vector Let A be a k × d matrix with iid standard Gaussian entries v ∈ R d is a deterministic vector with unit ℓ 2 norm, then A � If � v is a k -dimensional standard Gaussian vector Proof: ( A � v ) [ i ] , 1 ≤ i ≤ k is Gaussian with mean zero and variance � � A T v T Σ A i , : � i , : � = � Var v v v T I � = � v v || 2 = || � 2 = 1

  40. Fixed vector Let A be a k × d matrix with iid standard Gaussian entries v ∈ R d is a deterministic vector with unit ℓ 2 norm, then A � If � v is a k -dimensional standard Gaussian vector Proof: ( A � v ) [ i ] , 1 ≤ i ≤ k is Gaussian with mean zero and variance � � A T v T Σ A i , : � i , : � = � Var v v v T I � = � v v || 2 = || � 2 = 1 A i , : , 1 ≤ i ≤ k are all independent

  41. Non-asymptotic Chernoff tail bound Let � x be an iid standard Gaussian random vector of dimension k For any ǫ > 0 − k ǫ 2 � � � � x || 2 k ( 1 − ǫ ) < || � 2 < k ( 1 + ǫ ) ≥ 1 − 2 exp P 8

  42. Fixed vector Let A be a k × d matrix with iid standard Gaussian entries v ∈ R d with unit norm and any ǫ ∈ ( 0 , 1 ) For any � √ √ � � � � 1 � � � � 1 − ǫ ≤ √ A � ≤ 1 + ǫ v � � � � k � � � � 2 � − k ǫ 2 / 8 � with probability at least 1 − 2 exp

  43. Distance between two vectors The result implies that if we fix two vectors � x 1 and � x 2 and define � y := � x 2 − � x 1 then √ √ � � 1 � � � � � � 1 − ǫ || y || 2 ≤ √ ≤ 1 + ǫ || y || 2 A y � � � � k � � � � 2 y / || y || 2 ) with high probability (just set � v := � What about distances between a set of vectors?

  44. Johnson-Lindenstrauss lemma Let A be a k × d matrix with iid standard Gaussian entries x p ∈ R d be any fixed set of p deterministic vectors Let � x 1 , . . . , � For any pair � x i , � x j and any ǫ ∈ ( 0 , 1 ) 2 � � � � 1 x i − 1 x j || 2 x j || 2 � � � � ( 1 − ǫ ) || � x i − � 2 ≤ √ A � √ A � ≤ ( 1 + ǫ ) || � x i − � x j � � � � 2 k k � � � � 2 with probability at least 1 p as long as k ≥ 16 log ( p ) ǫ 2

  45. Johnson-Lindenstrauss lemma Let A be a k × d matrix with iid standard Gaussian entries x p ∈ R d be any fixed set of p deterministic vectors Let � x 1 , . . . , � For any pair � x i , � x j and any ǫ ∈ ( 0 , 1 ) 2 � � � � 1 x i − 1 x j || 2 x j || 2 � � � � ( 1 − ǫ ) || � x i − � 2 ≤ √ A � √ A � ≤ ( 1 + ǫ ) || � x i − � x j � � � � 2 k k � � � � 2 with probability at least 1 p as long as k ≥ 16 log ( p ) ǫ 2 No dependence on d !

  46. Proof Aim: Control action of A the normalized differences � x i − � x j � v ij := || � x i − � x j || 2 Our event of interest is the intersection of the events � � v ij || 2 E ij = k ( 1 − ǫ ) < || A � 2 < k ( 1 + ǫ ) 1 ≤ i < p , i < j ≤ p

  47. Proof Aim: Control action of A the normalized differences � x i − � x j � v ij := || � x i − � x j || 2 Our event of interest is the intersection of the events � � v ij || 2 E ij = k ( 1 − ǫ ) < || A � 2 < k ( 1 + ǫ ) 1 ≤ i < p , i < j ≤ p Is it equal to � i , j E ij ?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend