Randomization DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data - PowerPoint PPT Presentation

Variance � d  � 2  �� 2 � x || 2 � x [ i ] 2 || � = E � E 2   i = 1   d d � � x [ i ] 2 � x [ j ] 2 � = E   i = 1 j = 1 d d � � x [ i ] 2 � x [ j ] 2 � � � = E i = 1 j = 1

Variance � d  � 2  �� 2 � x || 2 � x [ i ] 2 || � = E � E 2   i = 1   d d � � x [ i ] 2 � x [ j ] 2 � = E   i = 1 j = 1 d d � � x [ i ] 2 � x [ j ] 2 � � � = E i = 1 j = 1 d − 1 d d � � � x [ i ] 4 � x [ i ] 2 � x [ j ] 2 � � � � � � � = + 2 E E E i = 1 i = 1 j = i + 1

Variance � d  � 2  �� 2 � x || 2 � x [ i ] 2 || � = E � E 2   i = 1   d d � � x [ i ] 2 � x [ j ] 2 � = E   i = 1 j = 1 d d � � x [ i ] 2 � x [ j ] 2 � � � = E i = 1 j = 1 d − 1 d d � � � x [ i ] 4 � x [ i ] 2 � x [ j ] 2 � � � � � � � = + 2 E E E i = 1 i = 1 j = i + 1 = 3 d + d ( d − 1 ) 4th moment of standard Gaussian equals 3

Variance � d  � 2  �� 2 � x || 2 � x [ i ] 2 || � = E � E 2   i = 1   d d � � x [ i ] 2 � x [ j ] 2 � = E   i = 1 j = 1 d d � � x [ i ] 2 � x [ j ] 2 � � � = E i = 1 j = 1 d − 1 d d � � � x [ i ] 4 � x [ i ] 2 � x [ j ] 2 � � � � � � � = + 2 E E E i = 1 i = 1 j = i + 1 = 3 d + d ( d − 1 ) 4th moment of standard Gaussian equals 3 = d ( d + 2 )

Variance �� 2 � � 2 � � � x || 2 x || 2 x || 2 || � || � − E || � Var = E 2 2 2 = d ( d + 2 ) − d 2 = 2 d � Relative standard deviation around mean scales as 2 / d Geometrically, probability density concentrates close to surface of a √ sphere with radius d

Non-asymptotic tail bound Let � x be an iid standard Gaussian random vector of dimension d For any ǫ > 0 2 � � x || 2 P d ( 1 − ǫ ) < || � 2 < d ( 1 + ǫ ) ≥ 1 − d ǫ 2

Markov’s inequality Let x be a nonnegative random variable For any positive constant a > 0, P ( x ≥ a ) ≤ E ( x ) a

Proof Define the indicator variable 1 x ≥ a x − a 1 x ≥ a ≥ 0

Proof Define the indicator variable 1 x ≥ a x − a 1 x ≥ a ≥ 0 E ( x ) ≥ a E ( 1 x ≥ a ) = a P ( x ≥ a )

Chebyshev bound x || 2 Let y := || � 2 , P ( | y − d | ≥ d ǫ )

Chebyshev bound x || 2 Let y := || � 2 , ( y − E ( y )) 2 ≥ d 2 ǫ 2 � � P ( | y − d | ≥ d ǫ ) = P

Chebyshev bound x || 2 Let y := || � 2 , ( y − E ( y )) 2 ≥ d 2 ǫ 2 � � P ( | y − d | ≥ d ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality d 2 ǫ 2

Chebyshev bound x || 2 Let y := || � 2 , ( y − E ( y )) 2 ≥ d 2 ǫ 2 � � P ( | y − d | ≥ d ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality d 2 ǫ 2 = Var ( y ) d 2 ǫ 2

Chebyshev bound x || 2 Let y := || � 2 , ( y − E ( y )) 2 ≥ d 2 ǫ 2 � � P ( | y − d | ≥ d ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality d 2 ǫ 2 = Var ( y ) d 2 ǫ 2 2 = d ǫ 2

Non-asymptotic Chernoff tail bound Let � x be an iid standard Gaussian random vector of dimension d For any ǫ > 0 − d ǫ 2 � � � � x || 2 d ( 1 − ǫ ) < || � 2 < d ( 1 + ǫ ) ≥ 1 − 2 exp P 8

Proof x || 2 Let y := || � 2 . The result is implied by − d ǫ 2 � � P ( y > d ( 1 + ǫ )) ≤ exp 8 − d ǫ 2 � � P ( y < d ( 1 − ǫ )) ≤ exp 8

Proof Fix t > 0 P ( y > a )

Proof Fix t > 0 P ( y > a ) = P ( exp ( t y ) > exp ( at ))

Proof Fix t > 0 P ( y > a ) = P ( exp ( t y ) > exp ( at )) ≤ exp ( − at ) E ( exp ( t y )) by Markov’s inequality

Proof Fix t > 0 P ( y > a ) = P ( exp ( t y ) > exp ( at )) ≤ exp ( − at ) E ( exp ( t y )) by Markov’s inequality � d � �� 2 ≤ exp ( − at ) E exp t x i i = 1

Proof Fix t > 0 P ( y > a ) = P ( exp ( t y ) > exp ( at )) ≤ exp ( − at ) E ( exp ( t y )) by Markov’s inequality � d � �� 2 ≤ exp ( − at ) E exp t x i i = 1 d � 2 �� ≤ exp ( − at ) � � E exp t x i by independence of x 1 , . . . , x d i = 1

Proof Lemma (by direct integration) 1 t x 2 �� exp = √ 1 − 2 t E Equivalent to controlling higher-order moments since � ∞ t x 2 � i � � � t x 2 �� exp = E E i ! i = 0 ∞ � t i � x 2 i �� E � = . i ! i = 0

Proof Fix t > 0 d � 2 �� P ( y > a ) ≤ exp ( − at ) exp t x i E i = 1 = exp ( − at ) d ( 1 − 2 t ) 2

Proof Setting a := d ( 1 + ǫ ) and t := 1 1 2 − 2 ( 1 + ǫ ) , we conclude � − d ǫ � P ( y > d ( 1 + ǫ )) ≤ ( 1 + ǫ ) d 2 exp 2 − d ǫ 2 � � ≤ exp 8

Projection onto a fixed subspace Probability density is isotropic and has variance d Projection onto fixed k -dimensional subspace should capture fraction of variance equal to k / d Variance of projection should be k

Projection onto a fixed subspace Let S be a k -dimensional subspace of R d and � x a d -dimensional standard Gaussian vector x ) = UU T � P S ( � x is not a Gaussian vector Covariance: x ) = UU T Σ � x UU T Σ P S ( �

Projection onto a fixed subspace Let S be a k -dimensional subspace of R d and � x a d -dimensional standard Gaussian vector x ) = UU T � P S ( � x is not a Gaussian vector Covariance: x ) = UU T Σ � x UU T Σ P S ( � = UU T

Projection onto a fixed subspace Let S be a k -dimensional subspace of R d and � x a d -dimensional standard Gaussian vector x ) = UU T � P S ( � x is not a Gaussian vector Covariance: x ) = UU T Σ � x UU T Σ P S ( � = UU T Not full rank

Projection onto a fixed subspace Coefficients U T � x are a Gaussian vector with covariance x = U T Σ � x U = U T U = I Σ U T �

Projection onto a fixed subspace Coefficients U T � x are a Gaussian vector with covariance x = U T Σ � x U = U T U = I Σ U T � We have x ) || 2 2 = ( UU T � x ) T UU T � ||P S ( � x 2 � � � � � U T � = x � � � � � � � 2

Projection onto a fixed subspace Coefficients U T � x are a Gaussian vector with covariance x = U T Σ � x U = U T U = I Σ U T � We have x ) || 2 2 = ( UU T � x ) T UU T � ||P S ( � x 2 � � � � � U T � = x � � � � � � � 2 For any ǫ > 0 − k ǫ 2 � � � � x ) || 2 k ( 1 − ǫ ) < ||P S ( � ≥ 1 − 2 exp P 2 < k ( 1 + ǫ ) 8

Linear regression To analyze the performance of the least-squares estimator we assume a linear model with additive iid Gaussian noise y train := X train � � β true + � z train The LS estimator equals � y train − X train � β LS := arg min � � β � 2 � β

Training error The training error is the projection of the noise onto the orthogonal complement of the column space of X train � y train − � y LS = P col ( X train ) ⊥ � z train Dimension of orthogonal complement of col ( X train ) equals n − p � y LS || 2 || � y train − � 2 Training RMSE := n � 1 − p ≈ σ n

Temperature prediction via linear regression 7 1 p / n 6 1 + p / n Training error Average error (deg Celsius) 5 Test error 4 3 2 1 0 200 500 1000 2000 5000 Number of training data

Motivating applications Gaussian random variables Randomized dimensionality reduction Compressed sensing

Randomized linear maps We use Gaussian matrices as randomized linear maps from R d to R k , k < d Each entry is sampled independently from standard Gaussian Question: Do we preserve distances between points in set? Equivalently, are any fixed vectors in the null space?

Fixed vector Let A be a k × d matrix with iid standard Gaussian entries v ∈ R d is a deterministic vector with unit ℓ 2 norm, then A � If � v is a k -dimensional standard Gaussian vector

Fixed vector Let A be a k × d matrix with iid standard Gaussian entries v ∈ R d is a deterministic vector with unit ℓ 2 norm, then A � If � v is a k -dimensional standard Gaussian vector Proof: ( A � v ) [ i ] , 1 ≤ i ≤ k is Gaussian with mean zero and variance � � A T v T Σ A i , : � i , : � = � Var v v

Fixed vector Let A be a k × d matrix with iid standard Gaussian entries v ∈ R d is a deterministic vector with unit ℓ 2 norm, then A � If � v is a k -dimensional standard Gaussian vector Proof: ( A � v ) [ i ] , 1 ≤ i ≤ k is Gaussian with mean zero and variance � � A T v T Σ A i , : � i , : � = � Var v v v T I � = � v v || 2 = || � 2 = 1

Fixed vector Let A be a k × d matrix with iid standard Gaussian entries v ∈ R d is a deterministic vector with unit ℓ 2 norm, then A � If � v is a k -dimensional standard Gaussian vector Proof: ( A � v ) [ i ] , 1 ≤ i ≤ k is Gaussian with mean zero and variance � � A T v T Σ A i , : � i , : � = � Var v v v T I � = � v v || 2 = || � 2 = 1 A i , : , 1 ≤ i ≤ k are all independent

Non-asymptotic Chernoff tail bound Let � x be an iid standard Gaussian random vector of dimension k For any ǫ > 0 − k ǫ 2 � � � � x || 2 k ( 1 − ǫ ) < || � 2 < k ( 1 + ǫ ) ≥ 1 − 2 exp P 8

Fixed vector Let A be a k × d matrix with iid standard Gaussian entries v ∈ R d with unit norm and any ǫ ∈ ( 0 , 1 ) For any � √ √ � � � � 1 � � � � 1 − ǫ ≤ √ A � ≤ 1 + ǫ v � � � � k � � � � 2 � − k ǫ 2 / 8 � with probability at least 1 − 2 exp

Distance between two vectors The result implies that if we fix two vectors � x 1 and � x 2 and define � y := � x 2 − � x 1 then √ √ � � 1 � � � � � � 1 − ǫ || y || 2 ≤ √ ≤ 1 + ǫ || y || 2 A y � � � � k � � � � 2 y / || y || 2 ) with high probability (just set � v := � What about distances between a set of vectors?

Johnson-Lindenstrauss lemma Let A be a k × d matrix with iid standard Gaussian entries x p ∈ R d be any fixed set of p deterministic vectors Let � x 1 , . . . , � For any pair � x i , � x j and any ǫ ∈ ( 0 , 1 ) 2 � � � � 1 x i − 1 x j || 2 x j || 2 � � � � ( 1 − ǫ ) || � x i − � 2 ≤ √ A � √ A � ≤ ( 1 + ǫ ) || � x i − � x j � � � � 2 k k � � � � 2 with probability at least 1 p as long as k ≥ 16 log ( p ) ǫ 2

Johnson-Lindenstrauss lemma Let A be a k × d matrix with iid standard Gaussian entries x p ∈ R d be any fixed set of p deterministic vectors Let � x 1 , . . . , � For any pair � x i , � x j and any ǫ ∈ ( 0 , 1 ) 2 � � � � 1 x i − 1 x j || 2 x j || 2 � � � � ( 1 − ǫ ) || � x i − � 2 ≤ √ A � √ A � ≤ ( 1 + ǫ ) || � x i − � x j � � � � 2 k k � � � � 2 with probability at least 1 p as long as k ≥ 16 log ( p ) ǫ 2 No dependence on d !

Proof Aim: Control action of A the normalized differences � x i − � x j � v ij := || � x i − � x j || 2 Our event of interest is the intersection of the events � � v ij || 2 E ij = k ( 1 − ǫ ) < || A � 2 < k ( 1 + ǫ ) 1 ≤ i < p , i < j ≤ p

Proof Aim: Control action of A the normalized differences � x i − � x j � v ij := || � x i − � x j || 2 Our event of interest is the intersection of the events � � v ij || 2 E ij = k ( 1 − ǫ ) < || A � 2 < k ( 1 + ǫ ) 1 ≤ i < p , i < j ≤ p Is it equal to � i , j E ij ?

Randomization DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data - PowerPoint PPT Presentation

Randomization DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science https://cims.nyu.edu/~cfgranda/pages/MTDS_spring19/index.html Carlos Fernandez-Granda Motivating applications Gaussian random variables Randomized dimensionality

Randomization Algorithm Theory WS 2012/13 Fabian Kuhn Randomization Randomized Algorithm: An

Stage III of Social Subprojects Selection, Youth Corps Project Randomization (computer-based

Experience with MAC Address Randomization in Windows 10 Christian Huitema Huitema@microsoft.com

Beyond Domain Randomization Josh Tobin 6/23/19 Goals for this talk Understand domain

Address Space Randomization A n E f f e c t i v e I m p l e m e n t a t i o n Michael Cloppert

What About Randomization Tests? Strengths Gail et al. (1996) reported nominal Type I and II

Gov 2002: 3. Randomization Inference Matthew Blackwell September 10, 2015 Where are we? Where

Br Breaking Kern rnel Ad Address ss Space La Layout Randomization (KASLR LR) wi with th

Test statistics and randomization distributions Applied Statistics and Experimental Design

Optimal Randomization in Quantizer Design with Marginal Constraint Naci Saldi Queens

Topic III.1: Swap Randomization Discrete Topics in Data Mining Universitt des Saarlandes,

Lecture 4: Permutation Methods Applied Statistics 2014 1 / 21 Randomization Model Population

1 2 In stat. people may call these multistage trials (the randomization at each stage is

MENDELIAN RANDOMIZATION Maria Carolina Borges Research Fellow MRC Integrative Epidemiology Unit

How to Randomize? Bruno Crepon J-PAL Lecture Overview Unit and method of randomization

Its a TRaP: Table Randomization and Protection against Function-Reuse Attacks Stephen Crane,

Improving the Randomization Step in Feasibility Pump using WalkSAT Santanu S. Dey Joint work

Detecting Network Effects Randomizing Over Randomized Experiments Martin Saveski (@msaveski)

Shuffler: Fast and Deployable Continuous Code Re-Randomization David Williams-King, Graham

stakeholder Think Tank Meeting Trevor Lentz, PhD, PT, MPH Lesley Curtis, PhD Frank Rockhold, PhD

MARDU: Efficient and Scalable Code Re-Randomization SYSTOR '20: Proceedings of the 13th ACM

Graphical Representation of Causal Effects November 10, 2016 Lords Paradox: Observed Data

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

1 Take a broad view of what constitutes therapies: changing intensity, switching medication,