Randomness DS-GA 1013 / MATH-GA 2824 Optimization-based Data - PowerPoint PPT Presentation

Chebyshev bound x || 2 Let y := || � 2 , � ( y − E ( y )) 2 ≥ k 2 ǫ 2 � P ( | y − k | ≥ k ǫ ) = P

Chebyshev bound x || 2 Let y := || � 2 , � ( y − E ( y )) 2 ≥ k 2 ǫ 2 � P ( | y − k | ≥ k ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality k 2 ǫ 2

Chebyshev bound x || 2 Let y := || � 2 , � ( y − E ( y )) 2 ≥ k 2 ǫ 2 � P ( | y − k | ≥ k ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality k 2 ǫ 2 = Var ( y ) k 2 ǫ 2

Chebyshev bound x || 2 Let y := || � 2 , � ( y − E ( y )) 2 ≥ k 2 ǫ 2 � P ( | y − k | ≥ k ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality k 2 ǫ 2 = Var ( y ) k 2 ǫ 2 2 = k ǫ 2

Non-asymptotic Chernoff tail bound Let � x be an iid standard Gaussian random vector of dimension k For any ǫ > 0 � � � � − k ǫ 2 x || 2 k ( 1 − ǫ ) < || � 2 < k ( 1 + ǫ ) ≥ 1 − 2 exp P 8

Proof x || 2 Let y := || � 2 . The result is implied by � � − k ǫ 2 P ( y > k ( 1 + ǫ )) ≤ exp 8 � � − k ǫ 2 P ( y < k ( 1 − ǫ )) ≤ exp 8

Proof Fix t > 0 P ( y > a )

Proof Fix t > 0 P ( y > a ) = P (exp ( t y ) > exp ( at ))

Proof Fix t > 0 P ( y > a ) = P (exp ( t y ) > exp ( at )) ≤ exp ( − at ) E (exp ( t y )) by Markov’s inequality

Proof Fix t > 0 P ( y > a ) = P (exp ( t y ) > exp ( at )) ≤ exp ( − at ) E (exp ( t y )) by Markov’s inequality � � k �� 2 ≤ exp ( − at ) E exp t x i i = 1

Proof Fix t > 0 P ( y > a ) = P (exp ( t y ) > exp ( at )) ≤ exp ( − at ) E (exp ( t y )) by Markov’s inequality � � k �� 2 ≤ exp ( − at ) E exp t x i i = 1 k � � � 2 �� ≤ exp ( − at ) E exp t x i by independence of x 1 , . . . , x k i = 1

Proof Lemma (by direct integration) � � t x 2 �� 1 exp = √ 1 − 2 t E Equivalent to controlling higher-order moments since � ∞ � � t x 2 � i � � � t x 2 �� exp = E E i ! i = 0 � t i � x 2 i �� ∞ � E = . i ! i = 0

Proof Fix t > 0 k � � � 2 �� P ( y > a ) ≤ exp ( − at ) exp t x i E i = 1 = exp ( − at ) k ( 1 − 2 t ) 2

Proof Setting a := k ( 1 + ǫ ) and t := 1 1 2 − 2 ( 1 + ǫ ) , we conclude � � − k ǫ P ( y > k ( 1 + ǫ )) ≤ ( 1 + ǫ ) k 2 exp 2 � � − k ǫ 2 ≤ exp 8

Projection onto a fixed subspace P S 1 � P S 2 � z z 0 . 007 = ||P S 1 � z || 2 < ||P S 2 � z || 2 = 0 . 043 || � x || 2 || � x || 2 � 0 . 043 dim ( S 2 ) 0 . 007 = 6 . 14 ≈ (not a coincidence) dim ( S 1 )

Projection onto a fixed subspace Let S be a k -dimensional subspace of R n and � z ∈ R n a vector of iid standard Gaussian noise 2 is a χ 2 random variable with k degrees of freedom z || 2 ||P S � It has the same distribution as k � 2 y := x i i = 1 where x 1 , . . . , x k are iid standard Gaussians.

Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: z || 2 ||P S � 2

Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2

Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z

Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z

Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z w T � = � w

Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z w T � = � w k � w [ i ] 2 = � i = 1

Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z w T � = � w k � w [ i ] 2 = � i = 1 w := U T � � z is Gaussian with mean zero and covariance matrix w = U T Σ � Σ � z U

Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z w T � = � w k � w [ i ] 2 = � i = 1 w := U T � � z is Gaussian with mean zero and covariance matrix w = U T Σ � Σ � z U = U T U = I

Projection onto a fixed subspace Let S be a k -dimensional subspace of R n and � z ∈ R n a vector of iid standard Gaussian noise For any ǫ > 0 � � − k ǫ 2 P ( k ( 1 − ǫ ) < ||P S � z || 2 < k ( 1 + ǫ )) ≥ 1 − 2 exp 8

Gaussian random variables Gaussian random vectors Randomized projections SVD of a random matrix Randomized SVD

Dimensionality reduction ◮ PCA preserves the most energy ( ℓ 2 norm) ◮ Problem 1: Computationally expensive ◮ Problem 2: Depends on all of the data ◮ (Possible) Solution: Just project randomly! x 2 , . . . ∈ R m compute A � x 2 , . . . ∈ R m where ◮ For a data set � x 1 , � x 1 , A � A ∈ R k × n ( k < n ) has iid standard Gaussian entries

Fixed vector Let A be a a × b matrix with iid standard Gaussian entries v ∈ R b is a deterministic vector with unit ℓ 2 norm, then A � If � v is an a -dimensional iid standard Gaussian vector Proof:

Fixed vector Let A be a a × b matrix with iid standard Gaussian entries v ∈ R b is a deterministic vector with unit ℓ 2 norm, then A � If � v is an a -dimensional iid standard Gaussian vector Proof: ( A � v ) [ i ] , 1 ≤ i ≤ a is Gaussian with mean zero and variance � � A T v T Σ A i , : � i , : � = � Var v v v T I � = � v v || 2 = || � 2 = 1

Fixed vector Let A be a a × b matrix with iid standard Gaussian entries v ∈ R p with unit norm and any ǫ ∈ ( 0 , 1 ) For any � � � a ( 1 − ǫ ) ≤ || A � v || 2 ≤ a ( 1 + ǫ ) � � − a ǫ 2 / 8 with probability at least 1 − 2 exp

Johnson-Lindenstrauss lemma Let A be a k × n matrix with iid standard Gaussian entries x p ∈ R n be any fixed set of p deterministic vectors Let � x 1 , . . . , � For any pair � x i , � x j and any ǫ ∈ ( 0 , 1 ) � � � � 2 � � � � 1 x i − 1 x j || 2 � � � � x j || 2 √ √ ( 1 − ǫ ) || � x i − � 2 ≤ A � A � x j ≤ ( 1 + ǫ ) || � x i − � � � � � 2 k k 2 with probability at least 1 p as long as k ≥ 16 log ( p ) ǫ 2

Proof Aim: Control action of A the normalized differences x i − � � x j � v ij := || � x i − � x j || 2 Our event of interest is the intersection of the events � � v ij || 2 E ij = k ( 1 − ǫ ) < || A � 2 < k ( 1 + ǫ ) 1 ≤ i < p , i < j ≤ p

Fixed vector Let A be a a × b matrix with iid standard Gaussian entries v ∈ R b with unit norm and any ǫ ∈ ( 0 , 1 ) For any � � � a ( 1 − ǫ ) ≤ || A � v || 2 ≤ a ( 1 + ǫ ) � � − a ǫ 2 / 8 with probability at least 1 − 2 exp This implies � � ≤ 2 if k ≥ 16 log ( p ) E c P ij p 2 ǫ 2

Union bound For any events S 1 , S 2 , . . . , S n in a probability space n � P ( ∪ i S i ) ≤ P ( S i ) . i = 1

Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound   �  E ij P i , j

Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound     � �  = 1 − P E c  E ij P ij i , j i , j

Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound     � �  = 1 − P E c  E ij P ij i , j i , j � � � E c ≥ 1 − P ij i , j

Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound     � �  = 1 − P E c  E ij P ij i , j i , j � � � E c ≥ 1 − P ij i , j ≥ 1 − p ( p − 1 ) 2 p 2 2

Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound     � �  = 1 − P E c  E ij P ij i , j i , j � � � E c ≥ 1 − P ij i , j ≥ 1 − p ( p − 1 ) 2 p 2 2 ≥ 1 p

Dimensionality reduction for visualization Motivation: Visualize high-dimensional features projected onto 2D or 3D Example: Seeds from three different varieties of wheat: Kama, Rosa and Canadian Features: ◮ Area ◮ Perimeter ◮ Compactness ◮ Length of kernel ◮ Width of kernel ◮ Asymmetry coefficient ◮ Length of kernel groove

Dimensionality reduction for visualization Randomized projection PCA

Nearest neighbors in random subspace Nearest neighbors classification (Algorithm 4.2 in Lecture Notes 1) computes n distances in R m for each new example Cost: O ( nmp ) for p examples Idea: Use a k × m iid standard Gaussian matrix to project onto k -dimensional space beforehand Cost: ◮ kmn operations to project training set ◮ kmp operations to project test set ◮ knp to perform nearest-neighbor classification Much faster!

Face recognition Training set: 360 64 × 64 images from 40 different subjects (9 each) Test set: 1 new image from each subject We model each image as a vector in R 4096 ( m = 4096) To classify we: 1. Project onto random a k -dimensional subspace 2. Apply nearest-neighbor classification using the ℓ 2 -norm distance in R k

Performance Average 40 Maximum 35 Minimum 30 30 25 Errors 20 15 10 10 5 0 0 20 40 60 80 100 120 140 160 180 200 Dimension

Nearest neighbor in R 50 Test image Projection Closest projection Corresponding image

Gaussian random variables Gaussian random vectors Randomized projections SVD of a random matrix Randomized SVD

Singular values of n × k matrix, k = 100 n / k 2 1 . 5 5 10 20 50 1 √ n σ i 100 200 0 . 5 0 20 40 60 80 100 i

Randomness DS-GA 1013 / MATH-GA 2824 Optimization-based Data - PowerPoint PPT Presentation

Randomness DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html Carlos Fernandez-Granda Gaussian random variables Gaussian random vectors Randomized projections SVD of a

Algorithmic randomness Cuny logic worshop Benoit Monin - LACL - Universit e Paris-Est Cr

Lecture 19 Randomness, Pseudo Randomness, and Confidentiality Stephen Checkoway University

Counting Words: Non- Randomness Pre-Processing and Non-Randomness The End Marco Baroni &

Randomness Some content taken from Silence on the Wire by Michal Zalewski Todays Agenda

CS 574: Randomized Algorithms Lecture 1. Introduction to Randomness August 25, 2015 Lecture 1.

Randomness in Computing L ECTURE 1 Randomness in Computing Course information Verifying

15-251 Great Theoretical Ideas in Computer Science Lecture 21: Introduction to Randomness and

Firmware Insider Bluetooth Randomness is Mostly Random RANDOMNESS IS MY PASSION Jrn

Randomness Dependent Randomness Dependent Message Security g y Eleanor Birrell Kai

Randomness in C 2 and Pluripotential Theory Randomness in C 2 and Pluripotential Theory Outline 1

Randomness and analysis: a tutorial Part I: Randomness notions and almost everywhere theorems

Computability, randomness and the ergodic decomposition Mathieu Hoyrup ( t r

Higher Randomness and hK-Trivials Paul-Elliot Angls dAuriac Benot Monin March 26, 2019

Pseudo-Random Number Generators Functional Programming and Intelligent Algorithms Prof Hans Georg

Higher randomness Benoit Monin - LIAFA - University of Paris VII Victoria university - 16 April

Higher randomness Benoit Monin - LIAFA - University of Paris VII Join work with Laurent Bienvenu

The Normal Distribution The normal distribution plays a central role in probability theory and in

Introduction to Business Statistics QM 120 Chapter 6 Spring 2008 Chapter 6: Continuous

Unit 2: Probability and distributions Lecture 3: Normal distribution Statistics 101 Thomas

How to Analyse an S-box, and, in the Process, Prove the Russian Standardizing Agency Wrong Lo

Our Place Our Place in in the the Cosmos Cosmos following physical properties Surface

Bringing the Awesomeness of Astronomy to Everyone: How to give a great public talk t i f o r

CS 188: Artificial Intelligence Spring 2007 Lecture 4: A* Search Srini Narayanan ICSI and UC

The effect of photoionising feedback on star formation in colliding clouds Kazuhiro Shima

Randomness DS-GA 1013 / MATH-GA 2824 Optimization-based Data - PowerPoint PPT Presentation

Randomness DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html Carlos Fernandez-Granda Gaussian random variables Gaussian random vectors Randomized projections SVD of a

Algorithmic randomness Cuny logic worshop Benoit Monin - LACL - Universit e Paris-Est Cr

Lecture 19 Randomness, Pseudo Randomness, and Confidentiality Stephen Checkoway University

Counting Words: Non- Randomness Pre-Processing and Non-Randomness The End Marco Baroni &amp;

Randomness Some content taken from Silence on the Wire by Michal Zalewski Todays Agenda

CS 574: Randomized Algorithms Lecture 1. Introduction to Randomness August 25, 2015 Lecture 1.

Randomness in Computing L ECTURE 1 Randomness in Computing Course information Verifying

15-251 Great Theoretical Ideas in Computer Science Lecture 21: Introduction to Randomness and

Firmware Insider Bluetooth Randomness is Mostly Random RANDOMNESS IS MY PASSION Jrn

Randomness Dependent Randomness Dependent Message Security g y Eleanor Birrell Kai

Randomness in C 2 and Pluripotential Theory Randomness in C 2 and Pluripotential Theory Outline 1

Randomness and analysis: a tutorial Part I: Randomness notions and almost everywhere theorems

Computability, randomness and the ergodic decomposition Mathieu Hoyrup ( t r

Higher Randomness and hK-Trivials Paul-Elliot Angls dAuriac Benot Monin March 26, 2019

Pseudo-Random Number Generators Functional Programming and Intelligent Algorithms Prof Hans Georg

Higher randomness Benoit Monin - LIAFA - University of Paris VII Victoria university - 16 April

Higher randomness Benoit Monin - LIAFA - University of Paris VII Join work with Laurent Bienvenu

The Normal Distribution The normal distribution plays a central role in probability theory and in

Introduction to Business Statistics QM 120 Chapter 6 Spring 2008 Chapter 6: Continuous

Unit 2: Probability and distributions Lecture 3: Normal distribution Statistics 101 Thomas

How to Analyse an S-box, and, in the Process, Prove the Russian Standardizing Agency Wrong Lo

Our Place Our Place in in the the Cosmos Cosmos following physical properties Surface

Bringing the Awesomeness of Astronomy to Everyone: How to give a great public talk t i f o r

CS 188: Artificial Intelligence Spring 2007 Lecture 4: A* Search Srini Narayanan ICSI and UC

The effect of photoionising feedback on star formation in colliding clouds Kazuhiro Shima

Counting Words: Non- Randomness Pre-Processing and Non-Randomness The End Marco Baroni &