Random Projections and Dimension Reduction Rishi Advani 1 Madison - PowerPoint PPT Presentation

Random Projections and Dimension Reduction Rishi Advani 1 Madison Crim 2 Sean O’Hagan 3 1 Cornell University 2 Salisbury University 3 University of Connecticut Summer@ICERM, July 2020 Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 1 / 35

Acknowledgements Thank you to our organizers, Akil Narayan and Yanlai Chen, along with our TAs, Justin Baker and Liu Yang, for supporting us throughout this program Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 2 / 35

Introduction During this talk, we will focus on the use of randomness in two main areas: low-rank approximation kernel methods Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 3 / 35

Table of Contents Low-rank Approximation 1 Johnson-Lindenstrauss Lemma Interpolative Decomposition Singular Value Decomposition SVD/ID Performance Eigenfaces Kernel Methods 2 Kernel Methods Kernel PCA Kernel SVM Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 4 / 35

Johnson-Lindenstrauss Lemma If we have n data points in R d , there exists a linear map into R k , k < d , such that pairwise distances between data points can be preserved up to an ǫ tolerance, provided k > C ε − 2 log n , where C ≈ 24 [JL84]. The proof follows three steps [Mic09]: Define a random linear map f : R d → R k by f ( u ) = 1 k R · u , where √ R ∈ R k × d is drawn elementwise from a standard normal distribution. If u ∈ R d , show E [ � f ( u ) � 2 2 ] = � u � 2 2 . Show that the random variable � f ( u ) � 2 2 concentrates around � u � 2 2 , and construct a union bound over all pairwise distances. Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 5 / 35

Johnson-Lindenstrauss Lemma: Demonstration Figure: Histogram of � u � 2 2 − � f ( u ) � 2 2 for a fixed u ∈ R 1000 , f ( u ) ∈ R 10 Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 6 / 35

Deterministic Interpolative Decomposition Given a matrix A ∈ R m × n , we can compute an interpolative decomposition (ID), a low-rank matrix approximation that uses A ′ s own columns [Yin+18]. The ID can be computed using the column-pivoted QR factorization: AP = QR . To obtain our low-rank approximation, we form the submatrix Q k using the first k columns of Q . We then have the approximation A ≈ Q k Q ∗ k A , which gives us a particular rank- k projection of A . Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 8 / 35

Randomized Interpolative Decomposition We introduce a new method to compute randomized ID, by taking a subset S of p > k distinct, randomly-selected columns from the n columns of A . The algorithm then performs the column-pivoted QR factorization on the submatrix: A (: , S ) P = QR Accordingly we have the following rank k projection of A : A ≈ Q k Q ∗ k A , where Q k is the submatrix formed by the first k columns of Q . Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 9 / 35

Deterministic Singular Value Decomposition Recall the singular value decomposition of a matrix [16], A m × n = U m × m Σ m × n V ∗ n × n , where U and V are orthogonal matrices, and Σ is a rectangular diagonal matrix with positive diagonal entries σ 1 ≥ σ 2 ≥ · · · ≥ σ r , where r is the rank of the matrix A . The σ i s are called the singular values of A . Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 11 / 35

Randomized Singular Value Decomposition Utilizing ideas from [HMT09], our algorithm executes the following steps to compute the randomized SVD: 1 Construct a n × k random Gaussian matrix Ω 2 Form Y = A Ω 3 Construct a matrix Q whose columns form an orthonormal basis for the column space of Y 4 Set B = Q ∗ A 5 Compute the SVD: B = U ′ Σ V ∗ 6 Construct the SVD approximation: A ≈ QQ ∗ A = QB = QU ′ Σ V ∗ Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 12 / 35

Results - Testing 620 × 187500 Matrix Figure: Error Relative to Original Data Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 14 / 35

Results - Testing 620 × 187500 Matrix Figure: Random ID Error and Time Relative to Deterministic ID Figure: Random SVD Error and Time Relative to Deterministic SVD Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 15 / 35

Eigenfaces Using ideas from [BKP15], our eigenfaces experiment is based on the LFW dataset [Hua+07]. This dataset contains more than 13,000 RGB images of faces, where each image has dimensions 250 × 250. We can flatten each image to represent it as vector of length 250 · 250 · 3 = 187500. In our experiment we will only use 620 images from the LFW dataset. This gives us a data matrix A of size 187500 × 620. We then can perform SVD on the mean-subtracted columns of A . Figure: Original LFW Images Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 17 / 35

Image Results We obtain the following eigenfaces from the columns of the matrix U : Figure: Eigenfaces Obtained using Deterministic SVD Figure: Eigenfaces Obtained using Randomized SVD Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 18 / 35

Kernel Methods Kernel methods work by mapping the data into a high-dimensional space to add more structure and encourage linear separability. Suppose we have a feature map φ : R n → R m , m > n . The ‘kernel trick’ is based on the observation that we only need the inner products of vectors in the feature space, not the explicit high-dimensional mappings. k ( x , y ) = � φ ( x ) , φ ( y ) � � − γ � x − y � 2 � Ex. Gaussian/RBF Kernel: k ( x , y ) = exp 2 Kernel methods include kernel PCA, kernel SVM, and more. Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 20 / 35

Randomized Fourier Features Kernel We can sample random Fourier features to approximate a kernel [RR08]. Let k ( x , y ) denote our kernel, and p ( w ) the probability distribution corresponding to the inverse Fourier transform of k . � R d p ( w ) e − j w T ( x − y ) d w k ( x , y ) = m ≈ 1 � cos( w i T x + b i ) cos( w i T y + b i ) , m i =1 where w i ∼ p ( w ), b i ∼ Uniform(0 , 2 π ). For a given m , define m � cos( w i T x + b i ) z ( x ) = i =1 m z ( x ) z ( y ) T [Lop+14]. to yield the approximation k ( x , y ) ≈ 1 Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 21 / 35

Data for Kernel PCA Experiments To test kernel PCA methods, we use a dataset that is not linearly separable — a cloud of points surrounded by a circle: Figure: Data used to test kernel PCA methods Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 23 / 35

Randomized Kernel PCA Results Figure: Random Fourier features KPCA results Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 24 / 35

Kernel SVM We may also use kernel methods for support vector machines (SVM). The goal of an SVM is to find the ( d − 1)-hyperplane that best separates two clusters of d -dimensional data points. In two dimensions, this is a line separating two clusters of points in a plane. Using the kernel trick, we can project inseparable points into a higher dimension and run an SVM algorithm on the resulting points. Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 26 / 35

Randomized Kernel SVM Figure: Randomized Kernel SVM Accuracy and time results as m varies Advani, Crim, O’Hagan Random Projections Summer@ICERM 2020 27 / 35

Random Projections and Dimension Reduction Rishi Advani 1 Madison - PowerPoint PPT Presentation

Random Projections and Dimension Reduction Rishi Advani 1 Madison Crim 2 Sean OHagan 3 1 Cornell University 2 Salisbury University 3 University of Connecticut Summer@ICERM, July 2020 Advani, Crim, OHagan Random Projections Summer@ICERM

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan,

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

Packing Dimension Results for Anisotropic Gaussian Random Fields Dongsheng Wu Department of

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

VC-dimension and Erd os-P osa property Nicolas Bousquet LIRMM, University Montpellier II

The Human Dimension Sue Manns Regional Director Pegasus The Human Dimension The Human

The Metric Dimension Problem. J. D az Monash U., May 2018 The Metric Dimension problem

Nonparametric Variable Selection via Sufficient Dimension Reduction Lexin Li Workshop on Current

Extreme Value Theory and Dimension GARDES Inference on reduction for the study of hyperspectral

Dimension Reduction CS 760@UW-Madison Goals for the lecture you should understand the following

Dimension Reduction CS 6242 Ramakrishnan Kannan Thanks : Prof. Jaegul Choo and Prof. Le

Dimension reduction numerical methods for Bermudan options Scott Sues Probability, Numerics, and

Geometric perspectives for supervised dimension reduction A Tale of Two Manifolds S. Mukherjee,

Assouad Dimension and Random Fractals (Contains joint work with Jonathan M. Fraser and Jun J.

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Summary of the Sterile Neutrinos at the Crossroads Workshop Jonathan Link Virginia Tech

Theory of Interaction what is a model theory of computer science? Yuxi Fu BASICS, Shanghai

Efficient Algorithms and Error Analysis for Motivation the Modified Nystrm Method The Nystrm

Mermin Non-Locality in Abstract Process Theories arXiv:1506.02675 Stefano Gogioso and William

A Picture Tells A Thousand Words How to use images to tell your organizations story and

CS 730/730W/830: Intro AI First-order Logic Inference in FOL 1 handout: slides 730W journal

Modeling Coreference in Contexts with Three Referents Jet Hoek, Andrew Kehler & Hannah Rohde

Computational Semantics and Pragmatics Autumn 2013 Raquel Fernndez Institute for Logic,

Random Projections and Dimension Reduction Rishi Advani 1 Madison - PowerPoint PPT Presentation

Random Projections and Dimension Reduction Rishi Advani 1 Madison Crim 2 Sean OHagan 3 1 Cornell University 2 Salisbury University 3 University of Connecticut Summer@ICERM, July 2020 Advani, Crim, OHagan Random Projections Summer@ICERM

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan,

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

Packing Dimension Results for Anisotropic Gaussian Random Fields Dongsheng Wu Department of

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

VC-dimension and Erd os-P osa property Nicolas Bousquet LIRMM, University Montpellier II

The Human Dimension Sue Manns Regional Director Pegasus The Human Dimension The Human

The Metric Dimension Problem. J. D az Monash U., May 2018 The Metric Dimension problem

Nonparametric Variable Selection via Sufficient Dimension Reduction Lexin Li Workshop on Current

Extreme Value Theory and Dimension GARDES Inference on reduction for the study of hyperspectral

Dimension Reduction CS 760@UW-Madison Goals for the lecture you should understand the following

Dimension Reduction CS 6242 Ramakrishnan Kannan Thanks : Prof. Jaegul Choo and Prof. Le

Dimension reduction numerical methods for Bermudan options Scott Sues Probability, Numerics, and

Geometric perspectives for supervised dimension reduction A Tale of Two Manifolds S. Mukherjee,

Assouad Dimension and Random Fractals (Contains joint work with Jonathan M. Fraser and Jun J.

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Summary of the Sterile Neutrinos at the Crossroads Workshop Jonathan Link Virginia Tech

Theory of Interaction what is a model theory of computer science? Yuxi Fu BASICS, Shanghai

Efficient Algorithms and Error Analysis for Motivation the Modified Nystrm Method The Nystrm

Mermin Non-Locality in Abstract Process Theories arXiv:1506.02675 Stefano Gogioso and William

A Picture Tells A Thousand Words How to use images to tell your organizations story and

CS 730/730W/830: Intro AI First-order Logic Inference in FOL 1 handout: slides 730W journal

Modeling Coreference in Contexts with Three Referents Jet Hoek, Andrew Kehler &amp; Hannah Rohde

Computational Semantics and Pragmatics Autumn 2013 Raquel Fernndez Institute for Logic,

Modeling Coreference in Contexts with Three Referents Jet Hoek, Andrew Kehler & Hannah Rohde