Reducing dimensionality Principal components R.W. Oldford Reducing - PowerPoint PPT Presentation

Reducing dimensionality Principal components R.W. Oldford

Reducing dimensions Recall how orthogonal projections work. Given V = [ v 1 , . . . , v k ], an orthogonal projection matrix P is easily constructed as V T V � − 1 V T . P = V � And, if the column vectors of V form an orthonormal basis for S , then P = VV T . So far, we have only considered the choice where the v i s are unit vectors in the direction of the original data axes e i . Projections onto these directions simply return the scatterplots on pairs of the original variates. Are there other directions which would do as well (or possibly better)? Imagine that we have n points x 1 , . . . , x n ∈ R p centred so that � n i =1 x i = 0 and we denote by X = [ x 1 , . . . , x n ] T the n × p real matrix whose i th row is x T i . Being centred, we can now ask whether the data truly lie in a linear subspace of R p . And, if they do, can we find that subspace? Alternatively, do they lie nearly in a linear subspace and could we find it?

Reducing dimensions - Finding the principal axes For example suppose n = 20 and p = 2 so that a point cloud might look like: r r r r r r r r r r r r r r r r r r r r The points x 1 , . . . , x n lie in the plane, but do not occupy all of it. They appear nearly to lie in a one-dimensional subspace of R 2

Reducing dimensions - Finding the principal axes We can think about orthogonally projecting the points (or equivalently, vectors) x 1 , . . . , x n onto any direction vector a ∈ R p (i.e. � a � = 1 , a ∈ R p ). P P r x k a r r ✄ ✗ r ✄ r ✄ r r ✄ r r r r r r r r r r r r r

Reducing dimensions - Finding the principal axes The orthogonal projection of the point x k onto a (i.e. onto the span { a } ) is ( aa T ) x k = a × w k a vector in the direction of a × sign( w k ) of length | w k | with w k = a T x k . Note that the squared length of this projection is 2 = || aa T x k || 2 = x k T aa T x k = a T x k x k T a . w k Since every point can be projected onto the direction vector a , we might ask what vector would maximize (or minimize) the sum of the squared lengths of the projections? That is, onto which direction would the projections have the largest average squared length? Because the points are already centred about 0 , this is the same as asking which direction are the original data points most (or least) spread out (or variable)?

Reducing dimensions - Finding the principal axes Mathematically we want to find the direction vector a , which maximizes (minimizes) the sum � n k =1 w 2 k . This sum can in turn be expressed in terms of the original points in the point cloud as: n n 2 = � � T aa T x k w k x k k =1 k =1 n � a T x k x k T a = k =1 n a T ( � T ) a = x k x k k =1 a T ( X T X ) a = where X T = [ x 1 , x 2 , . . . , x n ] is the p × n matrix of data vectors.

Reducing dimensions - Finding the principal axes The maximization (minimization) problem can now be expressed as follows. Find a ∈ R p which maximizes (minimizes) a T ( X T X ) a subject to the constraint that a T a = 1. We can write this as an unconstrained optimization by introducing a Lagrange multiplier λ . The problem then becomes to find λ ∈ R and a ∈ R p which maximizes (minimizes) a T ( X T X ) a + λ (1 − a T a ) which we now simply differentiate with respect to a , set to zero, solve, etc. Note that the objective function to be maximized (minimized) is a quadratic form in a . More generally, a quadratic form in z ∈ R p can always be written as Q ( z ) = z T Az + b T z + c where A ∈ R p × p , b ∈ R p , and c ∈ R are all constants (wlog A = A T ).

Reducing dimensions - Finding the principal axes Differentiating the quadratic form Q ( z ) = z T Az + b T z + c with respect to the vector z is ∂ ∂ z Q ( z ) = 2 Az + b . For our problem, we have the variable vector z = a and constants c = λ , b = 0 , and A = X T X − λ I p . Differentiating with respect to a and setting the result to 0 gives the set of equations: 2( X T X − λ I p ) a = 0 or ( X T X ) a = λ a . Differentiating with respect to λ , setting to zero and solving yields a T a = 1. Which should look familiar . . . ?

Reducing dimensions - Finding the principal axes The solution to the systems of equations ( X T X ) a = λ a a T a = 1 and are the eigen-vector a and its corresponding eigen-value λ of the real symmetric matrix X T X . The quadratic form we are maximizing (minimizing) is a T ( X T X ) a = λ a T a = λ. To maximize (minimize) this quadratic form, we choose the eigen-vector corresponding to the largest (smallest) eigen-value. Denote the solution to this problem as v 1 (or v p for the minimization problem). The solution v 1 (or v p ) will be the eigen-vector of ( X T X ) which corresponds to its largest (or smallest) eigen-value λ 1 (or λ p ). Putting all eigen-vectors into an orthogonal matrix V = [ v 1 , · · · , v p ] ordered by the eigen-values λ 1 ≥ λ 2 ≥ · · · ≥ λ p we have ( X T X ) = VD λ V T is the eigen-decomposition of X T X with D λ = diag ( λ 1 , . . . , λ p ).

Reducing dimensions - Finding the principal axes The figure below shows v 1 (and v 2 ) for this data. r r r ✓ ✼ v 1 r ✓ r ✓ r r r ✓ r r ❩❩❩❩ r r r r ⑦ v 2 r r r r r r

Reducing dimensions - Finding the principal axes Consider a change of variables y = V T x (with V T V = I p = VV T ). The data in the original coordinate system has  x 1   x 1  x 2 x 2     x =  = [ e 1 , e 2 , · · · , e p ]  = Vy . .     . . . .   x p x p and in the new coordinate system becomes  y 1  y 2   = VV T x  [ v 1 , v 2 , · · · , v p ] .   . .  y p The values y 1 , . . . , y p are coordinates on new axes v 1 , . . . v p . The axes were chosen so that the coordinates are most spread out for variable y 1 , next for variable y 2 , . . . , least for y p . The axes v 1 , . . . , v p are called the principal axes and the variables y 1 , . . . , y p the principal components . (N.B. sometimes both axes and variables are called the principal components . . . sigh.) Note that the transformed variates (the principal components) yi and yj are now uncorrelated for i � = j (since the y points are now aligned along their principal axes).

Reducing dimensions - Finding the principal axes For our example, the transform y = V T x yields T x = c 1 x 1 + c 2 x 2 + · · · + c p x p y 1 = v 1 as a weighted linear combination of the original x variables (using entries of v 1 as weights). The following figure shows the points as they appear in the new co-ordinate system (e.g. y k = V T x k ). y 2 r r r r r rr r y 1 r r r r r r r r r y k r r r Note that the transformation rotates the points into position and (in this example) reflects them through one (or more) of the principal axes .

Reducing dimensions - Finding the principal axes For our example, consider only the Species = versicolor of the iris data and the first three variates. In three dimensions, the plot looks like library (loon) ## Loading required package: tcltk data <- l_scale3D (iris[iris $ Species == "versicolor", 1 : 4]) p3D <- l_plot3D (data[,1 : 3], showGuides = TRUE) plot (p3D) Sepal.Width Sepal.Length Which can now be rotated by hand to get to the principal components.

Reducing dimensions Note that n � T v j = v j T ( λ j v j ) = v j T ( X T X ) v j = ( Xv j ) T ( Xv j ) = z T z = z 2 λ j = λ j v j i i =1 for real values z j , and so λ j ≥ 0 ∀ j = 1 , . . . , p . T x i is the coordinate of the data point x i projected onto Note that each z i = v j the principal axis v j . So, if λ j = 0, then the projection of every point onto the direction v j is identically 0! That is, the data lie in a space orthogonal to v j . Suppose there is a value d < p such that λ 1 ≥ · · · ≥ λ d > 0, and λ j = 0 for j > d . Then the points x 1 , . . . , x n lie in a d -dimensional subspace of R p defined by the principal axes v 1 , . . . , v d . That is x i ∈ span { v 1 , . . . , v d } ⊂ R p for all i = 1 , . . . , n . Question : What if we only have λ j ≈ 0 for j > d ? Answer : The points x 1 , . . . , x n nearly lie in a d -dimensional subspace. Perhaps we can reduce consideration to d dimensions.

Reducing dimensions Let Y T = [ y 1 , . . . , y n ] be the p × n matrix of the points y 1 , . . . , y n in the coordinate system of the principal axes. Note that these coordinates in the new space are simply given by Y = X [ v 1 , . . . , v p ] = XV . The i th column of Y is the i th principal component. When we want to reduce the dimensionality to only those defined by the first d principal components, then we could right-multiply X by the p × d matrix [ v 1 , . . . , v d ] or equivalently simply select the first d columns of Y . There is a decomposition of a real rectangular n × p matrix X which is particularly handy called the singular value decomposition : X = UD σ V T where U is an n × p matrix with the property that U T U = I p , V is a p × p matrix with V T V = VV T = I p and D σ = diag ( σ 1 , . . . , σ p ) with σ 1 ≥ σ 2 ≥ · · · ≥ σ p ≥ 0. The scalars σ i are called the singular values of X , the columns of U the left singular vectors and the columns of V the right singular vectors .

Reducing dimensionality Principal components R.W. Oldford Reducing - PowerPoint PPT Presentation

Reducing dimensionality Principal components R.W. Oldford Reducing dimensions Recall how orthogonal projections work. Given V = [ v 1 , . . . , v k ], an orthogonal projection matrix P is easily constructed as V T V 1 V T . P = V

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

How to Cope with the Curse of Dimensionality ? Henryk Wo zniakowski University of Warsaw and

CME/STATS 195 CME/STATS 195 Lecture 8: Hypothesis Testing and Lecture 8: Hypothesis Testing and

Introduction to Machine Learning Session 3b: Principal Components Analysis Reto West

A posteriori error estimates for space-time domain decomposition method for two-phase flow

Customer Heterogeneity in Purchasing Habit of Variety Seeking Based on Hierarchical Bayesian

Principal Component Analysis Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE 175A Winter

The Overlapping Thermodynamic Dissociation Constants of the Antidepressant Vortioxetine Using

DEVELOPING A THEORETICAL MODEL OF CLINICIAN INFORMATION USAGE PROPENSITY Dr Philip J Scott MSc

Statistical Natural Language Processing . ltekin, variables learned without labels