Principal Component Analysis
4/7/17
Principal Component Analysis 4/7/17 PCA: the setting Unsupervised - - PowerPoint PPT Presentation
Principal Component Analysis 4/7/17 PCA: the setting Unsupervised learning Unlabeled data Dimensionality reduction Simplify the data representation What does the algorithm do? Performs an affine change of basis. Rotate and
4/7/17
Unsupervised learning
Dimensionality reduction
What does the algorithm do?
variance lies along the axes.
Support vector machines
may be (approximately) linearly separable in a transformed basis.
dimensional bases.
Approximate Q-learning
be able to extract features that summarize the state space well.
transformed representation.
This looks like the change of basis from linear algebra.
The goal:
data is along the axes.
be important.
First step: center the data.
that dimension.
perform a linear transformation.
x0 x1 x2 x3 x4 4 3
1 2 8
6
means 1.2
x0 x1 x2 x3 x4 2.8 1.8
.8 8
.4 8.4
Original Data Centered Data
The hard part: find an orthogonal basis that's a linear transformation of the original, where the variance in the data is explained by as few dimensions as possible. Basis: set of vectors that span the dimensions. Orthogonal: all vectors are perpendicular. Linear transformation: rotate all vectors by the same amount. Explaining the variance: low co-variance across dimensions.
[0,1] [1,0]
Last step: reduce the dimension.
the data varies.
dimensions.
threshold.
remaining axes.
covariance matrix.
covariance matrix.
2.8 1.8
.8 8
.4 8.4
X 2.8 8 .4 1.8 8.4
4.6
.8
XT 7.76 4.8 8.08 4.8 18.8 3.6 8.08 3.6 21.04 CX = ⅕(X)(XT)
Eigenvectors are vectors that the matrix doesn’t rotate. If X is a matrix, and v is a vector, then v is an eigenvector of x iff there is some constant λ, such that:
λ, the amount by which X stretches the eigenvector is the eigenvalue. np.linalg.eig gives eigenvalues and eigenvectors.
If the matrix (X)(XT) has eigenvectors with eigenvalues for i ∈ {1, …, m}, then the following vectors form an orthonormal basis: The key point: computing the eigenvectors of the covariance matrix gives us the optimal (linear) basis for explaining the variance in our data.
tells us the relative importance of each dimension.
Center the data by subtracting the mean in each dimension. Re-align the data by finding an orthonormal basis for the covariance matrix.
have we seen before? Compare auto-encoders with PCA:
Idea: train a network for data compression/ dimensionality reduction by throwing away outputs.
2
1 3 2
1 3 input target = input hidden layer becomes the output training