ECS231 PCA, revisited
May 28, 2019
1 / 18
ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for - - PowerPoint PPT Presentation
ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for learning a representation of data 3. Extra: learning XOR 2 / 18 1. PCA for lossy data compression 1 Data compression: given data points { x (1)
1 / 18
2 / 18
◮ Data compression:
◮ Encoding function f : x → c ◮ Lossy decoding function g : c ❀ x ◮ Reconstruction: x ≈ g(c) = g(f(x)) ◮ PCA is defined by choicing decoding function:
◮ Questions:
1Section 2.12 of I. Goodfellow, Y. Bengio and A. Courville, Deep Learning,
3 / 18
c
2. ◮ By vector calculus and the first-order necessary condition for
◮ To encode x, we just need the mat-vec product
◮ PCA reconstruction operation
4 / 18
◮ Idea: minimize the L2 distance between inputs and reconstructions:
j
◮ For simplicity, consider ℓ = 1 and D = d ∈ Rn, then
2
◮ Let X ∈ Rm×n with X(i,:) = (x(i))T , then
F
5 / 18
◮ Equivalently,
2
◮ Let (σ,u1, v1) be the largest singular triplet of X, i.e.,
2 = v1. ◮ In the general case, when ℓ > 1, the matrix D is given by the ℓ right
6 / 18
7 / 18
Height
1 2 3 4 5 6 20 40 60 80 100
data pca
Weight
1 2 3 4 5 6 10 20 30 40
data pca
8 / 18
◮ PCA as an unsupervised learning algorithm that learns a representation
◮ learns a representation that has lower dimensionality than the original
◮ learns a representation whose element have no linear correlation with
◮ Consider m × n “design” matrix X of data x with
◮ PCA finds a representation of x via an orthogonal linear transformation
2Section 5.8.1 of I. Goodfellow, Y. Bengio and A. Courville, Deep Learning,
9 / 18
◮ Let X = UΣW T be the SVD of X ◮ Then
10 / 18
◮ Therefore, if we take
11 / 18
◮ The individual elements of z are mutually uncorrelated — disentangle
◮ While correlation is an important category of dependency between
12 / 18
13 / 18
0.1 0.2 0.3 0.4 0.5 x1
0.1 0.2 0.3 0.4 0.5 x2 Original data
0.5 z1
0.1 0.2 0.3 0.4 0.5 z2 PCA-transformed data
14 / 18
15 / 18
◮ The first (simplest) example of “Deeping Learning” ◮ The XOR function (“exclusive or”)
◮ Task: find function f ∗ such that
◮ Model: ˆ
◮ Measure: MSE loss function
3Section 6.1 of I. Goodfellow, Y. Bengio and A. Courville, Deep Learning,
16 / 18
◮ Linear model:
◮ Solution of the minimization of the MSE loss function
◮ A linear model is not able to represent the XOR function
17 / 18
◮ Two-layer model:
◮ Then by taking
◮ Question: how to find θ∗?
18 / 18