PCA applied to bodies e 1 e 2 e 3 e 4 e 5 +4 4 Freifeld and - - PowerPoint PPT Presentation

pca applied to bodies
SMART_READER_LITE
LIVE PREVIEW

PCA applied to bodies e 1 e 2 e 3 e 4 e 5 +4 4 Freifeld and - - PowerPoint PPT Presentation

PCA: Principal Component Analysis Iain Murray http://iainmurray.net/ PCA: Principal Component Analysis Code assuming X is zero-mean 1 % Find top K principal directions: 0 [V, E] = eig(X*X); [E,id] = sort(diag(E),1,descend); 1 V


slide-1
SLIDE 1

PCA: Principal Component Analysis

Iain Murray http://iainmurray.net/

slide-2
SLIDE 2

PCA: Principal Component Analysis

−1 1 −1 1

K = 1 + = X

· = Xproj

— = V(:,1) Code assuming X is zero-mean

% Find top K principal directions: [V, E] = eig(X’*X); [E,id] = sort(diag(E),1,’descend’); V = V(:, id(1:K)); % DxK % Project to K-dims: X_kdim = X*V; % NxK % Project back: X_proj = X_kdim * V’; % NxD

slide-3
SLIDE 3

PCA applied to bodies

+4σ −4σ

e1 e2 e3 e4 e5 µ

Freifeld and Black, ECCV 2012

slide-4
SLIDE 4

PCA applied to DNA

Novembre et al. (2008) — doi:10.1038/nature07331 Carefully selected both individuals and features

1,387 individuals 197,146 single nucleotide polymorphisms (SNPs)

Each person reduced to two(!) numbers with PCA

slide-5
SLIDE 5
slide-6
SLIDE 6

MSc course enrollment data

Binary S×C matrix M Msc = 1, if student s taking course c Each course is a length S vector . . . OR each student is a length C vector

slide-7
SLIDE 7

PCA applied to MSc courses

−0.2 −0.1 0.1 0.2 −0.2 −0.1 0.1 0.2 0.3 ALE1 ADBS ANLP AV ABS AGTA AR ASR BIO1 BIO2 COPT CCN CCS CNV CAV CG CN DIE DME DMR DAPA DS EXC HCI IT IJP IQC IAML LP MLPR MT MASWS MI NLU NC NIP PA PPLS PM PMR QSX RC RL RLSC RSS SP SAPM SEOC ST TTS TCM TDD CPSLP SProc

slide-8
SLIDE 8

PCA applied to MSc students

−0.04 −0.02 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 −0.15 −0.1 −0.05 0.05 0.1

slide-9
SLIDE 9

Truncated SVD

           X11 X12 · · · X1D X21 X22 · · · X2D X31 X32 · · · X3D X41 X42 · · · X4D X51 X52 · · · X5D . . . . . . ... . . . XN1 XN2 · · · XND           

           U11 · · · U1K U21 · · · U2K U31 · · · U3K U41 · · · U4K U51 · · · U5K . . . ... . . . UN1 · · · UNK                 S11 ... SKK           V11 V21 · · · VD1 . . . . . . ... . . . V1K V2K · · · VDK     

X ≈ U S V ⊤

% PCA via SVD, % for zero-mean X: [U, S, V] = svd(X, 0); U = U(:, 1:K); S = S(1:K, 1:K); V = V(:, 1:K); X_kdim = U*S; X_proj = U*S*V’;

slide-10
SLIDE 10

PCA summary

Project data onto major axes of covariance

X⊤X is covariance if make data zero mean

Low-dim coordinates can be useful:

— visualization — if can’t cope with high-dim data

Can project back into original space:

— detail is lost: still in K-dim subspace — PCA minimizes the square error

slide-11
SLIDE 11

PPCA: Probabilistic PCA

Gaussian model: Σ = WW ⊤ + σ2I W is D×K, σ2 small ⇒ nearly low-rank

W is also orthogonal

As σ2 → 0, recover PCA. Need σ2 > 0 to explain data

Special case of factor analysis: Σ = WW ⊤+ Φ, with Φ diagonal

slide-12
SLIDE 12

Dim reduction in other models

Can replace x with Ax in any model A is a K×D matrix of projection params Large D: a lot of extra parameters

NB: Neural nets already have such projections

slide-13
SLIDE 13

Practical tip

Scale features to have unit variance

Equivalently: find eigenvectors of correlation rather than covariance

Avoids issues with (arbitrary?) scaling.

If multiply feature by 109, PC points along that feature

E.g., if change unit of feature from metres to nanometres