Principal Component Analysis Proseminar Data Mining Tobias Holl 1 1 - - PowerPoint PPT Presentation

principal component analysis
SMART_READER_LITE
LIVE PREVIEW

Principal Component Analysis Proseminar Data Mining Tobias Holl 1 1 - - PowerPoint PPT Presentation

. . . . . . . . . . . . . . Introduction Theory Applications Principal Component Analysis Proseminar Data Mining Tobias Holl 1 1 Technische Universitt Mnchen 2017-06-09 Tobias Holl Technische Universitt Mnchen . . .


slide-1
SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Principal Component Analysis

Proseminar Data Mining Tobias Holl1

1Technische Universität München

2017-06-09

Tobias Holl Technische Universität München Principal Component Analysis

slide-2
SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

The Problem

Tobias Holl Technische Universität München Principal Component Analysis

slide-3
SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

The Problem

Data.

Lots of data.

Tobias Holl Technische Universität München Principal Component Analysis

slide-4
SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

An Example

Reference Energy Disaggregation Data Set [1]

▶ Power usage over >100 days for >200 devices ▶ Measured every 2s ▶ Over 500GB of compressed data

Tobias Holl Technische Universität München Principal Component Analysis

slide-5
SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

An Example

Iris Data Set [2]

▶ 150 fmowers of 3 difgerent species ▶ Petal and sepal widths and lengths

Tobias Holl Technische Universität München Principal Component Analysis

slide-6
SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

An Example

Sepal length

2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 4.5 5.5 6.5 7.5 2.0 2.5 3.0 3.5 4.0

Sepal width Petal length

1 2 3 4 5 6 7 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 0.5 1.0 1.5 2.0 2.5 1 2 3 4 5 6 7

Petal width

Tobias Holl Technische Universität München Principal Component Analysis

slide-7
SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

An Example

Petal length

0.5 1.0 1.5 2.0 2.5 1 2 3 4 5 6 7 1 2 3 4 5 6 7 0.5 1.0 1.5 2.0 2.5

Petal width

Tobias Holl Technische Universität München Principal Component Analysis

slide-8
SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

An Example

1 2 3 4 5 6 7 0.5 1.0 1.5 2.0 2.5 Petal length Petal width

Tobias Holl Technische Universität München Principal Component Analysis

slide-9
SLIDE 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

An Example

1 2 3 4 5 6 7 0.5 1.0 1.5 2.0 2.5 Petal length Petal width

Clear correlation

Tobias Holl Technische Universität München Principal Component Analysis

slide-10
SLIDE 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

An Example

1 2 3 4 5 6 7 0.5 1.0 1.5 2.0 2.5 Petal length Petal width

Unnecessary redundancy

Tobias Holl Technische Universität München Principal Component Analysis

slide-11
SLIDE 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Data Matrices

X =   

Variable 1 ··· Variable n Measurement 1

x11 · · · x1n . . . . . . ... . . .

Measurement m

xm1 · · · xmn    ∈ Rm×n

Tobias Holl Technische Universität München Principal Component Analysis

slide-12
SLIDE 12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Data Matrices

X =   

Variable 1 ··· Variable n Measurement 1

x11 · · · x1n . . . . . . ... . . .

Measurement m

xm1 · · · xmn    ∈ Rm×n Assume that X is centered around 0.

Tobias Holl Technische Universität München Principal Component Analysis

slide-13
SLIDE 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Some Statistics

cov(xa, xb) = 1 m − 1xT

axb

Tobias Holl Technische Universität München Principal Component Analysis

slide-14
SLIDE 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Some Statistics

cov(xa, xb) = 1 m − 1xT

axb

cov(xa, xb) is the covariance of xa and xb.

Tobias Holl Technische Universität München Principal Component Analysis

slide-15
SLIDE 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Some Statistics

cov(xa, xb) = 1 m − 1xT

axb

cov(xa, xb) is the covariance of xa and xb. Covariance describes the strength of the correlation.

Tobias Holl Technische Universität München Principal Component Analysis

slide-16
SLIDE 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Some Statistics

cov(X) =    cov(x1, x1) · · · cov(x1, xn) . . . ... . . . cov(x1, xn) · · · cov(xn, xn)   

Tobias Holl Technische Universität München Principal Component Analysis

slide-17
SLIDE 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Some Statistics

cov(X) =    cov(x1, x1) · · · cov(x1, xn) . . . ... . . . cov(x1, xn) · · · cov(xn, xn)    cov(v, v) = 1 m − 1vTv = var(v)

Tobias Holl Technische Universität München Principal Component Analysis

slide-18
SLIDE 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Some Statistics

cov(X) =    var(x1) · · · cov(x1, xn) . . . ... . . . cov(x1, xn) · · · var(xn)   

Tobias Holl Technische Universität München Principal Component Analysis

slide-19
SLIDE 19

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Some Statistics

cov(X) =    var(x1) · · · cov(x1, xn) . . . ... . . . cov(x1, xn) · · · var(xn)    = 1 m − 1XTX

Tobias Holl Technische Universität München Principal Component Analysis

slide-20
SLIDE 20

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

What We Really Want

Eliminate unnecessary redundancies

Tobias Holl Technische Universität München Principal Component Analysis

slide-21
SLIDE 21

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

What We Really Want

Transform X into Y so that cov(ya, yb) = 0 ∀a ̸= b

Tobias Holl Technische Universität München Principal Component Analysis

slide-22
SLIDE 22

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

What We Really Want

Transform X linearly into Y = XP so that cov(ya, yb) = 0 ∀a ̸= b

Tobias Holl Technische Universität München Principal Component Analysis

slide-23
SLIDE 23

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

What We Really Want

Transform X linearly into Y = XP so that cov(Y) =    var(y1) ... var(yn)   

Tobias Holl Technische Universität München Principal Component Analysis

slide-24
SLIDE 24

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Diagonalizing Matrices

Theorem

Every symmetric real matrix A has an eigenvalue decomposition A = VDVT, where D is a diagonal matrix composed of the eigenvalues of A, and V is orthonormal. The rows of V are the eigenvectors corresponding to the matching entry in D. D = VTAV follows trivially.

Tobias Holl Technische Universität München Principal Component Analysis

slide-25
SLIDE 25

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Diagonalizing Matrices

Theorem

Every symmetric real matrix A has an eigenvalue decomposition A = VDVT, where D is a diagonal matrix composed of the eigenvalues of A, and V is orthonormal. The rows of V are the eigenvectors corresponding to the matching entry in D. D = VTAV follows trivially. cov(Y) = VTcov(X) V

Tobias Holl Technische Universität München Principal Component Analysis

slide-26
SLIDE 26

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Diagonalizing Matrices

Theorem

Every symmetric real matrix A has an eigenvalue decomposition A = VDVT, where D is a diagonal matrix composed of the eigenvalues of A, and V is orthonormal. The rows of V are the eigenvectors corresponding to the matching entry in D. D = VTAV follows trivially. cov(Y) = Pcov(X) PT

Tobias Holl Technische Universität München Principal Component Analysis

slide-27
SLIDE 27

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Diagonalizing Matrices

Theorem

Every symmetric real matrix A has an eigenvalue decomposition A = VDVT, where D is a diagonal matrix composed of the eigenvalues of A, and V is orthonormal. The rows of V are the eigenvectors corresponding to the matching entry in D. D = VTAV follows trivially. P = VT The columns of P are the (normalized) eigenvectors of cov(X) matching the eigenvalues in cov(Y).

Tobias Holl Technische Universität München Principal Component Analysis

slide-28
SLIDE 28

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Diagonalizing Matrices

One more thing!

We could order the eigenvalues in cov(Y) any way we want.

Tobias Holl Technische Universität München Principal Component Analysis

slide-29
SLIDE 29

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Diagonalizing Matrices

Defjnition

The eigenvalues in cov(Y) are ordered by descending size.

Tobias Holl Technische Universität München Principal Component Analysis

slide-30
SLIDE 30

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Principal Component Analysis

Defjnition

The columns of P are the principal components pi of X.

Tobias Holl Technische Universität München Principal Component Analysis

slide-31
SLIDE 31

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Throwing Away Data, Gracefully

Dimensionality reduction reduces the number of variables while losing as little data as possible. p1 is the axis with the highest variance, p2 the axis with the highest variance orthogonal to p1, and so on. pn is the axis with the lowest variance.

Tobias Holl Technische Universität München Principal Component Analysis

slide-32
SLIDE 32

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Throwing Away Data, Gracefully

Dimensionality reduction reduces the number of variables while losing as little data as possible. p1 is the axis with the highest variance, p2 the axis with the highest variance orthogonal to p1, and so on. pn is the axis with the lowest variance.

Tobias Holl Technische Universität München Principal Component Analysis

slide-33
SLIDE 33

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Throwing Away Data, Gracefully

In P, remove the last n − i columns to obtain Pi. Then, to reduce X to i dimensions: Xi XPPiT

Tobias Holl Technische Universität München Principal Component Analysis

slide-34
SLIDE 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Throwing Away Data, Gracefully

In P, remove the last n − i columns to obtain Pi. Then, to reduce X to i dimensions: Xi = XPPiT

Tobias Holl Technische Universität München Principal Component Analysis

slide-35
SLIDE 35

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Throwing Away Data, Gracefully

Sepal length

2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 4.5 5.5 6.5 7.5 2.0 2.5 3.0 3.5 4.0

Sepal width Petal length

1 2 3 4 5 6 7 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 0.5 1.0 1.5 2.0 2.5 1 2 3 4 5 6 7

Petal width

Tobias Holl Technische Universität München Principal Component Analysis

slide-36
SLIDE 36

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Throwing Away Data, Gracefully

Sepal length

2.5 3.0 3.5 4.0 0.0 0.5 1.0 1.5 2.0 2.5 5 6 7 8 2.5 3.0 3.5 4.0

Sepal width Petal length

1 2 3 4 5 6 7 5 6 7 8 0.0 0.5 1.0 1.5 2.0 2.5 1 2 3 4 5 6 7

Petal width

Tobias Holl Technische Universität München Principal Component Analysis

slide-37
SLIDE 37

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Throwing Away Data, Gracefully

Sepal length

2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 4.5 5.5 6.5 7.5 2.0 2.5 3.0 3.5 4.0

Sepal width Petal length

1 2 3 4 5 6 7 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 0.5 1.0 1.5 2.0 2.5 1 2 3 4 5 6 7

Petal width

i = n = 4

Sepal length

2.5 3.0 3.5 4.0 0.0 0.5 1.0 1.5 2.0 2.5 5 6 7 8 2.5 3.0 3.5 4.0

Sepal width Petal length

1 2 3 4 5 6 7 5 6 7 8 0.0 0.5 1.0 1.5 2.0 2.5 1 2 3 4 5 6 7

Petal width

i = 2

Tobias Holl Technische Universität München Principal Component Analysis

slide-38
SLIDE 38

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Linear Regression

1 2 3 4 5 6 7 0.5 1.0 1.5 2.0 2.5 Petal length Petal width

Tobias Holl Technische Universität München Principal Component Analysis

slide-39
SLIDE 39

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Eigenfaces [3, 4]

The fjrst eight original Eigenfaces from [3]

Tobias Holl Technische Universität München Principal Component Analysis

slide-40
SLIDE 40

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Eigenfaces [3, 4]

▶ We can treat pictures of faces as vectors of pixels.

If we have many pictures, we can turn them into a matrix. Now, apply PCA to this matrix to obtain the principal components. These principal components are the Eigenfaces [3]. They represent basic features of human faces.

Tobias Holl Technische Universität München Principal Component Analysis

slide-41
SLIDE 41

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Eigenfaces [3, 4]

▶ We can treat pictures of faces as vectors of pixels. ▶ If we have many pictures, we can turn them into a matrix.

Now, apply PCA to this matrix to obtain the principal components. These principal components are the Eigenfaces [3]. They represent basic features of human faces.

Tobias Holl Technische Universität München Principal Component Analysis

slide-42
SLIDE 42

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Eigenfaces [3, 4]

▶ We can treat pictures of faces as vectors of pixels. ▶ If we have many pictures, we can turn them into a matrix. ▶ Now, apply PCA to this matrix to obtain the principal

components. These principal components are the Eigenfaces [3]. They represent basic features of human faces.

Tobias Holl Technische Universität München Principal Component Analysis

slide-43
SLIDE 43

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Eigenfaces [3, 4]

▶ We can treat pictures of faces as vectors of pixels. ▶ If we have many pictures, we can turn them into a matrix. ▶ Now, apply PCA to this matrix to obtain the principal

components.

▶ These principal components are the Eigenfaces [3].

They represent basic features of human faces.

Tobias Holl Technische Universität München Principal Component Analysis

slide-44
SLIDE 44

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Eigenfaces [3, 4]

▶ We can treat pictures of faces as vectors of pixels. ▶ If we have many pictures, we can turn them into a matrix. ▶ Now, apply PCA to this matrix to obtain the principal

components.

▶ These principal components are the Eigenfaces [3]. ▶ They represent basic features of human faces.

Tobias Holl Technische Universität München Principal Component Analysis

slide-45
SLIDE 45

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Eigenfaces [3, 4]

▶ Every (reasonably similar) picture of a face can be represented

through weights assigned to those features. This means we can use Eigenfaces to detect and recognize faces [4].

Tobias Holl Technische Universität München Principal Component Analysis

slide-46
SLIDE 46

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Eigenfaces [3, 4]

▶ Every (reasonably similar) picture of a face can be represented

through weights assigned to those features.

▶ This means we can use Eigenfaces to detect and recognize

faces [4].

Tobias Holl Technische Universität München Principal Component Analysis

slide-47
SLIDE 47

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

Q&A

Tobias Holl Technische Universität München Principal Component Analysis

slide-48
SLIDE 48

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

References I

  • J. Z. Kolter and M. J. Johnson, “REDD: A public data set for

energy disaggregation research,” in SustKDD 2011, 2011. [Online]. Available: http://redd.csail.mit.edu/kolter-kddsust11.pdf

  • R. A. Fisher, “The uses of multiple measurements in

taxonomic problems,” Annals of Eugenics, vol. 7, no. 2, pp. 179–188, Sep. 1936.

  • L. Sirovich and M. Kirby, “Low-dimensional procedure for the

characterization of human faces,” Journal of the Optical Society of America A, vol. 4, no. 3, p. 519, Mar. 1987.

Tobias Holl Technische Universität München Principal Component Analysis

slide-49
SLIDE 49

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Theory Applications

References II

  • M. Turk and A. Pentland, “Face recognition using eigenfaces,”

in Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 1991, pp. 586–591.

Tobias Holl Technische Universität München Principal Component Analysis