APPLIED MACHINE LEARNING Methods for Reduction of Dimensionality - - PowerPoint PPT Presentation

applied machine learning
SMART_READER_LITE
LIVE PREVIEW

APPLIED MACHINE LEARNING Methods for Reduction of Dimensionality - - PowerPoint PPT Presentation

APPLIED MACHINE LEARNING APPLIED MACHINE LEARNING Methods for Reduction of Dimensionality through Linear Projection Principal Component Analysis (PCA) 1 APPLIED MACHINE LEARNING Curse of Dimensionality Computational Costs O(N 2 ) O(N) N:


slide-1
SLIDE 1

APPLIED MACHINE LEARNING

APPLIED MACHINE LEARNING

Methods for Reduction of Dimensionality through Linear Projection Principal Component Analysis (PCA)

1

slide-2
SLIDE 2

APPLIED MACHINE LEARNING

2

Curse of Dimensionality

N: Nb of dimensions Computational Costs

O(N) O(N2) Linear increase is much preferred over exponential increase

slide-3
SLIDE 3

APPLIED MACHINE LEARNING

3

Curse of Dimensionality

N: Nb of dimensions Computational Costs

O(N) O(N2) When the increase is exponential/polynomial  reduce dimensionality of data prior to further processing. Several methods for classification and regression grow exponentially with the dimension of the data N.

slide-4
SLIDE 4

APPLIED MACHINE LEARNING

Principal Component Analysis (PCA)

PCA is a method to reduce the dimensionality of dataset. It does so by projecting the dataset onto a lower-dimensional space.

4

slide-5
SLIDE 5

APPLIED MACHINE LEARNING

The joint angle trajectories convey redundant information  reduce information with PCA Record human motion when writing letters A, B and C

time

5

1

x

2

x

3

x

4

x

Examples: PCA – dimensionality reduction

slide-6
SLIDE 6

APPLIED MACHINE LEARNING

4 2 2 4

4-dimensional state space Project onto 2-dimensional space through matrix x y A

  

1

y

2

y

y Ax 

time

1

x

2

x

3

x

4

x

6

Examples: PCA – dimensionality reduction

time

slide-7
SLIDE 7

APPLIED MACHINE LEARNING

1

y

2

y

Rotate the trajectories y onto the plane where the robot writes. Use inverse kinematics to drive the robot’s motion.

7

Examples: PCA – dimensionality reduction

slide-8
SLIDE 8

APPLIED MACHINE LEARNING

PCA: Exercise 1.1

Reducing dimensionality of dataset

Can you find a way to reduce the amount of information needed to store the coordinates of these 4 datapoints?

8

  • 3
  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 10
  • 5

5

Original Data

1 -2 -3 1.5 Dataset X= 2 -4 -6 3      

slide-9
SLIDE 9

APPLIED MACHINE LEARNING

  • 3
  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 10
  • 5

5

Original Data

If you use the same solution as before, how much error do you get?

9

1.1 -2 -2.9 1.5 Dataset X= 2 -4 -6.2 3      

PCA: Exercise 1.2

Reducing dimensionality of dataset

slide-10
SLIDE 10

APPLIED MACHINE LEARNING

10

2

If we project each datapoint

  • nto a one dimensional space
  • nto the vector , which minimizes reconstruction error among these?

a=[0 1] a=[1 0] a=[1 2]

T T T

x a a 

PCA: Exercise 1.3

Reducing dimensionality of dataset

  • 3
  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 10
  • 5

5

Original Data

1.1 -2 -2.9 1.5 Dataset X= 2 -4 -6.2 3      

slide-11
SLIDE 11

APPLIED MACHINE LEARNING

11

PCA: Reduction of dimensionality

1

x

2

x

What is the 2D to 1D projection that minimizes the reconstruction error? Infinite number of choices for projecting the data  need criteria to reduce the choice 1: minimum information loss (minimal reconstruction error)

slide-12
SLIDE 12

APPLIED MACHINE LEARNING

12

Infinite number of choices for projecting the data  need criteria to reduce the choice

1

x

2

x

What is the 2D to 1D projection that minimizes the reconstruction error? 1: minimum information loss(minimal reconstruction error) 2: equivalent to finding the direction with maximum variance

1

x

2

x

Smallest breadth of data lost Largest breadth of data conserved

Reconstruction after projection

PCA: Reduction of dimensionality

slide-13
SLIDE 13

APPLIED MACHINE LEARNING

Principal Component Analysis

PCA is a method to reduce the dimensionality of dataset. It is used as:

  • Pre-processing method before classification to reduce computational

costs of the classifier.

  • Compression method for ease of data storage and retrieval.
  • Feature extraction method.

13

slide-14
SLIDE 14

APPLIED MACHINE LEARNING

Principal Component Analysis

PCA is a method to reduce the dimensionality of dataset. It is used as:

  • Pre-processing method before classification to reduce computational

costs of the classifier.

  • Compression method for ease of data storage and retrieval.
  • Feature extraction method.

14

slide-15
SLIDE 15

APPLIED MACHINE LEARNING

15

Dataset with samples of two classes (red and green class)

320 240 3 230400

Each image is a high- dimensional vector x

Examples: PCA – preprocessing for classification

slide-16
SLIDE 16

APPLIED MACHINE LEARNING

16

2 2 230400

Project the images onto a lower dimensional space through matrix : y A y Ax

  

Separating Line Examples: PCA – preprocessing for classification

slide-17
SLIDE 17

APPLIED MACHINE LEARNING

Linear classification:

  • Separate the two groups of datapoints with a line.
  • Which projections make separation unfeasible?

17

  • 3
  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 10
  • 5

5

Original Data Projection

1 -2 -3 1.5 Dataset X= 2 -4 -6 3      

PCA: Exercise 2

PCA does not seek projections that make data more separable! However, among the projections, some may make data more separable.

slide-18
SLIDE 18

APPLIED MACHINE LEARNING

Principal Component Analysis

PCA is a method to reduce the dimensionality of dataset. It is used as:

  • Pre-processing method before classification to reduce computational

costs of the classifier.

  • Compression method for ease of data storage and retrieval.
  • Feature extraction method.

18

slide-19
SLIDE 19

APPLIED MACHINE LEARNING

Extract features and reduce dimensionality: 50 PCA extracted from a set of 100 faces, originally coded in a high dimensional pixel space (e.g. 54 150 dimensions),

First 4 projections (principal components)

Hancock, P et al (1996). Face processing: human perception and principal components

  • analysis. Memory and Cognition 24 1, pp. 26–40

19

Examples: PCA for Feature Extraction

slide-20
SLIDE 20

APPLIED MACHINE LEARNING

Examples of six facial expressions (happy, sad, anger, fear, disgust and surprise) in their original format (full-image, top row) and morphed to average face shape (shape-free, bottom row).

Calder et al (2001), A principal component analysis of facial expressions Vision Research, 41:9, p. 1179-1208

20

Examples: PCA for Feature Extraction

slide-21
SLIDE 21

APPLIED MACHINE LEARNING

The first eight eigenfaces abstracted from a PCA of facial expressions.

Calder et al (2001), A principal component analysis of facial expressions Vision Research, 41:9, p. 1179-1208

21

Examples: PCA for Feature Extraction

slide-22
SLIDE 22

APPLIED MACHINE LEARNING

To extract features common among groups and subgroups of images. Which projection would extract one common feature across the 3 images?

1 2 3

12 24 6 2 4 1 14 28 7 5 6 8 6 10 9 2 3 5 2 3 10 3 1 2 3 1 x x x                                                                                       

PCA: Exercise 3

22

slide-23
SLIDE 23

APPLIED MACHINE LEARNING

1 2 3

12 24 6 2 4 1 14 28 7 5 6 8 6 10 9 2 3 5 2 3 10 3 1 2 3 1 x x x                                                                                       

1

x

2

x

3

x

The first 3 dimensions can be compressed into 1.

23

To extract features common among groups and subgroups of images. Which projection would extract some common feature across the 3 images?

PCA: Exercise 3

slide-24
SLIDE 24

APPLIED MACHINE LEARNING

To extract features common among groups and subgroups of images. A first step toward clustering and classification. PCA is not classification!

6 1 7 . . . . . . a                             

PCA ~ Feature Extraction

1

x

2

x

3

x

This projection embeds a feature. The regions of the eyes are darker than the middle region.

a

24

slide-25
SLIDE 25

APPLIED MACHINE LEARNING

Formally, a projection can be described as follows:

: : and p : X N M A p N N Y p M    

1

Let [ ... ] a set of

  • dim. datapoints,

, 1... . A projection of through a linear map : , is given by: .

M i N N p

X x x M N x i M Y X A x y p N Y A X         

Constructing a projection

25

slide-26
SLIDE 26

APPLIED MACHINE LEARNING

Example: 2-dimensional projection through a matrix A

0 15 0 15 0 15 0 15 0 0 15 15 0 0 15 15 0 0 0 0 15 15 15 15 X           

Constructing a projection: Exercise 4

Original data X

5 10 15 5 10 15 5 10 15 x1 x2 x3

26

Find a matrix A which groups the points into 4 groups.

slide-27
SLIDE 27

APPLIED MACHINE LEARNING

Example: 2-dimensional projection through a matrix A

0 15 0 15 0 15 0 15 0 0 15 15 0 0 15 15 0 0 0 0 15 15 15 15 X           

The data are well grouped into 4 tiny clusters

Projected data Y

Constructing a projection: Exercise 4

Original data X 0 1 0 1 0 0 A        0 0 15 15 0 0 15 15 0 15 0 15 0 15 0 15 Y AX        

5 10 15 5 10 15 5 10 15 x1 x2 x3

27

slide-28
SLIDE 28

APPLIED MACHINE LEARNING

Example: 2-dimensional projection through a matrix A

Constructing a projection

1

a

2

a

 

1 2

The rows of are composed of two orthonormal vectors: , . The product of each with each datapoint corresponds to the coordinate of the image

  • f the point

in the projected space.

j i i i

A a a a x y x

28

0 15 0 15 0 15 0 15 0 0 15 15 0 0 15 15 0 0 0 0 15 15 15 15 X            0 1 0 1 0 0 A       

slide-29
SLIDE 29

APPLIED MACHINE LEARNING

Example: 2-dimensional projection through a matrix A

Constructing a projection

3 2

The columns of represent the images (in the projected space) of the axes of the

  • riginal space (

); in the example, the first two columns form a basis of . A

Projections 1 & 2

29

0 15 0 15 0 15 0 15 0 0 15 15 0 0 15 15 0 0 0 0 15 15 15 15 X            0 1 0 1 0 0 A       

slide-30
SLIDE 30

APPLIED MACHINE LEARNING

Uni-dimensional projection from 2-dimensional vectors a

y

 x

     

2

1 0 cos , cos

T T T T

a a x x a a x y a a y x            

Constructing a projection

30

slide-31
SLIDE 31

APPLIED MACHINE LEARNING

 Asking that the projection vectors be orthogonal makes the computation of the projection easier.

x

y

1

a

2

a

       

1 2 1 2 1 2 1 2 2 2 1 2

The projection of onto the plane formed by , , with 0, is given by: coordinate coordinate 1 2

  • f y onto
  • f y onto

T T T

y x a a a a a x a x y a a a a a a     

Constructing a projection

31

slide-32
SLIDE 32

APPLIED MACHINE LEARNING

x

y

1

a

2

a

Normalize the projection vectors , 1,2

i

i i

a e i a  

       

1 2 1 2 1 2 1 2 2 2 1 2

The projection of onto the plane formed by , , with 0, is given by: coordinate coordinate 1 2

  • f y onto
  • f y onto

T T T

y x a a a a a x a x y a a a a a a     

Constructing a projection

32

slide-33
SLIDE 33

APPLIED MACHINE LEARNING

 Asking that the projection vectors be orthonormal makes the computation of the projection easier.

x

y

1

e

2

e

       

1 2 1 2 1 1 2 2

The projection of onto the plane formed by , , with 0, is given by: coordinate coordinate 1 2

  • f y onto
  • f y onto

T T T

y x e e e e y e x e e x e e e     

Constructing a projection

33

slide-34
SLIDE 34

APPLIED MACHINE LEARNING

       

1 2 1 2 1 1 2 2

The projection of onto the plane formed by , , with 0, is given by: coordinate coordinate 1 2

  • f y onto
  • f y onto

T T T

y x e e e e y e x e e x e e e     

1

1 1 1 e                             

The coordinates of the eigenvectors in the original space provide information regarding the contribution of each original dimension.

   

1 1 1 1 1 2 2 1 1 1 2

.... coordinate coordinate

  • f onto
  • f onto

T T

e x e x x e x e x e x     

Eigenface

Interpreting a projection I

34

slide-35
SLIDE 35

APPLIED MACHINE LEARNING

The breadth covered by the coordinates of the projected data Y

  • nto the eigenvectors represents the relative amount (in

variance of data X explained by each eigenvector) – see following slides for derivation.

1

e

2

e

       

1 2 1 2 1 1 2 2

Amount of spread Amount of spread 1 2

  • f Y onto
  • f Y onto

The projection Y of dataset X onto the plane formed by , , with 0, is given by:

T T T

e e

e e e e Y e X e e X e     

Interpreting a projection II

35

slide-36
SLIDE 36

APPLIED MACHINE LEARNING

PCA can be used with a single data-point (single image) to reduce the number of dimensions required to represent this data-point. Very useful in processing of high-dimensional images

PCA for Data Compression

Original Image Image compressed

36

slide-37
SLIDE 37

APPLIED MACHINE LEARNING

PCA for Data Compression

 

 

1

Each image is encoded in .

  • 1. Compute but ask

!

  • 2. Project the image in

.

N N N N T i i i

x A A y Ax y e x e

 

    

Original Image

The larger this projection, the more features in the data are encapsulated in the projection e . Low values noise can be discarded

i

 

37

slide-38
SLIDE 38

APPLIED MACHINE LEARNING

PCA for Data Compression

 

 

1

Each image is encoded in .

  • 1. Compute but ask

!

  • 2. Project the image in

.

N N N N T i i i

x A A y Ax y e x e

 

    

Original Image The smaller p, the more compression

   

 

1

Remove rows of with smallest projections . , .

T i p T i i i

A e x y e x e p N

  

Image compressed

38

slide-39
SLIDE 39

APPLIED MACHINE LEARNING

PCA for Data Compression: Exercise 5

Compressed image is , with 0.1 contains lines of

p p p

y y A x p N A p A    Original image is encoded in .

N

x

39

What is the compression gain?

slide-40
SLIDE 40

APPLIED MACHINE LEARNING

PCA for Data Compression: Exercise 5

Original Image Image compressed 90%

Original image is encoded in .

N

x

40

Compressed image is , with 0.1 contains lines of

p p p

y y A x p N A p A   

slide-41
SLIDE 41

APPLIED MACHINE LEARNING

PCA via reconstruction through error minimization

1

Find such that *

min

A

A A y x

A ensures minimal reconstruction error

  • keep statistics
  • minimal loss of information

Requests that all projection vectors are

  • rthonormal.

     

1 2

1, with 0, . .

T i T T i j

e e i e A e e i j                           

1:

Least-square approximation for reconstruction *

p N p

y y

      

0.1

Compressed image is , contains 1st lines of

p N p p

y y A x A p A

 

Original image is encoded in .

N

x

41

slide-42
SLIDE 42

APPLIED MACHINE LEARNING

 

 

1

*

p T i i i

y e x e



1 *

min A y x

 

 

1 1

,...,

min

N T i i p N i p

e e

e x e

  

 

 

 

 

 

 

     

   

 

   

1

1 ,..., 1

1 1 1 1 1

,..., 1

,..., ,...,

,...,

Ask that all projections be orthogonal 0, min

min min

min

min

p N

N T T i i i i i p e e p N

N N N T T T T T i i i i i i i i i p i p i p p p N N

T i j e e T i i i p N

e e e e

e e

e xe e xe xe e e xe

e x e e x e

e e i j xe xe

  

       

   

 

 

 

1 1 1,...,

min

N N T i T i p i p p N

e e

e xx e

    

 

1:

Least-square approximation for reconstruction *

p N p

y y

      

PCA via reconstruction through error minimization

42

slide-43
SLIDE 43

APPLIED MACHINE LEARNING

1

T

C XX M 

Covariance Matrix for zero-mean data

 

     

   

1 1 1 1 1 1

,..., ,...,

1 1

min min

N M N M T T T T T i j i i j i i j j i i p j i p j p p N N

e e e e

e x e e x e e x x e M M

       

   

Generalize to minimizing reconstruction error for a set of M datapoints

1 *

min A y x

PCA via reconstruction through error minimization

43

 

 

1 1

,...,

min

N T i i p N i p

e e

e x e

  

slide-44
SLIDE 44

APPLIED MACHINE LEARNING

1

x

2

x

1

x

2

x

 

x E x 

 

E x  1

T

C XX M 

Covariance Matrix for zero-mean data

PCA via reconstruction through error minimization

44

 

     

   

1 1 1 1 1 1

,..., ,...,

1 1

min min

N M N M T T T T T i j i i j i i j j i i p j i p j p p N N

e e e e

e x e e x e e x x e M M

       

   

slide-45
SLIDE 45

APPLIED MACHINE LEARNING

 

     

   

1 1 1 1 1 1

,..., ,...,

1 1

min min

N M N M T T T T T i j i i j i i j j i i p j i p j p p N N

e e e e

e x e e x e e x x e M M

       

   

1

x

2

x

       

1 2 1 1 2 2

var cov , cov , var x x x C x x x         

   

 

 

 

2 2

var

i i i

x E x E x  

     

 

 

 

1 2 1 2 1 2

cov , x x E x x E x E x   

1

T

C XX M 

Covariance Matrix for zero-mean data

 

E x 

 

,...,

min

N T i i i p p N

e e

e Ce

PCA via reconstruction through error minimization

45

slide-46
SLIDE 46

APPLIED MACHINE LEARNING

1 2

, are uncorrelated x x

   

1 2

var 0 var x C x         

1

x

2

x

     

 

 

 

1 2 1 2 1 2

cov , x x E x x E x E x    

PCA via reconstruction through error minimization

46

slide-47
SLIDE 47

APPLIED MACHINE LEARNING

Each image is encoded in .

N

x

A ensures minimal reconstruction error

  • keep statistics
  • minimal loss of information

Requests that all projection vectors are

  • rthonormal.

 

1

Compressed image is

p p T i i i

y y Ax e xe

  

 

min

j

T j j e

e Ce

 

1, 0,

i T i j

e i e e i j     

Optimization with constraint problem: convex objective function under equality constraint  Lagrange multipliers

PCA via reconstruction through error minimization

47

slide-48
SLIDE 48

APPLIED MACHINE LEARNING

     

 

 

1 1 1 1 1 1 1 1 1 1 1

Constrained-based optimization (solving for the first eigenvector) Minimum of the Lagrangian: , 1 ,

T T

L e e Ce e e L e Ce e e Ce e               

Each image is encoded in .

N

x

 

1

Compressed image is

p p T i i i

y y Ax e xe

  

The solution is an eigenvector of the covariance matrix C! All eigenvectors of the matrix C are orthonormal  the p projections are p eigenvectors of C

PCA via reconstruction through error minimization

48

slide-49
SLIDE 49

APPLIED MACHINE LEARNING

Eigenvalue Decomposition of Correlation Matrix

i i

i

Ce e  

 

1... , such that:

The correlation matrix is real, square NxN and symmetric. It can hence be decomposed into a set of real eigenvectors with associated real eigenvalues ,

i

i

i N

N e 

 The Eigenvalues are calculated by solving the characteristic equation:

 

det C I   

49

slide-50
SLIDE 50

APPLIED MACHINE LEARNING

Eigenvalue Decomposition of Correlation Matrix

1 1 2

The eigenvalue decomposition yields a product between 3 matrices: ....... : matrix of the eigenvectors ..... .... ...... : diagonal matrix co ...... ......................

T N N

C V V V e e                         mposed of the eigenvalues

50

slide-51
SLIDE 51

APPLIED MACHINE LEARNING

Eigenvalue Decomposition of Correlation Matrix

1

x

2

x

1

e

2

e

 

1

T T

C E XX XX M  

Not Diagonal!

1 2

, [ ]

T

C V V V e e   

Eigenvalue decomposition

1 2

= , [ ]

T

Y AX A V V e e  

Project onto eigenvectors

1

e

2

e

T Y Y

C YY C    

Compute Covariance It is diagonal  Projection uncorrelated!

51

slide-52
SLIDE 52

APPLIED MACHINE LEARNING

The eigenvalues give a measure of the variance of the distribution of X on each projection.

 

 

Each projection

  • f
  • n is given by

Percentage of the dataset covered by each projection:

T i i i i i T i T j j

Y X e Y e X e X e X e 

Eigenvalue Decomposition of Correlation Matrix

     

 

 

Observe: . and , , .

T T T T i T i i T i i i i i T i j T i i j T j j j

X e X e e XX e e e i j i j e e X e X e             

52

slide-53
SLIDE 53

APPLIED MACHINE LEARNING

     

 

 

Observe: . and , , .

T T T T i T i i T i i i i i T i j T i i j T j j j

X e X e e XX e e e i j i j e e X e X e             

Eigenvalue Decomposition of Correlation Matrix

1

x

2

x

1

e

2

e

   

 

1 1 1 1

var

T T T

e XX e e x  

The eigenvalues give a measure of the variance of the distribution of X on each projection.

53

slide-54
SLIDE 54

APPLIED MACHINE LEARNING

54

Infinite number of choices for A  need criteria to reduce the choice 1: minimum loss from statistical viewpoint (minimal reconstruction error) 2: equivalent to finding the direction with maximum variance

1

x

2

x PCA: Maximize Variance

j j

Ce e   

 

1,...

argmax under constraint 1.

T j j j p j

e Ce e

The solution is also an eigenvector

  • f the Covariance matrix.

The eigenvalue determines the “importance” of the projection.

1

e

slide-55
SLIDE 55

APPLIED MACHINE LEARNING

Eigenvalue in MLDEMOS

55

slide-56
SLIDE 56

APPLIED MACHINE LEARNING

First 4 principal components

Examples: PCA for Data Compression

56

slide-57
SLIDE 57

APPLIED MACHINE LEARNING

First 4 principal components

Examples: PCA for Data Compression

57

slide-58
SLIDE 58

APPLIED MACHINE LEARNING

Examples: PCA for Data Compression

58

slide-59
SLIDE 59

APPLIED MACHINE LEARNING

Examples: PCA for Data Compression

59

slide-60
SLIDE 60

APPLIED MACHINE LEARNING

Examples: PCA for Data Compression

60

slide-61
SLIDE 61

APPLIED MACHINE LEARNING

PCA: SUMMARY Algorithm

1 2

....

N

     

Method: Ordered orthogonal basis with the first eigenvector having the direction

  • f largest variance of the data.

1,..., N

e e

with Choose the p<<N major eigenvalues  reduce the dimensionality  Projection space contains only the major directions of variation of the data.  Reduce noise encapsulated in the lower dimensions of the dataset.

Goal:

1. Identifying a suitable representation of a multivariate data set by decorrelating the dataset. 2. Reducing the dimensionality of the multidimensional data set

61

slide-62
SLIDE 62

APPLIED MACHINE LEARNING

Summary: PCA through Eigenvalue Decomposition

 

 

 

1

1) Substract the mean:

  • 2) Compute Covariance matrix:

3) Compute eigenvalues using det 0. 4) Compute eigenvectors using . 5) Choose first p eigenvectors: ,... Al . wi gorithm: t

i i

T i p

x x E X C E XX C I Ce e N e e        

1 2 1 1 1 1

h ... ...... 6) Project data onto new basis: , .. ......

p N p p p p N

e e Y A X A e e                 

62

slide-63
SLIDE 63

APPLIED MACHINE LEARNING

1. Determine the direction (vector) along which the variance of the data is maximal. 2. Determine an orthonormal basis. 3. The projection of the data onto each axis are uncorrelated. 4. PCA gives an optimal (in the mean-square sense) linear reduction

  • f the dimensionality.

Principal Component Analysis Properties of the Projections

63

slide-64
SLIDE 64

APPLIED MACHINE LEARNING

64

Register your team on doodle Check webpage of the class Practice sessions are done in team of three. Practical I next week! Friday September 30 ROOM WILL BE ANNOUNCED SOON

slide-65
SLIDE 65

APPLIED MACHINE LEARNING

65

Teach you how to exploit at best PCA for: a) Reduction of dimensionality b) As a first pre-processing step to perform classification (PCA DOES NOT DO CLASSIFICATION!)  Find projection that enable to separate well classes The practical should allow you to: Understand sensitivity of PCA to choice of training dataset! Know how to select/generate at best data for training the algorithm Know how to evaluate the algorithm and what it has really learned

Goal of practical I

slide-66
SLIDE 66

APPLIED MACHINE LEARNING

Creating a training and a testing set for the practicals

  • Gather enough examples of each of the objects you want to classify
  • Make sure these examples are representative of the object and do not

contain a systematic bias (misleading feature)

66

slide-67
SLIDE 67

APPLIED MACHINE LEARNING

67

ML rely on statistics  they may learn the wrong correlations if the data is ambiguous. The two sets of images (red and green class) differ in both orientation and shape. Shape is the feature we try to teach the algorithm. The algorithm may cluster data of the water bottle and of the yoghurt box according to orientation instead of their different shape and texture, as the datasets of each object shows two very distinct types of orientation.

Choosing well your dataset

slide-68
SLIDE 68

APPLIED MACHINE LEARNING

68

ML rely on statistics  they may learn the wrong correlations if the data is ambiguous. Above, we see the first 5 eigenvectors found by the algorithm (e1 to e5 from left to right). The two orientations are well encapsulated and easily separable using in combination the first and second eigenvectors. To separate the different types of objects, one must go to the projections with lower representative power, here projections 4 and 5.

Choosing well your dataset

slide-69
SLIDE 69

APPLIED MACHINE LEARNING

69

Looking at the percentage of the variance explained by the eigenvectors help to determine that lower projections. Here to explain 90%

  • f

the variance of the data, one must use up to 7 projections

Choosing well your dataset