Nonlinear Dimension Reduction Using Kernel Representations Katie - - PowerPoint PPT Presentation

nonlinear dimension reduction using kernel representations
SMART_READER_LITE
LIVE PREVIEW

Nonlinear Dimension Reduction Using Kernel Representations Katie - - PowerPoint PPT Presentation

Introduction Theoretical Development Simulation Application Conclusion Nonlinear Dimension Reduction Using Kernel Representations Katie Kempfert University of North Carolina Wilmington Statistical and Machine Learning REU July 25, 2017 1 /


slide-1
SLIDE 1

Introduction Theoretical Development Simulation Application Conclusion

Nonlinear Dimension Reduction Using Kernel Representations

Katie Kempfert

University of North Carolina Wilmington Statistical and Machine Learning REU

July 25, 2017

1 / 22

slide-2
SLIDE 2

Introduction Theoretical Development Simulation Application Conclusion

Outline

Introduction Theoretical Development Kernel Principal Component Analysis (KPCA) Supervised Kernel Principal Component Analysis (SKPCA) Kernel Fisher’s Discriminant Analysis (KFDA) Simulation Application Conclusion

2 / 22

slide-3
SLIDE 3

Introduction Theoretical Development Simulation Application Conclusion

Introduction

◮ Dimensionality reduction techniques have become more

popular with the rise of big data.

◮ In particular, dimensionality reduction is important for image

processing.

◮ Features extracted from images often have very high

dimension, especially because redundant information and noise is contained in the features.

◮ Dimensionality reduction can be used to find meaningful

patterns in the data.

3 / 22

slide-4
SLIDE 4

Introduction Theoretical Development Simulation Application Conclusion Kernel Principal Component Analysis (KPCA)

Principal Component Analysis (PCA)

◮ Let X be a data matrix with covariance matrix Σ. ◮ In standard PCA, we assume the directions of variability in X

are linear.

◮ Hence, we seek some transformation of X

Y

n×d = X n×p

A

p×d,

(1) such that A is an orthogonal matrix.

◮ This optimization problem can be expressed as the following

eigenproblem: Σai = λiai, i = 1, . . . , d, (2) where λ1 ≥ λ2 ≥ ... ≥ λd are eigenvalues with associated eigenvectors ai.

4 / 22

slide-5
SLIDE 5

Introduction Theoretical Development Simulation Application Conclusion Kernel Principal Component Analysis (KPCA)

Nonlinear Mapping

◮ A disadvantage of standard PCA is that it can only identify

linear directions of variability.

◮ We overcome this by mapping X into some higher-dimensional

Hilbert space Rq via the nonlinear function Φ: Φ :Rn → Rq X → Z (3)

◮ The goal is to find Φ such that the directions of variability in

Φ(X) = Z are linear.

5 / 22

slide-6
SLIDE 6

Introduction Theoretical Development Simulation Application Conclusion Kernel Principal Component Analysis (KPCA)

Example of Nonlinear Mapping

Figure 1: Intuition of KPCA

6 / 22

slide-7
SLIDE 7

Introduction Theoretical Development Simulation Application Conclusion Kernel Principal Component Analysis (KPCA)

Kernel Trick

◮ Performing PCA in a higher-dimensional space like Rq may

present computational complexities.

◮ In order to reduce the complexity, we use the kernel trick:

k(xi, xj) = Φ(xi), Φ(xj), i, j = 1, 2, ..., n (4)

◮ The kernel is substituted for any dot product used in the

covariance or Grammian matrix.

◮ Then we can essentially perform PCA on Z in Rq.

7 / 22

slide-8
SLIDE 8

Introduction Theoretical Development Simulation Application Conclusion Supervised Kernel Principal Component Analysis (SKPCA)

SKPCA Problem

◮ PCA and KPCA are unsupervised methods, since they do not

consider a response variable when identifying directions of variability in the data.

◮ SKPCA is a generalization of PCA and KPCA which

incorporates class information.

◮ This is done by solving the maximization problem

tr(βKHLHKβt)

β

, (5) where Kij = k(xi, xj) is the kernel matrix as defined for KPCA, Lij = l(yi, yj) = 1(yi = yj) × k(xi, xj) is the link matrix, and Hij = 1(i = j) − 1

n is the Hat matrix, for i, j = 1, ..., n.

8 / 22

slide-9
SLIDE 9

Introduction Theoretical Development Simulation Application Conclusion Supervised Kernel Principal Component Analysis (SKPCA)

SKPCA Solution

Assuming K is non-singular, this is a regular eigenproblem, since we have Av = λBv, which implies B−1Av = λv, (6) where λ = vtAv

vtKv , B = K, and A = KHLHK, and B−1Av is

symmetric.

9 / 22

slide-10
SLIDE 10

Introduction Theoretical Development Simulation Application Conclusion Kernel Fisher’s Discriminant Analysis (KFDA)

Fisher’s Discriminant Analysis (FDA)

◮ FDA is a popular dimension reduction technique in statistical

and machine learning.

◮ Given a dataset with m classes, FDA aims to find the best set

  • f features to discriminate between the classes.

◮ FDA is a supervised method; for every observation xi, FDA

uses a class label associated with it.

◮ FDA can only identify groups that are linearly separable.

10 / 22

slide-11
SLIDE 11

Introduction Theoretical Development Simulation Application Conclusion Kernel Fisher’s Discriminant Analysis (KFDA)

◮ In standard FDA, we seek to maximize the following objective

function J(v): J(v) = vtSBv vtSW v , (7) where SB is the between classes scatter matrix, SW is the within classes scatter matrix, and v is a p x 1 vector. SB

p×p =

  • c

( ¯ xc − ¯ x)( ¯ xc − ¯ x)t SW

p×p =

  • c
  • i∈c

(xi − ¯ xc)(xi − ¯ xc)t (8)

◮ The solution to the maximization of J(v) is the eigenproblem

S

1 2

BSW −1S

1 2

Bu = λu.

(9)

11 / 22

slide-12
SLIDE 12

Introduction Theoretical Development Simulation Application Conclusion Kernel Fisher’s Discriminant Analysis (KFDA)

Kernel Trick

◮ FDA can be generalized to KFDA to accommodate

nonlinearities in the data.

◮ Similar to KPCA and SKPCA, this is achieved through the

kernel trick.

◮ Basically, any occurrence of the dot product in the scatter

matrices is replaced with the kernel function k(xi, xj) = Φ(xi), Φ(xj), i, j = 1, 2, ..., n. (10)

12 / 22

slide-13
SLIDE 13

Introduction Theoretical Development Simulation Application Conclusion

Simulation Summary

◮ KPCA, SKPCA, and KFDA are applied to 3 simulation

datasets generated in R.

◮ For all methods, a modification of the radial basic function

(RBF) k(xi, xj) = e−δ||xi−xj||2 (11) is used.

◮ The tuning parameter δ is chosen through a grid search for

each combination of dimensionality reduction method and dataset.

◮ For data visualization purposes, plots comparing the original

data and the reduced dimension data in two dimensions (the projections) are given for each dataset.

13 / 22

slide-14
SLIDE 14

Introduction Theoretical Development Simulation Application Conclusion

Three Ring Data

(a) Original Data (b) KFDA Projections in 2D (c) KPCA Projections in 2D (d) SKPCA Projections in 2D

14 / 22

slide-15
SLIDE 15

Introduction Theoretical Development Simulation Application Conclusion

Wine Chocolate Data

(a) Original Data (b) KFDA Projections in 2D (c) KPCA Projections in 2D (d) SKPCA Projections in 2D

15 / 22

slide-16
SLIDE 16

Introduction Theoretical Development Simulation Application Conclusion

Swiss Roll Data

(a) Original Data (b) KFDA Projections in 2D (c) KPCA Projections in 2D (d) SKPCA Projections in 2D

16 / 22

slide-17
SLIDE 17

Introduction Theoretical Development Simulation Application Conclusion

Introduction to MORPH-II

◮ MORPH-II is a face imaging database used by over 500

researchers worldwide for a variety of race, gender, and age face imaging tasks.

◮ It includes 55,134 mugshots of 13,617 individuals collected

  • ver a 5-year span.

◮ Additionally, MORPH-II provides relevant metadata such as

subject ID number, picture number, date of birth, date of arrest, race, gender, and age.

◮ On average, there are 4 images per subject, with ages ranging

from 16 to 77 years.

17 / 22

slide-18
SLIDE 18

Introduction Theoretical Development Simulation Application Conclusion

Process for MORPH-II

  • 1. Clean, pre-process, and subset MORPH-II database.
  • 2. Extract a number of features from MORPH-II images:

biologically-inspired features (BIFs), histogram of oriented gradients (HOGs), and local binary patterns (LBPs).

  • 3. Use KPCA, SKPCA, and KFDA to reduce the dimension of

the feature data.

  • 4. Perform gender classification with a linear support vector

machine (SVM), taking reduced dimension data as input.

  • 5. Compare results for all combinations of dimensionality

reduction technique and feature type.

18 / 22

slide-19
SLIDE 19

Introduction Theoretical Development Simulation Application Conclusion

Tuning Parameters

◮ The dimension reduction techniques used on MORPH-II

require careful tuning of parameters.

◮ Tuning is done on a subset of 1000 ”even” images from

MORPH-II using two-fold cross-validation.

◮ The tuning parameters (including number of dimensions)

which yield the highest average gender classification accuracy

  • n the subset of 1000 images are the ones which will be used
  • n the full set of images in MORPH-II.

19 / 22

slide-20
SLIDE 20

Introduction Theoretical Development Simulation Application Conclusion

Results from MORPH-II Subset

Table 1: Feature Type Tuning Summary

Values KFDA Ac- curacy KPCA Ac- curacy SKPCA Accuracy LBP r=1,2,3 s=10,12,14,16,18,20 80.80% (r=1,s=10) 85.20% (r=1,s=14) 86.00% (r=1,s=10) HOG

  • =4,6,8

s=4,6,8,10,12,14 79.40% (o=8,s=4) 90.40% (o=4,s=4) 88.90% (o=4,s=4) BIF g=0.1,0.2,...,1.0 (s=15-29,7-37) 83.50% (g=0.4,s=15- 29) 90.40% (g=0.1,s=15- 29) 89.80% (g=1.0,s=15- 29)

In all cases, linear SVM with cost c = 1 is used to classify gender.

20 / 22

slide-21
SLIDE 21

Introduction Theoretical Development Simulation Application Conclusion

Results from MORPH-II

◮ The proposed machine learning pipeline for MORPH-II has

not yet been used on the full set of images.

◮ This process requires the use of high-performance computing,

due to the size of MORPH-II, the dimension of extracted features, and the computational complexity of kernel-based methods.

◮ Hopefully, in the future a supercomputer can be used

successfully for the task.

21 / 22

slide-22
SLIDE 22

Introduction Theoretical Development Simulation Application Conclusion

Conclusion

◮ Dimension reduction techniques can increase computational

efficiency and improve classification (or prediction accuracy), because they identify meaningful patterns in the data.

◮ KFDA, KPCA, and SKPCA are particularly powerful, flexible

techniques.

◮ However, further research is necessary to understand how they

work.

◮ In particular, further directions involve investigating tuning

parameters.

◮ Additionally, we would like to further understand the

differences between these techniques in terms of preserving local structure of the data.

22 / 22

slide-23
SLIDE 23

Introduction Theoretical Development Simulation Application Conclusion

  • K. Ricanek and T. Tesafaye, ”MORPH: a longitudinal image database of normal adult age-progression,” 7th

International Conference on Automatic Face and Gesture Recognition (FGR06), Southampton, 2006, pp. 341-345. Bernhard Sch¨

  • lkopf, Alexander Smola, and Klaus-Robert M¨
  • uller. 1998. Nonlinear component analysis as a

kernel eigenvalue problem. Neural Comput. 10, 5 (July 1998), 1299-1319. Elnaz Barshan, Ali Ghodsi, Zohreh Azimifar, and Mansoor Zolghadri Jahromi. 2011. Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds. Pattern

  • Recogn. 44, 7 (July 2011), 1357-1371.

Wang Y., Chen C., Watkins V., Ricanek K. (2015) Modified Supervised Kernel PCA for Gender

  • Classification. In: He X. et al. (eds) Intelligence Science and Big Data Engineering. Image and Video Data
  • Engineering. Lecture Notes in Computer Science, vol 9242. Springer, Cham.
  • S. Mika, G. Ratsch, J. Weston, B. Scholkopf and K. R. Mullers, ”Fisher discriminant analysis with kernels,”

Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468), Madison, WI, 1999, pp. 41-48. Welling, M. (2005). Fisher linear discriminant analysis. Department of Computer Science, University of Toronto, 3(1).

  • K. Kempfert, J. Fabish, K. Park, and R. Towner, ”MORPH-II: A Proposed Subsetting Scheme,” University
  • f North Carolina Wilmington NSF REU, 2017.
  • G. Bingham, B. Yip, M. Ferguson, and C. Nansalo, ”MORPH-II: Inconsistencies and Cleaning,” University
  • f North Carolina Wilmington NSF REU, 2017.

22 / 22