Dimension Reduction for Classification Alfred O. Hero Dept. EECS, - - PowerPoint PPT Presentation

dimension reduction for classification
SMART_READER_LITE
LIVE PREVIEW

Dimension Reduction for Classification Alfred O. Hero Dept. EECS, - - PowerPoint PPT Presentation

Dimension Reduction for Classification Alfred O. Hero Dept. EECS, Dept BME, Dept. Statistics University of Michigan - Ann Arbor hero@eecs.umich.edu http://www.eecs.umich.edu/~hero BIRS, July. 2005 1. The manifold supporting data sample 2.


slide-1
SLIDE 1

Dimension Reduction for Classification

Alfred O. Hero

  • Dept. EECS, Dept BME, Dept. Statistics

University of Michigan - Ann Arbor hero@eecs.umich.edu http://www.eecs.umich.edu/~hero

  • 1. The manifold supporting data sample
  • 2. Classification constrained dimension reduction
  • 3. Dimension estimation on smooth manifolds
  • 4. Applications
  • 5. Conclusions

BIRS, July. 2005

slide-2
SLIDE 2
  • 1. Manifold supporting data sample
  • Yale Face Database B: each 128x128 image lies in R^(16384)

d dimensional subspace

slide-3
SLIDE 3

Data-driven dimensionality reduction

  • Data-driven dimensionality reduction consists of:

– Estimation of intrinsic dimension d:

  • Direct intrinsic dimension estimation

– Reconstruction of data samples in the manifold domain:

  • Manifold learning
  • Classifiers on intrinsic data dimension

– Estimated dimension as a discriminant – Label-constrained manifold learning

slide-4
SLIDE 4

Manifold learning problem setup:

Given a finite sampling of a d-dimensional manifold , find an embedding of into a subset (usually ) without any prior knowledge about or d.

Manifold Learning

slide-5
SLIDE 5

Reconstructing the mapping and attributes of the manifold from a finite dataset falls into the general manifold learning problem. Manifold reconstruction for fixed d:

1. ISOMAP, Tenenbaum, de Silva, Langford (2000); 2. Locally Linear Embeddings (LLE), Roweiss, Saul (2000); 3. Laplacian Eigenmaps, Belkin, Niyogi (2002); 4. Hessian Eigenmaps (HLLE), Grimes, Donoho (2003); 5. Local Space Tangent Alignment (LTSA), Zhang, Za (2003); 6. SemiDefinite Embedding (SDE), Weinberger, Saul (2004).

Manifold learning background

slide-6
SLIDE 6

Laplacian Eigenmaps

Laplacian Eigenmaps: preserving local information

(Belkin & Niyogi 2002)

  • 1. Constructing an Adjacency Graph:
  • a. compute a k-NN graph on the dataset;
  • b. compute a similarity/weight matrix W between data points,

that encodes neighborhood information (e.g., heat kernel):

slide-7
SLIDE 7
  • 2. Manifold learning as an optimization problem:
  • a. objective function:

where is the Graph Laplacian.

  • b. embedding is solution of

()

Laplacian Eigenmaps

slide-8
SLIDE 8
  • 3. Eigenmaps:
  • a. solution to () is given by the d generalized eigenvectors

associated with the d smallest generalized eigenvalues that solve: equivalently, eigenvectors of the normalized Graph Laplacian

  • b. if is the collection of such eigenvectors,

then the embedded points are given by

Laplacian Eigenmaps

slide-9
SLIDE 9

Dimension Reduction for Labeled Data

−5 5 10 5 10 15 20 −10 −5 5 10

800 points uniform on Swiss roll, 400 each class

−60 −40 −20 20 40 60 −15 −10 −5 5 10 15

Original Data Y Dimension reduced Data X

slide-10
SLIDE 10

−5 5 10 5 10 15 20 −10 −5 5 10

Adding class dependent constraints – “virtual” class vertices.

  • 2. Classification constrained

dimensionality reduction

slide-11
SLIDE 11
  • 2. Embedding is solution of

where L is Laplacian of augmented weight matrix

  • 1. If C is the class membership matrix (i.e., c_ij = 1 if point j is from

class i), define the objective function: where , are the “virtual” class centers and is a regularization parameter.

Label-penalized Laplacian Eigenmaps

slide-12
SLIDE 12

−60 −40 −20 20 40 60 −15 −10 −5 5 10 15 −0.02 −0.015 −0.01 −0.005 0.005 0.01 0.015 0.02 −0.02 −0.015 −0.01 −0.005 0.005 0.01 0.015 0.02 0.025 0.03

−5 5 10 5 10 15 20 −10 −5 5 10

Classification Constrained Dimensionality Reduction Unconstrained Dimensionality Reduction

slide-13
SLIDE 13

−5 5 10 5 10 15 20 −10 −5 5 10

Semi-Supervised Learning on Manifolds

unlabeled samples labeled samples

Partially Labeled Data

slide-14
SLIDE 14

Semisupervised extension

Algorithm:

  • 1. Compute the constrained embedding of the entire data set,

inserting a zero column in C for each unlabeled sample.

  • 2. Fit a (e.g., linear) classifier to the labeled embedded points by

minimizing the quadratic error loss:

  • 3. For an unlabeled point , label it using the fitted (linear) classifier:
slide-15
SLIDE 15

Classification Error Rates

100 200 300 400 500 600 700 10 20 30 40 50 60 Number of labeled points Error rate (%) k−NN Laplacian CCDR

Percentage of errors for labeling unlabeled samples as a function of the number of labeled points, out of a total of 1000 points on the Swiss roll.

slide-16
SLIDE 16
  • 3. Methods of dimension estimation
  • Scree plots

– Plot residual fitting errors of SVD, Isomap, LE, LLE

  • Kolmogorov/Entropy/Correlation dimension

– Box counting, sphere packing (Liebovitch and Toth)

  • Maximum likelihood

– Poisson approximation to Binomial (Levina&Bickel:2004)

  • Entropic graphs

– Spanner-graph length approximation to entropy functional (Costa&Hero:2003)

2 4 6 8 10 12 14 16 18 20 0.005 0.01 0.015 R e sid u a l va ria n ce Isomap dimensionality Residual variance vs dimentionality- Data Set 1

ISOMAP residual curve

Knee?

slide-17
SLIDE 17

Euclidean Random Graphs

  • data in D-dimensionalEuclidean space
  • Euclidean MST with edge power weighting gamma:
  • pairwise distance matrix over
  • edge length matrix of spanning trees over
  • Euclidean k-NNG with edge power weighting gamma:
slide-18
SLIDE 18

Example: Uniform Planar Sample

−2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 Uniform sample on square: N=400

−2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 Minimal Spanning Tree: γ=1, k=4

−2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 k NN Graph: γ=1, k=4

slide-19
SLIDE 19

Convergence of Euclidean CQF’s

Beardwood, Halton, Hammersley Theorem (BHH:1959):

slide-20
SLIDE 20

k-NNG Convergence Theorem in Non-Euclidean Spaces

Costa, Hero: TSP(2004), Birkhauser(2005)

slide-21
SLIDE 21

Application to 2D Torus

  • 600 uniformly distributed random samples

Mean kNNG (k=5) length

682 684 686 688 690 692 694 696 698 2.75 2.76 2.77 2.78 2.79 2.8 2.81 x 10

4

number of samples average length

d=2 H=9.6 bits

slide-22
SLIDE 22

Local Extension via kNNG

– Initialize: – For i=1,2,…p

  • compute and set
slide-23
SLIDE 23
  • 4. Application to MNIST Digits
  • Large database of 8 bit images of digits 0-9.
  • 28x28 pixels for each image
  • First 1000 images in training set used here
  • Non-adaptive: digit labels are known
slide-24
SLIDE 24

Scree Plot

Costa&Hero:Birkhauser05

slide-25
SLIDE 25

Local Dimension/Entropy Statistics

Costa&Hero:Birkhauser05

slide-26
SLIDE 26

Adaptive Anomaly Detection

  • Spatio-temporal measurement vector:

day tempera ture

temperat ure day

day temperat ure

SNVA

STTL LOSA KSCY HSTN DNVR CHIN IPLS ATLA WASH NYCM

slide-27
SLIDE 27

Data Observed from Abilene Network

  • Objective: detect changes in network traffic via

local intrinsic dimension

  • Hypotheses:

– High traffic from few sources lowers the local dimension of the network traffic – Changes in distribution of dimension estimate can be used as a marker for more subtle changes in traffic

  • Data collection period: 1/1/05-1/2/05
  • Data sampling: packet flow sampled every 5

minutes from all 11 routers on Abilene Network

  • Data fields: aggregate of all flows to/from all

ports

slide-28
SLIDE 28

KNN Algorithm (Costa)

100 200 300 400 500 2 4 6 8 10 Modified Knn-Algo II d 100 200 300 400 500 0.5 1 1.5 2 2.5 x 10

6

1/01/05-1/02/05,n=150,Q=65,NN=4,N=10,M=10 Actual Data

slide-29
SLIDE 29

Example – 1/1/05, 12:20 pm

  • Large data transfers from IPs 145.146.96 and 192.31.120 drastically

increase flows through Chicago and NYC.

100 110 120 130 140 150 160 170 180 190 200 2 4 6 8 10 Modified Knn-Algo II d 100 110 120 130 140 150 160 170 180 190 200 0.5 1 1.5 2 2.5 x 10

6

1/01/05-1/02/05,n=150,Q=65,N N=4,N=10,M=10 Actual Data

slide-30
SLIDE 30
  • 5. Conclusions
  • Classification constraints can be included in

manifold learning dimension reduction algorithms

  • kNNG jointly estimate dimension and entropy of

high dimensional data

  • Dimension can be used as a discriminant in

anomaly detection

  • Can be used as precursor to model reduction

and database compression

  • Methods only suffer from curse of intrinsic

dimensionality

slide-31
SLIDE 31

References

  • J. Costa, N. Patwari and A. O. Hero, "Distributed multidimensional scaling

with adaptive weighting for node localization in sensor networks", (http://www.eecs.umich.edu/~hero/Preprints/wmds_v9.pdf), ACM Journal on Networking To appear 2005.

  • J. Costa and A. O. Hero, "Geodesic entropic graphs for dimension and

entropy estimation in manifold learning," (http://www.eecs.umich.edu/~hero/Preprints/sp_mlsi_final_twocolumn.pdf) , IEEE Trans. on Signal Process., Vol. 52, No. 8, pp. 2210-2221, Aug. 2004.

  • J.A. Costa, A. Girotra and A.O. Hero, "Estimating Local Intrinsic Dimension

with k-Nearest Neighbor Graphs," IEEE Workshop on Statistical Signal Processing (SSP), Bordeaux, July 2005. (http://www.eecs.umich.edu/~hero/Preprints/ssp_2005_final_1.pdf)

  • J. Costa and A. O. Hero, "Classification constrained dimensionality

reduction," (http://www.eecs.umich.edu/~hero/Preprints/costa_icassp2005.pdf), Proc. of ICASSP , Philadelphia, March, 2005.