When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, - - PowerPoint PPT Presentation

when dictionary learning meets classification
SMART_READER_LITE
LIVE PREVIEW

When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, - - PowerPoint PPT Presentation

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor:


slide-1
SLIDE 1

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

When Dictionary Learning Meets Classification

Bufford, Teresa1 Chen, Yuxin2 Horning, Mitchell3 Shee, Liberty1 Mentor: Professor Yohann Tendero

1UCLA 2Dalhousie University 3Harvey Mudd College

August 9, 2013

slide-2
SLIDE 2

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Outline

  • 1. The Method: Dictionary Learning
  • 2. Experiments and Results

2.1 Supervised Dictionary Learning 2.2 Unsupervised Dictionary Learning 2.3 Robustness w.r.t. Noise

  • 3. Conclusion
slide-3
SLIDE 3

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Dictionary Learning

GOAL: Classify discrete image signals x ∈ Rn.

The Dictionary, D ∈ Rn×K

x ≈ Dα =   | | atom1 · · · atomK | |      α1 . . . αK   

  • Each dictionary can be represented as a matrix, where each

column is an atom ∈ Rn, learned from a set of training data.

  • A signal x ∈ Rn can be approximated by a linear combination
  • f atoms in a dictionary, represented by α ∈ RK.
  • We seek a sparse signal representation:

arg min

α∈RK x − Dα2 2 + λα1.

(1)

0Sprechmann, et al. ”Dictionary learning and sparse coding for unsupervised

clustering.” Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010.

slide-4
SLIDE 4

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Supervised Dictionary Learning Algorithm

Illustrating Example Training image signals: x1, x2, x3, x4 X = [x1 x2 x3 x4] Classes: 0, 1

Group training images according to their labels:

class 0 x1 x4 class 1 x2 x3

Use training images with label i to train dictionary i:

class 0 x1 x4 − → D0 class 1 x2 x3 − → D1

slide-5
SLIDE 5

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Signal Classification

Illustrating Example Test image signals: y1, y2, y3 Y = [y1 y2 y3] Dictionaries: D0, D1

Take one test image. Compute its optimal sparse representation α for each dictionary. Use α to compute energy:

y1 Ei(y1) = min

α y1 − Diα2 2 + λα1

− − − − − − − − − − − − − − − − − − − − − − − − − →

E0(y1) E1(y1)

  • Do this for each test signal:

E(Y ) = E0(y1) E0(y2) E0(y3) E1(y1) E1(y2) E1(y3)

  • Classify the test signal as class i∗ = arg min Ei(y). For example:

E(Y ) = 5 12 8 24 6 4

→ class 0 y1 class 1 y2 y3

slide-6
SLIDE 6

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Supervised Dictionary Learning Results

Table : Supervised Results

Cluster Centered& Digits K misclassification Type Normalized rate average Supervised False {0, ..., 4} 500 1.3431% Supervised True {0, ..., 4} 500 0.5449% Supervised False {0, ..., 4} 800 0.7784% Supervised True {0, ..., 4} 800 0.3892% Supervised False {0, ..., 9} 800 3.1100% Supervised True {0, ..., 9} 800 1.6800% Error rate for digits {0, ..., 9} and K = 800 from [1] is 1.26%.

1Sprechmann, et al. ”Dictionary learning and sparse coding for unsupervised

clustering.” Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010.

slide-7
SLIDE 7

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Supervised Results, ctd.

Here is a confusion matrix for the supervised, centered, normalized case with k = 800 and digits {0, . . . , 4}. Element cij ∈ C is the number of times that an image of digit i was classified as digit j. C =       1 2 3 4 978 1 1 1 1132 2 1 2 5 2 1023 2 3 3 1006 1 4 1 1 980      

slide-8
SLIDE 8

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Unsupervised Dictionary Learning Algorithm

(Spectral Clustering)

Illustrating Example Training image signals: x1, x2, x3, x4 X = [x1 x2 x3 x4] Classes: 0, 1

Train a dictionary D from all training images:

All images x1 x2 x3 x4 − → D =   | | | atom1 atom2 atom3 | | |   Reminder: The number of atoms in the dictionary is a parameter.

For each image, compute optimal sparse representation α w.r.t. D:

A =   x1 x2 x3 x4 atom1 | | | | atom2 α1 α2 α3 α4 atom3 | | | |   Reminder: α is a linear combination of atoms in D.

slide-9
SLIDE 9

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Illustrating Example Training image signals: x1, x2, x3, x4 X = [x1 x2 x3 x4] Classes: 0, 1

Construct a similarity matrix S1 = |A|t|A|. Perform spectral clustering on the graph G1 = {X, S1}:

G1 = {X, S1} spectral − − − − − − − − → ..clustering.. class 0 x1 x4 class 1 x2 x3

Use signal clusters to train initial dictionaries:

class 0 x1 x4 − → D0 class 1 x2 x3 − → D1

slide-10
SLIDE 10

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Refining Dictionaries

Repeat:

  • 1. Classify the training images using current dictionaries:

x1, x2, x3, x4 classify − − − − − − − − − − → ..with D0, D1.. class 0 x4 class 1 x1 x2 x3

  • 2. Use these classifications to train new, refined dictionaries

for each cluster:

class 0 x4 − → D0,new class 1 x1 x2 x3 − → D1,new

slide-11
SLIDE 11

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Unsupervised Dictionary Learning Spectral Clustering Results

Table : Unsupervised Results (spectral clustering)

Cluster Centered & Digits K misclassification Normalized rate average Atoms True {0, ..., 4} 500 24.8881% Atoms True {0, ..., 4} 800 27.1843% Signals True {0, ..., 4} 500 27.4372% Signals True {0, ..., 4} 800 29.5777%

Misclassification rate for digits {0, ..., 4} and K = 500 from [1] is 1.44%.

1Sprechmann, et al. ”Dictionary learning and sparse coding for unsupervised

clustering.” Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010.

slide-12
SLIDE 12

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Why do our results differ from Sprechmann, et al.’s?

Hypothesis

The problems lie in the author’s choice of similarity measures, S = |A|t|A|.

Possible Problem: Normalization (A’s cols not of constant l2 norm)

Illustration of Problem

  • Assume that A contains only positive entries. Then

S = |A|t|A| = AtA and the ijth entry of S is < αi, αj >= αiαj cos(αi, αj).

  • Nearly orthogonal vectors are more similar than co-linear ones

if the products of their norms are big enough. “more similar”

slide-13
SLIDE 13

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Attempted Solution

Normalized each column of A before computing S. However, this caused little change in the results, due to the l2 norms of A’s columns being almost constant.

Figure : The normalized histogram of the l2 norm of the columns of A. Note: The l2 norms of most columns in A are in [3, 4].

2 3 4 5 6 7 8 9 10 11 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Histogram of the l2−norm of the columns of A (Normalized) Norm interval Emirical probability

slide-14
SLIDE 14

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Possible Problem: Absolute Value (use |A|t|A| instead of |AtA|)

Illustration of Problem

a1 = (1, −1)t and a2 = (1, 1)t, which are orthogonal vectors, become co-linear when the entries are replaced by their absolute values.

Attempted Solution

In the experiments, changing |A||A|t to |AAt| does not significantly improve the results, because among all the entries of A associated with MNIST data, only ≈ 13.5% are negative.

slide-15
SLIDE 15

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

In the last part of Sprechmann et al, the authors state:

“We observed that best results are obtained for all the experiments when the initial dictionaries in the learning stage are constructed by randomly selecting signals from the training set. If the size of the dictionary compared to the dimension of the data is small, [it] is better to first partition the dataset (using for example Euclidean k-means) in order to obtain a more representative sample.”

Algorithm

Use k-means to cluster the training signals, instead of spectral clustering.

slide-16
SLIDE 16

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Unsupervised Dictionary Learning (kmeans)

Illustrating Example Training image signals: x1, x2, x3, x4 X = [x1 x2 x3 x4] Classes: 0, 1

Use kmeans to cluster training images:

x1, x2, x3, x4 ..kmeans.. − − − − − − → class 0 x1 x4 class 1 x2 x3

Note: Must center and normalize after clustering.

Use clusters to train dictionaries:

class 0 x1 x4 − → D0 class 1 x2 x3 − → D1

Refinement process is same as spectral clustering case.

slide-17
SLIDE 17

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Unsupervised Dictionary Learning K-means Results

Table : Unsupervised Results (kmeans), without refinement Cluster Centered & Digits K iter misclassification Normalized rate average Signals True {0,...,4} 500 2 7.1025% Table : Unsupervised Results (kmeans), with refinement Cluster Centered & Digits K iter misclassification Normalized rate average Signals True {0,...,4} 500 2 1.1286% Signals True {0,...,4} 500 20 0.5449% Misclassification rate for digits {0, ..., 4} and K = 500 from [1] is 1.44%.

1Sprechmann, et al. ”Dictionary learning and sparse coding for unsupervised

clustering.” Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010.

slide-18
SLIDE 18

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Unsupervised Results, ctd.

Here is a confusion matrix for the k-means, unsupervised, centered, normalized case with k = 500, iter = 20, and digits {0, . . . , 4}. Element cij ∈ C is the number of times that an image of digit i was classified as digit j. C =       1 2 3 4 979 1 1 1129 4 1 1 2 4 2 1023 2 1 3 3 1006 1 4 1 3 978      

slide-19
SLIDE 19

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Gaussian Noise Experiments

Classification of Noisy Images

Goal: To analyze the robustness of our method with respect to noise in images Questions to consider:

  • How does adding noise affect misclassification rates?
  • How does centering and normalizing the data after adding

noise affect the results?

slide-20
SLIDE 20

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Results: Classifying Pure Gaussian Noise Classification Rates for Each Digit

Train dictionaries on Train dictionaries on centered and normalized data not centered and not normalized data noise variance = 0.5 noise variance = 0.5

slide-21
SLIDE 21

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

MNIST Images with Gaussian Noise Added

slide-22
SLIDE 22

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Results: Adding Gaussian Noise to MNIST Noise in test images only

Centered and Normalized Not Centered and Not Normalized test noise variance = 0.5 test noise variance = 0.5 Misclassification rate 8.21% Misclassification rate 87.05%

slide-23
SLIDE 23

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Results: Adding Gaussian Noise to MNIST Same noise variance for test & training images

Misclassification Rates Noise variance 0.1 0.5 1.0 Centered & 3.2% 17.33% 38.92% Normalized Not Centered & 11.08% 57.56% 75.87% Not Normalized

slide-24
SLIDE 24

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Conclusions:

  • Unable to reproduce results reported by Sprechmann
  • Method of k-means produces better results than spectral

clustering

  • Centering and normalizing data consistently improves results
  • Method is robust to added noise
slide-25
SLIDE 25

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Thanks for your attention!

slide-26
SLIDE 26

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Unsupervised Dictionary Learning Algorithm 2 (with split initialization)

Step 1 Train a dictionary D0 ∈ Rnxk from all training images Step 2 Obtain matrix A0 and similarity matrix S0 = |A0||A0|t as before Step 3 Apply spectral clustering to dictionary D0 to split its atoms into D1 and D2 Step 4 Obtain matrices A1, S1, A2, and S2 for dictionaries D1 and D2 Step 5 Apply spectral clustering to dictionaries D1 and D2 to split each into two clusters. Choose the segmentation that results in the lower energy and keep this one. Return the

  • ther dictionary to its original form. Now you have three

dictionaries. Step 6 Repeat this process so that after each iteration the number

  • f dictionaries increases by one. Stop when the desired

number of clusters is reached.

slide-27
SLIDE 27

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Semisupervised Dictionary Learning Algorithm

Step 1 Use training images, a percentage of which have known

  • labels. The remaining percentage are randomly labeled.

Step 2 Train dictionaries {D0, . . . , DN−1} independently. Step 3 Classify the training images using current dictionaries. Then use these classifications to train new, refined dictionaries for each cluster.

slide-28
SLIDE 28

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Semisupervised Dictionary Learning Results

Dictionaries for digits {0, . . . , 4}. Cluster Type Centered & K perturb. misclassification Normalized percent rate average Semisupervised True 200 40 0.9730% Semisupervised True 200 70 0.8951% Semisupervised True 200 90 1.4205%

slide-29
SLIDE 29

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

1 2 3 4 5 6 7 8 9 10 11 5 10 15 20 25 30 Average misclassification Rate at Each Iteration, digits 0−4, 40% Iteration (iteration 1 = first refinement) Average misclassification Rate 1 2 3 4 5 6 7 8 9 10 11 5 10 15 20 25 30 Average misclassification Rate at Each Iteration, digits 0−4, 70% Iteration (iteration 1 = first refinement) Average misclassification Rate 1 2 3 4 5 6 7 8 9 10 11 5 10 15 20 25 30 Average misclassification Rate at Each Iteration, digits 0−4, 90% Iteration (iteration 1 = first refinement) Average misclassification Rate

Figure : Misclassification rate average per refinement iteration. Left to right: 40%, 70%, 90% perturbation.

1 2 3 4 5 6 7 8 9 10 11 0.129 0.13 0.131 0.132 0.133 0.134 0.135 0.136 0.137 0.138 0.139 Energy at Each Iteration, digits 0−4, 40% Iteration (iteration 1 = first refinement) Energy 1 2 3 4 5 6 7 8 9 10 11 0.129 0.13 0.131 0.132 0.133 0.134 0.135 0.136 0.137 0.138 0.139 Energy at Each Iteration, digits 0−4, 70% Iteration (iteration 1 = first refinement) Energy 1 2 3 4 5 6 7 8 9 10 11 0.129 0.13 0.131 0.132 0.133 0.134 0.135 0.136 0.137 0.138 0.139 Energy at Each Iteration, digits 0−4, 90% Iteration (iteration 1 = first refinement) Energy

Figure : Energy per refinement iteration. Left to right: 40%, 70%, 90% perturbation.

slide-30
SLIDE 30

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

1 2 3 4 5 6 7 8 9 10 11 5 10 15 20 25 30 Average misclassification Rate at Each Iteration, digits 0−4, 40% Iteration (iteration 1 = first refinement) Average misclassification Rate 1 2 3 4 5 6 7 8 9 10 11 5 10 15 20 25 30 Average misclassification Rate at Each Iteration, digits 0−4, 70% Iteration (iteration 1 = first refinement) Average misclassification Rate 1 2 3 4 5 6 7 8 9 10 11 5 10 15 20 25 30 Average misclassification Rate at Each Iteration, digits 0−4, 90% Iteration (iteration 1 = first refinement) Average misclassification Rate

Figure : Misclassification rate average per refinement iteration. Left to right: 40%, 70%, 90% perturbation.

2 3 4 5 6 7 8 9 10 11 1000 2000 3000 4000 5000 6000 7000 Changes in Training Signals at Each Iteration, digits 0−4, 40% Change to refinement # (change 1 = change to first refinement) Number of signals that changed 2 3 4 5 6 7 8 9 10 11 1000 2000 3000 4000 5000 6000 7000 Changes in Training Signals at Each Iteration, digits 0−4, 70% Change to refinement # (change 1 = change to first refinement) Number of signals that changed 2 3 4 5 6 7 8 9 10 11 1000 2000 3000 4000 5000 6000 7000 Changes in Training Signals at Each Iteration, digits 0−4, 90% Change to refinement # (change 1 = change to first refinement) Number of signals that changed

Figure : Number of training images whose classification changed per refinement iteration. Left to right: 40%, 70%, 90% perturbation.

slide-31
SLIDE 31

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Unsupervised - Atoms (change over refinement iterations)

1 2 3 4 5 6 7 8 9 10 0.113 0.114 0.115 0.116 0.117 0.118 0.119 0.12 0.121 0.122 0.123 Energy at Each Iteration, digits 0−4, CN Iteration (iteration 1 = first refinement) Energy 1 2 3 4 5 6 7 8 9 10 24 24.5 25 25.5 Average misclassification Rate at Each Iteration, digits 0−4, CN Iteration (iteration 1 = first refinement) Average misclassification Rate 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90 Changes in Training Signals at Each Iteration, digits 0−4, CN Change to refinement # (change 1 = change to first refinement) Number of signals that changed

Figure : Unsupervised dictionary learning (K = 500), spectral clustering

  • f centered and normalized signals by atoms. Left to right: (1) energy,

(2) misclassification rate average, (3) number of training images whose classification changed per refinement iteration.

slide-32
SLIDE 32

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Unsupervised - Signals (change over refinement iterations)

1 2 3 4 5 6 7 8 9 10 0.0855 0.086 0.0865 0.087 0.0875 0.088 Energy at Each Iteration, digits 0−4, CN Iteration (iteration 1 = first refinement) Energy 1 2 3 4 5 6 7 8 9 10 27 27.5 28 28.5 29 29.5 Average misclassification Rate at Each Iteration, digits 0−4, CN Iteration (iteration 1 = first refinement) Average misclassification Rate 2 3 4 5 6 7 8 9 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Changes in Training Signals at Each Iteration, digits 0−4, CN Change to refinement # (change 1 = change to first refinement) Number of signals that changed

Figure : Unsupervised dictionary learning (K = 500), spectral clustering

  • f centered and normalized signals by signals. Left to right: (1) energy,

(2) misclassification rate average, (3) number of training images whose classification changed per refinement iteration.

slide-33
SLIDE 33

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Unsupervised - kmeans (change over refinement iterations)

1 2 3 4 5 6 7 8 9 10 0.1282 0.1284 0.1286 0.1288 0.129 0.1292 0.1294 0.1296 0.1298 Energy at Each Iteration, digits 0−4, CN Iteration (iteration 1 = first refinement) Energy 1 2 3 4 5 6 7 8 9 10 1 1.5 2 2.5 3 3.5 4 4.5 5 Average misclassification Rate at Each Iteration, digits 0−4, CN Iteration (iteration 1 = first refinement) Average misclassification Rate 2 3 4 5 6 7 8 9 10 200 300 400 500 600 700 800 Changes in Training Signals at Each Iteration, digits 0−4, CN Change to refinement # (change 1 = change to first refinement) Number of signals that changed

Figure : Unsupervised dictionary learning (K = 500), kmeans clustering

  • f centered and normalized signals. Left to right: (1) energy, (2)

misclassification rate average, (3) number of training images whose classification changed per refinement iteration.

slide-34
SLIDE 34

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material

Unsupervised - kmeans (change over refinement iterations)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0.114 0.116 0.118 0.12 0.122 0.124 0.126 0.128 Energy at Each Iteration, digits 0−4, CN Iteration (iteration 1 = first refinement) Energy 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0.114 0.116 0.118 0.12 0.122 0.124 0.126 0.128 Energy at Each Iteration, digits 0−4, CN Iteration (iteration 1 = first refinement) Energy

Figure : Unsupervised dictionary learning (K = 500), kmeans clustering

  • f centered and normalized signals. Left: 2 dictionary learning
  • refinements. Right: 20 dictionary learning refinements.