When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, - PowerPoint PPT Presentation

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August 9, 2013

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material Outline 1. The Method: Dictionary Learning 2. Experiments and Results 2.1 Supervised Dictionary Learning 2.2 Unsupervised Dictionary Learning 2.3 Robustness w.r.t. Noise 3. Conclusion

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material Dictionary Learning GOAL: Classify discrete image signals x ∈ R n . The Dictionary, D ∈ R n × K   α 1   | | . . x ≈ D α = atom 1 · · · atom K   .     | | α K • Each dictionary can be represented as a matrix, where each column is an atom ∈ R n , learned from a set of training data. • A signal x ∈ R n can be approximated by a linear combination of atoms in a dictionary, represented by α ∈ R K . • We seek a sparse signal representation: α ∈ R K � x − D α � 2 arg min 2 + λ � α � 1 . (1) 0 Sprechmann, et al. ”Dictionary learning and sparse coding for unsupervised clustering.” Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010.

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material Supervised Dictionary Learning Algorithm Illustrating Example Training image signals: x 1 , x 2 , x 3 , x 4 X = [ x 1 x 2 x 3 x 4 ] Classes: 0 , 1 Group training images according to their labels: class 0 x 1 x 4 class 1 x 2 x 3 Use training images with label i to train dictionary i : class 0 x 1 x 4 − → D 0 class 1 − → x 2 x 3 D 1

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material Signal Classification Illustrating Example Test image signals: y 1 , y 2 , y 3 Y = [ y 1 y 2 y 3 ] Dictionaries: D 0 , D 1 Take one test image. Compute its optimal sparse representation α for each dictionary. Use α to compute energy: α � y 1 − D i α � 2 E i ( y 1 ) = min 2 + λ � α � 1 � E 0 ( y 1 ) � y 1 − − − − − − − − − − − − − − − − − − − − − − − − − → E 1 ( y 1 ) Do this for each test signal: � E 0 ( y 1 ) � E 0 ( y 2 ) E 0 ( y 3 ) E ( Y ) = E 1 ( y 1 ) E 1 ( y 2 ) E 1 ( y 3 ) Classify the test signal as class i ∗ = arg min E i ( y ). For example: class 0 y 1 � 5 � 12 8 E ( Y ) = − → 24 6 4 class 1 y 2 y 3

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material Supervised Dictionary Learning Results Table : Supervised Results Cluster Centered& Digits misclassification K Type Normalized rate average Supervised False { 0 , ..., 4 } 500 1.3431% Supervised True { 0 , ..., 4 } 500 0.5449% Supervised False { 0 , ..., 4 } 800 0.7784% Supervised True { 0 , ..., 4 } 800 0.3892% Supervised False { 0 , ..., 9 } 800 3.1100% Supervised True { 0 , ..., 9 } 800 1.6800% Error rate for digits { 0 , ..., 9 } and K = 800 from [1] is 1 . 26%. 1 Sprechmann, et al. ”Dictionary learning and sparse coding for unsupervised clustering.” Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010.

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material Supervised Results, ctd. Here is a confusion matrix for the supervised, centered, normalized case with k = 800 and digits { 0 , . . . , 4 } . Element c ij ∈ C is the number of times that an image of digit i was classified as digit j . 0 1 2 3 4 0  978 0 1 1 0  1 0 1132 2 1 0     C = 2 5 2 1023 0 2     3 0 0 3 1006 1   4 1 0 1 0 980

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material Unsupervised Dictionary Learning Algorithm (Spectral Clustering) Illustrating Example Training image signals: x 1 , x 2 , x 3 , x 4 X = [ x 1 x 2 x 3 x 4 ] Classes: 0 , 1 Train a dictionary D from all training images:   | | | All x 1 x 2 − → D = atom 1 atom 2 atom 3   images x 3 x 4 | | | Reminder: The number of atoms in the dictionary is a parameter. For each image, compute optimal sparse representation α w.r.t. D : x 1 x 2 x 3 x 4   atom 1 | | | | A = atom 2 α 1 α 2 α 3 α 4   atom 3 | | | | Reminder: α is a linear combination of atoms in D .

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material Illustrating Example Training image signals: x 1 , x 2 , x 3 , x 4 X = [ x 1 x 2 x 3 x 4 ] Classes: 0 , 1 Construct a similarity matrix S 1 = | A | t | A | . Perform spectral clustering on the graph G 1 = { X , S 1 } : class 0 x 1 x 4 spectral G 1 = { X , S 1 } − − − − − − − − → .. clustering .. class 1 x 2 x 3 Use signal clusters to train initial dictionaries: class 0 x 1 x 4 − → D 0 class 1 x 2 x 3 − → D 1

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material Refining Dictionaries Repeat: 1. Classify the training images using current dictionaries: class 0 x 4 classify x 1 , x 2 , x 3 , x 4 − − − − − − − − − − → x 1 x 2 .. with D 0 , D 1 .. class 1 x 3 2. Use these classifications to train new, refined dictionaries for each cluster: class 0 x 4 − → D 0 , new x 1 x 2 class 1 − → D 1 , new x 3

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material Unsupervised Dictionary Learning Spectral Clustering Results Table : Unsupervised Results (spectral clustering) Cluster Centered & Digits misclassification K Normalized rate average Atoms True { 0 , ..., 4 } 500 24.8881% Atoms True { 0 , ..., 4 } 800 27.1843% Signals True { 0 , ..., 4 } 500 27.4372% Signals True { 0 , ..., 4 } 800 29.5777% Misclassification rate for digits { 0 , ..., 4 } and K = 500 from [1] is 1 . 44%. 1 Sprechmann, et al. ”Dictionary learning and sparse coding for unsupervised clustering.” Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010.

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material Why do our results differ from Sprechmann, et al.’s? Hypothesis The problems lie in the author’s choice of similarity measures, S = | A | t | A | . Possible Problem: Normalization ( A ’s cols not of constant l 2 norm) Illustration of Problem • Assume that A contains only positive entries. Then S = | A | t | A | = A t A and the ij th entry of S is < α i , α j > = � α i �� α j � cos( α i , α j ) . • Nearly orthogonal vectors are more similar than co-linear ones if the products of their norms are big enough. “more similar”

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material Attempted Solution Normalized each column of A before computing S . However, this caused little change in the results, due to the l 2 norms of A ’s columns being almost constant. Figure : The normalized histogram of the l 2 norm of the columns of A. Note : The l 2 norms of most columns in A are in [3 , 4]. Histogram of the l2−norm of the columns of A (Normalized) 0.7 0.6 0.5 Emirical probability 0.4 0.3 0.2 0.1 0 2 3 4 5 6 7 8 9 10 11 Norm interval

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material Possible Problem: Absolute Value (use | A | t | A | instead of | A t A | ) Illustration of Problem a 1 = (1 , − 1) t and a 2 = (1 , 1) t , which are orthogonal vectors, become co-linear when the entries are replaced by their absolute values. Attempted Solution In the experiments, changing | A || A | t to | AA t | does not significantly improve the results, because among all the entries of A associated with MNIST data, only ≈ 13 . 5% are negative.

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material In the last part of Sprechmann et al, the authors state: “We observed that best results are obtained for all the experiments when the initial dictionaries in the learning stage are constructed by randomly selecting signals from the training set. If the size of the dictionary compared to the dimension of the data is small, [it] is better to first partition the dataset (using for example Euclidean k-means) in order to obtain a more representative sample.” Algorithm Use k -means to cluster the training signals, instead of spectral clustering.

When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, - PowerPoint PPT Presentation

The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor:

The Dictionary ADT The dictionary ADT models a searchable collection findElement(k): if the

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

CMSC 206 Dictionaries and Hashing The Dictionary ADT n a dictionary (table) is an abstract

Sparse Coding and Dictionary Learning for Image Analysis Part II: Dictionary Learning for signal

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Where eco-tourism meets business park.. Where economic opportunity meets people! Westmead

Dictionaries A Good morning dictionary English: Good morning Spanish: Buenas das

Hashing - Introduction Dictionary Dictionary = a dynamic set that supports the = a dynamic set

Dictionary lookup Suppose youre looking up a word in the dictionary (paper one, not

The dictionary problem. A dictionary can be seen as a database of records; in each record we

6. Dictionary models for text compression Previous techniques: Predictive, statistical One

Hash- Tables Introduction Dictionary Dictionary stores key-value pairs Find( k ) Insert( k

Agenda Announcements Dictionary please snarf code for class today

Dictionaries and Sets Ali Taheri Sharif University of Technology Spring 2019 Outline 1.

Test of Time Award Online Dictionary Learning for Sparse Coding Julien Mairal, Francis Bach, Jean

ArtsSemNet : From Bilingual Dictionary To Bilingual Semantic Network Ivanka Atanassova

embeddings with (almost) no bilingual data Mik ikel l Art rtetxe, Gorka Labaka, Eneko Agirre

THE PRESENTATION OF A PLANNED TRILINGUAL LOGISTICS TERMINOLOGY DICTIONARY FOR LEARNERS RENATA

Development The development of a national IBM Strategy Borut Eren Border Management and Visa

Logical Categories and Parts of Speech as Structuring Devices in Pollux Onomasticon CHS

Structural Differences Between Two Graphs through Hierarchies Daniel Archambault 1 1 GRAVITE,

Highland Park Public Schools Enrollment Projection and Facility Utilization Study March, 2017

Upper Saddle River Schools Enrollment Study December, 2016 Ross Haber and Associates Selected