ecg782 multidimensional
play

ECG782: Multidimensional Digital Signal Processing Object - PowerPoint PPT Presentation

ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting 3 Object Recognition


  1. ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/

  2. 2 Outline • Knowledge Representation • Statistical Pattern Recognition • Neural Networks • Boosting

  3. 3 Object Recognition • Pattern recognition is a fundamental component of machine vision • Recognition is high-level image analysis ▫ From the bottom-up perspective (pixels  objects) ▫ Many software packages exist to easily implement recognition algorithms (E.g. Weka Project, R package) • Goal of object recognition is to “learn” characteristics that help distinguish object of interest ▫ Most are binary problems

  4. 4 Knowledge Representation • Syntax – specifies the symbols that may be used and ways they may be arranged • Semantics – specifies how meaning is embodied in syntax • Representation – set of syntactic and semantic conventions used to describe things • Sonka book focuses on artificial intelligence (AI) representations ▫ More closely related to human cognition modeling (e.g. how humans represent things) ▫ Not as popular in vision community

  5. 5 Descriptors/Features • Most common representation in vision • Descriptors (features) usually represent some scalar property of an object ▫ These are often combined into feature vectors • Numerical feature vectors are inputs for statistical pattern recognition techniques ▫ Descriptor represents a point in feature space

  6. 6 Statistical Pattern Recognition • Object recognition = pattern recognition ▫ Pattern – measureable properties of object • Pattern recognition steps: ▫ Description – determine right features for task ▫ Classification – technique to separate different object “classes” • Separable classes – hyper-surface exists perfectly distinguish objects ▫ Hyper-planes used for linearly separable classes ▫ This is unlikely in real-world scenarios

  7. 7 General Classification Principles • A statistical classifiers takes in a • Decision rule 𝑜 -dimensional feature of an ▫ 𝑒 𝒚 = 𝜕 𝑠 ⇔ 𝑕 𝑠 𝒚 = object and has a single output 𝑡=1,…,𝑆 𝑕 𝑡 (𝒚) max ▫ The output is one of the 𝑆 available class symbols ▫ Which subset (region) provides (identifiers) maximum discrimination • Decision rule – describes • Linear discriminant functions relations between classifier are simple and often used in inputs and output linear classifier ▫ 𝑒 𝒚 = 𝜕 𝑠 ▫ 𝑕 𝑠 𝒚 = 𝑟 𝑠0 + 𝑟 𝑠1 𝑦 1 + ⋯ + ▫ Divides feature space into 𝑆 𝑟 𝑠𝑜 𝑦 𝑜 disjoint subsets 𝐿 𝑠 • Must use non-linear for more • Discrimination hyper-surface is complex problems the border between subsets ▫ Trick is to transform the • Discrimination function original feature space into a higher dimensional space ▫ 𝑕 𝑠 𝒚 ≥ 𝑕 𝑡 𝒚 , 𝑡 ≠ 𝑠  Can use a linear classifier in  𝒚 ∈ 𝐿 𝑠 the higher dim space • Discrimination hyper-surface ▫ 𝑕 𝑠 𝒚 = 𝒓 𝒔 ⋅ Φ 𝒚 between class regions  Φ(𝒚) – non-linear mapping to ▫ 𝑕 𝑠 𝒚 − 𝑕 𝑡 𝒚 = 0 higher dimensional space

  8. 8 Nearest Neighbors • Classifier based on minimum • Nearest neighbor (NN) classifier distance principle ▫ Very simple classifier uses multiple exemplars per class • Minimum distance classifier labels pattern 𝒚 into the class ▫ Take same label as closest with closest exemplar exemplar • k-NN classifier ▫ 𝑒 𝒚 = 𝑏𝑠𝑕𝑛𝑗𝑜 𝑡 |𝒘 𝑡 − 𝒚| ▫ More robust version by ▫ 𝒘 𝑡 - exemplars (sample examining 𝑙 closest points and pattern) for class 𝜕 𝑡 taking most often occurring • With a single exemplar per label class, results in linear classifier • Advantage: easy “training” • Problems: computational complexity ▫ Scales with number of exemplars and dimensions ▫ Must do many comparisons ▫ Can improve performance with K-D trees

  9. 9 Classifier Optimization • Discriminative classifiers are deterministic • Discriminative function ▫ Pattern 𝒚 always mapped to same class ▫ 𝑕 𝑠 𝑦 = 𝑞 𝑦 𝜕 𝑠 𝑞 𝜕 𝑠 • Would like to have an optimal classifier ▫ Corresponds to posteriori probability ▫ Classifier the minimizes the errors in 𝑞(𝜕 𝑠 |𝑦) classification • Define loss function to optimize based on • Posteriori probability describes how often classifier parameters 𝑟 pattern x is from class 𝜕 𝑠 𝐾 𝑟 ∗ = min ▫ 𝑟 𝐾 𝑟 Optimal decision is to classify 𝑦 to class 𝜕 𝑠 if • posteriori 𝑄(𝜕 𝑠 |𝑦) is highest 𝑒 𝑦, 𝑟 = 𝜕 ▫ ▫ However, we do not know the posteriori • Minimum error criterion (Bayes criterion, • Bayes theorem maximum likelihood) loss function 𝑞 𝜕 𝑡 𝑦 = 𝑞 𝑦 𝜕 𝑡 𝑞 𝜕 𝑡 ▫ 𝜇 𝜕 𝑠 𝜕 𝑡 - loss incurred if classifier ▫ incorrectly labels object 𝜕 𝑠 𝑞 𝑦 Since 𝑞(𝑦) is a constant and prior 𝑞 𝜕 𝑡 is • 𝜇 𝜕 𝑠 𝜕 𝑡 = 1 for 𝑠 ≠ 𝑡  known, • Mean loss Just need to maximize likelihood 𝑞 𝑦 𝜕 𝑡 ▫ ▫ 𝐾 𝑟 = • This is desirable because the likelihood is 𝑆 something we can learn using training data 𝜇 𝑒 𝑦, 𝑟 𝜕 𝑡 𝑞 𝑦 𝜕 𝑡 𝑞 𝜕 𝑡 𝑒𝑦 𝑡=1 𝑌 𝑞 𝜕 𝑡 - prior probability of class  𝑞 𝑦 𝜕 𝑡 - conditional probability density 

  10. 10 Classifier Training • Supervised approach • Usually, larger datasets result in better generalization • Training set is given with feature and associated class ▫ Some state-of-the-art classifiers use millions of label examples ▫ 𝑈 = { 𝒚 𝑗 , 𝑧 𝑗 } ▫ Try to have enough samples ▫ Used to set the classifier to statistical cover space parameters 𝒓 • N Cross-fold validation/testing • Learning methods should be ▫ Divide training data into a inductive to generalize well train and validation set ▫ Represent entire feature ▫ Only train using training data space and check results on ▫ E.g. work even on unseen validation set examples ▫ Can be used for “bootstrapping” or to select best parameters after partitioning data N times

  11. 11 Classifier Learning • Probability density estimation ▫ Estimate the probability densities 𝑞(𝒚|𝜕 𝑠 ) and priors 𝑞(𝜕 𝑠 ) • Parametric learning ▫ Typically, the distribution 𝑞 𝒚 𝜕 𝑠 shape is known but the parameters must be learned  E.g. Gaussian mixture model ▫ Like to select a distribution family that can be efficiently estimated such as Gaussians ▫ Prior estimation by relative frequency  𝑞 𝜕 𝑠 = 𝐿 𝑠 /𝐿  Number of objects in class 𝑠 over total objects in training database

  12. 12 Support Vector Machines (SVM) • Maybe the most popular classifier in CV today • SVM is an optimal classification for separable two- class problem ▫ Maximizes the margin (separation) between two classes  generalizable and avoids overfitting ▫ Relaxed constraints for non-separable classes ▫ Can use kernel trick to provide non-linear separating hyper-surfaces • Support vectors – vectors from each class that are closest to the discriminating surface ▫ Define the margin • Rather than explicitly model the likelihood, search for the discrimination function ▫ Don’t waste time modeling densities when class label is all we need

  13. 13 SVM Insight • SVM is designed for binary classification of linearly separable classes • Input 𝒚 is n-dimensional (scaled between [0,1] to normalize) and class label 𝜕 ∈ {−1,1} • Discrimination between classes defined by hyperplane s.t. no training samples are misclassified ▫ 𝒙 ⋅ 𝒚 + 𝑐 = 0  𝒙 – plane normal, 𝑐 offset ▫ Optimization finds “best” separating hyperplane

  14. 14 SVM Power • Final discrimination function ▫ 𝑔 𝑦 = 𝑥 ⋅ 𝑦 + 𝑐 • Re-written using training data ▫ 𝑔 𝑦 = 𝛽 𝑗 𝜕 𝑗 𝑦 𝑗 ⋅ 𝑦 + 𝑐 𝑗∈𝑇𝑊  𝛽 𝑗 - weight of support vector SV ▫ Only need to keep support vectors for classification • Kernel trick ▫ Replace 𝑦 𝑗 ⋅ 𝑦 with non-linear mapping kernel  𝑙 𝑦 𝑗 , 𝑦 = Φ 𝑦 𝑗 ⋅ Φ 𝑦 𝑘 ▫ For specific kernels this can be efficiently computed without doing the warping Φ  Can even map into an infinite dimensional space ▫ Allows linear separation in a higher dimensional space

  15. 15 SVM Resources • More detailed treatment can be found in ▫ Duda , Hart, Stork, “Pattern Classification” • Lecture notes from Nuno Vasconcelos (UCSD) ▫ http://www.svcl.ucsd.edu/courses/ece271B- F09/handouts/SVMs.pdf • SVM software ▫ LibSVM [link] ▫ SVMLight [link]

  16. 16 Cluster Analysis • Unsupervised learning method that does not require labeled training data • Divide training set into subsets (clusters) based on mutual similarity of subset elements ▫ Similar objects are in a single cluster, dissimilar objects in separate clusters • Clustering can be performed hierarchically or non- hierarchically • Hierarchical clustering ▫ Agglomerative – each sample starts as its own cluster and clusters are merged ▫ Divisive – the whole dataset starts as a single cluster and is divided • Non-hierarchical clustering ▫ Parametric approaches – assumes a known class-conditioned distribution (similar to classifier learning) ▫ Non-parametric approaches – avoid strict definition of distribution

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend