(B) Di Dista tance-based syste tems Mich Michael ael Bieh - - PowerPoint PPT Presentation
(B) Di Dista tance-based syste tems Mich Michael ael Bieh - - PowerPoint PPT Presentation
(B) Di Dista tance-based syste tems Mich Michael ael Bieh Biehl Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence University of Groningen www.cs.rug.nl/biehl Intr troducti tion: supervised learning,
IAC Winter School 2018, La Laguna
2
Intr troducti tion:
- supervised learning, clasification, regression
- machine learning “vs.” statistical modeling
Ea Early (importa tant! t!) syste tems
- linear threshold classifier, Rosenblatt’s Perceptron
- adaptive linear neuron, Widrow and Hoff’s Adaline
From Perceptr tron to Support t Vecto tor Machine
- large margin classification
- beyond linear separability
Di Dista tance-based syste tems
- prototypes: K-means and Vector Quantization
- from K-Nearest_Neighbors to Learning Vector Quantization
- adaptive distance measures and relevance learning
IAC Winter School 2018, La Laguna
3
- v
- verv
erview iew
Basic concepts ts of similarity ty / dista tance based classificati tion prototype based systems: Vector Quantization, K-means (K) Nearest Neighbor classifier Learning Vector Quantization (LVQ) Di Dista tance measures and Relevance Learning predefined distances, e.g. divergence based LVQ adaptive distances, e.g. Matrix Relevance LVQ
Di Dista tance-based classificati tion
IAC Winter School 2018, La Laguna
dista tance-based classifiers
a simple dista tance-based syste tem: (K) NN classifier
- store a set of labeled examples
- classify a query according to the
label of the Nearest t Neighbor (or the majority of K K NN NN)
- piece-wise linear decision
boundaries according to (e.g.) Euclidean dista tance from all examples
? ?
N-dim. feature space
+ conceptually simple,
+ no training phase + only one parameter (K)
- expensive (storage, computation)
- sensitive to mislabeled data
- overly complex decision boundaries
IAC Winter School 2018, La Laguna
proto toty type-based classificati tion
- represent the data by one or
several proto toty types per class
- classify a query according to the
label of the nearest t proto toty type (or alternative schemes)
- local decision boundaries acc.
to (e.g.) Euclidean distances
+ +
+ robust, low storage needs,
little computational effort
- model selection: number of prototypes per class, etc.
requires training: placement of prototypes in feature space
N-dim. feature space
? ? parameterization in feature space, interpretability Learning Vecto tor Quanti tizati tion [Ko Koho honen nen] ]
IAC Winter School 2018, La Laguna
set of proto toty types carrying class-labels based on dissimilarity/distance measure nearest t proto toty type classifier (NPC): given - determine the winner
- assign x to the class
most prominent example: (squared) Eu Euclidean dista tance
Nearest t Proto toty type Classifier
reasonable requirements:
IAC Winter School 2018, La Laguna
∙ identification of proto
toty type vecto tors from labeled example data
∙ distance based classificati
tion (e.g. Euclidean)
Learning Vecto tor Quanti tizati tion
N-dimensional data, feature vectors
- initi
tialize prototype vectors for different classes competi
titi tive learning: LVQ1 LVQ1 [Kohonen]
- identify the wi
winne nner (closest prototype)
- present a single
single ex examp ample le
- move the winner
- closer to
towards th the data ta (same class)
- away from th
the data ta (different class)
IAC Winter School 2018, La Laguna
∙ identification of proto toty type vecto tors from labeled example data ∙ distance based classificati tion (e.g. Euclidean)
Learning Vecto tor Quanti tizati tion
N-dimensional data, feature vectors ∙ te tesselati tion of featu ture space [piece-wise linear] ∙ dista tance-based classificati tion [here: Euclidean distances]
∙ generalizati
tion ability ty correct classification of new data
∙ aim: discriminati
tion of classes ( ≠ vector quantization
- r density estimation )
IAC Winter School 2018, La Laguna
sequential pres. of labelled labelled examples … the winner takes it all:
learning rate
many heuristic variants/modifications:
- learning rate schedules ηw (t)
- update more than one prototype per step
ite terati tive tr training procedure: randomized initial , e.g. close to the class-conditional means
LVQ1 LVQ1
LVQ1 update te ste tep:
IAC Winter School 2018, La Laguna
LVQ1 update te ste tep: LVQ1-like update te for generalized dista tance:
- addtl. requirement:
update decreases (increases) distance if classes coincide (are different)
LVQ1 LVQ1
IAC Winter School 2018, La Laguna
remark: th the curse of dimension ?
concentr
trati tion of dista tances for large N „distance based methods are bound to fail in high dimensions“
??? ???
LVQ: LVQ:
- prototypes are not just random data points
- carefully selected low-noise representatives of the data
- distances of a given data point to prototypes are com
compared pared projecti tion to to non-tr trivial low low-dim dimen ension sional su al subspace! bspace!
IAC Winter School 2018, La Laguna
cost t functi tion based LVQ
- ne example: Generalized LVQ (GLVQ) cost
t functi tion [Sato&Yamada, 1995] two winning proto toty types: minimize E favors
- small number of misclassifications, e.g. with
- large margins between classes
- s
small , larg all , large e
- class-ty
typical proto toty types
dJ dK E = X
m
e(xm) with e(xm) = ϕ d(wJ, xm) − d(wK, xm) d(wJ, xm) + d(wK, xm)
IAC Winter School 2018, La Laguna
GL GLVQ VQ
tr training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on non-negati tive, differenti tiable dista tance
IAC Winter School 2018, La Laguna
GL GLVQ VQ
tr training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on non-negati tive, differenti tiable dista tance
IAC Winter School 2018, La Laguna
GL GLVQ VQ
moves prototypes towards / away from sample with prefactors tr training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on non-negati tive, differenti tiable dista tance
IAC Winter School 2018, La Laguna
17
fixed, pre-defined dista tance measures: heuristic LVQ1, GLVQ (or more general cost function based LVQ): can be based on general, differentiable distances, e.g. Minkowski measures
Alte ternati tive dista tance measures dp(w, x) = X
j
|wj − xj|p 1/p
pos possible w ible work-
- rk-flow
low
- select several distance measures according to prior knowledge
- r a data-driven choice in a preprocessing step
- compare performance of various measures
examples: kernelized distances divergences (statistics)
IAC Winter School 2018, La Laguna
18
Kernelized Kernelized dista tances
rewrite squared Euclidean distance in terms of dot-product analagous: distance measure associated with general inner product
- r kernel function
dK(w, x) = K(w, w) − 2K(w, x) + K(x, x) d(w, x) = w2 − 2w · x + x2
e.g. Gaussian Kernel implicit mapping to high-dimensional space for better seperability of classes, similar: Support Vector Machine
K(x, y) = exp ✓ −(x − y)2 2σ2 ◆ with kernel width σ
Biehl, Hammer, Villmann. 2014 Distance measures for prototype-based classification 2016 Prototype-based models in machine learning
IAC Winter School 2018, La Laguna
Relev Relevan ance Learn ce Learnin ing
elegant approach: Relevance Learning / adapti tive dista tances
- employ a parameterized distance measure
with only the mathematical form fixed in advance
- optimize its parameters in the training process
- adaptive, data driven dissimilarity
example: Matr trix Relevance LVQ
- data
ta-driven optimization of prototypes and relevance matrix
- in the same training process (≠ pre-processing )
IAC Winter School 2018, La Laguna
Generalized Matr trix Relevance LVQ:
generalized quadrati tic dista tance in LVQ:
[Schneider, Biehl, Hammer, 2009]
d(w, x) = (w − x)> Λ (w − x) GM GMLVQ VQ = [ Ω (w − x) ]2
IAC Winter School 2018, La Laguna
GM GMLVQ VQ
generalized quadrati tic dista tance in LVQ:
[Schneider, Biehl, Hammer, 2009]
variants ts:
- ne g
global, lobal, several local, local, class-wise relevance matrices recta tangular
low-dim. representation / visualization
[Bunte et al., 2012] diag diagon
- nal
al matrices: : single feature weights [Hammer et al., 2002] tr training: adaptation of prototypes and distance measure guided by GLVQ cost function
= [ Ω (w − x) ]2 d(w, x) = (w − x)> Λ (w − x) Generalized Matr trix Relevance LVQ:
IAC Winter School 2018, La Laguna
22
inte terpreta tati tion
summarizes
- the contribution of a single dimension
- th
the relevance of original features in the classifier Note: interpretation assumes implicitly that features have equal order of magnitude e.g. after z-score-transformation → (averages over data set)
Λij
quantifies the contribution of the pair
- f features (i,j) to the distance
after training: proto toty types represent typical class properties or subtypes Relevance Matr trix
IAC Winter School 2018, La Laguna
But t th this is just t Mah Mahalon alonobis
- bis dista
tance… …
[Mahalonobis, 1936] S covariance matr trix of random vectors (calculated once from the data, fixed definition, not adaptive)
x ∈ RN
if you insist… (‘two point version’) So it t is a generalized Mah Mahalon alonobis
- bis dista
tance ? No. a generalized broccoli
E = ~ω
a generalization
- f Ohm’s Law
dM(x, y) = q (x − y)> S1 (x − y)
IAC Winter School 2018, La Laguna
Relevance Matr trix LVQ
Generalized Matr trix LVQ (G (GMLVQ) MLVQ) gradient terms for single example xm
- ptimization of proto
toty types and dista tance measure
IAC Winter School 2018, La Laguna
Relevance Matr trix LVQ
- ptimization of
proto toty type positi tions dista tance measure(s) in one training process (≠ pre-processing) moti tivati tion: im improv proved perf ed perform
- rman
ance ce
- weighting of features and pairs of features
simplified classificati tion schemes
- elimination of non-informative, noisy features
- discriminative low-dimensional representation
insight t into to th the data ta / classificati tion problem
- identification of most discriminative features
- incorporation of prior knowledge (e.g. structure of Ω)
Brain Inspired Computing - BrainComp, Cetraro, June 2017
26
Iris flow Iris flower er data ta
GMLVQ prototypes relevance matrix
Relevance Matr trix LVQ
Brain Inspired Computing - BrainComp, Cetraro, June 2017
27
empirical observation / theory: relevance matrix becomes singular, dominated by very few eigenvectors prevents over-fitting in high-dim. feature spaces facilitates discriminative visualization of datasets confirms: Setosa well-separated from Virginica / Versicolor
Relevance Matr trix LVQ
Brain Inspired Computing - BrainComp, Cetraro, June 2017
projection on first eigenvector projection on second eigenvector
a multi ti-class example
classificati tion of
- f cof
coffee s ee sam amples ples based on hyperspectral data (256-dim. feature vectors) [U. Seiffert et al., IFF Magdeburg]
prototypes
IAC Winter School 2018, La Laguna
relate ted schemes
Relevance LVQ variants ts local, rectangular, structured, restricted... relevance matrices for visualization, functional data, texture recognition, etc. relevance learning in Robust Soft LVQ, Supervised NG, etc. combination of distances for mixed data ... Relevance Learning related schemes in supervised learning ... RBF Netw tworks [Backhaus et al., 2012] Neighborhood Component t Analysis [Goldberger et al., 2005] Large Margin Nearest t Neighbor [Weinberger et al., 2006, 2010] and many more! Linear Di Discriminant t Analysis (LDA DA) ~ one prototype per class + global matrix, different objective function!
IAC Winter School 2018, La Laguna
30
http ttp://matl tlabserver.cs.rug.nl/gmlvqweb/web/ Matl tlab code: code: Relevance and Matrix adaptation in Learning Vector Quantization (GRLVQ, GMLVQ and LiRaM LVQ) [K. Bunte] http ttp://www.cs.rug.nl/~biehl/
lin links ks
Relate ted pre- and re-prints ts etc tc.: A no-nonsense beginners’ to tool for GMLVQ: http ttp://www.cs.rug.nl/~biehl/gml gmlvq vq A Sc Scikit-L
- Learn
n compatible collection of Pyth thon code for LVQ and variants, including GMLVQ, [B. Paaßen et al., CITEC Bielefeld] http ttps://te techfak.uni-bielefeld.de/~ /~bpaas bpaassen en/gl glvq vq.zip http ttps://gith thub.com/MrN MrNuggelz elz/sk sklearn-glvq learn-glvq
IAC Winter School 2018, La Laguna
31
Ref Referen erences ces
- M. Biehl, B. Hammer, T. Villmann.
Prototype-based models in machine learning. WIREs Es Cogniti tive Sc Sciene
- nes. 2016. Advanced review, 20 pages
- M. Biehl, B. Hammer , T. Villmann.
Distance measures for prototype based classification. In: Grandinetti L, Lippert T, Petkov N (eds.). Brain-Inspired Computi
- ting. Lecture Notes in Computer Science.
Springer International Publishing, pp. 100–116, 2014.
- M. Biehl, B. Hammer, T. Villmann.
Prototype-based models for the supervised learning of classification. In: Pr Proceed eedings ings IAU IAU Sy Symp mposium sium No
- No. 325, 2016
- M. Brescia, S.G. Djogovski, E. Feigelson, G. Longo & S. Cavuoti (eds.),