(B) Di Dista tance-based syste tems Mich Michael ael Bieh - - PowerPoint PPT Presentation

b di dista tance based syste tems
SMART_READER_LITE
LIVE PREVIEW

(B) Di Dista tance-based syste tems Mich Michael ael Bieh - - PowerPoint PPT Presentation

(B) Di Dista tance-based syste tems Mich Michael ael Bieh Biehl Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence University of Groningen www.cs.rug.nl/biehl Intr troducti tion: supervised learning,


slide-1
SLIDE 1

Mich Michael ael Bieh Biehl Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence University of Groningen www.cs.rug.nl/biehl

(B) Di Dista tance-based syste tems

slide-2
SLIDE 2

IAC Winter School 2018, La Laguna

2

Intr troducti tion:

  • supervised learning, clasification, regression
  • machine learning “vs.” statistical modeling

Ea Early (importa tant! t!) syste tems

  • linear threshold classifier, Rosenblatt’s Perceptron
  • adaptive linear neuron, Widrow and Hoff’s Adaline

From Perceptr tron to Support t Vecto tor Machine

  • large margin classification
  • beyond linear separability

Di Dista tance-based syste tems

  • prototypes: K-means and Vector Quantization
  • from K-Nearest_Neighbors to Learning Vector Quantization
  • adaptive distance measures and relevance learning
slide-3
SLIDE 3

IAC Winter School 2018, La Laguna

3

  • v
  • verv

erview iew

Basic concepts ts of similarity ty / dista tance based classificati tion prototype based systems: Vector Quantization, K-means (K) Nearest Neighbor classifier Learning Vector Quantization (LVQ) Di Dista tance measures and Relevance Learning predefined distances, e.g. divergence based LVQ adaptive distances, e.g. Matrix Relevance LVQ

slide-4
SLIDE 4

Di Dista tance-based classificati tion

slide-5
SLIDE 5

IAC Winter School 2018, La Laguna

dista tance-based classifiers

a simple dista tance-based syste tem: (K) NN classifier

  • store a set of labeled examples
  • classify a query according to the

label of the Nearest t Neighbor (or the majority of K K NN NN)

  • piece-wise linear decision

boundaries according to (e.g.) Euclidean dista tance
 from all examples

? ?

N-dim. feature space

+ conceptually simple,

+ no training phase + only one parameter (K)

  • expensive (storage, computation)
  • sensitive to mislabeled data
  • overly complex decision boundaries
slide-6
SLIDE 6

IAC Winter School 2018, La Laguna

proto toty type-based classificati tion

  • represent the data by one or

several proto toty types per class

  • classify a query according to the

label of the nearest t proto toty type (or alternative schemes)

  • local decision boundaries acc.

to (e.g.) Euclidean distances

+ +

+ robust, low storage needs,

little computational effort

  • model selection: number of prototypes per class, etc.

requires training: placement of prototypes in feature space

N-dim. feature space

? ? parameterization in feature space, interpretability Learning Vecto tor Quanti tizati tion [Ko Koho honen nen] ]

slide-7
SLIDE 7

IAC Winter School 2018, La Laguna

set of proto toty types carrying class-labels based on dissimilarity/distance measure nearest t proto toty type classifier (NPC): given - determine the winner

  • assign x to the class

most prominent example: (squared) Eu Euclidean dista tance

Nearest t Proto toty type Classifier

reasonable requirements:

slide-8
SLIDE 8

IAC Winter School 2018, La Laguna

∙ identification of proto

toty type vecto tors from labeled example data

∙ distance based classificati

tion (e.g. Euclidean)

Learning Vecto tor Quanti tizati tion

N-dimensional data, feature vectors

  • initi

tialize prototype vectors for different classes competi

titi tive learning: LVQ1 LVQ1 [Kohonen]

  • identify the wi

winne nner (closest prototype)

  • present a single

single ex examp ample le

  • move the winner
  • closer to

towards th the data ta (same class)

  • away from th

the data ta (different class)

slide-9
SLIDE 9

IAC Winter School 2018, La Laguna

∙ identification of proto toty type vecto tors from labeled example data ∙ distance based classificati tion (e.g. Euclidean)

Learning Vecto tor Quanti tizati tion

N-dimensional data, feature vectors ∙ te tesselati tion of featu ture space [piece-wise linear] ∙ dista tance-based classificati tion [here: Euclidean distances]

∙ generalizati

tion ability ty correct classification of new data

∙ aim: discriminati

tion of classes ( ≠ vector quantization

  • r density estimation )
slide-10
SLIDE 10

IAC Winter School 2018, La Laguna

sequential pres. of labelled labelled examples … the winner takes it all:

learning rate

many heuristic variants/modifications:

  • learning rate schedules ηw (t)
  • update more than one prototype per step

ite terati tive tr training procedure: randomized initial , e.g. close to the class-conditional means

LVQ1 LVQ1

LVQ1 update te ste tep:

slide-11
SLIDE 11

IAC Winter School 2018, La Laguna

LVQ1 update te ste tep: LVQ1-like update te for generalized dista tance:

  • addtl. requirement:

update decreases (increases) distance if classes coincide (are different)

LVQ1 LVQ1

slide-12
SLIDE 12

IAC Winter School 2018, La Laguna

remark: th the curse of dimension ?

concentr

trati tion of dista tances for large N „distance based methods are bound to fail in high dimensions“

??? ???

LVQ: LVQ:

  • prototypes are not just random data points
  • carefully selected low-noise representatives of the data
  • distances of a given data point to prototypes are com

compared pared projecti tion to to non-tr trivial low low-dim dimen ension sional su al subspace! bspace!

slide-13
SLIDE 13

IAC Winter School 2018, La Laguna

cost t functi tion based LVQ

  • ne example: Generalized LVQ (GLVQ) cost

t functi tion [Sato&Yamada, 1995] two winning proto toty types: minimize E favors

  • small number of misclassifications, e.g. with
  • large margins between classes
  • s

small , larg all , large e

  • class-ty

typical proto toty types

dJ dK E = X

m

e(xm) with e(xm) = ϕ d(wJ, xm) − d(wK, xm) d(wJ, xm) + d(wK, xm)

slide-14
SLIDE 14

IAC Winter School 2018, La Laguna

GL GLVQ VQ

tr training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on non-negati tive, differenti tiable dista tance

slide-15
SLIDE 15

IAC Winter School 2018, La Laguna

GL GLVQ VQ

tr training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on non-negati tive, differenti tiable dista tance

slide-16
SLIDE 16

IAC Winter School 2018, La Laguna

GL GLVQ VQ

moves prototypes towards / away from sample with prefactors tr training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on non-negati tive, differenti tiable dista tance

slide-17
SLIDE 17

IAC Winter School 2018, La Laguna

17

fixed, pre-defined dista tance measures: heuristic LVQ1, GLVQ (or more general cost function based LVQ): can be based on general, differentiable distances, e.g. Minkowski measures

Alte ternati tive dista tance measures dp(w, x) = X

j

|wj − xj|p 1/p

pos possible w ible work-

  • rk-flow

low

  • select several distance measures according to prior knowledge
  • r a data-driven choice in a preprocessing step
  • compare performance of various measures

examples: kernelized distances divergences (statistics)

slide-18
SLIDE 18

IAC Winter School 2018, La Laguna

18

Kernelized Kernelized dista tances

rewrite squared Euclidean distance in terms of dot-product analagous: distance measure associated with general inner product

  • r kernel function

dK(w, x) = K(w, w) − 2K(w, x) + K(x, x) d(w, x) = w2 − 2w · x + x2

e.g. Gaussian Kernel implicit mapping to high-dimensional space for better seperability of classes, similar: Support Vector Machine

K(x, y) = exp ✓ −(x − y)2 2σ2 ◆ with kernel width σ

Biehl, Hammer, Villmann. 2014 Distance measures for prototype-based classification 2016 Prototype-based models in machine learning

slide-19
SLIDE 19

IAC Winter School 2018, La Laguna

Relev Relevan ance Learn ce Learnin ing

elegant approach: Relevance Learning / adapti tive dista tances

  • employ a parameterized distance measure

with only the mathematical form fixed in advance

  • optimize its parameters in the training process
  • adaptive, data driven dissimilarity

example: Matr trix Relevance LVQ

  • data

ta-driven optimization of prototypes and relevance matrix

  • in the same training process (≠ pre-processing )
slide-20
SLIDE 20

IAC Winter School 2018, La Laguna

Generalized Matr trix Relevance LVQ:

generalized quadrati tic dista tance in LVQ:

[Schneider, Biehl, Hammer, 2009]

d(w, x) = (w − x)> Λ (w − x) GM GMLVQ VQ = [ Ω (w − x) ]2

slide-21
SLIDE 21

IAC Winter School 2018, La Laguna

GM GMLVQ VQ

generalized quadrati tic dista tance in LVQ:

[Schneider, Biehl, Hammer, 2009]

variants ts:

  • ne g

global, lobal, several local, local, class-wise relevance matrices recta tangular

low-dim. representation / visualization

[Bunte et al., 2012] diag diagon

  • nal

al matrices: : single feature weights [Hammer et al., 2002] tr training: adaptation of prototypes and distance measure guided by GLVQ cost function

= [ Ω (w − x) ]2 d(w, x) = (w − x)> Λ (w − x) Generalized Matr trix Relevance LVQ:

slide-22
SLIDE 22

IAC Winter School 2018, La Laguna

22

inte terpreta tati tion

summarizes

  • the contribution of a single dimension
  • th

the relevance of original features in the classifier Note: interpretation assumes implicitly that features have equal order of magnitude e.g. after z-score-transformation → (averages over data set)

Λij

quantifies the contribution of the pair

  • f features (i,j) to the distance

after training: proto toty types represent typical class properties or subtypes Relevance Matr trix

slide-23
SLIDE 23

IAC Winter School 2018, La Laguna

But t th this is just t Mah Mahalon alonobis

  • bis dista

tance… …

[Mahalonobis, 1936] S covariance matr trix of random vectors (calculated once from the data, fixed definition, not adaptive)

x ∈ RN

if you insist… (‘two point version’) So it t is a generalized Mah Mahalon alonobis

  • bis dista

tance ? No. a generalized broccoli

E = ~ω

a generalization

  • f Ohm’s Law

dM(x, y) = q (x − y)> S1 (x − y)

slide-24
SLIDE 24

IAC Winter School 2018, La Laguna

Relevance Matr trix LVQ

Generalized Matr trix LVQ (G (GMLVQ) MLVQ) gradient terms for single example xm

  • ptimization of proto

toty types and dista tance measure

slide-25
SLIDE 25

IAC Winter School 2018, La Laguna

Relevance Matr trix LVQ

  • ptimization of

proto toty type positi tions dista tance measure(s) in one training process (≠ pre-processing) moti tivati tion: im improv proved perf ed perform

  • rman

ance ce

  • weighting of features and pairs of features

simplified classificati tion schemes

  • elimination of non-informative, noisy features
  • discriminative low-dimensional representation

insight t into to th the data ta / classificati tion problem

  • identification of most discriminative features
  • incorporation of prior knowledge (e.g. structure of Ω)
slide-26
SLIDE 26

Brain Inspired Computing - BrainComp, Cetraro, June 2017

26

Iris flow Iris flower er data ta

GMLVQ prototypes relevance matrix

Relevance Matr trix LVQ

slide-27
SLIDE 27

Brain Inspired Computing - BrainComp, Cetraro, June 2017

27

empirical observation / theory: relevance matrix becomes singular, dominated by very few eigenvectors prevents over-fitting in high-dim. feature spaces facilitates discriminative visualization of datasets confirms: Setosa well-separated from Virginica / Versicolor

Relevance Matr trix LVQ

slide-28
SLIDE 28

Brain Inspired Computing - BrainComp, Cetraro, June 2017

projection on first eigenvector projection on second eigenvector

a multi ti-class example

classificati tion of

  • f cof

coffee s ee sam amples ples based on hyperspectral data (256-dim. feature vectors) [U. Seiffert et al., IFF Magdeburg]

prototypes

slide-29
SLIDE 29

IAC Winter School 2018, La Laguna

relate ted schemes

Relevance LVQ variants ts 
 local, rectangular, structured, restricted... relevance matrices for visualization, functional data, texture recognition, etc. relevance learning in Robust Soft LVQ, Supervised NG, etc. combination of distances for mixed data ... Relevance Learning related schemes in supervised learning ... RBF Netw tworks [Backhaus et al., 2012] Neighborhood Component t Analysis [Goldberger et al., 2005] Large Margin Nearest t Neighbor [Weinberger et al., 2006, 2010] and many more! Linear Di Discriminant t Analysis (LDA DA) ~ one prototype per class + global matrix, different objective function!

slide-30
SLIDE 30

IAC Winter School 2018, La Laguna

30

http ttp://matl tlabserver.cs.rug.nl/gmlvqweb/web/ Matl tlab code: code: Relevance and Matrix adaptation in Learning Vector Quantization (GRLVQ, GMLVQ and LiRaM LVQ) [K. Bunte] http ttp://www.cs.rug.nl/~biehl/

lin links ks

Relate ted pre- and re-prints ts etc tc.: A no-nonsense beginners’ to tool for GMLVQ: http ttp://www.cs.rug.nl/~biehl/gml gmlvq vq A Sc Scikit-L

  • Learn

n compatible collection of Pyth thon code for LVQ and variants, including GMLVQ, [B. Paaßen et al., CITEC Bielefeld] http ttps://te techfak.uni-bielefeld.de/~ /~bpaas bpaassen en/gl glvq vq.zip http ttps://gith thub.com/MrN MrNuggelz elz/sk sklearn-glvq learn-glvq

slide-31
SLIDE 31

IAC Winter School 2018, La Laguna

31

Ref Referen erences ces

  • M. Biehl, B. Hammer, T. Villmann.

Prototype-based models in machine learning. WIREs Es Cogniti tive Sc Sciene

  • nes. 2016. Advanced review, 20 pages
  • M. Biehl, B. Hammer , T. Villmann.

Distance measures for prototype based classification. In: Grandinetti L, Lippert T, Petkov N (eds.). Brain-Inspired Computi

  • ting. Lecture Notes in Computer Science.

Springer International Publishing, pp. 100–116, 2014.

  • M. Biehl, B. Hammer, T. Villmann.

Prototype-based models for the supervised learning of classification. In: Pr Proceed eedings ings IAU IAU Sy Symp mposium sium No

  • No. 325, 2016
  • M. Brescia, S.G. Djogovski, E. Feigelson, G. Longo & S. Cavuoti (eds.),

10 pages, 2016. Overview arti ticles which relate te directl tly to to th the lectu tures and provide furth ther refs.