SLIDE 1 Prototypes and Matrix Relevance Learning in Complex Fourier Space
- M. Straat, M. Kaden, M. Gay, T. Villmann, A. Lampe,
- U. Seiffert, M. Biehl, and F. Melchert
June 26, 2017
SLIDE 2 Overview
A study of classification of time series. In Fourier space: Vectors in Cn. Generalized Matrix Learning Vector Quantization (GMLVQ)
Evaluation and interpretation of the Fourier-space classifiers.
50 100 150 Feature index
0.5 1 1.5 2 2.5 3 Feature value Plane examples
SLIDE 3
Learning Vector Quantization (LVQ)
Dataset of vectors ①m ∈ RN, each carrying class label σm ∈ {1, 2, ..., C} Training: For each class σ, identify prototype(s) ✇ i ∈ RN in feature space that are typical representatives for that class. Aim: Classify novel vectors ①µ, assigning them to the class of the nearest prototype.
SLIDE 4
Figure: LVQ with 5 prototypes per class. Initialized with K-means on each class. Black line: Piece-wise linear decision boundary.
SLIDE 5 d(①, ✇) = (① − ✇)T(① − ✇), sq. Euclidean distance.
1: procedure LVQ 2:
for each training epoch do
3:
for each labeled vector {①, σ} do
4:
{✇ ∗, S∗} ← argmini{d(①, ✇ i)}
5:
✇ ∗ ← ✇ ∗ + ηΨ(S∗, σ)(① − ✇ ∗) Ψ(S, σ) =
if S = σ −1,
- therwise
- Classification of novel data point ①µ:
Closest prototype {✇ ∗, S∗} ← argmin
i
{d(①µ, ✇ i)} Classify ①µ in class S∗ : {①µ, σµ = S∗}
SLIDE 6
GMLVQ
Learn feature relevance and adapt d accordingly.
Adaptive quadratic distance measure: dΩ(①, ✇) = (① − ✇)TΩTΩ(① − ✇).
Update two prototypes upon presentation of {①, σ}.
✇ +: Closest prototype of the same class as ①. ✇ −: Closest prototype of a different class than ①.
Cost one example ①m em = dΩ[✇ +] − dΩ[✇ −] dΩ[✇ +] + dΩ[✇ −] ∈ [−1, 1] . Learning is minimization of the cost with gradient descent: ✇ ± ← ✇ ± − ηw∇✇ ±em Ω ← Ω − ηΩ∇Ωem
SLIDE 7 Time series
f (t) → f (i∆T), i = 0, 1, ..., N−1 Vectors ① ∈ RN. Temporal order of dimensions.
200 400 600 800 1000 Feature index (sample index)
0.5 1 1.5 2 Magnitude Example
SLIDE 8 Training in coefficient space
Approximate f (t) = n
i=1 cigi(t):
Using Chebyshev basis. Using Fourier basis: ① ∈ RN → ①f ∈ Cn. Prototypes ✇ i ∈ Cn and relevances Λ Hermitian.
Figure: 5 Chebyshev basis functions Figure: Fourier complex sinusoid
- F. Melchert, U. Seiffert, and M. Biehl, Polynomial Approximation of Spectral Data in LVQ and Relevance Learning,
in Workshop on New Challenges in Neural Computation 2015
SLIDE 9 Fourier: Time ⇆ Frequency
Matrix ❋ ∈ Cn×N with rows e−j2πkn/N, k = 0, 1, 2, ..., N − 1. Forward (DFT): ①f = ❋① ∈ Cn Backward (iDFT): ① = 1 N ❋ H①f ∈ RN
200 400 600 800 1000 Feature index (sample index)
0.5 1 1.5 2 Magnitude Example
200 400 600 800 1000 Frequency 50 100 150 Magnitude Frequency magnitudes
SLIDE 10 GMLVQ complex-valued data
Quadratic distance measure dΛ[①f , ✇f ] = (①f − ✇f )HΩHΩ(①f − ✇f ) ∈ R≥0 . Cost one example ①m
f
em = dΛ[✇ +
f ] − dΛ[✇ − f ]
dΛ[✇ +
f ] + dΛ[✇ − f ] ∈ [−1, 1] .
Compute gradients w.r.t. ✇ +
f , ✇ − f
and Ω for learning: ∇✇ +
f eµ = ∂eµ
∂d+
Λ
∂dΛ ∂✇ +
f
SLIDE 11 Wirtinger derivatives
f (z) : C → R. Operators
∂ ∂z = 1 2
∂x − i ∂ ∂y
∂ ∂z∗ = 1 2
∂x + i ∂ ∂y
∂z = z∗ and ∂f ∂z∗ = z.
Wirtinger gradients
∂ ∂z =
∂z1
, ...,
∂ ∂zN
T and
∂ ∂z∗ =
∂z∗
1
, ...,
∂ ∂z∗
N
T Using the Wirtinger gradient: ∂ ∂z∗ (zH❆z) = ❆z
- M. Gay, M. Kaden, M. Biehl, A. Lampe, and T. Villmann, ”Complex variants of GLVQ based on Wirtinger’s
calculus”
SLIDE 12 Learning rules
Complex-valued GMLVQ (Wirtinger) ∇✇ ∗
f dΛ[①f , ✇f ] = −ΩHΩ(①f − ✇f ) ,
∇Ω∗dΛ[①f , ✇f ] = Ω(①f − ✇f )(①f − ✇f )H . Relevance matrix Λ = ΩHΩ is Hermitian. Real-valued GMLVQ ∇✇dΛ[①, ✇] = −2ΩTΩ(① − ✇) , ∇ΩdΛ[①, ✇] = Ω(① − ✇)(① − ✇)T . Relevance matrix Λ = ΩTΩ is symmetric (also Hermitian). After each epoch, normalize Λ such that tr(Λ) = 1.
SLIDE 13 The testing scenarios
1 GMLVQ in original time domain on vectors ① ∈ RN. 2 GMLVQ (Wirtinger) in complex Fourier space on vectors
①f ∈ Cn with n = [6, 11, ..., 51].
3 GMLVQ in Fourier space on vectors ①f ∈ R2n, real and
imaginary concatenated.
4 GMLVQ on smoothed time domain vectors ˆ
① ∈ RN. Before training... All dimensions z-score transformed. One prototype per class. Initialization prototype class i: ✇ i ≈ mean({(①, y)|y == i}). Λ = cI.
SLIDE 14 Plane dataset
210 labeled vectors (x, y) ∈ R144 × {1, 2, ..., 7} 105/105 train/val vectors.
50 100 150
Feature index
0.5 1 1.5 2 2.5 3
Feature value Plane examples
SLIDE 15
Plane - Classification performance
Accuracies of the 4 testing scenarios on validation set
SLIDE 16
Interpreting the classifier
Prototypes ✇ i
f ∈ Cn
Matrix Λf is Hermitian: Λf = ΛH
f
2 prototypes Plane 21-coeff Fourier 1 2 Prototype 20 40 60 80 Magnitude
Map prototypes to time domain with iDFT: ✇ i = 1
N ❋ H✇ i f .
Relevance matrix to time domain: d[①f , ✇f ] = (① − ✇)H❋ HΛf ❋(① − ✇).
SLIDE 17 Plane - Prototypes and feature relevance
Time domain training vs. 21 coefficient Fourier space
Feature 50 100 150 Value
1 2 3 Prototypes Plane
Feature 50 100 150 Relevance 0.01 0.02 0.03 Relevances Plane
Feature 50 100 150 Value
1 2 3 Backtransformed prototypes Plane
Feature 50 100 150 Relevance 0.02 0.04 Relevances Plane (backtransformed)
SLIDE 18 Symbols dataset
1020 feature vectors (x, y) ∈ R398 × {1, 2, ..., 6} 25/995 train/validation vectors.
50 100 150 200 250 300 350 400
Feature index
0.5 1 1.5 2 2.5
Feature value Symbols examples
SLIDE 19
Symbols - Classification performance
Accuracies of the 4 testing scenarios on validation set
SLIDE 20 Mallat dataset
2400 feature vectors (x, y) ∈ R1024 × {1, 2, ..., 8} 55/2345 train/validation vectors.
100 200 300 400 500 600 700 800 900 1000
Feature index
0.5 1 1.5 2 2.5
Feature value Mallat examples
SLIDE 21
Mallat - Classification performance
Accuracies of the 4 testing scenarios on validation set
SLIDE 22 Mallat - Classification error curves
Error development on the training and validation set
epoch
50 100 150 200 250
train error
0.05 0.1 0.15 0.2 0.25 GMLVQ original space GMLVQ Complex Fourier GMLVQ concatenated Fourier
epoch
50 100 150 200 250
test error
0.05 0.1 0.15 0.2 0.25 GMLVQ original space GMLVQ Complex Fourier GMLVQ concatenated Fourier
SLIDE 23
Discussion
Learning in complex Fourier-coefficient space... can be an effective method for classification of periodic functional data. can provide an efficient low-dimensional representation. has the potential to improve classification accuracy. For future research: How to obtain close to optimal accuracy with the least number of adaptive parameters.