Prototypes and Matrix Relevance Learning in Complex Fourier Space - - PowerPoint PPT Presentation

prototypes and matrix relevance learning in complex
SMART_READER_LITE
LIVE PREVIEW

Prototypes and Matrix Relevance Learning in Complex Fourier Space - - PowerPoint PPT Presentation

Prototypes and Matrix Relevance Learning in Complex Fourier Space M. Straat, M. Kaden, M. Gay, T. Villmann, A. Lampe, U. Seiffert, M. Biehl, and F. Melchert June 26, 2017 Overview A study of classification of time series. In Fourier space:


slide-1
SLIDE 1

Prototypes and Matrix Relevance Learning in Complex Fourier Space

  • M. Straat, M. Kaden, M. Gay, T. Villmann, A. Lampe,
  • U. Seiffert, M. Biehl, and F. Melchert

June 26, 2017

slide-2
SLIDE 2

Overview

A study of classification of time series. In Fourier space: Vectors in Cn. Generalized Matrix Learning Vector Quantization (GMLVQ)

  • n complex-valued data.

Evaluation and interpretation of the Fourier-space classifiers.

50 100 150 Feature index

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 2.5 3 Feature value Plane examples

slide-3
SLIDE 3

Learning Vector Quantization (LVQ)

Dataset of vectors ①m ∈ RN, each carrying class label σm ∈ {1, 2, ..., C} Training: For each class σ, identify prototype(s) ✇ i ∈ RN in feature space that are typical representatives for that class. Aim: Classify novel vectors ①µ, assigning them to the class of the nearest prototype.

slide-4
SLIDE 4

Figure: LVQ with 5 prototypes per class. Initialized with K-means on each class. Black line: Piece-wise linear decision boundary.

slide-5
SLIDE 5

d(①, ✇) = (① − ✇)T(① − ✇), sq. Euclidean distance.

1: procedure LVQ 2:

for each training epoch do

3:

for each labeled vector {①, σ} do

4:

{✇ ∗, S∗} ← argmini{d(①, ✇ i)}

5:

✇ ∗ ← ✇ ∗ + ηΨ(S∗, σ)(① − ✇ ∗) Ψ(S, σ) =

  • +1,

if S = σ −1,

  • therwise
  • Classification of novel data point ①µ:

Closest prototype {✇ ∗, S∗} ← argmin

i

{d(①µ, ✇ i)} Classify ①µ in class S∗ : {①µ, σµ = S∗}

slide-6
SLIDE 6

GMLVQ

Learn feature relevance and adapt d accordingly.

Adaptive quadratic distance measure: dΩ(①, ✇) = (① − ✇)TΩTΩ(① − ✇).

Update two prototypes upon presentation of {①, σ}.

✇ +: Closest prototype of the same class as ①. ✇ −: Closest prototype of a different class than ①.

Cost one example ①m em = dΩ[✇ +] − dΩ[✇ −] dΩ[✇ +] + dΩ[✇ −] ∈ [−1, 1] . Learning is minimization of the cost with gradient descent: ✇ ± ← ✇ ± − ηw∇✇ ±em Ω ← Ω − ηΩ∇Ωem

slide-7
SLIDE 7

Time series

f (t) → f (i∆T), i = 0, 1, ..., N−1 Vectors ① ∈ RN. Temporal order of dimensions.

200 400 600 800 1000 Feature index (sample index)

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 Magnitude Example

slide-8
SLIDE 8

Training in coefficient space

Approximate f (t) = n

i=1 cigi(t):

Using Chebyshev basis. Using Fourier basis: ① ∈ RN → ①f ∈ Cn. Prototypes ✇ i ∈ Cn and relevances Λ Hermitian.

Figure: 5 Chebyshev basis functions Figure: Fourier complex sinusoid

  • F. Melchert, U. Seiffert, and M. Biehl, Polynomial Approximation of Spectral Data in LVQ and Relevance Learning,

in Workshop on New Challenges in Neural Computation 2015

slide-9
SLIDE 9

Fourier: Time ⇆ Frequency

Matrix ❋ ∈ Cn×N with rows e−j2πkn/N, k = 0, 1, 2, ..., N − 1. Forward (DFT): ①f = ❋① ∈ Cn Backward (iDFT): ① = 1 N ❋ H①f ∈ RN

200 400 600 800 1000 Feature index (sample index)

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 Magnitude Example

200 400 600 800 1000 Frequency 50 100 150 Magnitude Frequency magnitudes

slide-10
SLIDE 10

GMLVQ complex-valued data

Quadratic distance measure dΛ[①f , ✇f ] = (①f − ✇f )HΩHΩ(①f − ✇f ) ∈ R≥0 . Cost one example ①m

f

em = dΛ[✇ +

f ] − dΛ[✇ − f ]

dΛ[✇ +

f ] + dΛ[✇ − f ] ∈ [−1, 1] .

Compute gradients w.r.t. ✇ +

f , ✇ − f

and Ω for learning: ∇✇ +

f eµ = ∂eµ

∂d+

Λ

∂dΛ ∂✇ +

f

slide-11
SLIDE 11

Wirtinger derivatives

f (z) : C → R. Operators

∂ ∂z = 1 2

∂x − i ∂ ∂y

  • and

∂ ∂z∗ = 1 2

∂x + i ∂ ∂y

  • f (z) = z · z∗, then ∂f

∂z = z∗ and ∂f ∂z∗ = z.

Wirtinger gradients

∂ ∂z =

∂z1

, ...,

∂ ∂zN

T and

∂ ∂z∗ =

∂z∗

1

, ...,

∂ ∂z∗

N

T Using the Wirtinger gradient: ∂ ∂z∗ (zH❆z) = ❆z

  • M. Gay, M. Kaden, M. Biehl, A. Lampe, and T. Villmann, ”Complex variants of GLVQ based on Wirtinger’s

calculus”

slide-12
SLIDE 12

Learning rules

Complex-valued GMLVQ (Wirtinger) ∇✇ ∗

f dΛ[①f , ✇f ] = −ΩHΩ(①f − ✇f ) ,

∇Ω∗dΛ[①f , ✇f ] = Ω(①f − ✇f )(①f − ✇f )H . Relevance matrix Λ = ΩHΩ is Hermitian. Real-valued GMLVQ ∇✇dΛ[①, ✇] = −2ΩTΩ(① − ✇) , ∇ΩdΛ[①, ✇] = Ω(① − ✇)(① − ✇)T . Relevance matrix Λ = ΩTΩ is symmetric (also Hermitian). After each epoch, normalize Λ such that tr(Λ) = 1.

slide-13
SLIDE 13

The testing scenarios

1 GMLVQ in original time domain on vectors ① ∈ RN. 2 GMLVQ (Wirtinger) in complex Fourier space on vectors

①f ∈ Cn with n = [6, 11, ..., 51].

3 GMLVQ in Fourier space on vectors ①f ∈ R2n, real and

imaginary concatenated.

4 GMLVQ on smoothed time domain vectors ˆ

① ∈ RN. Before training... All dimensions z-score transformed. One prototype per class. Initialization prototype class i: ✇ i ≈ mean({(①, y)|y == i}). Λ = cI.

slide-14
SLIDE 14

Plane dataset

210 labeled vectors (x, y) ∈ R144 × {1, 2, ..., 7} 105/105 train/val vectors.

50 100 150

Feature index

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 2.5 3

Feature value Plane examples

slide-15
SLIDE 15

Plane - Classification performance

Accuracies of the 4 testing scenarios on validation set

slide-16
SLIDE 16

Interpreting the classifier

Prototypes ✇ i

f ∈ Cn

Matrix Λf is Hermitian: Λf = ΛH

f

2 prototypes Plane 21-coeff Fourier 1 2 Prototype 20 40 60 80 Magnitude

Map prototypes to time domain with iDFT: ✇ i = 1

N ❋ H✇ i f .

Relevance matrix to time domain: d[①f , ✇f ] = (① − ✇)H❋ HΛf ❋(① − ✇).

slide-17
SLIDE 17

Plane - Prototypes and feature relevance

Time domain training vs. 21 coefficient Fourier space

Feature 50 100 150 Value

  • 2
  • 1

1 2 3 Prototypes Plane

Feature 50 100 150 Relevance 0.01 0.02 0.03 Relevances Plane

Feature 50 100 150 Value

  • 2
  • 1

1 2 3 Backtransformed prototypes Plane

Feature 50 100 150 Relevance 0.02 0.04 Relevances Plane (backtransformed)

slide-18
SLIDE 18

Symbols dataset

1020 feature vectors (x, y) ∈ R398 × {1, 2, ..., 6} 25/995 train/validation vectors.

50 100 150 200 250 300 350 400

Feature index

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 2.5

Feature value Symbols examples

slide-19
SLIDE 19

Symbols - Classification performance

Accuracies of the 4 testing scenarios on validation set

slide-20
SLIDE 20

Mallat dataset

2400 feature vectors (x, y) ∈ R1024 × {1, 2, ..., 8} 55/2345 train/validation vectors.

100 200 300 400 500 600 700 800 900 1000

Feature index

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 2.5

Feature value Mallat examples

slide-21
SLIDE 21

Mallat - Classification performance

Accuracies of the 4 testing scenarios on validation set

slide-22
SLIDE 22

Mallat - Classification error curves

Error development on the training and validation set

epoch

50 100 150 200 250

train error

0.05 0.1 0.15 0.2 0.25 GMLVQ original space GMLVQ Complex Fourier GMLVQ concatenated Fourier

epoch

50 100 150 200 250

test error

0.05 0.1 0.15 0.2 0.25 GMLVQ original space GMLVQ Complex Fourier GMLVQ concatenated Fourier

slide-23
SLIDE 23

Discussion

Learning in complex Fourier-coefficient space... can be an effective method for classification of periodic functional data. can provide an efficient low-dimensional representation. has the potential to improve classification accuracy. For future research: How to obtain close to optimal accuracy with the least number of adaptive parameters.