LS-SVMlab & Large scale modeling Kristiaan Pelckmans, ESAT- - - PowerPoint PPT Presentation

ls svmlab large scale modeling
SMART_READER_LITE
LIVE PREVIEW

LS-SVMlab & Large scale modeling Kristiaan Pelckmans, ESAT- - - PowerPoint PPT Presentation

LS-SVMlab & Large scale modeling Kristiaan Pelckmans, ESAT- SCD/SISTA J.A.K. Suykens, B. De Moor Content Content I. Overview II. Classification III. Regression IV. Unsupervised Learning V. Time-series VI.


slide-1
SLIDE 1

LS-SVMlab & Large scale modeling

Kristiaan Pelckmans, ESAT- SCD/SISTA J.A.K. Suykens, B. De Moor

slide-2
SLIDE 2

Content Content

  • I. Overview
  • II. Classification
  • III. Regression
  • IV. Unsupervised Learning
  • V. Time-series
  • VI. Conclusions and Outlooks
slide-3
SLIDE 3

People Contributors to LS-SVMlab:

  • Kristiaan Pelckmans
  • Johan Suykens
  • Tony Van Gestel
  • Jos De Brabanter
  • Lukas Lukas
  • Bart Hamers
  • Emmanuel Lambert

Supervisors:

  • Bart De Moor
  • Johan Suykens
  • Joos Vandewalle

Acknowledgements Our research is supported by grants from several funding agencies and sources: Research Council K.U.Leuven: Concerted Research Action GOA-Mefisto 666 (Mathematical Engineering), IDO (IOTA Oncology, Genetic networks), several PhD/postdoc & fellow grants; Flemish Government: Fund for Scientific Research FWO Flanders (several PhD/postdoc grants, projects G.0407.02 (support vector machines), G.0080.01 (collective intelligence), G.0256.97 (subspace), G.0115.01 (bio-i and microarrays), G.0240.99 (multilinear algebra), G.0197.02 (power islands), research communities ICCoS, ANMMM), AWI (Bil.

  • Int. Collaboration South Africa, Hungary and Poland), IWT

(Soft4s (softsensors), STWW-Genprom (gene promotor prediction), GBOU McKnow(Knowledge management algorithms), Eureka-Impact (MPC-control), Eureka-FLiTE (flutter modeling), several PhD-grants); Belgian Federal Government: DWTC (IUAP IV-02 (1996-2001) and IUAP V- 10-29 (2002-2006): Dynamical Systems and Control: Computation, Identification & Modelling), Program Sustainable Development PODO-II (CP-TR-18: Sustainibility effects of Traffic Management Systems); Direct contract research: Verhaert, Electrabel, Elia, Data4s, IPCOS. JS is a professor at K.U.Leuven Belgium and a postdoctoral researcher with FWO

  • Flanders. BDM and JWDW are full professors at K.U.Leuven

Belgium.

slide-4
SLIDE 4
  • I. Overview
  • I. Overview
  • Goal of the Presentation
  • 1. Overview & Intuition
  • 2. Demonstration LS-SVMlab
  • 3. Pinpoint research challenges
  • 4. Preparation NIPS 2002
  • Research results and challenges
  • Towards applications
  • Overview LS-SVMlab
slide-5
SLIDE 5

I.2 Overview research I.2 Overview research

“Learning, generalization, extrapolation, identification, smoothing, modeling”

  • Prediction (black box modeling)
  • Point of view: Statistical Learning,

Machine Learning, Neural Networks, Optimization, SVM

slide-6
SLIDE 6

I.2 Type, Target, Topic I.2 Type, Target, Topic

slide-7
SLIDE 7

I.3 Towards applications I.3 Towards applications

  • System identification
  • Financial engineering
  • Biomedical signal processing
  • Datamining
  • Bio-informatics
  • Textmining
  • Adaptive signal processing
slide-8
SLIDE 8

I.4 LS I.4 LS-

  • SVMlab

SVMlab

slide-9
SLIDE 9

I.4 LS I.4 LS-

  • SVMlab

SVMlab (2) (2)

  • Starting points:

– Modularity – Object Oriented & Functional Interface – Basic bricks for advanced research

  • Website and tutorial
  • Reproducibility (preprocessing)
slide-10
SLIDE 10
  • II. Classification
  • II. Classification

“Learn the decision function associated with a set

  • f labeled data points to predict the values of

unseen data”

  • Least Squares – Support Vector

Machines

  • Bayesian Framework
  • Different norms
  • Coding schemes
slide-11
SLIDE 11

II.1 Least Squares II.1 Least Squares – – Support vector Machines Support vector Machines

(LS (LS-

  • SVM

SVM (L,a)) )

1. Least Squares cost-function + regularization & equality constraints 2. Non-linearity by Mercer kernels 3. Primal-Dual Interpretation (Lagrange multipliers)

Primal parametric Model:

i i T i

e b x w y + + =

Dual non-parametric Model:

i n j j i i i

e b x x K y + + = ∑

=1

) , (

σ

α

γ

(.,.)

σ

K

slide-12
SLIDE 12

II.1 LS II.1 LS-

  • SVM

SVM (

(L

L,

,a

a)

)

“Learning representations from relations”

            > < > < > < > < > < > < = Ω

N N N N N

a a a a a a a a a a a a , ... ... , ... ... ... ... ... ... ... , , ... , ,

1 2 1 2 1 1 1

slide-13
SLIDE 13

II.2 Bayesian Inference

  • Bayes rule (MAP):
  • Closed form formulas

Approximations: - Hessian in optimum

  • Gaussian distribution
  • Three levels of posteriors:

) ( ) ( ) | ( ) | ( X P P X P X P θ θ θ =

) | ( : Level ) , | ( : Level ) , , | ( : Level

3 2 1

X K P X K P X K P

σ σ σ

γ γ α

slide-14
SLIDE 14

II.3 SVM formulations & norms II.3 SVM formulations & norms

  • 1 norm + inequality constraints: SVM

extensions to any convex cost-function

  • 2 norm + equality constraints: LS-SVM

weighted versions

slide-15
SLIDE 15

II.4 Coding schemes II.4 Coding schemes

… 1 2 4 6 2 1 3 …

… 1 -1 1 1 … … -1 -1 -1 1 … … 1 -1 -1 -1 … … 1 2 4 6 2 1 3 …

Encoding Decoding Multi-class Classification task (multiple) binary classifiers Labels:

slide-16
SLIDE 16
  • III. Regression
  • III. Regression

“Learn the underlying function from a set of data points and its corresponding noisy targets in

  • rder to predict the values of unseen data”
  • LS-SVM(L,a)
  • Cross-validation (CV)
  • Bayesian Inference
  • Robustness
slide-17
SLIDE 17

III.1 LS-SVM(L,a)

  • Least Squares cost-function +

Regularization & Equality constraints

  • Mercer kernels
  • Lagrange multipliers:

Primal Parametric Dual Non-parametric

slide-18
SLIDE 18

III.1 III.1 LS-SVM(L,a) (2)

  • Regularization parameter:

– Do not fit noise (overfitting)! – trade-off noise and information

e x x x f + + = 5 ) 10 sin( ) sinc( ) (

slide-19
SLIDE 19

III.2 Cross III.2 Cross-

  • validation (CV)

validation (CV)

“How to estimate generalization power of model?”

  • Division training set – test set
  • Repeated division: Leave-one-out CV (fast implementation)
  • L-fold cross-validation
  • Generalized Cross-validation (GCV):
  • Complexity criteria: AIC, BIC, …

[ ]

          =          

N N

y y y y K X S ˆ ... ˆ ... . ) , | (

1 1 σ

γ

1 2 3…t-l-1 t-l…t+l t+1+l … n 1 2 3 …. t-2 t-1 t t+1 t+2 … n 1 2 3 …. t-1 t … n

slide-20
SLIDE 20

III.2 Cross III.2 Cross-

  • validation Procedure

validation Procedure

(CVP) (CVP)

“How to optimize model for optimal generalization performance”

  • Trade-off fitting – model complexity
  • Kernel parameters
  • Optimization routine?
slide-21
SLIDE 21

III.1 III.1 LS-SVM(L,a) (3)

  • Kernel type and parameter

“Zoölogy as elephantism and non-elephantism”

  • Model Comparison
  • By cross-validation or Bayesian Inference
slide-22
SLIDE 22

III.3 Applications III.3 Applications

“ok, but does it work?”

  • Soft4s

– Together with O. Barrero, L. Hoegaerts, IPCOS (ISMC), BASF, B. De Moor – Soft-sensor

  • ELIA

– Together with O. Barrero, I.Goethals, L. Hoegaerts, I.Markovsky, T. Van Gestel, ELIA, B. De Moor – Prediction short and long term electricity consumption

slide-23
SLIDE 23

III.2 Bayesian Inference

  • Bayes rule (MAP):
  • Closed form formulas
  • Three levels of posteriors:

) ( ) ( ) | ( ) | ( X P P X P X P θ θ θ =

) | ( : ) Comparison (Model Level ) , | ( : ation) (Regulariz Level ) , , | ( : ) parameters (Model Level

3 2 1

X K P X K P X K P

σ σ σ

γ γ α

slide-24
SLIDE 24

III.4 Robustness III.4 Robustness

“How to build good models in the case of non-

Gaussian noise or outliers”

  • Influence function
  • Breakdown point
  • How:

– De-preciating influence of large residuals – Mean - Trimmed mean – Median

  • Robust CV, GCV, AIC,…
slide-25
SLIDE 25
  • IV. Unsupervised Learning
  • IV. Unsupervised Learning

“Extract important features from the unlabeled data”

  • Kernel PCA and related methods
  • Nyström approximation

– From Dual to primal – Fixed size LS-SVM

slide-26
SLIDE 26

IV.1 Kernel PCA IV.1 Kernel PCA

Principal Component Analysis Kernel based PCA y x z

slide-27
SLIDE 27

IV.2 Kernel PCA (2) IV.2 Kernel PCA (2)

  • Primal Dual LS-SVM style formulations
  • For Kernel PCA, CCA, PLS
slide-28
SLIDE 28

IV.2 IV.2 Nystr Nyströ öm m approximation approximation

  • Sampling of integral equation
  • Approximating Feature map

for Mercer kernel

) ( ) ( ) ( ) , ( y dx x p y y x K

i i

λφ φ

σ

=

) ( ) ( ) , (

1

y y y x K

i N j i i j

φ λ φ

σ

=

= ) ( ) ( ) , (

1

y y y x K

i n j i i j

φ λ φ

σ

=

=

⇓ ⇓

) ( ) ( ) , ( y x y x K

ϕ

σ

=

(.) ϕ

(.) ϕ

slide-29
SLIDE 29

IV.3 Fixed Size LS IV.3 Fixed Size LS-

  • SVM

SVM

i i T i

e b x w y + + = ) ( φ

i n j j i i i

e b x x K y + + = ∑

=1

) , (

σ

α

?

slide-30
SLIDE 30
  • V. Time
  • V. Time-
  • series

series

“Learn to predict future values given a sequence of past values”

  • NARX
  • Recurrent vs. feedforward
slide-31
SLIDE 31

V.1 NARX V.1 NARX

  • Reducible to static regression
  • CV and Complexity criteria
  • Predicting in recurrent mode
  • Fixed size LS-SVM (sparse

representation)

) ,..., , ( ˆ

1 1 l t t t t

y y y f y

− − −

=

,.... , , , , , ...,

5 4 3 2 1 + + + + + t t t t t t

y y y y y y

f

slide-32
SLIDE 32

V.1 NARX (2) V.1 NARX (2)

Santa Fe Time-series competition

slide-33
SLIDE 33

V.2 V.2 Recurrent models? Recurrent models?

“How to learn recurrent dynamical models?”

  • Training cost = Prediction cost?
  • Non-parametric model class?
  • Convex or non-convex?
  • Hyper-parameters?

) ˆ ,..., ˆ , ˆ ( ˆ

2 1 l t t t t

y y y f y

− − −

=

slide-34
SLIDE 34

VI.0 References VI.0 References

  • J. A. K. Suykens, T. Van Gestel, J. De

Brabanter, B. De Moor & J. Vandewalle (2002), Least Squares Support Vector Machines, World Scientific.

  • V. Vapnik (1995), The Nature of

Statistical Learning Theory, Springer-Verlag.

  • B. Schölkopf & A. Smola (2002),

Learning with Kernels, MIT Press.

  • T. Poggio & F. Girosi (1990), ``Networks

for approximation and learning'', Proc. of the IEEE, , 78, 1481-1497.

  • N. Cristianini &J. Shawe-Taylor (2000), An

Introduction to Support Vector Machines, Cambridge University Press.

slide-35
SLIDE 35
  • VI. Conclusions
  • VI. Conclusions

“Non-linear Non-parametric learning as a generalized methodology”

  • Non-parametric Learning
  • Intuition & Formulations
  • Hyper-parameters
  • LS-SVMlab

Questions?