LS-SVMlab & Large scale modeling Kristiaan Pelckmans, ESAT- - - PowerPoint PPT Presentation
LS-SVMlab & Large scale modeling Kristiaan Pelckmans, ESAT- - - PowerPoint PPT Presentation
LS-SVMlab & Large scale modeling Kristiaan Pelckmans, ESAT- SCD/SISTA J.A.K. Suykens, B. De Moor Content Content I. Overview II. Classification III. Regression IV. Unsupervised Learning V. Time-series VI.
Content Content
- I. Overview
- II. Classification
- III. Regression
- IV. Unsupervised Learning
- V. Time-series
- VI. Conclusions and Outlooks
People Contributors to LS-SVMlab:
- Kristiaan Pelckmans
- Johan Suykens
- Tony Van Gestel
- Jos De Brabanter
- Lukas Lukas
- Bart Hamers
- Emmanuel Lambert
Supervisors:
- Bart De Moor
- Johan Suykens
- Joos Vandewalle
Acknowledgements Our research is supported by grants from several funding agencies and sources: Research Council K.U.Leuven: Concerted Research Action GOA-Mefisto 666 (Mathematical Engineering), IDO (IOTA Oncology, Genetic networks), several PhD/postdoc & fellow grants; Flemish Government: Fund for Scientific Research FWO Flanders (several PhD/postdoc grants, projects G.0407.02 (support vector machines), G.0080.01 (collective intelligence), G.0256.97 (subspace), G.0115.01 (bio-i and microarrays), G.0240.99 (multilinear algebra), G.0197.02 (power islands), research communities ICCoS, ANMMM), AWI (Bil.
- Int. Collaboration South Africa, Hungary and Poland), IWT
(Soft4s (softsensors), STWW-Genprom (gene promotor prediction), GBOU McKnow(Knowledge management algorithms), Eureka-Impact (MPC-control), Eureka-FLiTE (flutter modeling), several PhD-grants); Belgian Federal Government: DWTC (IUAP IV-02 (1996-2001) and IUAP V- 10-29 (2002-2006): Dynamical Systems and Control: Computation, Identification & Modelling), Program Sustainable Development PODO-II (CP-TR-18: Sustainibility effects of Traffic Management Systems); Direct contract research: Verhaert, Electrabel, Elia, Data4s, IPCOS. JS is a professor at K.U.Leuven Belgium and a postdoctoral researcher with FWO
- Flanders. BDM and JWDW are full professors at K.U.Leuven
Belgium.
- I. Overview
- I. Overview
- Goal of the Presentation
- 1. Overview & Intuition
- 2. Demonstration LS-SVMlab
- 3. Pinpoint research challenges
- 4. Preparation NIPS 2002
- Research results and challenges
- Towards applications
- Overview LS-SVMlab
I.2 Overview research I.2 Overview research
“Learning, generalization, extrapolation, identification, smoothing, modeling”
- Prediction (black box modeling)
- Point of view: Statistical Learning,
Machine Learning, Neural Networks, Optimization, SVM
I.2 Type, Target, Topic I.2 Type, Target, Topic
I.3 Towards applications I.3 Towards applications
- System identification
- Financial engineering
- Biomedical signal processing
- Datamining
- Bio-informatics
- Textmining
- Adaptive signal processing
I.4 LS I.4 LS-
- SVMlab
SVMlab
I.4 LS I.4 LS-
- SVMlab
SVMlab (2) (2)
- Starting points:
– Modularity – Object Oriented & Functional Interface – Basic bricks for advanced research
- Website and tutorial
- Reproducibility (preprocessing)
- II. Classification
- II. Classification
“Learn the decision function associated with a set
- f labeled data points to predict the values of
unseen data”
- Least Squares – Support Vector
Machines
- Bayesian Framework
- Different norms
- Coding schemes
II.1 Least Squares II.1 Least Squares – – Support vector Machines Support vector Machines
(LS (LS-
- SVM
SVM (L,a)) )
1. Least Squares cost-function + regularization & equality constraints 2. Non-linearity by Mercer kernels 3. Primal-Dual Interpretation (Lagrange multipliers)
Primal parametric Model:
i i T i
e b x w y + + =
Dual non-parametric Model:
i n j j i i i
e b x x K y + + = ∑
=1
) , (
σ
α
→
γ
(.,.)
σ
K
II.1 LS II.1 LS-
- SVM
SVM (
(L
L,
,a
a)
)
“Learning representations from relations”
> < > < > < > < > < > < = Ω
N N N N N
a a a a a a a a a a a a , ... ... , ... ... ... ... ... ... ... , , ... , ,
1 2 1 2 1 1 1
II.2 Bayesian Inference
- Bayes rule (MAP):
- Closed form formulas
Approximations: - Hessian in optimum
- Gaussian distribution
- Three levels of posteriors:
) ( ) ( ) | ( ) | ( X P P X P X P θ θ θ =
) | ( : Level ) , | ( : Level ) , , | ( : Level
3 2 1
X K P X K P X K P
σ σ σ
γ γ α
II.3 SVM formulations & norms II.3 SVM formulations & norms
- 1 norm + inequality constraints: SVM
extensions to any convex cost-function
- 2 norm + equality constraints: LS-SVM
weighted versions
II.4 Coding schemes II.4 Coding schemes
… 1 2 4 6 2 1 3 …
…
… 1 -1 1 1 … … -1 -1 -1 1 … … 1 -1 -1 -1 … … 1 2 4 6 2 1 3 …
Encoding Decoding Multi-class Classification task (multiple) binary classifiers Labels:
- III. Regression
- III. Regression
“Learn the underlying function from a set of data points and its corresponding noisy targets in
- rder to predict the values of unseen data”
- LS-SVM(L,a)
- Cross-validation (CV)
- Bayesian Inference
- Robustness
III.1 LS-SVM(L,a)
- Least Squares cost-function +
Regularization & Equality constraints
- Mercer kernels
- Lagrange multipliers:
Primal Parametric Dual Non-parametric
III.1 III.1 LS-SVM(L,a) (2)
- Regularization parameter:
– Do not fit noise (overfitting)! – trade-off noise and information
e x x x f + + = 5 ) 10 sin( ) sinc( ) (
→
III.2 Cross III.2 Cross-
- validation (CV)
validation (CV)
“How to estimate generalization power of model?”
- Division training set – test set
- Repeated division: Leave-one-out CV (fast implementation)
- L-fold cross-validation
- Generalized Cross-validation (GCV):
- Complexity criteria: AIC, BIC, …
[ ]
=
N N
y y y y K X S ˆ ... ˆ ... . ) , | (
1 1 σ
γ
1 2 3…t-l-1 t-l…t+l t+1+l … n 1 2 3 …. t-2 t-1 t t+1 t+2 … n 1 2 3 …. t-1 t … n
III.2 Cross III.2 Cross-
- validation Procedure
validation Procedure
(CVP) (CVP)
“How to optimize model for optimal generalization performance”
- Trade-off fitting – model complexity
- Kernel parameters
- Optimization routine?
III.1 III.1 LS-SVM(L,a) (3)
- Kernel type and parameter
“Zoölogy as elephantism and non-elephantism”
- Model Comparison
- By cross-validation or Bayesian Inference
III.3 Applications III.3 Applications
“ok, but does it work?”
- Soft4s
– Together with O. Barrero, L. Hoegaerts, IPCOS (ISMC), BASF, B. De Moor – Soft-sensor
- ELIA
– Together with O. Barrero, I.Goethals, L. Hoegaerts, I.Markovsky, T. Van Gestel, ELIA, B. De Moor – Prediction short and long term electricity consumption
III.2 Bayesian Inference
- Bayes rule (MAP):
- Closed form formulas
- Three levels of posteriors:
) ( ) ( ) | ( ) | ( X P P X P X P θ θ θ =
) | ( : ) Comparison (Model Level ) , | ( : ation) (Regulariz Level ) , , | ( : ) parameters (Model Level
3 2 1
X K P X K P X K P
σ σ σ
γ γ α
III.4 Robustness III.4 Robustness
“How to build good models in the case of non-
Gaussian noise or outliers”
- Influence function
- Breakdown point
- How:
– De-preciating influence of large residuals – Mean - Trimmed mean – Median
- Robust CV, GCV, AIC,…
- IV. Unsupervised Learning
- IV. Unsupervised Learning
“Extract important features from the unlabeled data”
- Kernel PCA and related methods
- Nyström approximation
– From Dual to primal – Fixed size LS-SVM
IV.1 Kernel PCA IV.1 Kernel PCA
Principal Component Analysis Kernel based PCA y x z
IV.2 Kernel PCA (2) IV.2 Kernel PCA (2)
- Primal Dual LS-SVM style formulations
- For Kernel PCA, CCA, PLS
IV.2 IV.2 Nystr Nyströ öm m approximation approximation
- Sampling of integral equation
- Approximating Feature map
for Mercer kernel
) ( ) ( ) ( ) , ( y dx x p y y x K
i i
λφ φ
σ
∫
=
) ( ) ( ) , (
1
y y y x K
i N j i i j
φ λ φ
σ
∑
=
= ) ( ) ( ) , (
1
y y y x K
i n j i i j
φ λ φ
σ
∑
=
=
⇓ ⇓
≈
) ( ) ( ) , ( y x y x K
Tϕ
ϕ
σ
=
(.) ϕ
(.) ϕ
IV.3 Fixed Size LS IV.3 Fixed Size LS-
- SVM
SVM
i i T i
e b x w y + + = ) ( φ
→
i n j j i i i
e b x x K y + + = ∑
=1
) , (
σ
α
?
- V. Time
- V. Time-
- series
series
“Learn to predict future values given a sequence of past values”
- NARX
- Recurrent vs. feedforward
V.1 NARX V.1 NARX
- Reducible to static regression
- CV and Complexity criteria
- Predicting in recurrent mode
- Fixed size LS-SVM (sparse
representation)
) ,..., , ( ˆ
1 1 l t t t t
y y y f y
− − −
=
,.... , , , , , ...,
5 4 3 2 1 + + + + + t t t t t t
y y y y y y
f
V.1 NARX (2) V.1 NARX (2)
Santa Fe Time-series competition
V.2 V.2 Recurrent models? Recurrent models?
“How to learn recurrent dynamical models?”
- Training cost = Prediction cost?
- Non-parametric model class?
- Convex or non-convex?
- Hyper-parameters?
) ˆ ,..., ˆ , ˆ ( ˆ
2 1 l t t t t
y y y f y
− − −
=
VI.0 References VI.0 References
- J. A. K. Suykens, T. Van Gestel, J. De
Brabanter, B. De Moor & J. Vandewalle (2002), Least Squares Support Vector Machines, World Scientific.
- V. Vapnik (1995), The Nature of
Statistical Learning Theory, Springer-Verlag.
- B. Schölkopf & A. Smola (2002),
Learning with Kernels, MIT Press.
- T. Poggio & F. Girosi (1990), ``Networks
for approximation and learning'', Proc. of the IEEE, , 78, 1481-1497.
- N. Cristianini &J. Shawe-Taylor (2000), An
Introduction to Support Vector Machines, Cambridge University Press.
- VI. Conclusions
- VI. Conclusions
“Non-linear Non-parametric learning as a generalized methodology”
- Non-parametric Learning
- Intuition & Formulations
- Hyper-parameters
- LS-SVMlab