Functional Analytic Framework Functional Analytic Framework for - - PowerPoint PPT Presentation

functional analytic framework functional analytic
SMART_READER_LITE
LIVE PREVIEW

Functional Analytic Framework Functional Analytic Framework for - - PowerPoint PPT Presentation

IFAC-SYSID2003 Aug. 27, 2003 Functional Analytic Framework Functional Analytic Framework for Model Selection for Model Selection Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Fraunhofer FIRST-IDA, Berlin, Germany 2


slide-1
SLIDE 1
  • Aug. 27, 2003

IFAC-SYSID2003

Functional Analytic Framework for Model Selection Functional Analytic Framework for Model Selection

Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Fraunhofer FIRST-IDA, Berlin, Germany

slide-2
SLIDE 2

2

From , obtain a good approximation to

Regression Problem Regression Problem

L L

:Underlying function :Learned function :Training examples (noise)

slide-3
SLIDE 3

3

Model Selection Model Selection

Too simple Appropriate Too complex

Target function Learned function

Choice of the model is extremely important for obtaining good learned function ! (Model refers to, e.g., regularization parameter)

slide-4
SLIDE 4

4

Aims of Our Research Aims of Our Research

Model is chosen such that a generalization error estimator is minimized. Therefore, model selection research is essentially to pursue an accurate estimator of the generalization error. We are interested in

Having a novel method in different framework. Estimating the generalization error with small (finite) samples.

slide-5
SLIDE 5

5

  • : A functional Hilbert space

We assume We shall measure the “goodness”

  • f the learned function (or the

generalization error) by

Formulating Regression Problem as Function Approximation Problem Formulating Regression Problem as Function Approximation Problem

:Norm in :Expectation over noise

slide-6
SLIDE 6

6

In learning problems, we sample values of the target function at sample points (e.g., ). Therefore, values of the target function at sample points should be specified. This means that usual -space is not suitable for learning problems.

Function Spaces for Learning Function Spaces for Learning

and have different values at But they are treated as the same function in

is spanned by

slide-7
SLIDE 7

7

In a reproducing kernel Hilbert space (RKHS), a value of a function at an input point is always specified. Indeed, an RKHS has the reproducing kernel with reproducing property:

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces

:Inner product in

slide-8
SLIDE 8

8

Sampling Operator Sampling Operator

For any RKHS , there exists a linear

  • perator from to such that

Indeed,

:Neumann-Schatten product : -th standard basis in For vectors,

slide-9
SLIDE 9

9

Our Framework Our Framework

Learning target function Learned function +noise Sampling operator (Always linear) Learning operator (Generally non-linear) RKHS Sample value space

  • Gen. error

:Expectation over noise

slide-10
SLIDE 10

10

We want to estimate . But it includes unknown so it is not straightforward. To cope with this problem,

We shall estimate only its essential part We focus on the kernel regression model:

Tricks for Estimating Generalization Error Tricks for Estimating Generalization Error

Constant Essential part :Reproducing kernel of

slide-11
SLIDE 11

11

Unknown target function can be erased! For the kernel regression model, the essential gen. error is expressed by

A Key Lemma A Key Lemma

:Generalized inverse :Expectation over noise

slide-12
SLIDE 12

12

  • is an unbiased

estimator of the essential gen. error . However, the noise vector is unknown. Let us define Clearly, it is still unbiased: We would like to handle well.

Estimating Essential Part Estimating Essential Part

slide-13
SLIDE 13

13

How to Deal with How to Deal with

Depending on the type of learning operator we consider the following three cases.

A) is linear. B) is non-linear but twice almost differentiable. C) is general non-linear.

slide-14
SLIDE 14

14

A) Examples of Linear Learning Operator A) Examples of Linear Learning Operator

Kernel ridge regression A particular Gaussian process regression Least-squares support vector machine

:Parameters to be learned :Ridge parameter

slide-15
SLIDE 15

15

When the learning operator is linear,

A) Linear Learning A) Linear Learning

This induces the subspace information criterion (SIC): SIC is unbiased with finite samples:

  • M. Sugiyama & H. Ogawa (Neural Comp, 2001)
  • M. Sugiyama & K.-R. Müller (JMLR, 2002)

:Adjoint of

slide-16
SLIDE 16

16

How to Deal with How to Deal with

Depending on the type of learning operator we consider the following three cases.

A) is linear. B) is non-linear but twice almost differentiable. C) is general non-linear.

slide-17
SLIDE 17

17

B) Examples of Twice Almost Differentiable Learning Operator B) Examples of Twice Almost Differentiable Learning Operator

Support vector regression with Huber’s loss

:Ridge parameter :Threshold

slide-18
SLIDE 18

18

For the Gaussian noise, we have

B) Twice Differentiable Learning B) Twice Differentiable Learning

SIC for twice almost differentiable learning: It reduces to the original SIC if is linear. It is still unbiased with finite samples:

:Vector-valued function

slide-19
SLIDE 19

19

How to Deal with How to Deal with

Depending on the type of learning operator we consider the following three cases.

A) is linear. B) is non-linear but twice almost differentiable. C) is general non-linear.

slide-20
SLIDE 20

20

C) Examples of General Non-Linear Learning Operator C) Examples of General Non-Linear Learning Operator

Kernel sparse regression Support vector regression with Vapnik’s loss

slide-21
SLIDE 21

21

Approximation by the bootstrap

C) General Non-Linear Learning C) General Non-Linear Learning

Bootstrap approximation of SIC (BASIC): BASIC is almost unbiased:

:Expectation over bootstrap replications

slide-22
SLIDE 22

22

  • :Gaussian RKHS

Kernel ridge regression

Simulation: Learning Sinc function Simulation: Learning Sinc function

:Ridge parameter

slide-23
SLIDE 23

23

Simulation: DELVE Data Sets Simulation: DELVE Data Sets

Red: Best or comparable (95%t-test) Normalized test error

slide-24
SLIDE 24

24

Conclusions Conclusions

We provided a functional analytic framework for regression, where the generalization error is measured using the RKHS norm: Within this framework, we derived a generalization error estimator called SIC.

A) Linear learning (Kernel ridge, GPR, LS-SVM): SIC is exact unbiased with finite samples. B) Twice almost differentiable learning (SVR+Huber): SIC is exact unbiased with finite samples. C) Non-linear learning (K-sparse, SVR+Vapnik): BASIC is almost unbiased.