Estimating the Error at Given Test Estimating the Error at Given - - PowerPoint PPT Presentation

estimating the error at given test estimating the error
SMART_READER_LITE
LIVE PREVIEW

Estimating the Error at Given Test Estimating the Error at Given - - PowerPoint PPT Presentation

IASTED-NCI2004 Feb. 23-25, 2004 Estimating the Error at Given Test Estimating the Error at Given Test Input Points for Linear Regression Input Points for Linear Regression Masashi Sugiyama Fraunhofer FIRST-IDA, Berlin, Germany Tokyo


slide-1
SLIDE 1
  • Feb. 23-25, 2004

IASTED-NCI2004

Estimating the Error at Given Test Input Points for Linear Regression Estimating the Error at Given Test Input Points for Linear Regression

Masashi Sugiyama Fraunhofer FIRST-IDA, Berlin, Germany Tokyo Institute of Technology, Tokyo, Japan

slide-2
SLIDE 2

2

From , obtain a good approximation to

Regression Problem Regression Problem

L L

:Underlying function :Learned function :Training examples (noise)

slide-3
SLIDE 3

3

Typical Method of Learning Typical Method of Learning

Linear regression model Ridge estimation

:Parameters :Fixed basis functions :Ridge parameter (model parameter)

slide-4
SLIDE 4

4

Model Selection Model Selection

is too small is appropriate is too large Underlying function Learned function

Choice of the model is crucial for obtaining good learned function !

slide-5
SLIDE 5

5

Generalization Error Generalization Error

Determine the model so that an estimator of the unknown generalization error is minimized.

:Probability density

  • f test input points

For model selection, we need a criterion that measures ‘closeness’ between and : Generalization error, e.g.,

slide-6
SLIDE 6

6

Transductive Inference Transductive Inference

Test input points are specified in advance. We do not have to estimate the entire function , but just estimate the values of the function at the test input points .

slide-7
SLIDE 7

7

Model Selection for Transductive Inference Model Selection for Transductive Inference

Test error at given test input points is different from the generalization error. Model should be chosen so that the test error only at is minimized.

Small generalization error Large test error Large generalization error Small test error

slide-8
SLIDE 8

8

Goal of Our Research Goal of Our Research

We want to estimate the test error at the given test input points!

:Expectation over noise

slide-9
SLIDE 9

9

Setting Setting

Linear regression model Linear estimation Realizability

:Parameters :Fixed basis functions :A matrix :Unknown true parameters

slide-10
SLIDE 10

10

Bias / Variance Decomposition Bias / Variance Decomposition

Bias Variance Bias Variance

slide-11
SLIDE 11

11

Tricks for Estimating Bias Tricks for Estimating Bias

True parameter is unknown. We utilize an unbiased estimator of the true parameter for estimating the bias.

Sugiyama & Ogawa (Neural Comp., 2001) Sugiyama & Müller (JMLR, 2002)

:Design matrix :Generalized inverse

slide-12
SLIDE 12

12

Unbiased Estimator of Bias Unbiased Estimator of Bias

Bias Rough estimate

slide-13
SLIDE 13

13

Unbiased Estimator of Variance Unbiased Estimator of Variance

  • An unbiased estimator of noise variance:
  • :Noise variance
slide-14
SLIDE 14

14

Unbiased Estimator of Test Error Unbiased Estimator of Test Error

Adding bias and variance estimators, we have an unbiased estimator of test error. For simplicity, we ignore constant terms

slide-15
SLIDE 15

15

Unrealizable Cases Unrealizable Cases

So far, we assumed that the model includes the underlying function. We can prove that even when the above assumption is not rigorously fulfilled, is still almost unbiased.

:Unknown true parameters

slide-16
SLIDE 16

16

Simulation: Toy Data Sets Simulation: Toy Data Sets

Basis functions: 10 Gaussian functions centered at equally located points in . Target function: sinc-like function (realizable). Training examples : Test input points : Ridge estimation is used for learning.

slide-17
SLIDE 17

17

Results (1) Results (1)

:Ridge parameter

slide-18
SLIDE 18

18

Results (2) Results (2)

:Ridge parameter

slide-19
SLIDE 19

19

Simulation: DELVE Data Sets Simulation: DELVE Data Sets

Training set: 100 randomly selected samples. Test set: 50 randomly selected samples. Basis functions: Gaussian function centered at first 50 training input points. Ridge estimation is used for learning. Ridge parameter is selected by the proposed method, leave-one-out cross-validation, or an empirical Bayesian method.

slide-20
SLIDE 20

20

Normalized Test Errors Normalized Test Errors

1.39 (0.59) 1.26 (0.58) 1.17 (0.54) Boston 1.09 (0.31) 1.11 (0.32) 1.07 (0.29) Bank-8fm 1.15 (0.24) 1.09 (0.24) 1.11 (0.27) Kin-8nm 1.68 (0.48) 1.17 (0.36) 1.06 (0.32) Kin-8fm 1.18 (0.60) 1.12 (0.56) 1.09 (0.51) Bank-8nm Empirical Bayes LOO cross- validation Proposed method Data set

Red: Best and others with no significant difference by 99% t-test Mean (Standard deviation)

Proposed method can be successfully applied to transductive model selection!

slide-21
SLIDE 21

21

Conclusions Conclusions

Model selection is usually carried out so that estimated generalization error is minimized. When test input points are specified in advance (transductive inference), it is natural to choose a model so that the test error only at the test input points is minimized. We derived an unbiased estimator of the test error at given test input points. Simulation showed the proposed method works well in practical situations.