Model Selection Model Selection under Covariate Shift under - - PowerPoint PPT Presentation

model selection model selection under covariate shift
SMART_READER_LITE
LIVE PREVIEW

Model Selection Model Selection under Covariate Shift under - - PowerPoint PPT Presentation

Model Selection Model Selection under Covariate Shift under Covariate Shift Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Klaus-Robert Mller Fraunhofer FIRST, Berlin, Germany University of Potsdam, Potsdam, Germany 2


slide-1
SLIDE 1

Model Selection under Covariate Shift Model Selection under Covariate Shift

Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Klaus-Robert Müller Fraunhofer FIRST, Berlin, Germany University of Potsdam, Potsdam, Germany

slide-2
SLIDE 2

2

Standard Regression Problem Standard Regression Problem

Learning target function: Training examples: Test input: Goal: Obtain approximation that minimizes expected error for test inputs (or generalization error)

slide-3
SLIDE 3

3

Training Input Distribution Training Input Distribution

Common assumption: Training input follows the same distribution as test input: Here, we suppose distributions are different. Covariate shift

slide-4
SLIDE 4

4

Covariate Shift Covariate Shift

Is covariate shift important to investigate? Yes! It often happens in reality.

Interpolation / extrapolation Active learning (experimental design) Classification from imbalanced data

slide-5
SLIDE 5

5

Ordinary Least Squares under Covariate Shift Ordinary Least Squares under Covariate Shift

Asymptotically unbiased if model is correct. Asymptotically biased for misspecified models. Need to reduce bias.

slide-6
SLIDE 6

6

Asymptotically unbiased for misspecified models. Can have large variance. Need to reduce variance.

Weighted Least Squares for Covariate Shift Weighted Least Squares for Covariate Shift

(Shimodaira, 2000)

:Assumed known and strictly positive

slide-7
SLIDE 7

7

should be chosen appropriately! (Model Selection)

  • Weighted Least Squares
  • Weighted Least Squares

Large bias Small variance Small bias Large variance (Intermediate) (Shimodaira, 2000)

slide-8
SLIDE 8

8

Generalization Error Estimation under Covariate Shift Generalization Error Estimation under Covariate Shift

  • is determined so

that (estimated) generalization error is minimized. However, standard methods such as cross-validation is heavily biased. Goal: Derive better estimator

Cross-validation True generalization error Proposed estimator

slide-9
SLIDE 9

9

Setting Setting

I.i.d. noise with mean 0 and variance Linear regression model:

  • weighted least squares:
slide-10
SLIDE 10

10

Decomposition of Generalization Error Decomposition of Generalization Error

We estimate Accessible Constant (ignored) Estimated

slide-11
SLIDE 11

11

Orthogonal Decomposition of Learning Target Function Orthogonal Decomposition of Learning Target Function

:Optimal parameter

slide-12
SLIDE 12

12

:Expectation over noise

Unbiased Estimation of Unbiased Estimation of

Suppose we have

  • , which gives linear unbiased estimator of
  • :Unbiased estimator of noise variance

Then we have an unbiased estimator of : But are not always available. Use approximations instead

slide-13
SLIDE 13

13

Approximations of Approximations of

If model is correct, If model is misspecified,

slide-14
SLIDE 14

14

New Generalization Error Estimator New Generalization Error Estimator

If model is correct, If model is almost correct, If model is misspecified, Bias:

slide-15
SLIDE 15

15

Simulation (Toy) Simulation (Toy)

slide-16
SLIDE 16

16

Results Results

10-fold cross-validation True generalization error Proposed estimator

slide-17
SLIDE 17

17

Simulation (Abalone from DELVE) Simulation (Abalone from DELVE)

Estimate the age of abalones from 7 physical measurements. We add bias to 4th attribute (weight of abalones) Training and test input densities are estimated by standard kernel density estimator.

slide-18
SLIDE 18

18

Generalization Error Estimation Generalization Error Estimation

Mean over 300 trials

10CV True gen error Proposed

slide-19
SLIDE 19

19

Test Error After Model Selection Test Error After Model Selection

6.77±1.40 7.95±2.15 11.67±5.74 Proposed 7.24±1.37 8.06±1.91 10.88±5.05 10CV 6.54±1.34 7.40±1.77 9.86±4.27 OPT 800 200 50 n

T-test (5%)

6.20±1.33 7.31±2.24 10.67±6.19 Proposed 6.68±1.25 7.42±1.81 10.15±4.95 10CV 6.05±1.25 6.76±1.68 9.04±4.04 OPT 800 200 50 n

Extrapolation in 4th attribute Extrapolation in 6th attribute

slide-20
SLIDE 20

20

Conclusions Conclusions

Covariate shift: Training and test input distributions are different Ordinary LS: Biased Weighted LS: Unbiased but large variance.

  • WLS: Model selection needed.

Cross-validation: Biased Proposed generalization error estimator:

Exactly unbiased (correct models) Asymptotically unbiased (misspecified models)