model selection model selection under covariate shift
play

Model Selection Model Selection under Covariate Shift under - PowerPoint PPT Presentation

Model Selection Model Selection under Covariate Shift under Covariate Shift Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Klaus-Robert Mller Fraunhofer FIRST, Berlin, Germany University of Potsdam, Potsdam, Germany 2


  1. Model Selection Model Selection under Covariate Shift under Covariate Shift Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Klaus-Robert Müller Fraunhofer FIRST, Berlin, Germany University of Potsdam, Potsdam, Germany

  2. 2 Standard Regression Problem Standard Regression Problem � Learning target function: � Training examples: � Test input : � Goal: Obtain approximation that minimizes expected error for test inputs (or generalization error)

  3. 3 Training Input Distribution Training Input Distribution � Common assumption: Training input follows the same distribution as test input: � Here, we suppose distributions are different. Covariate shift

  4. 4 Covariate Shift Covariate Shift � Is covariate shift important to investigate? � Yes! It often happens in reality. � Interpolation / extrapolation � Active learning (experimental design) � Classification from imbalanced data

  5. 5 Ordinary Least Squares Ordinary Least Squares under Covariate Shift under Covariate Shift � Asymptotically unbiased if model is correct. � Asymptotically biased for misspecified models. � Need to reduce bias.

  6. 6 Weighted Least Squares Weighted Least Squares for Covariate Shift for Covariate Shift (Shimodaira, 2000) :Assumed known and strictly positive � Asymptotically unbiased for misspecified models. � Can have large variance. � Need to reduce variance.

  7. 7 -Weighted Least Squares -Weighted Least Squares (Shimodaira, 2000) Large bias Small bias (Intermediate) Small variance Large variance should be chosen appropriately! (Model Selection)

  8. 8 Generalization Error Estimation Generalization Error Estimation under Covariate Shift under Covariate Shift � is determined so that (estimated) True generalization error generalization error is minimized. Cross-validation � However, standard methods such as cross-validation is Proposed estimator heavily biased. � Goal: Derive better estimator

  9. 9 Setting Setting � I.i.d. noise with mean 0 and variance � Linear regression model: � -weighted least squares:

  10. 10 Decomposition of Decomposition of Generalization Error Generalization Error Accessible Estimated Constant (ignored) � We estimate

  11. 11 Orthogonal Decomposition of Orthogonal Decomposition of Learning Target Function Learning Target Function :Optimal parameter

  12. 12 Unbiased Estimation of Unbiased Estimation of :Expectation over noise � Suppose we have , which gives linear unbiased estimator of � :Unbiased estimator of noise variance � � Then we have an unbiased estimator of : � But are not always available. Use approximations instead

  13. 13 Approximations of Approximations of � � � If model is correct, � If model is misspecified,

  14. 14 New Generalization Error Estimator New Generalization Error Estimator Bias : � If model is correct, � If model is almost correct, � If model is misspecified,

  15. 15 Simulation (Toy) Simulation (Toy)

  16. 16 Results Results True generalization error 10-fold cross-validation Proposed estimator

  17. 17 Simulation (Abalone from DELVE) Simulation (Abalone from DELVE) � Estimate the age of abalones from 7 physical measurements. � We add bias to 4 th attribute (weight of abalones) � Training and test input densities are estimated by standard kernel density estimator. �

  18. 18 Generalization Error Estimation Generalization Error Estimation Mean over 300 trials True gen error 10CV Proposed

  19. 19 Test Error After Model Selection Test Error After Model Selection Extrapolation in 4 th attribute n 50 200 800 9.86 ± 4.27 7.40 ± 1.77 6.54 ± 1.34 OPT 11.67 ± 5.74 7.95 ± 2.15 6.77 ± 1.40 Proposed 10.88 ± 5.05 8.06 ± 1.91 7.24 ± 1.37 10CV T-test (5%) Extrapolation in 6 th attribute n 50 200 800 9.04 ± 4.04 6.76 ± 1.68 6.05 ± 1.25 OPT 10.67 ± 6.19 7.31 ± 2.24 6.20 ± 1.33 Proposed 10.15 ± 4.95 7.42 ± 1.81 6.68 ± 1.25 10CV

  20. 20 Conclusions Conclusions � Covariate shift: Training and test input distributions are different � Ordinary LS: Biased � Weighted LS: Unbiased but large variance. � -WLS: Model selection needed. � Cross-validation: Biased � Proposed generalization error estimator: � Exactly unbiased (correct models) � Asymptotically unbiased (misspecified models)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend