PRLab TUDelft NL LEARNING UNDER COVARIATE SHIFT Domain - PowerPoint PPT Presentation

PRLab TUDelft NL

LEARNING UNDER COVARIATE SHIFT Domain Adaptation, Transfer Learning, Data Shift, Concept Drift… Marco Loog Pattern Recognition Laboratory Delft University of Technology PRLab TUDelft NL

PRLab TUDelft NL

Graphically Speaking � Covariate shift P(S=1|X,Y) = P(S=1|X) � So change of priors is not covariate shift… P(S=1|X,Y) = P(S=1|Y) PRLab TUDelft NL

The Canonical Example � How much does it help, really, when hypothesis considered are very nonparametric? PRLab TUDelft NL

Importance Weighting : Basic Idea � Expected risk on test : ∫∫ L(x,y|θ) P(x,y) dx dy � Rewrite : ∫∫ L(x,y|θ) P(x)/Q(x) Q(x,y) dx dy � Empirical loss [on training] : ∑ L(x i ,y i |θ) P(x i )/Q(x i ) � Importance weights : P(x i )/Q(x i ) PRLab TUDelft NL

Estimation of Importance : E.g. � Estimate P(x) and Q(x) [normal distributions, Parzen densities, whatever] and calculate weights through w = P/Q � Sugiyama suggests to estimate weights directly � Find w such that KL(Q||w P) is minimal [KLIEP] � Q and P are modelled by Parzen densities � More well-founded suggestions have been given by Huang, Smola, Cortes, Mohri, Mansour, et al. � Yet another approach is based on a very simple [Laplace smoothed] nearest neighbor estimate PRLab TUDelft NL

Again! A Shameless Plug… � But only a short one this time… � Nearest neighbor weighting [NNeW] � The idea… PRLab TUDelft NL

P “Optimal” Weights Q � Linear regression example � Find the coefficient θ that relates y to x via y = θ x + ɛ Q � Optimal θ = 1 � Squared loss � Assume one knows the true P(X) and Q(X) � For particular weighting, solution P can be found by means of weighted regression PRLab TUDelft NL

Learning Curve for “Optimal” Weights � Using the true weights Q/P, what behavior do we expect for increasing sample sizes? � Let us consider relative improvements : MSE(Q)/MSE(P) � 1 training sample? � Many [say ∞] training samples? � And in between? PRLab TUDelft NL

As a Side Remark � Can we solve semi-supervised learning by importance weighting? � [Earlier references to Sokolovska and Kawakita] PRLab TUDelft NL

[Further] Questions, Remarks, etc. � What problems can be modelled as covariate shift? � What if P(S=1|X,Y) cannot be simplified? � Bickel et al. take Sugiyama et al. a step further and discrepancy minimization makes yet another step � Weighted version can deteriorate even if “true” weights are used � Correction by weighting might have hardly any influence when nonparametric hypothesis considered � When to use weighting in the first place? PRLab TUDelft NL

References - Ben-David, Blitzer, Crammer, Kulesza, Pereira, Vaughan, “A theory of learning from different domains,” ML, 2010 - Ben-David, Lu, Pál, “Impossibility theorems for domain adaptation,” AISTATS, 2010 - Ben-David, Urner, “On the hardness of domain adaptation and the utility of unlabeled target samples,” ALT, 2012 - Bickel, Brückner, “Scheffer, Discriminative learning under covariate shift”, JMLR, 2009 - Cortes, Mohri, “Domain adaptation and sample bias correction theory and algorithm for regression,” Theoretical CS, 2014 - Daumé III, “Frustratingly easy domain adaptation,” ACL, 2009 - Dinh, Duin, Piqueras-Salazar, Loog, “FIDOS: A generalized Fisher based feature extraction method for domain shift,” PR, 2013 - Gama, Zliobaite, Bifet, Pechenizkiy, Bouchachia, “A survey on concept drift adaptation,” ACM CSUR, 2014 - Jiang, “A literature survey on domain adaptation of statistical classifiers,” 2008 - Loog, “Nearest neighbor-based importance weighting,” MLSP, 2012 - Lu, Behbood, Hao, Zuo, Xue, Zhang, “Transfer Learning using Computational Intelligence: A Survey,” KBS, 2015 - Mansour, Mohri, Rostamizadeh, “Domain adaptation: Learning bounds and algorithms,” COLT, 2009 - Margolis, “A literature review of domain adaptation with unlabeled data,” University of Washington, TR 35, 2010 - Pan, Tsang, Kwok, Yang, “Domain adaptation via transfer component analysis,” IEEE TNN, 2011 - Pan, Yang, “A survey on transfer learning,”, IEEE TKDE, 2010 - Quionero-Candela, Sugiyama, Schwaighofer, Lawrence, “Dataset shift in machine learning,” The MIT Press, 2009 - Shimodaira, “Improving predictive inference under covariate shift by weighting the log-likelihood function,” J. Stat. Plan. Inference, 2000 - Sugiyama, Krauledat, & Müller, “Covariate shift adaptation by importance weighted cross validation,” JMLR, 2007 - Torrey, Shavlik, “Transfer learning,” Handbook of Research on ML Applications and Trends, 2009 PRLab TUDelft NL

PRLab TUDelft NL LEARNING UNDER COVARIATE SHIFT Domain - PowerPoint PPT Presentation

PRLab TUDelft NL LEARNING UNDER COVARIATE SHIFT Domain Adaptation, Transfer Learning, Data Shift, Concept Drift Marco Loog Pattern Recognition Laboratory Delft University of Technology PRLab TUDelft NL PRLab TUDelft NL PRLab

PRLab TUDelft NL PATTERN RECOGNITION & MACHINE LEARNING An Introduction Marco Loog

Model Selection Model Selection under Covariate Shift under Covariate Shift Masashi Sugiyama

Covariate Adjustment and Statistical Power Tara Slough EGAP Learning Days X Covariate Adjustment

1 2 nd Shift Associates 2 nd Shift Associates 3 rd Shift Associates 3 rd Shift Associates 2

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation

Motivation: disease progression modelling Covariate-GPLVM Motivation: disease progression

Strong Baselines for Neural Semi-supervised Learning under Domain Shift Sebastian Ruder Barbara

HOLY SHIFT! Linda Zheng Roadmap You are here My Shift Introduction Shift AST Experience

Classify as a Whole? MULTIPLE INSTANCE LEARNING Set Learning? Multi-Set Learning? Marco Loog

Paradigm Shift: Moving from Vertical Paradigm Shift: Moving from Vertical Paradigm Shift:

Sharon Mast, Facilitator IIRP World Conference Bethlehem PA October 27, 2014 Shift your

Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton

Covariate Balancing Propensity Score Kosuke Imai Princeton University Winter Conference in

Covariate Balancing Propensity Score Kosuke Imai Princeton University June 1, 2012 Joint work

Treatment choice with many covariate values Aleksey Tetenov (University of Bristol) Cemmap

Interactive Disambiguation of Meta Programs with Concrete Object Syntax Lennart Kats (TUDelft)

Complex Prediction Problems A novel approach to multiple Structured Output Prediction Yasemin

Extreme Classification Many modern applications involve a huge number of classes . E.g., image

Multilingual Dependency Analysis with a Two-Stage Discriminative Parser R. McDonald and K. Lerman

Surveillance Event Detection(SED) Yu Cheng *, Lisa Brown , Quanfu Fan , Rogerio Feris ,

Ranking Related News Predictions Nattiya Kanhabua 1 , Roi Blanco 2 and Michael Matthews 2 1

!,/&"012,)"'34,&"',%.'5$."6'7/62"88$%&' @ANU ML Workshop,

FOSTERING CREATIVITY Letting go of the control of the visual performance and allowing your

Group Versus Individual Use of Power-Only EPMcreate as a Creativity Enhancement Technique for

Sambuz

Useful Links

Newsletter

Mail Us

PRLab TUDelft NL LEARNING UNDER COVARIATE SHIFT Domain - PowerPoint PPT Presentation

PRLab TUDelft NL LEARNING UNDER COVARIATE SHIFT Domain Adaptation, Transfer Learning, Data Shift, Concept Drift Marco Loog Pattern Recognition Laboratory Delft University of Technology PRLab TUDelft NL PRLab TUDelft NL PRLab

PRLab TUDelft NL PATTERN RECOGNITION &amp; MACHINE LEARNING An Introduction Marco Loog

Model Selection Model Selection under Covariate Shift under Covariate Shift Masashi Sugiyama

Covariate Adjustment and Statistical Power Tara Slough EGAP Learning Days X Covariate Adjustment

1 2 nd Shift Associates 2 nd Shift Associates 3 rd Shift Associates 3 rd Shift Associates 2

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation

Motivation: disease progression modelling Covariate-GPLVM Motivation: disease progression

Strong Baselines for Neural Semi-supervised Learning under Domain Shift Sebastian Ruder Barbara

HOLY SHIFT! Linda Zheng Roadmap You are here My Shift Introduction Shift AST Experience

Classify as a Whole? MULTIPLE INSTANCE LEARNING Set Learning? Multi-Set Learning? Marco Loog

Paradigm Shift: Moving from Vertical Paradigm Shift: Moving from Vertical Paradigm Shift:

Sharon Mast, Facilitator IIRP World Conference Bethlehem PA October 27, 2014 Shift your

Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton

Covariate Balancing Propensity Score Kosuke Imai Princeton University Winter Conference in

Covariate Balancing Propensity Score Kosuke Imai Princeton University June 1, 2012 Joint work

Treatment choice with many covariate values Aleksey Tetenov (University of Bristol) Cemmap

Interactive Disambiguation of Meta Programs with Concrete Object Syntax Lennart Kats (TUDelft)

Complex Prediction Problems A novel approach to multiple Structured Output Prediction Yasemin

Extreme Classification Many modern applications involve a huge number of classes . E.g., image

Multilingual Dependency Analysis with a Two-Stage Discriminative Parser R. McDonald and K. Lerman

Surveillance Event Detection(SED) Yu Cheng *, Lisa Brown , Quanfu Fan , Rogerio Feris ,

Ranking Related News Predictions Nattiya Kanhabua 1 , Roi Blanco 2 and Michael Matthews 2 1

!,/&amp;&quot;012,)&quot;'34,&amp;&quot;',%.'5$.&quot;6'7/62&quot;88$%&amp;' @ANU ML Workshop,

FOSTERING CREATIVITY Letting go of the control of the visual performance and allowing your

Group Versus Individual Use of Power-Only EPMcreate as a Creativity Enhancement Technique for

Sambuz

Useful Links

Newsletter

Mail Us

PRLab TUDelft NL PATTERN RECOGNITION & MACHINE LEARNING An Introduction Marco Loog

!,/&"012,)"'34,&"',%.'5$."6'7/62"88$%&' @ANU ML Workshop,