Towards Accurate Model Selection in Deep Unsupervised Domain - PowerPoint PPT Presentation

Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation Kaichao You 1 , Ximei Wang 1 , Mingsheng Long 1 , Michael I. Jordan 2 1 School of Software, Tsinghua University 1 National Engineering Lab for Big Data Software 2 University of California, Berkeley International Conference on Machine Learning ICML 2019 Kaichao You et al Deep Embedded Validation June 12, 2019 1 / 10

Validation in UDA: the problem Outline Validation in UDA: the problem 1 IWCV: the previous solution 2 Deep Embedded Validation 3 Experiments 4 Kaichao You et al Deep Embedded Validation June 12, 2019 2 / 10

Validation in UDA: the problem Validation in UDA: the problem Supervised Learning Kaichao You et al Deep Embedded Validation June 12, 2019 3 / 10

Validation in UDA: the problem Validation in UDA: the problem Supervised Learning Unsupervised Domain Adaptation Kaichao You et al Deep Embedded Validation June 12, 2019 3 / 10

IWCV: the previous solution Outline Validation in UDA: the problem 1 IWCV: the previous solution 2 Deep Embedded Validation 3 Experiments 4 Kaichao You et al Deep Embedded Validation June 12, 2019 4 / 10

IWCV: the previous solution IWCV: the previous solution Covariate Shift Assumption p ( y | x ) = q ( y | x ) Kaichao You et al Deep Embedded Validation June 12, 2019 5 / 10

IWCV: the previous solution IWCV: the previous solution Covariate Shift Assumption p ( y | x ) = q ( y | x ) Model Selection: estimate Target Risk R ( g ) = E x ∼ q ℓ ( g ( x ) , y ) Kaichao You et al Deep Embedded Validation June 12, 2019 5 / 10

IWCV: the previous solution IWCV: the previous solution Covariate Shift Assumption p ( y | x ) = q ( y | x ) Model Selection: estimate Target Risk R ( g ) = E x ∼ q ℓ ( g ( x ) , y ) Importance Weighted Cross Validation 1 q ( x ) E x ∼ p w ( x ) ℓ ( g ( x ) , y ) = E x ∼ p p ( x ) ℓ ( g ( x ) , y ) = E x ∼ q ℓ ( g ( x ) , y ) = R ( g ) 1 Covariate shift adaptation by importance weighted cross validation, JMLR’2007 Kaichao You et al Deep Embedded Validation June 12, 2019 5 / 10

IWCV: the previous solution IWCV: the previous solution Covariate Shift Assumption p ( y | x ) = q ( y | x ) Model Selection: estimate Target Risk R ( g ) = E x ∼ q ℓ ( g ( x ) , y ) Importance Weighted Cross Validation 1 q ( x ) E x ∼ p w ( x ) ℓ ( g ( x ) , y ) = E x ∼ p p ( x ) ℓ ( g ( x ) , y ) = E x ∼ q ℓ ( g ( x ) , y ) = R ( g ) Unbiased but the variance is unbounded Density ratio is not readily accessible 1 Covariate shift adaptation by importance weighted cross validation, JMLR’2007 Kaichao You et al Deep Embedded Validation June 12, 2019 5 / 10

Deep Embedded Validation Outline Validation in UDA: the problem 1 IWCV: the previous solution 2 Deep Embedded Validation 3 Experiments 4 Kaichao You et al Deep Embedded Validation June 12, 2019 6 / 10

Deep Embedded Validation Deep Embedded Validation IWCV’s variance 1 : Var x ∼ p [ ℓ w ] ≤ d α +1 ( q � p ) R ( g ) 1 − 1 α − R ( g ) 2 . 1 Learning Bounds for Importance Weighting, NeurIPS’2010 Kaichao You et al Deep Embedded Validation June 12, 2019 7 / 10

Deep Embedded Validation Deep Embedded Validation IWCV’s variance 1 : Var x ∼ p [ ℓ w ] ≤ d α +1 ( q � p ) R ( g ) 1 − 1 α − R ( g ) 2 . Feature adaptation reduces distribution discrepancy 2 1 Learning Bounds for Importance Weighting, NeurIPS’2010 2 Conditional Adversarial Domain Adaptation, NeurIPS’2018 Kaichao You et al Deep Embedded Validation June 12, 2019 7 / 10

Deep Embedded Validation Deep Embedded Validation IWCV’s variance 1 : Var x ∼ p [ ℓ w ] ≤ d α +1 ( q � p ) R ( g ) 1 − 1 α − R ( g ) 2 . Feature adaptation reduces distribution discrepancy 2 Control variate explicitly reduces the variance E [ z ] = ζ, E [ t ] = τ z ⋆ = z + η ( t − τ ) . E [ z ⋆ ] = E [ z ] + η E [ t − τ ] = ζ + η ( E [ t ] − E [ τ ]) = ζ. Var [ z ⋆ ] = Var [ z + η ( t − τ )] = η 2 Var [ t ] + 2 η Cov ( z , t ) + Var [ z ] η = − Cov ( z , t ) min Var [ z ⋆ ] = (1 − ρ 2 z , t ) Var [ z ] , when ˆ Var [ t ] 1 Learning Bounds for Importance Weighting, NeurIPS’2010 2 Conditional Adversarial Domain Adaptation, NeurIPS’2018 Kaichao You et al Deep Embedded Validation June 12, 2019 7 / 10

Deep Embedded Validation Deep Embedded Validation IWCV’s variance 1 : Var x ∼ p [ ℓ w ] ≤ d α +1 ( q � p ) R ( g ) 1 − 1 α − R ( g ) 2 . Feature adaptation reduces distribution discrepancy 2 Control variate explicitly reduces the variance E [ z ] = ζ, E [ t ] = τ z ⋆ = z + η ( t − τ ) . E [ z ⋆ ] = E [ z ] + η E [ t − τ ] = ζ + η ( E [ t ] − E [ τ ]) = ζ. Var [ z ⋆ ] = Var [ z + η ( t − τ )] = η 2 Var [ t ] + 2 η Cov ( z , t ) + Var [ z ] η = − Cov ( z , t ) min Var [ z ⋆ ] = (1 − ρ 2 z , t ) Var [ z ] , when ˆ Var [ t ] Density ratio can be estimated discriminatively. 3 1 Learning Bounds for Importance Weighting, NeurIPS’2010 2 Conditional Adversarial Domain Adaptation, NeurIPS’2018 3 Discriminative learning for differing training and test distributions, ICML’2007 Kaichao You et al Deep Embedded Validation June 12, 2019 7 / 10

Experiments Outline Validation in UDA: the problem 1 IWCV: the previous solution 2 Deep Embedded Validation 3 Experiments 4 Kaichao You et al Deep Embedded Validation June 12, 2019 8 / 10

Experiments Experiments Experiments on a toy problem under covariate shift 1 . 6 Train y = f ( x ) 1 . 2 Source Risk 0 . 14 IWCV 1 . 5 Test model IWCV Target Risk 1 . 4 Train Target Risk 0 . 12 DEV 1 . 0 1 . 0 DEV 1 . 2 Test Standard Deviation 0 . 10 0 . 5 0 . 8 Error Rate ∗ 1 . 0 0 . 8 0 . 0 0 . 08 0 . 6 λ = 0 . 5 0 . 6 − 0 . 5 0 . 06 0 . 4 0 . 4 λ = 1 − 1 . 0 0 . 04 0 . 2 0 . 2 λ = 0 − 1 . 5 0 . 02 0 . 0 0 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 − 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 x x λ λ Kaichao You et al Deep Embedded Validation June 12, 2019 9 / 10

Experiments Experiments Experiments on a toy problem under covariate shift 1 . 6 Train y = f ( x ) 1 . 2 Source Risk 0 . 14 IWCV 1 . 5 Test model IWCV Target Risk 1 . 4 Train Target Risk 0 . 12 DEV 1 . 0 1 . 0 DEV 1 . 2 Test Standard Deviation 0 . 10 0 . 5 0 . 8 Error Rate ∗ 1 . 0 0 . 8 0 . 0 0 . 08 0 . 6 λ = 0 . 5 0 . 6 − 0 . 5 0 . 06 0 . 4 0 . 4 λ = 1 − 1 . 0 0 . 04 0 . 2 0 . 2 λ = 0 − 1 . 5 0 . 02 0 . 0 0 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 − 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 x x λ λ Experiments on real-world problems Various datasets: VisDA/Office/Digits Various models: CDAN, MCD, GTA Deep Embedded Validation is empirically validated � Kaichao You et al Deep Embedded Validation June 12, 2019 9 / 10

Experiments Thanks! Code available at github.com/thuml/Deep-Embedded-Validation Poster: tonight at Pacific Ballroom #259 Kaichao You et al Deep Embedded Validation June 12, 2019 10 / 10

Towards Accurate Model Selection in Deep Unsupervised Domain - PowerPoint PPT Presentation

Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation Kaichao You 1 , Ximei Wang 1 , Mingsheng Long 1 , Michael I. Jordan 2 1 School of Software, Tsinghua University 1 National Engineering Lab for Big Data Software 2 University of

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

From unsupervised induction of linguistic structures from text towards applications in deep

TAKING DATA ON FORM TAKING DATA ON FORM- -WOUND WOUND MOTORS MOTORS By : Manuel Manny

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Full Year 2013 Results 27 February 2014 Key takeaways FY13 operating profit pre-RCR 2.5bn,

Using Partial Probes to Infer Network States Pavan Rangudu , Bijaya Adhikari , B. Aditya

Selling Random Energy Kameshwar Poolla UC Berkeley May 2011 Kameshwar Poolla LCCC Workshop

Accessing biological data as Prolog facts Nicos Angelopoulos and Jan Wielemaker

Culturally Competent Care Learning Collaborative Session 1 1 November 3, 2020 National Center

The Challenges of Importing Textiles and Apparel Rogelio Vazquez Sr. Business Development Mgr.

Qualifying Oral Exam: Representation Learning on Graphs Pengyu Cheng Duke University April 5,

Managing Student Debt Heather Jarvis, Presenter Todays Agenda Income-Based Repayment

Towards Accurate Model Selection in Deep Unsupervised Domain - PowerPoint PPT Presentation

Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation Kaichao You 1 , Ximei Wang 1 , Mingsheng Long 1 , Michael I. Jordan 2 1 School of Software, Tsinghua University 1 National Engineering Lab for Big Data Software 2 University of

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

From unsupervised induction of linguistic structures from text towards applications in deep

TAKING DATA ON FORM TAKING DATA ON FORM- -WOUND WOUND MOTORS MOTORS By : Manuel Manny

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Full Year 2013 Results 27 February 2014 Key takeaways FY13 operating profit pre-RCR 2.5bn,

Using Partial Probes to Infer Network States Pavan Rangudu , Bijaya Adhikari , B. Aditya

Selling Random Energy Kameshwar Poolla UC Berkeley May 2011 Kameshwar Poolla LCCC Workshop

Accessing biological data as Prolog facts Nicos Angelopoulos and Jan Wielemaker

Culturally Competent Care Learning Collaborative Session 1 1 November 3, 2020 National Center

The Challenges of Importing Textiles and Apparel Rogelio Vazquez Sr. Business Development Mgr.

Qualifying Oral Exam: Representation Learning on Graphs Pengyu Cheng Duke University April 5,

Managing Student Debt Heather Jarvis, Presenter Todays Agenda Income-Based Repayment

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?