discrepancy for unsupervised domain adaptation
play

discrepancy for unsupervised domain adaptation Hongliang Yan - PowerPoint PPT Presentation

Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation Hongliang Yan 2017/06/21 Domain Adaptation DA Training Test Problem: (Source (Target) Training and test sets are related but under


  1. Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation Hongliang Yan 2017/06/21

  2. Domain Adaptation DA Training Test Problem: (Source ) (Target) Training and test sets are related but under different distributions. Methodology: • Learn feature space that combine discriminativeness and domain invariance . minimize source error + domain discrepancy Figure 1. Illustration of dataset bias. [1]https://cs.stanford.edu/~jhoffman/domainadapt/

  3. Maximum Mean Discrepancy (MMD) • representing distances between distributions as distances between mean embeddings of features     2 2 s t MMD ( , ) sup || [ (x )] [ (x )]|| s t E E s t H x ~ x ~ s t   || || 1 H • An empirical estimate M t 1 1       2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j

  4. Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1       2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j

  5. Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1       2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j C C      2 s s t t || ( (x )) ( (x )) || w E w E c c c c H   1 1 c c   s t and w M M w N N c c c c

  6. Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1   Effect of class weight bias should be removed:     2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j ① Changes in sample selection criteria C C      2 s s t t || ( (x )) ( (x )) || w E w E c c c c H   1 1 c c   s t and w M M w N N c c c c

  7. Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1   Effect of class weight bias should be removed:     2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j ① Changes in sample selection criteria C C      2 s s t t || ( (x )) ( (x )) || w E w E c c c c H   1 1 c c   s t and w M M w N N c c c c Figure 2. Class prior distribution of three digit recognition datasets.

  8. Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1   Effect of class weight bias should be removed:     2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j ① Changes in sample selection criteria C C      2 s s t t || ( (x )) ( (x )) || w E w E ② Applications are not concerned c c c c H   1 1 c c with class prior distribution   s t and w M M w N N c c c c

  9. Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1   Effect of class weight bias should be removed:     2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j ① Changes in sample selection criteria C C      2 s s t t || ( (x )) ( (x )) || w E w E ② Applications are not concerned c c c c H   1 1 c c with class prior distribution   s t and w M M w N N c c c c MMD can be minimized by either learning domain invariant representation or preserving the class weights in source domain.

  10. Weighted MMD Main idea: reweighting classes in source domain so that they have the same class weights as target domain  • Introducing an auxiliary weight for each class c in source domain c M t 1 1       2 s t 2 MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j   t s w w c c c C C      s s t t 2 || ( (x )) ( (x )) || w E w E c c c c H   1 1 c c

  11. Weighted MMD Main idea: reweighting classes in source domain so that they have the same class weights as target domain  • Introducing an auxiliary weight for each class c in source domain c M t 1 1   1 M 1 t       2 s t 2 MMD ( D D , ) || (x ) (x ) ||      2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N s w s t i j H y   M N 1 1 i j i   1 1 i j   t s w w c c c C C C C           2 t s t t s s t t 2 || ( (x )) ( (x )) || || ( (x )) ( (x )) || w c E w E w E w E c c c c H c c c H     1 1 c c 1 1 c c

  12. Weighted DAN 1. Replace MMD with weighted MMD item in DAN[4]: 1 M     s s l l min (x , ;W) MMD ( D D , ) y i i l s t M W   1 { ,..., } i l l l 1 L [4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.

  13. Weighted DAN 1. Replace MMD with weighted MMD item in DAN[4]: 1 M     s s l l min (x , ;W) MMD ( D D , ) y i i l s t M W   1 { ,..., } i l l l 1 L 1 M     s s l l min (x , ;W) MMD ( D D , ) y i i l w , s t  M W,   i 1 l { ,..., l l } 1 L [4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.

  14. Weighted DAN 1. Replace MMD with Weighted MMD item in DAN[4]: 1 M     s s l l min (x , ;W) MMD ( D D , ) y i i l s t M W   1 { ,..., } i l l l 1 L 2. To further exploit the unlabeled data in target domain, empirical risk is considered as semi-supervised model in [5]: M N 1 1        ˆ s s t t l l min (x , ;W) (x , ;W) MMD ( D D , ) y y , i i i i l w s t  ˆ N M N W,{ } , y     1 j j 1 1 { ,..., } i j l l l 1 L [4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015. [5] Amini, Massih-Reza, and Patrick Gallinari. "Semi-supervised logistic regression." Proceedings of the 15th European Conference on Artificial Intelligence . IOS Press, 2002.

  15. Optimization: an extension of CEM[6]  ˆ Parameters to be estimated including three parts, i.e., t N W, ,{ } y  1 j j The model is optimized by alternating between three steps : • E-step:  t t ( | x ) p y c Fixed W , estimating the class posterior probability of target samples: j j   t t t ( | x ) (x ,W) p y c g j j j [7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.

  16. Optimization: an extension of CEM[6]  ˆ Parameters to be estimated including three parts, i.e., t N W, ,{ } y  1 j j The model is optimized by alternating between three steps : • E-step:  t t ( | x ) p y c Fixed W , estimating the class posterior probability of target samples: j j   t t t ( | x ) (x ,W) p y c g j j j • C-step:   ˆ ˆ ① Assign the pseudo labels on target domain: t N t t t { } arg max ( | x ) y y p y c  j j 1 j j j  c ② update the auxiliary class-specific weights for source domain:   1   ˆ ˆ ˆ t s t t where ( ) w w w y N c c c c c j j is an indictor function which equals 1 if x = c , and equals 0 otherwise. ( ) 1 c x [7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.

  17. Optimization: an extension of CEM[6]  ˆ Parameters to be estimated including three parts, i.e., t N W, ,{ } y  1 j j The model is optimized by alternating between three steps : • M-step:  ˆ t N Fixed and , updating W . The problem is reformulated as: { } y  1 j j M N 1        s s t t l l min (x , ;W) (x , ;W) MMD ( D D , ) y y , i i i i l w s t M W    1 1 { ,..., } i j l l l 1 L The gradient of the three items is computable and W can be optimized by using a mini-batch SGD. [7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.

  18. Experimental results • Comparison with state-of-the-arts Table 1. Experimental results on office-10+Caltech-10

  19. Experimental results • Empirical analysis Figure 3. Performance of various model Figure 4. Visualization of the learned features of DAN and weighted DAN. under different class weight bias.

  20. Summary • Introduce class-specific weight into MMD to reduce the effect of class weight bias cross domains. • Develop WDAN model and optimize it in an CEM framework. • Weighted MMD can be applied to other scenarios where MMD is used for distribution distance measurement, e.g., image generation

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend