discrepancy for unsupervised domain adaptation Hongliang Yan - - PowerPoint PPT Presentation
discrepancy for unsupervised domain adaptation Hongliang Yan - - PowerPoint PPT Presentation
Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation Hongliang Yan 2017/06/21 Domain Adaptation DA Training Test Problem: (Source (Target) Training and test sets are related but under
Domain Adaptation
[1]https://cs.stanford.edu/~jhoffman/domainadapt/
Figure 1. Illustration of dataset bias.
Problem:
Training (Source) Test (Target) Training and test sets are related but under different distributions.
- Learn feature space that combine
discriminativeness and domain invariance.
Methodology:
source error domain discrepancy minimize + DA
Maximum Mean Discrepancy (MMD)
- An empirical estimate
2 2 x ~ x ~ || || 1
MMD ( , ) sup || [ (x )] [ (x )]||
s t
s t s t
s t E E
H
H
2 2 1 1
1 1 MMD ( , ) || (x ) (x ) ||
M t s t s t i j i j
M N
D D
H
- representing distances between distributions as distances
between mean embeddings of features
Motivation
- Class weight bias cross domains remains unsolved but ubiquitous
2 2 1 1
1 1 MMD ( , ) || (x ) (x ) ||
M t s t s t i j i j
M N
D D
H
Motivation
- Class weight bias cross domains remains unsolved but ubiquitous
2 2 1 1
1 1 MMD ( , ) || (x ) (x ) ||
M t s t s t i j i j
M N
D D
H
and
s t c c c c
w M M w N N
2 1 1
|| ( (x )) ( (x )) ||
C C s s t t c c c c c c
w E w E
H
Motivation
- Class weight bias cross domains remains unsolved but ubiquitous
2 2 1 1
1 1 MMD ( , ) || (x ) (x ) ||
M t s t s t i j i j
M N
D D
H
and
s t c c c c
w M M w N N
2 1 1
|| ( (x )) ( (x )) ||
C C s s t t c c c c c c
w E w E
H
①Changes in sample selection criteria Effect of class weight bias should be removed:
Motivation
- Class weight bias cross domains remains unsolved but ubiquitous
2 2 1 1
1 1 MMD ( , ) || (x ) (x ) ||
M t s t s t i j i j
M N
D D
H
and
s t c c c c
w M M w N N
2 1 1
|| ( (x )) ( (x )) ||
C C s s t t c c c c c c
w E w E
H
①Changes in sample selection criteria Effect of class weight bias should be removed:
Figure 2. Class prior distribution of three digit recognition datasets.
Motivation
- Class weight bias cross domains remains unsolved but ubiquitous
2 2 1 1
1 1 MMD ( , ) || (x ) (x ) ||
M t s t s t i j i j
M N
D D
H
and
s t c c c c
w M M w N N
2 1 1
|| ( (x )) ( (x )) ||
C C s s t t c c c c c c
w E w E
H
①Changes in sample selection criteria Effect of class weight bias should be removed: ② Applications are not concerned with class prior distribution
Motivation
- Class weight bias cross domains remains unsolved but ubiquitous
2 2 1 1
1 1 MMD ( , ) || (x ) (x ) ||
M t s t s t i j i j
M N
D D
H
and
s t c c c c
w M M w N N
2 1 1
|| ( (x )) ( (x )) ||
C C s s t t c c c c c c
w E w E
H
①Changes in sample selection criteria Effect of class weight bias should be removed: ② Applications are not concerned with class prior distribution
MMD can be minimized by either learning domain invariant representation
- r preserving the class weights in source
domain.
Weighted MMD
Main idea: reweighting classes in source domain so that they have the same class weights as target domain
2 2 1 1
1 1 MMD ( , ) || (x ) (x ) ||
M t s t s t i j i j
M N
D D
H
t s c c c
w w
- Introducing an auxiliary weight for each class c in source domain
c
2 1 1
|| ( (x )) ( (x )) ||
C C s s t t c c c c c c
w E w E
H
Weighted MMD
2 2 1 1
1 1 MMD ( , ) || (x ) (x ) ||
s i
M t s t w s t i j i j y
M N
D D
H 2 1 1
|| ( (x )) ( (x )) ||
C C s t t c c c c c t cE
w w E
H
Main idea: reweighting classes in source domain so that they have the same class weights as target domain
2 2 1 1
1 1 MMD ( , ) || (x ) (x ) ||
M t s t s t i j i j
M N
D D
H
t s c c c
w w
- Introducing an auxiliary weight for each class c in source domain
c
2 1 1
|| ( (x )) ( (x )) ||
C C s s t t c c c c c c
w E w E
H
Weighted DAN
[4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.
- 1. Replace MMD with weighted MMD item in DAN[4]:
1
W 1 { ,..., }
1 min (x , ;W) MMD ( , )
L
M s s l l i i l s t i l l l
y M
D D
Weighted DAN
[4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.
- 1. Replace MMD with weighted MMD item in DAN[4]:
1
W 1 { ,..., }
1 min (x , ;W) MMD ( , )
L
M s s l l i i l s t i l l l
y M
D D
1
, W, 1 { ,..., }
1 min (x , ;W) MMD ( , )
L
M s s l l i i l w s t i l l l
y M
D D
Weighted DAN
- 2. To further exploit the unlabeled data in target domain, empirical
risk is considered as semi-supervised model in [5]:
[5] Amini, Massih-Reza, and Patrick Gallinari. "Semi-supervised logistic regression." Proceedings of the 15th European Conference on Artificial Intelligence. IOS Press, 2002.
1. Replace MMD with Weighted MMD item in DAN[4]:
1
W 1 { ,..., }
1 min (x , ;W) MMD ( , )
L
M s s l l i i l s t i l l l
y M
D D
1 1
, ˆ W,{ } , 1 1 { ,..., }
1 1 ˆ min (x , ;W) (x , ;W) MMD ( , )
N j j L
M N s s t t l l i i i i l w s t y i j l l l
y y M N
D D
[4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.
Optimization: an extension of CEM[6]
- E-step:
[7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.
Fixed W, estimating the class posterior probability of target samples:
Parameters to be estimated including three parts, i.e., The model is optimized by alternating between three steps :
1
ˆ W, ,{ }
t N j j
y
( | x )
t t j j
p y c ( | x ) (x ,W)
t t t j j j
p y c g
Optimization: an extension of CEM[6]
- E-step:
[7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.
- C-step:
Fixed W, estimating the class posterior probability of target samples:
① Assign the pseudo labels on target domain: ② update the auxiliary class-specific weights for source domain:
Parameters to be estimated including three parts, i.e., The model is optimized by alternating between three steps :
1
ˆ W, ,{ }
t N j j
y
( | x )
t t j j
p y c ( | x ) (x ,W)
t t t j j j
p y c g ˆ arg max ( | x )
t t t j j j c
y p y c
1
ˆ { }
t N j j
y
ˆ ˆ ˆ where ( )
t s t t c c c c c j j
w w w y N 1
( )
c x
1
is an indictor function which equals 1 if x = c, and equals 0 otherwise.
Optimization: an extension of CEM[6]
[7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.
- M-step:
Fixed and , updating W. The problem is reformulated as:
Parameters to be estimated including three parts, i.e., The model is optimized by alternating between three steps :
1
ˆ W, ,{ }
t N j j
y
1
ˆ { }
t N j j
y
1
, W 1 1 { ,..., }
1 min (x , ;W) (x , ;W) MMD ( , )
L
M N s s t t l l i i i i l w s t i j l l l
y y M
D D
The gradient of the three items is computable and W can be optimized by using a mini-batch SGD.
Experimental results
- Comparison with state-of-the-arts
Table 1. Experimental results on office-10+Caltech-10
Experimental results
- Empirical analysis
Figure 3. Performance of various model under different class weight bias. Figure 4. Visualization of the learned features of DAN and weighted DAN.
Summary
- Introduce class-specific weight into MMD to reduce the effect of class
weight bias cross domains.
- Develop WDAN model and optimize it in an CEM framework.
- Weighted MMD can be applied to other scenarios where MMD is used
for distribution distance measurement, e.g., image generation
Thanks!
Paper & code are available Paper: https://arxiv.org/abs/1705.00609 Code: https://github.com/yhldhit/WMMD-Caffe