discrepancy for unsupervised domain adaptation Hongliang Yan - - PowerPoint PPT Presentation

discrepancy for unsupervised domain adaptation
SMART_READER_LITE
LIVE PREVIEW

discrepancy for unsupervised domain adaptation Hongliang Yan - - PowerPoint PPT Presentation

Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation Hongliang Yan 2017/06/21 Domain Adaptation DA Training Test Problem: (Source (Target) Training and test sets are related but under


slide-1
SLIDE 1

Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation

Hongliang Yan 2017/06/21

slide-2
SLIDE 2

Domain Adaptation

[1]https://cs.stanford.edu/~jhoffman/domainadapt/

Figure 1. Illustration of dataset bias.

Problem:

Training (Source) Test (Target) Training and test sets are related but under different distributions.

  • Learn feature space that combine

discriminativeness and domain invariance.

Methodology:

source error domain discrepancy minimize + DA

slide-3
SLIDE 3

Maximum Mean Discrepancy (MMD)

  • An empirical estimate

2 2 x ~ x ~ || || 1

MMD ( , ) sup || [ (x )] [ (x )]||

s t

s t s t

s t E E

 

 

H

H

2 2 1 1

1 1 MMD ( , ) || (x ) (x ) ||

M t s t s t i j i j

M N  

 

 

 

D D

H

  • representing distances between distributions as distances

between mean embeddings of features

slide-4
SLIDE 4

Motivation

  • Class weight bias cross domains remains unsolved but ubiquitous

2 2 1 1

1 1 MMD ( , ) || (x ) (x ) ||

M t s t s t i j i j

M N  

 

 

 

D D

H

slide-5
SLIDE 5

Motivation

  • Class weight bias cross domains remains unsolved but ubiquitous

2 2 1 1

1 1 MMD ( , ) || (x ) (x ) ||

M t s t s t i j i j

M N  

 

 

 

D D

H

and

s t c c c c

w M M w N N  

2 1 1

|| ( (x )) ( (x )) ||

C C s s t t c c c c c c

w E w E  

 

 

H

slide-6
SLIDE 6

Motivation

  • Class weight bias cross domains remains unsolved but ubiquitous

2 2 1 1

1 1 MMD ( , ) || (x ) (x ) ||

M t s t s t i j i j

M N  

 

 

 

D D

H

and

s t c c c c

w M M w N N  

2 1 1

|| ( (x )) ( (x )) ||

C C s s t t c c c c c c

w E w E  

 

 

H

①Changes in sample selection criteria Effect of class weight bias should be removed:

slide-7
SLIDE 7

Motivation

  • Class weight bias cross domains remains unsolved but ubiquitous

2 2 1 1

1 1 MMD ( , ) || (x ) (x ) ||

M t s t s t i j i j

M N  

 

 

 

D D

H

and

s t c c c c

w M M w N N  

2 1 1

|| ( (x )) ( (x )) ||

C C s s t t c c c c c c

w E w E  

 

 

H

①Changes in sample selection criteria Effect of class weight bias should be removed:

Figure 2. Class prior distribution of three digit recognition datasets.

slide-8
SLIDE 8

Motivation

  • Class weight bias cross domains remains unsolved but ubiquitous

2 2 1 1

1 1 MMD ( , ) || (x ) (x ) ||

M t s t s t i j i j

M N  

 

 

 

D D

H

and

s t c c c c

w M M w N N  

2 1 1

|| ( (x )) ( (x )) ||

C C s s t t c c c c c c

w E w E  

 

 

H

①Changes in sample selection criteria Effect of class weight bias should be removed: ② Applications are not concerned with class prior distribution

slide-9
SLIDE 9

Motivation

  • Class weight bias cross domains remains unsolved but ubiquitous

2 2 1 1

1 1 MMD ( , ) || (x ) (x ) ||

M t s t s t i j i j

M N  

 

 

 

D D

H

and

s t c c c c

w M M w N N  

2 1 1

|| ( (x )) ( (x )) ||

C C s s t t c c c c c c

w E w E  

 

 

H

①Changes in sample selection criteria Effect of class weight bias should be removed: ② Applications are not concerned with class prior distribution

MMD can be minimized by either learning domain invariant representation

  • r preserving the class weights in source

domain.

slide-10
SLIDE 10

Weighted MMD

Main idea: reweighting classes in source domain so that they have the same class weights as target domain

2 2 1 1

1 1 MMD ( , ) || (x ) (x ) ||

M t s t s t i j i j

M N  

 

 

 

D D

H

t s c c c

w w  

  • Introducing an auxiliary weight for each class c in source domain

c

2 1 1

|| ( (x )) ( (x )) ||

C C s s t t c c c c c c

w E w E  

 

 

H

slide-11
SLIDE 11

Weighted MMD

2 2 1 1

1 1 MMD ( , ) || (x ) (x ) ||

s i

M t s t w s t i j i j y

M N   

 

 

 

D D

H 2 1 1

|| ( (x )) ( (x )) ||

C C s t t c c c c c t cE

w w E  

 

 

H

Main idea: reweighting classes in source domain so that they have the same class weights as target domain

2 2 1 1

1 1 MMD ( , ) || (x ) (x ) ||

M t s t s t i j i j

M N  

 

 

 

D D

H

t s c c c

w w  

  • Introducing an auxiliary weight for each class c in source domain

c

2 1 1

|| ( (x )) ( (x )) ||

C C s s t t c c c c c c

w E w E  

 

 

H

slide-12
SLIDE 12

Weighted DAN

[4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.

  • 1. Replace MMD with weighted MMD item in DAN[4]:

1

W 1 { ,..., }

1 min (x , ;W) MMD ( , )

L

M s s l l i i l s t i l l l

y M 

 

 

D D

slide-13
SLIDE 13

Weighted DAN

[4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.

  • 1. Replace MMD with weighted MMD item in DAN[4]:

1

W 1 { ,..., }

1 min (x , ;W) MMD ( , )

L

M s s l l i i l s t i l l l

y M 

 

 

D D

1

, W, 1 { ,..., }

1 min (x , ;W) MMD ( , )

L

M s s l l i i l w s t i l l l

y M 

 

 

D D

slide-14
SLIDE 14

Weighted DAN

  • 2. To further exploit the unlabeled data in target domain, empirical

risk is considered as semi-supervised model in [5]:

[5] Amini, Massih-Reza, and Patrick Gallinari. "Semi-supervised logistic regression." Proceedings of the 15th European Conference on Artificial Intelligence. IOS Press, 2002.

1. Replace MMD with Weighted MMD item in DAN[4]:

1

W 1 { ,..., }

1 min (x , ;W) MMD ( , )

L

M s s l l i i l s t i l l l

y M 

 

 

D D

1 1

, ˆ W,{ } , 1 1 { ,..., }

1 1 ˆ min (x , ;W) (x , ;W) MMD ( , )

N j j L

M N s s t t l l i i i i l w s t y i j l l l

y y M N  

  

 

  

D D

[4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.

slide-15
SLIDE 15

Optimization: an extension of CEM[6]

  • E-step:

[7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.

Fixed W, estimating the class posterior probability of target samples:

Parameters to be estimated including three parts, i.e., The model is optimized by alternating between three steps :

1

ˆ W, ,{ }

t N j j

y

 ( | x )

t t j j

p y c  ( | x ) (x ,W)

t t t j j j

p y c g  

slide-16
SLIDE 16

Optimization: an extension of CEM[6]

  • E-step:

[7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.

  • C-step:

Fixed W, estimating the class posterior probability of target samples:

① Assign the pseudo labels on target domain: ② update the auxiliary class-specific weights for source domain:

Parameters to be estimated including three parts, i.e., The model is optimized by alternating between three steps :

1

ˆ W, ,{ }

t N j j

y

 ( | x )

t t j j

p y c  ( | x ) (x ,W)

t t t j j j

p y c g   ˆ arg max ( | x )

t t t j j j c

y p y c  

1

ˆ { }

t N j j

y

ˆ ˆ ˆ where ( )

t s t t c c c c c j j

w w w y N   1

( )

c x

1

is an indictor function which equals 1 if x = c, and equals 0 otherwise.

slide-17
SLIDE 17

Optimization: an extension of CEM[6]

[7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.

  • M-step:

Fixed and , updating W. The problem is reformulated as:

Parameters to be estimated including three parts, i.e., The model is optimized by alternating between three steps :

1

ˆ W, ,{ }

t N j j

y

1

ˆ { }

t N j j

y

1

, W 1 1 { ,..., }

1 min (x , ;W) (x , ;W) MMD ( , )

L

M N s s t t l l i i i i l w s t i j l l l

y y M  

  

 

  

D D

The gradient of the three items is computable and W can be optimized by using a mini-batch SGD.

slide-18
SLIDE 18

Experimental results

  • Comparison with state-of-the-arts

Table 1. Experimental results on office-10+Caltech-10

slide-19
SLIDE 19

Experimental results

  • Empirical analysis

Figure 3. Performance of various model under different class weight bias. Figure 4. Visualization of the learned features of DAN and weighted DAN.

slide-20
SLIDE 20

Summary

  • Introduce class-specific weight into MMD to reduce the effect of class

weight bias cross domains.

  • Develop WDAN model and optimize it in an CEM framework.
  • Weighted MMD can be applied to other scenarios where MMD is used

for distribution distance measurement, e.g., image generation

slide-21
SLIDE 21

Thanks!

Paper & code are available Paper: https://arxiv.org/abs/1705.00609 Code: https://github.com/yhldhit/WMMD-Caffe