Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong - - PowerPoint PPT Presentation

geodesic flow kernel for unsupervised domain adaptation
SMART_READER_LITE
LIVE PREVIEW

Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong - - PowerPoint PPT Presentation

Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman 1 Motivation Mismatch between different domains/datasets TRAIN Object


slide-1
SLIDE 1

Geodesic Flow Kernel for Unsupervised Domain Adaptation

Boqing Gong University of Southern California

Joint work with Yuan Shi, Fei Sha, and Kristen Grauman

1

slide-2
SLIDE 2

Motivation

Mismatch between different domains/datasets

– Object recognition

  • Ex. [Torralba & Efros’11, Perronnin et al.’10]

– Video analysis

  • Ex. [Duan et al.’09, 10]

– Pedestrian detection

  • Ex. [Dollár et al.’09]

– Other vision tasks

2

Performance degrades significantly!

Images from [Saenko et al.’10].

TRAIN TEST

slide-3
SLIDE 3

Unsupervised domain adaptation

  • Source domain (labeled)
  • Target domain (unlabeled)
  • Objective

Train classification model to work well on the target

3

{( ), 1,2, , } ~ ( , , )

S i i S

x i P X Y y N = = ! D

The two distributions are not the same!

{( ), 1,2 , , , } ~ ( , )

T i T

M x i P X Y = = ! D

?

slide-4
SLIDE 4

Challenges

  • How to optimally, w.r.t. target domain,

define discriminative loss function select model, tune parameters

  • How to solve this ill-posed problem?

impose additional structure

4

slide-5
SLIDE 5
  • Correcting sample bias

– Ex. [Shimodaira’00, Huang et al.’06, Bickel et al.’07] – Assumption: marginal distributions are the only difference.

  • Learning transductively

– Ex. [Bergamo & Torresani’10, Bruzzone & Marconcini’10] – Assumption: classifiers have high-confidence predictions

across domains.

  • Learning a shared representation

– Ex. [Daumé III’07, Pan et al.’09, Gopalan et al.’11] – Assumption: a latent feature space exists in which

classification hypotheses fit both domains.

Examples of existing approaches

5

slide-6
SLIDE 6

Our approach:

learning a shared representation

Key insight: bridging the gap

– Fantasize infinite number of domains – Integrate out analytically idiosyncrasies in domains – Learn invariant features by constructing kernel

6

Target Source

,

i j

z z

¥ ¥

á ñ (0) ( ) (1)

T T T

z x t

¥

æ ö ç ÷ ç ÷ ç ÷ = ç ÷ ç ÷ ç ø F ÷ è F F ! !

( ) t F

slide-7
SLIDE 7

Main idea: geodesic flow kernel

  • 1. Model data with linear subspaces
  • 2. Model domain shift with geodesic flow
  • 3. Derive domain-invariant features with kernel
  • 4. Classify target data with the new features

7

1 3

,

i j

z z

¥ ¥

á ñ

4

Target Source 2

(0) ( ) (1)

T T T

z x t

¥

æ ö ç ÷ ç ÷ ç ÷ = ç ÷ ç ÷ ç ø F ÷ è F F ! !

( ) t F

slide-8
SLIDE 8

Assume low-dimensional structure

  • Ex. PCA, Partial Least Squares (source only)

Modeling data with linear subspaces

8

Target Source

slide-9
SLIDE 9

Grassmann manifold

– Collection of d-dimensional subspaces of a vector space – Each point corresponds to a subspace

Characterizing domains geometrically

9

( , ) d D G

( )

D

d D < R

Target subspace Source subspace

slide-10
SLIDE 10

Target Source

Geodesic flow on the manifold

– starting at source & arriving at target in unit time – flow parameterized with one parameter – closed-form, easy to compute with SVD

Modeling domain shift with geodesic flow

10

(0) F (1) F

( ),0 1 t t F £ £

slide-11
SLIDE 11

Target Source

Modeling domain shift with geodesic flow

11

(0) F (1) F

( ),0 1 t t F £ £

Subspaces: Domains: Source Target

slide-12
SLIDE 12

Target Source

Modeling domain shift with geodesic flow

12

(0) F (1) F

( ),0 1 t t F £ £

Subspaces: Domains:

Along this flow,

points (subspaces) represent intermediate domains.

slide-13
SLIDE 13

Domain-invariant features

13

(0) , , ( ) , , (1) [ ]

T T T

x x z t x

¥

F F F = ! !

Target Source

(0) F (1) F

( ),0 1 t t F £ £ More similar to source.

slide-14
SLIDE 14

Domain-invariant features

14

(0) , , ( ) , , (1) [ ]

T T T

x x z t x

¥

F F F = ! !

Target Source

(0) F (1) F

More similar to target. ( ),0 1 t t F £ £

slide-15
SLIDE 15

Domain-invariant features

15

(0) , , ( ) , , (1) [ ]

T T T

x x z t x

¥

F F F = ! !

Target Source

(0) F (1) F

Blend the two. ( ),0 1 t t F £ £

slide-16
SLIDE 16

Measuring feature similarities with inner products

16

(0) , , ( ) , , (1) (0) , , ( ) , , (1 ] [ ] ) [

T T T i i i i T T T j j j j

z z x t x x x t x x

¥ ¥

= = F F F F F F ! ! ! !

More similar to source. More similar to target.

, :

i j

z z ¥

¥

á ñ

Invariant to either source or target.

slide-17
SLIDE 17

We define the geodesic flow kernel (GFK):

  • Advantages

– Analytically computable – Robust to variants towards either source or target – Broadly applicable: can kernelize many classifiers

Learning domain-invariant features with kernels

17

1

( ) ( , ( ) ( ) )

T T T T i j i j i j

z t x t x z dt x Gx

¥ ¥

á ñ = F F =

ò

slide-18
SLIDE 18

Contrast to discretely sampling

18

(0) F (1) F

( ),0 1 t t F £ £

1

( ) ( ( ) ) , ( )

i j T T T T i j i j

z t x t x z dt x Gx

¥ ¥

á ñ = F F =

ò

GFK (ours) [Gopalan et al. ICCV 2011]

Dimensionality reduction No free parameters

Number of subspaces, dimensionality of subspace, dimensionality after reduction

GFK is conceptually cleaner and computationally more tractable.

slide-19
SLIDE 19

Recap of key steps

19

!" = $%&$'

,

i j

z z

¥ ¥

á ñ

(" =

)(+)- ⋮ )(/)- ⋮ )(0)-

x

1 3 4

Target subspace Source subspace

2

slide-20
SLIDE 20

Experimental setup

  • Four domains
  • Features

Bag-of-SURF

  • Classifier: 1NN
  • Average over 20

random trials

20

Caltech-256 Amazon DSLR Webcam

slide-21
SLIDE 21

10 20 30 40 W-->C W-->A C-->D C-->A A-->W A-->C D-->A No adaptation [Gopalan et al.'11] GFK (ours)

Classification accuracy on target

21

Accuracy (%) SourceàTarget

slide-22
SLIDE 22

10 20 30 40 W-->C W-->A C-->D C-->A A-->W A-->C D-->A No adaptation [Gopalan et al.'11] GFK (ours)

Classification accuracy on target

22

Accuracy (%) SourceàTarget

slide-23
SLIDE 23

10 20 30 40 W-->C W-->A C-->D C-->A A-->W A-->C D-->A No adaptation [Gopalan et al.'11] GFK (ours)

Classification accuracy on target

23

Accuracy (%) SourceàTarget

slide-24
SLIDE 24

Which domain should be used as the source?

24

DSLR Caltech-256 Webcam Amazon

slide-25
SLIDE 25

We introduce the Rank of Domains measure: Intuition

– Geometrically, how subspaces disagree – Statistically, how distributions disagree

25

Automatically selecting the best

slide-26
SLIDE 26

26

Possible sources Our ROD measure Caltech-256

0.003

Amazon DSLR

0.26

Webcam

0.05

Caltech-256 adapts the best to Amazon.

Accuracy (%)

10 20 30 40 W-->A C256-->A D-->A No adaptation [Gopalan et al.'11] GFK (ours)

SourceàTarget

Automatically selecting the best

slide-27
SLIDE 27

Semi-supervised domain adaptation

Label three instances per category in the target

27

10 20 30 40 50 60 W-->C W-->A C-->D C-->A A-->W A-->C D-->A No adaptation [Saenko et al.'10] [Gopalan et al.'11] GFK (ours)

Accuracy (%) SourceàTarget

slide-28
SLIDE 28

Cross-dataset generalization [Torralba & Efros’11]

28

30 40 50 60 70 PASCAL ImageNet Caltech-101 Self Cross (no adaptation) Cross (with adaptation)

Accuracy (%)

Analyzing datasets in light of domain adaptation

slide-29
SLIDE 29

Cross-dataset generalization [Torralba & Efros’11]

29

30 40 50 60 70 PASCAL ImageNet Caltech-101 Self Cross (no adaptation) Cross (with adaptation)

Accuracy (%)

Caltech-101 generalizes the worst. Performance drop of ImageNet is big.

Analyzing datasets in light of domain adaptation

Performance drop!

slide-30
SLIDE 30

Cross-dataset generalization [Torralba & Efros’11]

30

30 40 50 60 70 PASCAL ImageNet Caltech-101 Self Cross (no adaptation) Cross (with adaptation)

Accuracy (%) Performance drop becomes smaller!

Caltech-101 generalizes the worst (w/ or w/o adaptation). There is nearly no performance drop of ImageNet.

Analyzing datasets in light of domain adaptation

slide-31
SLIDE 31

Summary

  • Unsupervised domain adaptation

– Important in visual recognition – Challenge: no labeled data from the target

  • Geodesic flow kernel (GFK)

– Conceptually clean formulation: no free parameter – Computationally tractable: closed-form solution – Empirically successful: state-of-the-art results

  • New insight on vision datasets

– Cross-dataset generalization with domain adaptation – Leveraging existing datasets despite their idiosyncrasies

31

slide-32
SLIDE 32

Future work

  • Beyond subspaces

Other techniques to model domain shift

  • From GFK to statistical flow kernel

Add more statistical properties to the flow

  • Applications of GFK

Ex., face recognition, video analysis

32

slide-33
SLIDE 33

Summary

33

  • Unsupervised domain adaptation

– Important in visual recognition – Challenge: no labeled data from the target

  • Geodesic flow kernel (GFK)

– Conceptually clean formulation – Computationally tractable – Empirically successful

  • New insight on vision datasets

– Cross-dataset generalization with domain adaptation – Leveraging existing datasets despite their idiosyncrasies