Connecting the Dots with Landmarks : Discriminatively Learning - - PowerPoint PPT Presentation

connecting the dots with landmarks
SMART_READER_LITE
LIVE PREVIEW

Connecting the Dots with Landmarks : Discriminatively Learning - - PowerPoint PPT Presentation

Connecting the Dots with Landmarks : Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Kristen Grauman and Fei Sha The perils of mismatched


slide-1
SLIDE 1

Connecting the Dots with Landmarks:

Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Kristen Grauman and Fei Sha

slide-2
SLIDE 2

The perils of mismatched domains

T h e O f fi c e

TRAIN TEST Poor cross-domain generalization

Different underlying distributions Overfit to datasets’ idiosyncrasies

Images from [Saenko et al.’10].

slide-3
SLIDE 3

Common to many areas

T h e O f fi c e

Computer vision Text processing Speech recognition Language modeling etc.

slide-4
SLIDE 4

Setup

Source domain (with labeled data) Target domain (no labels for training)

Unsupervised domain adaptation

slide-5
SLIDE 5

Setup

Source domain (with labeled data) Target domain (no labels for training)

Objective

Learn classifier to work well on the target

Unsupervised domain adaptation

Different distributions

slide-6
SLIDE 6

Many existing works

Correcting sampling bias

[Shimodaira, ’00] [Huang et al., Bickel et al., ’07] [Sugiyama et al., ’08] [Sethy et al., ’06] [Sethy et al., ’09]

+

  • ++
slide-7
SLIDE 7

Many existing works

Correcting sampling bias

[Shimodaira, ’00] [Huang et al., Bickel et al., ’07] [Sugiyama et al., ’08] [Sethy et al., ’06] [Sethy et al., ’09]

Adjusting mismatched models

[Evgeniou and Pontil, ’05] [Duan et al., ’09] [Duan et al., Daumé III et al., Saenko et al., ’10] [Kulis et al., Chen et al., ’11]

+

  • ++

+

  • ++
slide-8
SLIDE 8

Many existing works

Correcting sampling bias

[Shimodaira, ’00] [Huang et al., Bickel et al., ’07] [Sugiyama et al., ’08] [Sethy et al., ’06] [Sethy et al., ’09]

Adjusting mismatched models

[Evgeniou and Pontil, ’05] [Duan et al., ’09] [Duan et al., Daumé III et al., Saenko et al., ’10] [Kulis et al., Chen et al., ’11]

+

  • ++

+

  • ++

Inferring domain- invariant features

[Pan et al., ’09] [Blitzer et al., ’06] [Gopalan et al., ’11] [Chen et al., ’12] [Daumé III, ’07] [Argyriou et al, ’08] [Gong et al., ’12] [Muandet et al., ’13]

+ + +

  • +
  • +
  • +
slide-9
SLIDE 9

Many existing works

Correcting sampling bias

[Shimodaira, ’00] [Huang et al., Bickel et al., ’07] [Sugiyama et al., ’08] [Sethy et al., ’06] [Sethy et al., ’09]

[This work]

Adjusting mismatched models

[Evgeniou and Pontil, ’05] [Duan et al., ’09] [Duan et al., Daumé III et al., Saenko et al., ’10] [Kulis et al., Chen et al., ’11]

+

  • ++

+

  • ++

Inferring domain- invariant features

[Pan et al., ’09] [Blitzer et al., ’06] [Gopalan et al., ’11] [Chen et al., ’12] [Daumé III, ’07] [Argyriou et al, ’08] [Gong et al., ’12] [Muandet et al., ’13]

+ + +

  • +
  • +
  • +
slide-10
SLIDE 10

Forced adaptation

Attempting to adapt all source data points, including “hard” ones

Implicit discrimination

Learning discrimination biased to source, rather than optimized w.r.t. target

Snags

slide-11
SLIDE 11

Forced adaptation ➔ Select the best instances for adaptation Implicit discriminations ➔ Approximate discriminative loss on target

Our key insights

slide-12
SLIDE 12

Landmarks are labeled source instances distributed similarly to the target domain.

Landmarks

slide-13
SLIDE 13

Landmarks are labeled source instances distributed similarly to the target domain.

Landmarks

slide-14
SLIDE 14

Landmarks are labeled source instances distributed similarly to the target domain.

Landmarks

slide-15
SLIDE 15

Landmarks are labeled source instances distributed similarly to the target domain. Roles

Ease adaptation difficulty Provide discrimination (biased to target)

Landmarks

slide-16
SLIDE 16

10

Landmarks Target Source

1 Identify landmarks

at multiple scales.

Key steps

Coarse Fine- grained

slide-17
SLIDE 17

Key steps

11

2 Construct auxiliary domain adaptation tasks

slide-18
SLIDE 18

Key steps

11

2 Construct auxiliary domain adaptation tasks 3 Obtain domain- invariant features

slide-19
SLIDE 19

Key steps

11

2 Construct auxiliary domain adaptation tasks 3 Obtain domain- invariant features 4 Predict target labels

slide-20
SLIDE 20

12

Landmarks Target Source

1 Identify landmarks

at multiple scales.

Key steps

Coarse Fine- grained

slide-21
SLIDE 21

Objective

Identifying landmarks

?

slide-22
SLIDE 22

Objective

Identifying landmarks

?

slide-23
SLIDE 23

Objective

Identifying landmarks

? ?

slide-24
SLIDE 24

Maximum mean discrepancy (MMD)

Empirical estimate [Gretton et al. ’06] a universal RKHS kernel function induced by the l-th landmark (from the source domain)

slide-25
SLIDE 25

Integer programming where

Method for identifying landmarks

slide-26
SLIDE 26

Convex relaxation

Method for identifying landmarks

slide-27
SLIDE 27

Convex relaxation

Method for identifying landmarks

slide-28
SLIDE 28

Convex relaxation

Method for identifying landmarks

slide-29
SLIDE 29

Gaussian kernels

Plus: universal (characteristic) Minus: how to choose the bandwidth?

How to choose the kernel functions?

slide-30
SLIDE 30

Gaussian kernels

Plus: universal (characteristic) Minus: how to choose the bandwidth?

Our solution: bandwidth---granularity

Examining distributions at multiple granularities Multiple bandwidths, multiple sets of landmarks

How to choose the kernel functions?

slide-31
SLIDE 31

Class balance constraint Recovering from (See paper for details)

Other details

slide-32
SLIDE 32

What do landmarks look like?

19

Headphone Mug target Target Source

slide-33
SLIDE 33

What do landmarks look like?

19

Headphone Mug target Target Source

6

σ=2 σ=2

  • 3

σ=2

slide-34
SLIDE 34

What do landmarks look like?

19

Headphone Mug target Target Source

6

σ=2 σ=2

  • 3

σ=2

Unselected

slide-35
SLIDE 35

20

2 Construct auxiliary domain adaptation tasks

Key steps

slide-36
SLIDE 36

Constructing easier auxiliary tasks

Target Source Landmarks

At each scale σ Intuition: distributions are closer (cf. Theorem 1)

slide-37
SLIDE 37

Constructing easier auxiliary tasks

New target New source

At each scale σ Intuition: distributions are closer (cf. Theorem 1)

Landmarks

slide-38
SLIDE 38

Auxiliary tasks new basis of features

  • Integrate out domain changes
  • Obtain domain-invariant

representation

[Gong, et al. ’12]

by a geodesic flow kernel (GFK) based method

slide-39
SLIDE 39

Key steps

24

2 Construct auxiliary domain adaptation tasks

slide-40
SLIDE 40

Key steps

24

2 Construct auxiliary domain adaptation tasks 3 Obtain domain- invariant features

slide-41
SLIDE 41

Multiple kernel learning on the labeled landmarks Arriving at domain-invariant feature space Discriminative loss biased to the target

Combining features discriminatively

slide-42
SLIDE 42

Key steps

26

2 Construct auxiliary domain adaptation tasks 3 Obtain domain- invariant features

slide-43
SLIDE 43

Key steps

26

2 Construct auxiliary domain adaptation tasks 3 Obtain domain- invariant features 4 Predict target labels

slide-44
SLIDE 44

Four vision datasets/domains on visual object recognition

[Griffin et al. ’07, Saenko et al. 10’]

Four types of product reviews on sentiment analysis

Books, DVD, electronics, kitchen appliances [Biltzer et al. ’07]

Experimental study

T h e O f fi c e

slide-45
SLIDE 45

Comparing with

Correcting sampling bias

[Shimodaira, ’00] [Huang et al., Bickel et al., ’07] [Sugiyama et al., ’08] [Sethy et al., ’06] [Sethy et al., ’09]

Adjusting mismatched models

[Evgeniou and Pontil, ’05] [Duan et al., ’09] [Duan et al., Daumé III et al., Saenko et al., ’10] [Kulis et al., Chen et al., ’11]

+

  • ++

+

  • ++

Inferring domain- invariant features

[Pan et al., ’09] [Blitzer et al., ’06] [Gopalan et al., ’11] [Chen et al., ’12] [Daumé III, ’07] [Argyriou et al, ’08] [Gong et al., ’12] [Muandet et al., ’13]

+ + +

  • +
  • +
  • +
slide-46
SLIDE 46

Comparing with

Correcting sampling bias

[Huang et al., ’07]

Adjusting mismatched models

+

  • ++

+

  • ++

Inferring domain- invariant features

[Pan et al., ’09] [Blitzer et al., ’06] [Gopalan et al., ’11] [Gong et al., ’12]

+ + +

  • +
  • +
  • +
slide-47
SLIDE 47

Object recognition

15 30 45 60 A-->C A-->D C-->A C-->W W-->A W-->C No adaptation Gopalan et al.'11 Pan et al.'09 GFK Landmark

Acuracy (%)

O f fi c e

slide-48
SLIDE 48

Object recognition

15 30 45 60 A-->C A-->D C-->A C-->W W-->A W-->C No adaptation Gopalan et al.'11 Pan et al.'09 GFK Landmark

Acuracy (%)

O f fi c e

slide-49
SLIDE 49

Object recognition

15 30 45 60 A-->C A-->D C-->A C-->W W-->A W-->C No adaptation Gopalan et al.'11 Pan et al.'09 GFK Landmark

Acuracy (%)

O f fi c e

slide-50
SLIDE 50

Sentiment analysis

55 60 65 70 75 80 85 K-->D D-->B B-->E E-->K Pan et al.'09 Gopalan et al.'11 GFK Saenko et al. ’10 Blitzer et al.’06 Huang et al.’07 Landmark

Acuracy (%)

slide-51
SLIDE 51

Empirical results on visual object recognition

Auxiliary tasks easier to solve

slide-52
SLIDE 52

Empirical results on visual object recognition

Auxiliary tasks easier to solve

Original tasks

slide-53
SLIDE 53

Empirical results on visual object recognition

Auxiliary tasks easier to solve

Auxiliary tasks Original tasks

slide-54
SLIDE 54

Landmarks good proxy to target discrimination

25 39 53 66 80 A-->C A-->D A-->W C-->A C-->D C-->W W-->A W-->C W-->D

Non-landmarks Random selection Landmark

Acuracy (%)

slide-55
SLIDE 55

Summary

landmarks

an intrinsic structure, shared between domains labeled source instances distributed similarly to the target auxiliary tasks provably easier to solve discriminative loss despite unlabeled target

Outperformed the state-of-the-art

slide-56
SLIDE 56

What do landmarks look like?

37

Headphone Mug target Target Source

6

σ=2 σ=2

  • 3

σ=2

Unselected

slide-57
SLIDE 57
slide-58
SLIDE 58

Dropping class balance constraint ?

Acuracy (%)

30 40 50 60 70 80 A-->C A-->D A-->W C-->A C-->D C-->W W-->A W-->C W-->D No class balance Landmark