Adaptation for Objects and Attributes Kristen Grauman Department of - - PowerPoint PPT Presentation

adaptation for objects and attributes
SMART_READER_LITE
LIVE PREVIEW

Adaptation for Objects and Attributes Kristen Grauman Department of - - PowerPoint PPT Presentation

Adaptation for Objects and Attributes Kristen Grauman Department of Computer Science University of Texas at Austin With Adriana Kovashka (UT Austin), Boqing Gong (USC), and Fei Sha (USC) Learning-based visual recognition Last 10+ years:


slide-1
SLIDE 1

Adaptation for Objects and Attributes

Kristen Grauman Department of Computer Science University of Texas at Austin With Adriana Kovashka (UT Austin), Boqing Gong (USC), and Fei Sha (USC)

slide-2
SLIDE 2

Learning-based visual recognition

Last 10+ years: impressive strides by learning appearance models (usually discriminative).

Annotator Training images New image

CAR CAR NOT CAR

Image features

CAR!

slide-3
SLIDE 3

Typical assumptions

1. Test set will look like the training set. 2. Human labelers “see” the same thing.

slide-4
SLIDE 4

Mismatched domains

TRAIN TEST

Flickr YouTube

slide-5
SLIDE 5

TRAIN TEST

Catalog images Mobile phone photos

Mismatched domains

slide-6
SLIDE 6

TRAIN TEST

ImageNet PASCAL VOC

Mismatched domains

slide-7
SLIDE 7

TRAIN TEST

ImageNet PASCAL VOC “It is worthwile to note that, even with 140K training ImageNet images, we do not perform as well as with 5K PASCAL VOC training images.” – Perronnin et al. CVPR 2010

Mismatched domains

slide-8
SLIDE 8

Problem: Poor cross-domain generalization

  • Different underlying distributions
  • Overfit to datasets’ idiosyncrasies

Possible solution: Unsupervised domain adaptation

Mismatched domains

slide-9
SLIDE 9

Setup

Source domain (with labeled data) Target domain (no labels for training)

Objective

Learn classifier to work well on the target

Unsupervised domain adaptation

Different distributions

slide-10
SLIDE 10

Much recent research

Correcting sampling bias

[Shimodaira, ’00] [Huang et al., Bickel et al., ’07] [Sugiyama et al., ’08] [Sethy et al., ’06] [Sethy et al., ’09]

[This work]

Adjusting mismatched models

[Evgeniou and Pontil, ’05] [Duan et al., ’09] [Duan et al., Daumé III et al., Saenko et al., ’10] [Kulis et al., Chen et al., ’11]

+

  • ++

+

  • ++

Inferring domain- invariant features

[Pan et al., ’09] [Blitzer et al., ’06] [Gopalan et al., ’11] [Chen et al., ’12] [Daumé III, ’07] [Argyriou et al, ’08] [Gong et al., ’12] [Muandet et al., ’13]

+ + +

  • +
  • +
  • +
slide-11
SLIDE 11

Existing methods attempt to adapt all source data points, including “hard” ones.

Problem

Source Target

slide-12
SLIDE 12

Automatically identify the “most adaptable” instances Use them to create series of easier auxiliary domain adaptation tasks

Our idea

[Gong et al., ICML 2013]

Problem

Existing methods attempt to adapt all source data points, including “hard” ones.

slide-13
SLIDE 13

Landmarks are labeled

source instances distributed similarly to the target domain.

Landmarks

Source Target

[Gong et al., ICML 2013]

slide-14
SLIDE 14

Landmarks are labeled

source instances distributed similarly to the target domain.

Roles:

Ease adaptation difficulty Provide discrimination (biased to target)

Source Target

Landmarks

[Gong et al., ICML 2013]

slide-15
SLIDE 15

Landmarks Target Source

1 Identify landmarks

at multiple scales.

Key steps

Coarse Fine- grained

[Gong et al., ICML 2013]

slide-16
SLIDE 16

2 Construct auxiliary domain adaptation tasks 3 Obtain domain- invariant features 4 Predict target labels

Key steps

[Gong et al., ICML 2013]

slide-17
SLIDE 17

Objective

Identifying landmarks

Source Target

[Gong et al., ICML 2013]

slide-18
SLIDE 18

Maximum mean discrepancy (MMD)

Empirical estimate [Gretton et al. ’06]

a universal RKHS kernel function induced by the l-th landmark (from the source domain)

[Gong et al., ICML 2013]

slide-19
SLIDE 19

Integer programming where

Method for identifying landmarks

[Gong et al., ICML 2013]

slide-20
SLIDE 20

Convex relaxation

Method for identifying landmarks

[Gong et al., ICML 2013]

slide-21
SLIDE 21

Gaussian kernels

How to choose the bandwidth?

Our solution:

Examine distributions at multiple granularities Multiple bandwidthsmultiple sets of landmarks

Scale for landmark similarity?

[Gong et al., ICML 2013]

slide-22
SLIDE 22

Landmarks at multiple scales

22

Headphone Mug target Target Source

6

σ=2 σ=2

  • 3

σ=2

Unselected [Gong et al., ICML 2013]

slide-23
SLIDE 23

2 Construct auxiliary domain adaptation tasks

Key steps

[Gong et al., ICML 2013]

slide-24
SLIDE 24

Constructing easier auxiliary tasks

Target Source Landmarks

At each scale σ

Intuition: distributions are closer (cf. Theorem 1)

[Gong et al., ICML 2013]

slide-25
SLIDE 25

At each scale σ

Intuition: distributions are closer (cf. Theorem 1)

New target New source Landmarks

Constructing easier auxiliary tasks

[Gong et al., ICML 2013]

slide-26
SLIDE 26
  • Integrate out domain changes
  • Obtain domain-invariant

representation [Gong, et al. ’12]

Each task provides new basis of features via geodesic flow kernel (GFK):

Constructing easier auxiliary tasks

[Gong et al., CVPR 2012]

slide-27
SLIDE 27

2 Construct auxiliary domain adaptation tasks 3 Obtain domain- invariant features

MKL

Key steps

slide-28
SLIDE 28

Multiple kernel learning on the labeled landmarks Arriving at domain-invariant feature space Discriminative loss biased to the target

Combining features discriminatively

slide-29
SLIDE 29

2 Construct auxiliary domain adaptation tasks 3 Obtain domain- invariant features 4 Predict target labels

Key steps

slide-30
SLIDE 30

Four vision datasets/domains

  • n visual object recognition

[Griffin et al. ’07, Saenko et al. 10’]

Four types of product reviews

  • n sentiment analysis

Books, DVD, electronics, kitchen appliances [Biltzer et al. ’07]

Experiments

slide-31
SLIDE 31

Cross-dataset object recognition

slide-32
SLIDE 32

Cross-dataset object recognition

slide-33
SLIDE 33

Cross-dataset object recognition

slide-34
SLIDE 34

Datasets as domains?

Domain 1 Domain 2 Domain 3 Domain 4 Domain 5

ASSUMED

slide-35
SLIDE 35

Datasets as domains?

Domain 1 Domain 2 Domain 3 Domain 4 Domain 5 Domain 5 Domain 6 Domain 7 Domain 8 Domain 9 Domain 10

REALITY

slide-36
SLIDE 36

Datasets as domains?

Domain 1 Domain 2 Domain 3 Domain 4 Domain 5 Domain 5 Domain 6 Domain 7 Domain 8 Domain 9 Domain 10

Dataset != Domain Cross-dataset adaptation is suboptimal

REALITY

slide-37
SLIDE 37

NLP: Language-specific domains Speech: Speaker-specific domains Vision: ?? pose-specific? illumination-specific?

  • cclusion? image resolution? background?

Challenges: Many continuous factors vs. few discrete Factors overlap and interact

How to define a domain?

slide-38
SLIDE 38

Discovering latent visual domains

Maximum distinctiveness Maximum learnability

Determine K with domain-wise cross-validation

MMD

where

[Gong et al., NIPS 2013]

We propose to discover domains – “reshaping” them to cross dataset boundaries

slide-39
SLIDE 39

Discovered domain I Discovered domain II

Results: discovering domains

[Gong et al., NIPS 2013]

slide-40
SLIDE 40

33 34 35 36 37 38 39 40 41 42 42 43 44 45 46 47 48 49 50

Domains= datasets Hoffman et

  • al. 2012

Discovered domains (ours)

Cross-dataset

  • bject recognition

Cross-viewpoint action recognition

Domain I Domain II

Domains= datasets Hoffman et

  • al. 2012

Discovered domains (ours)

Results: discovering domains

Accuracy Accuracy

slide-41
SLIDE 41

Summary so far

landmarks

labeled source instances distributed similarly to the target auxiliary tasks provably easier to solve discriminative loss despite unlabeled target reshaping datasets to latent domains discover cross-dataset domains maximally distinct & learnable

slide-42
SLIDE 42

Typical assumptions

1. Test set will look like the training set. 2. Human labelers “see” the same thing.

slide-43
SLIDE 43

Visual attributes

  • High-level semantic properties shared by objects
  • Human-understandable and machine-detectable

brown indoors

  • utdoors

flat four-legged high heel red has-

  • rnaments

metallic [Oliva et al. 2001, Ferrari & Zisserman 2007, Kumar et al. 2008, Farhadi et al. 2009, Lampert et al. 2009, Endres et al. 2010, Wang & Mori 2010, Berg et al. 2010, Branson et al. 2010, Parikh & Grauman 2011, …]

slide-44
SLIDE 44

Standard approach

Learn one monolithic model per attribute Vote on labels

“formal” “not formal”

Annotator A Annotator B Annotator C

slide-45
SLIDE 45

Problem

Formal?

User labels: 50% “yes” 50% “no”

  • r

More ornamented? User labels:

50% “first” 20% “second” 30% “equally”

There may be valid perceptual differences within an attribute.

Binary attribute Relative attribute

slide-46
SLIDE 46

Overweight?

  • r just

Chubby?

Fine-grained meaning

Imprecision of attributes

slide-47
SLIDE 47

Is formal? = formal wear for a conference? OR = formal wear for a wedding?

Context

Imprecision of attributes

slide-48
SLIDE 48

Is blue or green? English: “blue” Russian: “neither” (“голубой” vs. “синий”) Japanese: “both” (“青” = blue and green)

Cultural

Imprecision of attributes

slide-49
SLIDE 49

But do we need to be that precise?

  • Yes. Applications like image search require that

user’s perception matches system’s predictions.

[WhittleSearch, Kovashka et al. CVPR 2012]

“less formal than these” “white high heels”

slide-50
SLIDE 50

Our idea

[Kovashka and Grauman, ICCV 2013]

  • Treat learning perceived attributes as an

adaptation problem.

  • Adapt generic attribute model with

minimal user-specific labeled examples.

  • Obtain implicit user-specific labels from

user’s search history

slide-51
SLIDE 51

Vote on labels

“formal” “not formal” “formal” “not formal” “formal” “not formal”

[Kovashka and Grauman, ICCV 2013]

Our idea

slide-52
SLIDE 52
  • Adapting binary attribute classifiers:

Learning adapted attributes

  • J. Yang et al. ICDM 2007.

Given user-labeled data and generic model ,

slide-53
SLIDE 53
  • Adapting relative attribute rankers:

Learning adapted attributes

Given user-labeled data and generic model ,

  • B. Geng, et al. TKDE 2010.
slide-54
SLIDE 54

Collecting user-specific labels

  • Explicitly from actively requested labels

Seek labels on uncertain and diverse images

  • Implicitly from search history
  • Transitivity
  • Contradictions

“My target is… less formal than more formal than “

implies

slide-55
SLIDE 55

Inferring implicit labels

more sporty

… … … …

“Target is more sporty than B” “Target is less sporty than A”

less sporty

… … … … A B

User’s feedback history can reveal mismatch in perceived and predicted attributes

slide-56
SLIDE 56

User’s feedback history can reveal mismatch in perceived and predicted attributes

more sporty

… … … …

“Target is more sporty than B”

A C B

more feminine (~ less sporty)

… … … …

“Target is more feminine than A”

Inferring implicit labels

slide-57
SLIDE 57

SUN Attributes:

14,340 scene images 12 attributes: “sailing”, “hiking”, “vacationing”, “open area”, “vegetation”, etc.

Datasets

Shoes:

14,658 shoe images; 10 attributes: “pointy”, “bright”, “high- heeled”, “feminine” etc.

57

slide-58
SLIDE 58

Adapted attribute accuracy

  • 3 datasets
  • 22 attributes
  • 75 total users
slide-59
SLIDE 59

Adapted attribute accuracy

  • 3 datasets
  • 22 attributes
  • 75 total users
slide-60
SLIDE 60

Adapted attribute accuracy

  • 3 datasets
  • 22 attributes
  • 75 total users
slide-61
SLIDE 61

Adaptation approach most accurately captures perceived attributes

Adapted attribute accuracy

[Kovashka and Grauman, ICCV 2013]

slide-62
SLIDE 62

Which images most influence adaptation?

pointy

  • pen

bright

  • rnamented

shiny high-heeled long formal sporty feminine sailing vacationing hiking camping socializing shopping vegetation clouds natural light cold

  • pen area

horizon far

62

slide-63
SLIDE 63

SUN – Binary Attributes – “Vacationing” Shoes – Relative Attributes – “Formal”

generic adapted

less more

generic adapted

less more

Visualizing adapted attributes

slide-64
SLIDE 64

10 20 30 40 50 60 70

Shoes-Binary SUN generic generic+ user-exclusive user-adaptive

Match rate

Personalizing image search with adapted attributes

“white shiny heels” “shinier than ”

slide-65
SLIDE 65

71.5 72 72.5 73 73.5 74 74.5

Shoes-Relative

explicit labels only +contradictions +transitivity

Percentile rank

Impact of implicit labels

slide-66
SLIDE 66

Summary

  • Practical concerns if learning visual categories:

Test images can look different from training images! People do not perceive image labels universally!

  • Domain adaptation methods help address them

Landmark-based unsupervised adaptation Reshaping datasets into latent domains Adapt generic models to account for user-specific perception of attributes

slide-67
SLIDE 67

References

  • Attribute Adaptation for Personalized Image Search. A. Kovashka

and K. Grauman. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, December 2013.

  • Reshaping Visual Datasets for Domain Adaptation. B. Gong, K.

Grauman, and F. Sha. In Proceedings of Advances in Neural Information Processing Systems (NIPS), Tahoe, Nevada, December 2013.

  • Connecting the Dots with Landmarks: Discriminatively Learning

Domain-Invariant Features for Unsupervised Domain Adaptation.

  • B. Gong, K. Grauman, and F. Sha. In International Conference on

Machine Learning (ICML), Atlanta, GA, June 2013.

  • Geodesic Flow Kernel for Unsupervised Domain Adaptation. B.

Gong, Y. Shi, F. Sha, and K. Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, June 2012.