adaptation for objects and attributes
play

Adaptation for Objects and Attributes Kristen Grauman Department of - PowerPoint PPT Presentation

Adaptation for Objects and Attributes Kristen Grauman Department of Computer Science University of Texas at Austin With Adriana Kovashka (UT Austin), Boqing Gong (USC), and Fei Sha (USC) Learning-based visual recognition Last 10+ years:


  1. Adaptation for Objects and Attributes Kristen Grauman Department of Computer Science University of Texas at Austin With Adriana Kovashka (UT Austin), Boqing Gong (USC), and Fei Sha (USC)

  2. Learning-based visual recognition Last 10+ years: impressive strides by learning appearance models (usually discriminative). CAR! CAR New image CAR Image features Annotator NOT CAR Training images

  3. Typical assumptions 1. Test set will look like the training set. 2. Human labelers “see” the same thing.

  4. Mismatched domains TRAIN TEST Flickr YouTube

  5. Mismatched domains TRAIN TEST Catalog images Mobile phone photos

  6. Mismatched domains TRAIN TEST ImageNet PASCAL VOC

  7. Mismatched domains “It is worthwile to note that, even with 140K training ImageNet images, we do not perform as well as with 5K PASCAL VOC training images.” – Perronnin et al. CVPR 2010 TRAIN TEST ImageNet PASCAL VOC

  8. Mismatched domains Problem: Poor cross-domain generalization • Different underlying distributions • Overfit to datasets’ idiosyncrasies Possible solution: Unsupervised domain adaptation

  9. Unsupervised domain adaptation Setup Source domain (with labeled data) Target domain (no labels for training) Different distributions Objective Learn classifier to work well on the target

  10. Much recent research Correcting sampling bias + - + + [This work] - - [Sethy et al., ’09] - [Sugiyama et al., ’08] [Muandet et al., ’13] [Pan et al., ’09] [Huang et al., Bickel et al., ’07] [Gong et al., ’12] Inferring [Argyriou et al, ’08] [Sethy et al., ’06] [Chen et al., ’12] [Daumé III, ’07] domain- [Shimodaira, ’00] [Gopalan et al., ’11] [Blitzer et al., ’06] invariant [Evgeniou and Pontil, ’05] features + - ++ [Duan et al., ’09] -- -+ + [Duan et al., Daumé III et al., Saenko et al., ’10] - + - + - + + [Kulis et al., Chen et al., ’11] - Adjusting mismatched models

  11. Problem Existing methods attempt to adapt all source data points, including “hard” ones. Source Target

  12. Problem Existing methods attempt to adapt all source data points, including “hard” ones. Our idea Automatically identify the “most adaptable” instances Use them to create series of easier auxiliary domain adaptation tasks [Gong et al., ICML 2013]

  13. Landmarks Landmarks are labeled source instances distributed similarly to the target domain. Source Target [Gong et al., ICML 2013]

  14. Landmarks Landmarks are labeled source instances distributed similarly to the target domain. Source Roles: Ease adaptation difficulty Provide discrimination (biased to target) Target [Gong et al., ICML 2013]

  15. Key steps Coarse Source Landmarks Target Fine- grained 1 Identify landmarks at multiple scales. [Gong et al., ICML 2013]

  16. Key steps 3 Obtain domain- invariant features 4 Construct auxiliary domain 2 adaptation tasks Predict target labels [Gong et al., ICML 2013]

  17. Identifying landmarks Objective Source Target [Gong et al., ICML 2013]

  18. Maximum mean discrepancy (MMD) Empirical estimate [Gretton et al. ’06] a universal RKHS kernel function induced by the l -th landmark (from the source domain) [Gong et al., ICML 2013]

  19. Method for identifying landmarks Integer programming where [Gong et al., ICML 2013]

  20. Method for identifying landmarks Convex relaxation [Gong et al., ICML 2013]

  21. Scale for landmark similarity? Gaussian kernels How to choose the bandwidth? Our solution: Examine distributions at multiple granularities Multiple bandwidths  multiple sets of landmarks [Gong et al., ICML 2013]

  22. Landmarks at multiple scales Headphone Mug Target target σ=2 6 0 σ=2 Source 22 -3 σ=2 Unselected [Gong et al., ICML 2013]

  23. Key steps Construct auxiliary domain 2 adaptation tasks [Gong et al., ICML 2013]

  24. Constructing easier auxiliary tasks Source Landmarks Target At each scale σ Intuition: distributions are closer (cf. Theorem 1) [Gong et al., ICML 2013]

  25. Constructing easier auxiliary tasks New source Landmarks New target At each scale σ Intuition: distributions are closer (cf. Theorem 1) [Gong et al., ICML 2013]

  26. Constructing easier auxiliary tasks Each task provides new basis of features via geodesic flow kernel (GFK): - Integrate out domain changes - Obtain domain-invariant representation [Gong, et al. ’12] [Gong et al., CVPR 2012]

  27. Key steps MKL 3 Obtain domain- invariant features Construct auxiliary domain 2 adaptation tasks

  28. Combining features discriminatively Multiple kernel learning on the labeled landmarks Arriving at domain-invariant feature space Discriminative loss biased to the target

  29. Key steps 3 Obtain domain- invariant features 4 Construct auxiliary domain 2 adaptation tasks Predict target labels

  30. Experiments Four vision datasets/domains on visual object recognition [Griffin et al. ’07, Saenko et al. 10’] Four types of product reviews on sentiment analysis Books, DVD, electronics, kitchen appliances [Biltzer et al. ’07]

  31. Cross-dataset object recognition

  32. Cross-dataset object recognition

  33. Cross-dataset object recognition

  34. Datasets as domains? ASSUMED Domain 2 Domain 1 Domain 3 Domain 4 Domain 5

  35. Datasets as domains? REALITY Domain 2 Domain 1 Domain 3 Domain 8 Domain 7 Domain 6 Domain 9 Domain 5 Domain 10 Domain 4 Domain 5

  36. Datasets as domains? REALITY Domain 2 Domain 1 Domain 3 Dataset != Domain Domain 8 Domain 7 Cross- dataset adaptation is suboptimal Domain 6 Domain 9 Domain 5 Domain 10 Domain 4 Domain 5

  37. How to define a domain? NLP : Language -specific domains Speech : Speaker -specific domains Vision : ?? pose -specific? illumination -specific? occlusion ? image resolution ? background ? Challenges: Many continuous factors vs. few discrete Factors overlap and interact

  38. Discovering latent visual domains We propose to discover domains – “reshaping” them to cross dataset boundaries Maximum distinctiveness MMD where Maximum learnability Determine K with domain-wise cross-validation [Gong et al., NIPS 2013]

  39. Results: discovering domains Discovered Discovered domain II domain I [Gong et al., NIPS 2013]

  40. Results: discovering domains Cross-dataset Cross-viewpoint action recognition object recognition 42 50 41 49 40 48 39 Accuracy 47 Accuracy 38 46 37 45 36 44 35 43 34 42 33 Domains= Hoffman et Domains= Hoffman et Discovered Discovered datasets al. 2012 datasets al. 2012 domains (ours) domains (ours) Domain I Domain II

  41. Summary so far landmarks labeled source instances distributed similarly to the target auxiliary tasks provably easier to solve discriminative loss despite unlabeled target reshaping datasets to latent domains discover cross-dataset domains maximally distinct & learnable

  42. Typical assumptions 1. Test set will look like the training set. 2. Human labelers “see” the same thing.

  43. Visual attributes • High-level semantic properties shared by objects • Human-understandable and machine-detectable high outdoors flat metallic heel brown red indoors has- four-legged ornaments [Oliva et al. 2001, Ferrari & Zisserman 2007, Kumar et al. 2008, Farhadi et al. 2009, Lampert et al. 2009, Endres et al. 2010, Wang & Mori 2010, Berg et al. 2010, Branson et al. 2010, Parikh & Grauman 2011, …]

  44. Standard approach Learn one monolithic model per attribute “formal” Vote on Annotator A labels “not formal” Annotator B Annotator C

  45. Problem There may be valid perceptual differences within an attribute. Formal? More ornamented? User labels: User labels: 50% “yes” 50% “first” or 20% “second” 50% “no” 30% “equally” Binary attribute Relative attribute

  46. Imprecision of attributes Fine-grained meaning Overweight? or just Chubby?

  47. Imprecision of attributes Context Is formal? = formal wear for a conference? OR = formal wear for a wedding?

  48. Imprecision of attributes Cultural Is blue or green ? English : “blue” Russian : “neither” (“голубой” vs. “синий”) Japanese : “both” (“ 青 ” = blue and green)

  49. But do we need to be that precise? Yes. Applications like image search require that user’s perception matches system’s predictions. “white high heels” “less formal than these” [WhittleSearch, Kovashka et al. CVPR 2012]

  50. Our idea • Treat learning perceived attributes as an adaptation problem. • Adapt generic attribute model with minimal user-specific labeled examples. • Obtain implicit user-specific labels from user’s search history [Kovashka and Grauman, ICCV 2013]

  51. Our idea “formal” Vote on labels “not formal” “formal” “formal” “not formal” “not formal” [Kovashka and Grauman, ICCV 2013]

  52. Learning adapted attributes • Adapting binary attribute classifiers: Given user-labeled data and generic model , J. Yang et al. ICDM 2007.

  53. Learning adapted attributes • Adapting relative attribute rankers: Given user-labeled data and generic model , B. Geng, et al. TKDE 2010.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend