about visual categories
play

about visual categories Kristen Grauman Department of Computer - PowerPoint PPT Presentation

Teaching computers about visual categories Kristen Grauman Department of Computer Science University of Texas at Austin Visual category recognition Goal: recognize and detect categories of visually and semantically related Objects Scenes


  1. Teaching computers about visual categories Kristen Grauman Department of Computer Science University of Texas at Austin

  2. Visual category recognition Goal: recognize and detect categories of visually and semantically related… Objects Scenes Activities Kristen Grauman, UT Austin

  3. The need for visual recognition Robotics Augmented reality Indexing by content Surveillance Scientific data analysis Kristen Grauman, UT Austin

  4. Difficulty of category recognition Illumination Object pose Clutter Intra-class Occlusions Viewpoint appearance ~30,000 possible categories to distinguish! [Biederman 1987] Kristen Grauman, UT Austin

  5. Progress charted by datasets Roberts 1963 COIL 1963 … 1996 Kristen Grauman, UT Austin

  6. Progress charted by datasets MIT-CMU Faces INRIA Pedestrians UIUC Cars 1963 … 1996 2000 Kristen Grauman, UT Austin

  7. Progress charted by datasets MSRC 21 Objects Caltech-101 Caltech-256 1963 … 2005 1996 2000 Kristen Grauman, UT Austin

  8. Progress charted by datasets PASCAL VOC Detection challenge 1963 … 2005 2007 1996 2000 Kristen Grauman, UT Austin

  9. Progress charted by datasets ImageNet 80M Tiny Images PASCAL VOC Birds-200 Faces in the Wild 1963 … 2005 2007 2013 1996 2000 2008 Kristen Grauman, UT Austin

  10. Learning-based methods Last ~10 years: impressive strides by learning appearance models (usually discriminative). Novel image Annotator Car Training images Non-car [Papageorgiou & Poggio 1998, Schneiderman & Kanade 2000, Viola & Jones 2001, Dalal & Triggs 2005, Grauman & Darrell 2005, Lazebnik et al. 2006, Felzenszwalb et al. 2008,…] Kristen Grauman, UT Austin

  11. Exuberance for image data (and their category labels) 14M images 1K+ labeled object categories [Deng et al. 2009-2012] ImageNet 80M images 53K noisily labeled object categories [Torralba et al. 2008] 80M Tiny Images 131K images 902 labeled scene categories 4K labeled object categories [Xiao et al. 2010] SUN Database Kristen Grauman, UT Austin

  12. Problem Difficulty+scale Complexity of of data supervision Log scale 1998 2013 1998 2013 While complexity and scale of recognition task has escalated dramatically, our means of “teaching” visual categories remains shallow . Kristen Grauman, UT Austin

  13. Envisioning a broader channel “This image has a cow in it.” Human annotator More labeled images ↔ more accurate models? Kristen Grauman, UT Austin

  14. Envisioning a broader channel Human annotator Need richer means to teach system about visual world Kristen Grauman, UT Austin

  15. Envisioning a broader channel human system human system Today Next 10 years Vision Knowledge Learning representation Vision Learning Multi-agent Human systems computation Robotics Language Kristen Grauman, UT Austin

  16. Our goal Teaching computers about visual categories must be an ongoing, interactive process, with communication that goes beyond labels. This talk: 1. Active visual learning 2. Learning from visual comparisons Kristen Grauman, UT Austin

  17. Active learning for visual recognition Labeled data Annotator ? Active request Unlabeled data Current classifiers [Mackay 1992, Cohn et al. 1996, Freund et al. 1997, Lindenbaum et al. 1999, Tong & Koller 2000, Schohn and Cohn 2000, Campbell et al. 2000, Roy & McCallum 2001, Kapoor et al. 2007,…] Kristen Grauman, UT Austin

  18. Active learning for visual recognition active Labeled data Annotator Accuracy passive Unlabeled Num labels added data Current classifiers Intent: better models, faster/cheaper Kristen Grauman, UT Austin

  19. Problem: Active selection and recognition Less expensive to • Multiple levels of obtain annotation are possible • Variable cost depending on level and example More expensive to obtain Kristen Grauman, UT Austin

  20. Our idea: Cost-sensitive multi-question active learning • Compute decision-theoretic active selection criterion that weighs both: – which example to annotate, and – what kind of annotation to request for it as compared to – the predicted effort the request would require [Vijayanarasimhan & Grauman, NIPS 2008, CVPR 2009] Kristen Grauman, UT Austin

  21. Decision-theoretic multi-question criterion Value of asking given Current Estimated risk if candidate Cost of getting question about given misclassification risk request were answered the answer data object Three “levels” of requests to choose from: ? ? 3. Segment the 2. Tag an object 1. Label a region image, name all in the image objects. Kristen Grauman, UT Austin

  22. Predicting effort • What manual effort cost would we expect to pay for an unlabeled image? Which image would you rather annotate? Kristen Grauman, UT Austin

  23. Predicting effort • What manual effort cost would we expect to pay for an unlabeled image? Which image would you rather annotate? Kristen Grauman, UT Austin

  24. Predicting effort We estimate labeling difficulty from visual content. Kristen Grauman, UT Austin

  25. Predicting effort We estimate labeling difficulty from visual content. Other forms of effort cost : expertise required, resolution of data, how far the robot must move, length of video clip,… Kristen Grauman, UT Austin

  26. Multi-question active learning “Completely segment Labeled data image #32.” Annotator “Does image #7 contain a cow?” Unlabeled data Current classifiers [Vijayanarasimhan & Grauman, NIPS 2008, CVPR 2009] Kristen Grauman, UT Austin

  27. Multi-question active learning “Completely segment Labeled data image #32.” Annotator “Does image #7 contain a cow?” Unlabeled data Current classifiers [Vijayanarasimhan & Grauman, NIPS 2008, CVPR 2009] Kristen Grauman, UT Austin

  28. Multi-question active learning curves Accuracy Annotation effort Kristen Grauman, UT Austin

  29. Multi-question active learning with objects and attributes [Kovashka et al., ICCV 2011] Labeled data Annotator Unlabeled Does this object What is this Current data object ? have spots ? model Weigh relative impact of an object label or an attribute label, at each iteration. Kristen Grauman, UT Austin

  30. Budgeted batch active learning [Vijayanarasimhan et al., CVPR 2010] Labeled data Annotator $ $ Unlabeled Current data $ $ model Unlabeled data Select batch of examples that together improves classifier objective and meets annotation budget . Kristen Grauman, UT Austin

  31. Problem : “Sandbox” active learning Thus far, tested only in artificial settings: • Unlabeled data already fixed, small scale, biased ~10 3 prepared images passive Accuracy • Computational cost ignored active Actual time Kristen Grauman, UT Austin

  32. Our idea : Live active learning Large-scale active learning of object detectors with crawled data and crowdsourced labels. How to scale active learning to massive unlabeled pools of data? Kristen Grauman, UT Austin

  33. Pool-based active learning e.g., select point nearest to hyperplane decision boundary ? for labeling. w [Tong & Koller, 2000; Schohn & Cohn, 2000; Campbell et al. 2000] Kristen Grauman, UT Austin

  34. Sub-linear time active selection We propose a novel hashing approach to identify the most uncertain examples in sub-linear time. 110 Current classifier 101 111 Actively selected examples Hash table Unlabeled data [Jain, Vijayanarasimhan, Grauman, NIPS 2010] Kristen Grauman, UT Austin

  35. Hashing a hyperplane query   h ( ) { , } w x x 1 k ( t ) x 2 ( t ) x 1  ( t 1 ) x 1  ( t 1 )  x ( t 1 ) ( t ) (  x 3 t 1 ) w w 2 At each iteration of the learning loop, our hash functions map the current hyperplane directly to its nearest unlabeled points. Kristen Grauman, UT Austin

  36. Hashing a hyperplane query   h ( ) { , } w x x 1 k ( t ) x 2 ( t ) x Guarantee high probability of collision for 1  ( t 1 ) points near decision boundary: x 1  ( t 1 )  x ( t 1 ) ( t ) (  x 3 t 1 ) w w 2 At each iteration of the learning loop, our hash functions map the current hyperplane directly to its nearest unlabeled points. Kristen Grauman, UT Austin

  37. Sub-linear time active selection Accuracy improvements Accounting for all costs Improvement in AUROC as more data 15% H-Hash Active labeled Exhaustive Active Passive 10% 5% H-Hash Active 2 Exhaustive Active Passive Time spent 8 1.3 4 Selection + labeling time (hrs) searching for selection By minimizing both selection and labeling time, obtain the best H-Hash Exhaustive Active Active accuracy per unit time. H-Hash result on 1M Tiny Images Kristen Grauman, UT Austin

  38. PASCAL Visual Object Categorization • Closely studied object detection benchmark • Original image data from Flickr http://pascallin.ecs.soton.ac.uk/challenges/VOC/ Kristen Grauman, UT Austin

  39. Live active learning Consensus (Mean shift) Annotated data   “bicycle” h Current w 1100 hyperplane 1010 1111 Actively   selected h  Jumping ( O ) i examples Hash table of window image candidates windows Unlabeled Unlabeled images windows [Vijayanarasimhan & Grauman CVPR 2011]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend