learning image representations tied to ego motion
play

Learning Image Representations Tied to Ego motion Jayaraman and - PowerPoint PPT Presentation

University of Texas at Austin Visual Recognition Presentation (paper review) Learning Image Representations Tied to Ego motion Jayaraman and Grauman. ICCV 2015 Hilgad Montelo 2016 March Outline The "Kitten Carousel" Experiment


  1. University of Texas at Austin Visual Recognition Presentation (paper review) Learning Image Representations Tied to Ego ‐ motion Jayaraman and Grauman. ICCV 2015 Hilgad Montelo 2016 March

  2. Outline • The "Kitten Carousel" Experiment • Problem • Objective • Main Idea • Related Work • Approach • Experiments and Results • Conclusions

  3. The "Kitten Carousel" Experiment (Held & Hein, 1963) active kitten passive kitten Key to perceptual development: self-generated motion + visual feedback [Slide credit: Dinesh Jayaraman] 3

  4. Problem • Today’s visual recognition algorithms learn from “disembodied” bag of labeled snapshots. 4

  5. Objective • Provide visual recognition algorithm that learns in the context of acting and moving in the world. 5

  6. Main Idea • Associate Ego ‐ Motion and vision by teaching computer vision system the connection: • “how I move” “how my visual surroundings change” + 6

  7. Ego ‐ motion vision: view prediction After moving: 7 [Slide credit: Dinesh Jayaraman]

  8. Ego ‐ motion vision for recognition • Learning this connection requires:  Depth, 3D geometry Also key to  Semantics recognition!  Context • Can be learned without manual labels! Approach: unsupervised feature learning using egocentric video + motor signals 8

  9. Related Works Integrating vision and motion Agrawal, Carreira, Malik, “Learning to see by moving”, ICCV 2015 Watter, Springenberg, Boedecker, Riedmiller, “Embed to control...”, NIPS 2015 Levine, Finn, Darrell, Abbeel, “… visuomotor policies”, arXiv 2015 Konda, Memisevic, “Learning visual odometry ...”, VISAPP 2015 Visual prediction Doersch, Gupta, Efros, “… context prediction”, ICCV 2015 Oh, Guo, Lee, Lewis, Singh, “Action-conditional video …”, NIPS 2015 Kulkarni, Whitney, Kohli, Tenenbaum, “… inverse graphics ...”, NIPS 2015 Vondrick, Pirsiavash, Torralba, “Anticipating the future ...”, arXiv 2015 Video for unsupervised image features Wang, Gupta, “Unsupervised learning of visual …”, ICCV 2015 Goroshin, Bruna, Tompson, Eigen, LeCun, “Unsupervised ...”, ICCV 2015 9

  10. Approach Ego ‐ motion equivariance Invariant features: unresponsive to some classes of transformations Equivariant features : predictably responsive to some classes of transformations, through simple mappings (e.g., linear) “equivariance map” � Invariance discards information; equivariance organizes it. 10

  11. Approach Equivariant embedding Training data organized by ego ‐ motions Unlabeled video + motor signals left turn right turn forward motor signal Learn Pairs of frames related by similar ego ‐ motion should be related by same feature time → transformation 11 Source: “Learning image representations equivariant to ego motion ” Jayaraman and Grauman ICCV 2015

  12. Approach 1. Extract training frame pairs from video 2. Learn ego ‐ motion ‐ equivariant image features 3. Train on target recognition task in parallel 12

  13. Training frame pair mining Discovery of ego ‐ motion clusters =left turn yaw change =forward =right turn Right turn forward distance 13 [Slide credit: Dinesh Jayaraman]

  14. Ego ‐ motion equivariant feature learning Given: Desired : for all motions and all images , � � � Unsupervised training � � �� � � � � � ∥ � � � � �� � � � � � ��� � � ∥ � � � ��� � � � � Supervised training Feature space � � �� � � � � �� � � softmax loss � � �� � , y � � � � � � � � class � , � and jointly trained � � ��� � � 14 [Slide credit: Dinesh Jayaraman]

  15. Experiments • Validation using 3 public datasets: NORB , KITTI , SUN . • Comparison with different methods: CLSNET, TEMPORAL, DRLIM . 15

  16. Results: Recognition Learn from unlabeled car video (KITTI) Geiger et al, IJRR ’13 Exploit features for static scene classification (SUN, 397 classes) Xiao et al, CVPR ’10 16

  17. Results: Recognition Do ego ‐ motion equivariant features improve recognition? 6 labeled training examples KITTI ⟶ SUN per class recognition accuracy (%) 1.58 KITTI ⟶ KITTI 1.21 397 classes 1.02 0.70 0.25 invariance NORB ⟶ NORB Up to 30% accuracy increase over state of the art! * Hadsell et al., Dimensionality Reduction by Learning an Invariant Mapping, CVPR’06 ** Mobahi et al., Deep Learning from Temporal Coherence in Video, ICML’09 17

  18. Results: Active recognition • Leverage proposed equivariant embedding to select next best view for object recognition NORB data 50 40 Accuracy (%) 30 cup/bowl/pan? cup/bowl/pan? 20 10 0 cup frying pan [Slide credit: Dinesh Jayaraman]

  19. Conclusion and Future Work • The paper provided a new embodied visual feature learning paradigm. • The Ego ‐ motion equivariance boosts performance across multiple challenging recognition tasks. 19

  20. Questions • Why KITTI training and not some other domain based training? • Why does incorporating DRLIM improve EQUIV? Still Temporal coherence properties left to be learned? • Is it meaningful to compare EQUIV or EQUIV + DRLIM with the other cases with respect to equivariance error?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend