deep affordance grounded sensorimotor object recognition
play

Deep Affordance-Grounded Sensorimotor Object Recognition Authors: - PowerPoint PPT Presentation

Deep Affordance-Grounded Sensorimotor Object Recognition Authors: Spyridon Thermos, Georgios Presented By: Th. Papadopoulos, Petros Daras, Thomas Crosley Gerasimos Potamianos UT CS 381V Autumn 2017 Problem Integrate visual appearance


  1. Deep Affordance-Grounded Sensorimotor Object Recognition Authors: Spyridon Thermos, Georgios Presented By: Th. Papadopoulos, Petros Daras, Thomas Crosley Gerasimos Potamianos UT CS 381V Autumn 2017

  2. Problem ● Integrate visual appearance and visual affordance information ● Object + Affordance Classification Hit Using Hammer

  3. Affordances : “the types of actions that humans typically perform when interacting with an object.” Sit Throw Workout https://www.youtube.com/watch?v=V4XW74W9t4o https://www.youtube.com/watch?v=7Qxu5cvW-ds https://www.youtube.com/watch?v=1xS864zYIo8

  4. Related Work Simpler Methods Smaller Data ● Factorial Conditional ● Few objects [1, 2, 3] Random Fields and Binary ● Small number of affordances [1, 2, 3] SVMs [1] ● Ex: 6 objects and 3 affordances [1] ● Gaussian Processes [2] ● SVMs + Clustering [3] [1] [2] [3]

  5. RGB-D Sensorimotor Dataset

  6. RGB-D Sensorimotor Dataset http://sor3d.vcl.iti.gr/wp-content/uploads/2017/03/sor3d.mp4?_=1

  7. RGB-D Sensorimotor Dataset

  8. RGB-D Sensorimotor Dataset Original Input

  9. RGB-D Sensorimotor Dataset Input Processing

  10. RGB-D Sensorimotor Dataset Data Extraction

  11. RGB-D Sensorimotor Dataset ● 14 Object Types ● 13 Affordances ● 54 Interactions ● 105 subjects ● 4 to 8 seconds ● 20,830 instances

  12. Architectures ● Generalized Template-Matching (GTM) ● Model spatial correlations ● Appearance CNN for object detection

  13. Architectures ● Generalized Spatio-Temporal (GST) ● Encode time-evolving procedures ● CNN+LSTM for affordance modeling

  14. Long Short Term Memory Networks (LSTMs) LSTMs: recurrent architecture capable of learning long-term dependencies Image Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  15. LSTMs Core Idea: cell state updated and then passed on at each time step Image Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  16. LSTMs “Forget Gate” “Remember Gate” Image Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  17. LSTMs Image Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  18. Fusion ● Given multiple sources of information ● At what point do we combine their features? Image Source: http://cs.stanford.edu/people/karpathy/deepvideo/

  19. Fusion ● GST Architecture ● Combines ○ Appearance ○ Affordance ● (a) Late Fusion ● (b) Slow fusion

  20. Architecture Slow Fusion Multi-Level Late Fusion Late Fusion Fusion at FC at conv

  21. Results Single Stream (Best) Template Matching (Best) Spatio-Temporal

  22. Open Problems ● Authors’ Thoughts ○ NN-Autoencoders for human-object interactions ○ “In-the-wild” object-affordance detection ● Others ○ Affordance identification for control tasks ○ Better temporal sampling schemes

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend