COMP 150: Probabilistic Robotics for Human-Robot Interaction - - PowerPoint PPT Presentation
COMP 150: Probabilistic Robotics for Human-Robot Interaction - - PowerPoint PPT Presentation
COMP 150: Probabilistic Robotics for Human-Robot Interaction Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov Today: Perception beyond Vision Announcements Project Deadlines Project Presentations: Tuesday May 5 th 3:30- 6:30 pm
Today: Perception beyond Vision
Announcements
Project Deadlines
- Project Presentations: Tuesday May 5th 3:30-
6:30 pm
- Final Report + Deliverables: May 10
- Deliverables:
– Presentation slides + videos – Final Report (PDF) – Source code (link to github repositories + README)
Microphones in the head Torque sensors in the joints ZCam (RGB+D) Logitech Webcam 3-axis accelerometer
Sinapov, J., Schenck, C., Staley, K., Sukhoy, V., and Stoytchev, A. (2014) Grounding Semantic Categories in Behavioral Interactions: Experiments with 100 Objects Robotics and Autonomous Systems, Vol. 62, No. 5, pp. 632-645, May 2014.
100 objects from 20 categories
Exploratory Behaviors
grasp lift hold shake drop tap poke push press
Coupling Action and Perception
… … … … … …
Time
Action: poke Perception: optical flow
Sensorimotor Contexts
look grasp lift hold shake audio (DFT) haptics (joint torques) Color drop tap poke push press Optical flow SURF proprioception (finger pos.)
Overview
Category Recognition Model
Sensorimotor Feature Extraction Interaction with Object Category Estimates
…
Context-specific Category Recognition
Observation from poke-audio context
Mpoke-audio
Recognition model for poke-audio context Distribution over category labels
Recognition Rates with SVM
Audio Proprioception Color Optical Flow SURF All look 58.8 58.9 67.7 grasp 45.7 38.7 12.2 57.1 65.2 lift 48.1 63.7 5.0 65.9 79.0 hold 30.2 43.9 5.0 58.1 67.0 shake 49.3 57.7 32.8 75.6 76.8 drop 47.9 34.9 17.2 57.9 71.0 tap 63.3 50.7 26.0 77.3 82.4 push 72.8 69.6 26.4 76.8 88.8 poke 65.9 63.9 17.8 74.7 85.4 press 62.7 69.7 32.4 69.7 77.4
Combining Model Outputs
. . . .
Mlook-color Mtap-audio Mlift-SURF Mpress-prop.
. . . .
Weighted Combination
Combining Multiple Behaviors and Modalities
Deep Models for Non-Visual Perception
Tatiya, G., and Sinapov, J. (2019) Deep Multi-Sensory Object Category Recognition Using Interactive Behavioral Exploration 2019 IEEE International Conference on Robotics and Automation (ICRA), Montreal, Canada, May 20-24, 2019.
Haptics
Sensorimotor Word Embeddings
Sinapov, J., Schenck, C., and Stoytchev, A. (2014). Learning Relational Object Categories Using Behavioral Exploration and Multimodal Perception In the Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA)
Sensorimotor Word Embeddings
Thomason, J., Sinapov, J., Stone, P., and Mooney, R. (2018) Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions In proceedings of the 32nd Conference of the Association for the Advancement of Artificial Intelligence (AAAI)
Integration into Service Robotics Platform
Robot, fetch me the green empty bottle
Thomason, J., Padmakumar, A., Sinapov, J., Walker, N., Jiang, Y., Yedidsion, H., Hart, J., Stone,
- P. and Mooney, R.J. (2020)
Jointly improving parsing and perception for natural language commands through human-robot dialog Journal of Artificial Intelligence Research 67 (2020)
How do we extend this approach beyond individual words applied to individual objects?
Language Related to Space and Locations
Example: Offices
Labs
Kitchen
Lobby
Adjectives: clean vs messy
Crowded vs Empty
Computer Vision Solution: Scene Recognition
[https://people.csail.mit.edu/bzhou/image/cover_places2.png]
Breakout Activity
- Brainstorm: how can multimodal perception be used
to ground language related to locations and places?
- What are some modalities / sources of information
beyond visual input that may come useful?
- What are some nouns / adjectives related to locations
and places that may be difficult to ground using vision alone?
- Take notes! After 15 minutes we’ll reconvene and share