COMP 150: Probabilistic Robotics for Human-Robot Interaction - - PowerPoint PPT Presentation

comp 150 probabilistic robotics for human robot
SMART_READER_LITE
LIVE PREVIEW

COMP 150: Probabilistic Robotics for Human-Robot Interaction - - PowerPoint PPT Presentation

COMP 150: Probabilistic Robotics for Human-Robot Interaction Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov Today: Perception beyond Vision Announcements Project Deadlines Project Presentations: Tuesday May 5 th 3:30- 6:30 pm


slide-1
SLIDE 1

COMP 150: Probabilistic Robotics for Human-Robot Interaction

Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov

slide-2
SLIDE 2

Today: Perception beyond Vision

slide-3
SLIDE 3

Announcements

slide-4
SLIDE 4

Project Deadlines

  • Project Presentations: Tuesday May 5th 3:30-

6:30 pm

  • Final Report + Deliverables: May 10
  • Deliverables:

– Presentation slides + videos – Final Report (PDF) – Source code (link to github repositories + README)

slide-5
SLIDE 5

Microphones in the head Torque sensors in the joints ZCam (RGB+D) Logitech Webcam 3-axis accelerometer

Sinapov, J., Schenck, C., Staley, K., Sukhoy, V., and Stoytchev, A. (2014) Grounding Semantic Categories in Behavioral Interactions: Experiments with 100 Objects Robotics and Autonomous Systems, Vol. 62, No. 5, pp. 632-645, May 2014.

slide-6
SLIDE 6

100 objects from 20 categories

slide-7
SLIDE 7

Exploratory Behaviors

grasp lift hold shake drop tap poke push press

slide-8
SLIDE 8
slide-9
SLIDE 9

Coupling Action and Perception

… … … … … …

Time

Action: poke Perception: optical flow

slide-10
SLIDE 10

Sensorimotor Contexts

look grasp lift hold shake audio (DFT) haptics (joint torques) Color drop tap poke push press Optical flow SURF proprioception (finger pos.)

slide-11
SLIDE 11

Overview

Category Recognition Model

Sensorimotor Feature Extraction Interaction with Object Category Estimates

slide-12
SLIDE 12

Context-specific Category Recognition

Observation from poke-audio context

Mpoke-audio

Recognition model for poke-audio context Distribution over category labels

slide-13
SLIDE 13

Recognition Rates with SVM

Audio Proprioception Color Optical Flow SURF All look 58.8 58.9 67.7 grasp 45.7 38.7 12.2 57.1 65.2 lift 48.1 63.7 5.0 65.9 79.0 hold 30.2 43.9 5.0 58.1 67.0 shake 49.3 57.7 32.8 75.6 76.8 drop 47.9 34.9 17.2 57.9 71.0 tap 63.3 50.7 26.0 77.3 82.4 push 72.8 69.6 26.4 76.8 88.8 poke 65.9 63.9 17.8 74.7 85.4 press 62.7 69.7 32.4 69.7 77.4

slide-14
SLIDE 14

Combining Model Outputs

. . . .

Mlook-color Mtap-audio Mlift-SURF Mpress-prop.

. . . .

Weighted Combination

slide-15
SLIDE 15

Combining Multiple Behaviors and Modalities

slide-16
SLIDE 16

Deep Models for Non-Visual Perception

Tatiya, G., and Sinapov, J. (2019) Deep Multi-Sensory Object Category Recognition Using Interactive Behavioral Exploration 2019 IEEE International Conference on Robotics and Automation (ICRA), Montreal, Canada, May 20-24, 2019.

slide-17
SLIDE 17

Haptics

slide-18
SLIDE 18

Sensorimotor Word Embeddings

Sinapov, J., Schenck, C., and Stoytchev, A. (2014). Learning Relational Object Categories Using Behavioral Exploration and Multimodal Perception In the Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA)

slide-19
SLIDE 19

Sensorimotor Word Embeddings

Thomason, J., Sinapov, J., Stone, P., and Mooney, R. (2018) Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions In proceedings of the 32nd Conference of the Association for the Advancement of Artificial Intelligence (AAAI)

slide-20
SLIDE 20

Integration into Service Robotics Platform

Robot, fetch me the green empty bottle

slide-21
SLIDE 21

Thomason, J., Padmakumar, A., Sinapov, J., Walker, N., Jiang, Y., Yedidsion, H., Hart, J., Stone,

  • P. and Mooney, R.J. (2020)

Jointly improving parsing and perception for natural language commands through human-robot dialog Journal of Artificial Intelligence Research 67 (2020)

slide-22
SLIDE 22

How do we extend this approach beyond individual words applied to individual objects?

slide-23
SLIDE 23

Language Related to Space and Locations

slide-24
SLIDE 24

Example: Offices

slide-25
SLIDE 25

Labs

slide-26
SLIDE 26

Kitchen

slide-27
SLIDE 27

Lobby

slide-28
SLIDE 28

Adjectives: clean vs messy

slide-29
SLIDE 29

Crowded vs Empty

slide-30
SLIDE 30

Computer Vision Solution: Scene Recognition

[https://people.csail.mit.edu/bzhou/image/cover_places2.png]

slide-31
SLIDE 31

Breakout Activity

  • Brainstorm: how can multimodal perception be used

to ground language related to locations and places?

  • What are some modalities / sources of information

beyond visual input that may come useful?

  • What are some nouns / adjectives related to locations

and places that may be difficult to ground using vision alone?

  • Take notes! After 15 minutes we’ll reconvene and share

what we found

slide-32
SLIDE 32

Student-led Paper Presentation

slide-33
SLIDE 33