COMP 150: Probabilistic Robotics for Human-Robot Interaction - - PowerPoint PPT Presentation

comp 150 probabilistic robotics for human robot
SMART_READER_LITE
LIVE PREVIEW

COMP 150: Probabilistic Robotics for Human-Robot Interaction - - PowerPoint PPT Presentation

COMP 150: Probabilistic Robotics for Human-Robot Interaction Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov Today: Perception beyond Vision Announcements Project Deadlines Project Presentations: Tuesday May 5 th 3:30- 6:30 pm


slide-1
SLIDE 1

COMP 150: Probabilistic Robotics for Human-Robot Interaction

Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov

slide-2
SLIDE 2

Today: Perception beyond Vision

slide-3
SLIDE 3

Announcements

slide-4
SLIDE 4

Project Deadlines

  • Project Presentations: Tuesday May 5th 3:30-

6:30 pm

  • Final Report + Deliverables: May 10
  • Deliverables:

– Presentation slides + videos – Final Report (PDF) – Source code (link to github repositories + README)

slide-5
SLIDE 5

Today: Perception beyond Vision

slide-6
SLIDE 6

Language Acquisition

How would you describe this object? It is a small orange spray can My model of the word ‘orange’ has improved!

slide-7
SLIDE 7

Current Solution: connect the symbol with visual input

Sridharan et al. 2008 Lai et al. 2011 Rusu et al. 2009 Collet et al. 2009

slide-8
SLIDE 8

Current Solution: connect the symbol with visual input

Redmon et al. 2016

slide-9
SLIDE 9

Modality Exclusivity Norms for common English nouns and adjectives

slide-10
SLIDE 10

“Robot, I am thirsty, fetch me the yellow juice carton”

slide-11
SLIDE 11

Solution: Lift the Object

slide-12
SLIDE 12

“Fetch me the pill bottle”

slide-13
SLIDE 13

Solution: Shake the Object

slide-14
SLIDE 14

Solution: Shake the Object

Exploratory behaviors give us information about objects that vision cannot!

slide-15
SLIDE 15

[Power, 2000] [Lederman and Klatzky, 1987]

slide-16
SLIDE 16

Object Exploration in Infancy

slide-17
SLIDE 17

The “5” Senses

slide-18
SLIDE 18

The “5” Senses

[http://edublog.cmich.edu/meado1bl/files/2013/03/Five-Senses2.jpg]

slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

Why sound for robotics?

slide-22
SLIDE 22
slide-23
SLIDE 23

What just happened?

slide-24
SLIDE 24

What just happened?

What actually happened: The robot dropped a soda-can

slide-25
SLIDE 25

Why Natural Sound is Important

“…natural sound is as essential as visual information because sound tells us about things that we can't see, and it does so while our eyes are

  • ccupied elsewhere. “

“Sounds are generated when materials interact, and the sounds tell us whether they are hitting, sliding, breaking, tearing, crumbling, or

  • bouncing. “

“Moreover, sounds differ according to the characteristics of the

  • bjects, according to their size,

solidity, mass, tension, and material. “

Don Norman, “The Design of Everyday Things”, p.103

slide-26
SLIDE 26

Why Natural Sound is Important

Sound Producing Event [Gaver, 1993]

slide-27
SLIDE 27

What is a Sound Wave?

slide-28
SLIDE 28

What is a Sound Wave?

….from a computer's point of view, raw audio is a sequence of 44.1K floating point numbers arriving each second

slide-29
SLIDE 29

Sine Curve

[http://clem.mscd.edu/~talmanl/HTML/SineCurve.html]

slide-30
SLIDE 30

Amplitude (vertical stretch)

[http://www.sparknotes.com/math/trigonometry/graphs/section4.rhtml]

3 sin(x)

slide-31
SLIDE 31

Frequency (horizontal stretch)

[http://www.sparknotes.com/math/trigonometry/graphs/section4.rhtml]

slide-32
SLIDE 32

Sinusoidal waves of various frequencies

High Frequency Low Frequency

[http://en.wikipedia.org/wiki/Frequency]

slide-33
SLIDE 33

Fourier Series

A Fourier series decomposes periodic functions or periodic signals into the sum of a (possibly infinite) set of simple oscillating functions, namely sines and cosines

slide-34
SLIDE 34

Approximation

[http://en.wikipedia.org/wiki/Fourier_series]

slide-35
SLIDE 35

Discrete Fourier Transform

. . . .

slide-36
SLIDE 36

Discrete Fourier Transform

Frequency bin Time

slide-37
SLIDE 37

Object Exploration in Infancy

slide-38
SLIDE 38

Object Exploration by a Robot

Sinapov, J., Wiemer, M., and Stoytchev, A. (2009). Interactive Learning of the Acoustic Properties of Household Objects In proceedings of the 2009 IEEE International Conference on Robotics and Automation (ICRA)

slide-39
SLIDE 39

Objects

slide-40
SLIDE 40

Behaviors

Grasp: Shake: Drop: Push: Tap:

slide-41
SLIDE 41

Audio Feature Extraction

Behavior Execution: WAV file recorded: Discrete Fourier Transform:

slide-42
SLIDE 42
  • 1. Training a self-organizing map (SOM) using DFT column vectors:

Audio Feature Extraction

slide-43
SLIDE 43
  • 2. Use SOM to convert DFT spectrogram to a sequence:

Audio Feature Extraction

slide-44
SLIDE 44
  • 2. Use SOM to convert DFT spectrogram to a sequence:

Si: (3,2) ->

Audio Feature Extraction

slide-45
SLIDE 45
  • 2. Use SOM to convert DFT spectrogram to a sequence:

Si: (3,2) -> (2,2) ->

Audio Feature Extraction

slide-46
SLIDE 46
  • 2. Use SOM to convert DFT spectrogram to a sequence:

Si: (3,2) -> (2,2) -> (4,4) -> ….

Audio Feature Extraction

slide-47
SLIDE 47
  • 1. Training a self-organizing map

(SOM) using column vectors:

  • 2. Discretization of a DFT of a

sound using a trained SOM

is the sequence of activated SOM nodes

  • ver the duration of the sound

Audio Feature Extraction

slide-48
SLIDE 48

Auditory SOM Auditory SOM Sequence Xi Sequence Yj Global Sequence Alignment Similarity

very similar

sim(Xi,Yj) = 0.89

Detecting Acoustic Similarity

slide-49
SLIDE 49

Detecting Acoustic Similarity

Auditory SOM Auditory SOM Sequence Xi Sequence Yj Global Sequence Alignment Similarity

sim(Xi,Yj) = 0.23

not similar

slide-50
SLIDE 50

Si

Object Recognition Model Behavior Recognition Model Sound Sequence: drop

Model predictions:

Problem Formulation

slide-51
SLIDE 51

Dimensionality Reduction using SOM Auditory Recognition Model

Object Probability Estimates Discrete Auditory Sequence Auditory Data

Acoustic Object Recognition

slide-52
SLIDE 52

Recognition Model

  • k-NN: memory-based learning algorithm

? Test point With k = 3: 2 neighbors 1 neighbors

Therefore, Pr(red) = 0.66 Pr(blue) = 0.33

slide-53
SLIDE 53

Recognition Model

  • SVM: discriminative learning algorithm
slide-54
SLIDE 54

Evaluation Results

Chance accuracy = 2.7 %

slide-55
SLIDE 55

Evaluation Results

slide-56
SLIDE 56

Recognition Video

slide-57
SLIDE 57

Estimating Acoustic Object Similarity using Confusion Matrix

40 4 6 42 21 6 8 35 Predicted → Actual : similar : similar : different : different

slide-58
SLIDE 58
slide-59
SLIDE 59

(mostly) metal

  • bjects

(mostly) metal

  • bjects

Objects with contents inside Objects with contents inside Balls Balls Paper Objects Paper Objects Plastic Objects Plastic Objects (mostly) wooden

  • bjects

(mostly) wooden

  • bjects
slide-60
SLIDE 60

Recognizing the sounds of objects manipulated by other agents

slide-61
SLIDE 61
slide-62
SLIDE 62
slide-63
SLIDE 63

Sinapov, J., Bergquist, T., Schenck, C., Ohiri, U., Griffith, S., and Stoytchev, A. (2011) Interactive Object Recognition Using Proprioceptive and Auditory Feedback International Journal of Robotics Research, Vol. 30, No. 10, pp. 1250-1262, September 2011

slide-64
SLIDE 64

Objects

slide-65
SLIDE 65

The Proprioceptive / Haptic Modality

J1 J7

. . . Time

slide-66
SLIDE 66

Feature Extraction

Training a self-organizing map (SOM) using sampled joint torques: Training an SOM using sampled frequency distributions:

slide-67
SLIDE 67

Discretization of joint-torque records using a trained SOM Discretization of the DFT of a sound using a trained SOM

Feature Extraction

slide-68
SLIDE 68

Accuracy vs. Number of Behaviors

slide-69
SLIDE 69

1 Behavior Multiple Behaviors

slide-70
SLIDE 70

Sinapov, J., and Stoytchev, A. (2010). The Boosting Effect of Exploratory Behaviors In Proceedings of the 24-th National Conference on Artificial Intelligence (AAAI), 2010.

slide-71
SLIDE 71
slide-72
SLIDE 72

Microphones in the head Torque sensors in the joints ZCam (RGB+D) Logitech Webcam 3-axis accelerometer

Sinapov, J., Schenck, C., Staley, K., Sukhoy, V., and Stoytchev, A. (2014) Grounding Semantic Categories in Behavioral Interactions: Experiments with 100 Objects Robotics and Autonomous Systems, Vol. 62, No. 5, pp. 632-645, May 2014.

slide-73
SLIDE 73

Exploratory Behaviors

grasp lift hold shake drop tap poke push press

slide-74
SLIDE 74

Coupling Action and Perception

… … … … … …

Time

Action: poke Perception: optical flow

slide-75
SLIDE 75

Sensorimotor Contexts

look grasp lift hold shake audio (DFT) haptics (joint torques) Color drop tap poke push press Optical flow SURF proprioception (finger pos.)

slide-76
SLIDE 76

Overview

Category Recognition Model

Sensorimotor Feature Extraction Interaction with Object Category Estimates

slide-77
SLIDE 77

Context-specific Category Recognition

Observation from poke-audio context

Mpoke-audio

Recognition model for poke-audio context Distribution over category labels

slide-78
SLIDE 78

Combining Model Outputs

. . . .

Mlook-color Mtap-audio Mlift-SURF Mpress-prop.

. . . .

Weighted Combination

slide-79
SLIDE 79

Deep Models for Non-Visual Perception

Tatiya, G., and Sinapov, J. (2019) Deep Multi-Sensory Object Category Recognition Using Interactive Behavioral Exploration 2019 IEEE International Conference on Robotics and Automation (ICRA), Montreal, Canada, May 20-24, 2019.

slide-80
SLIDE 80
  • Behaviors allow robots not only to affect the world, but

also to perceive it

  • Non-visual sensory feedback improves object

classification and perception tasks that are typically solved using vision alone

  • A diverse sensorimotor repertoire is necessary for scaling

up object recognition, categorization, and individuation to a large number of objects

Take-home Message

slide-81
SLIDE 81
slide-82
SLIDE 82