CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov - - PowerPoint PPT Presentation

cs 378 autonomous intelligent robotics
SMART_READER_LITE
LIVE PREVIEW

CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov - - PowerPoint PPT Presentation

CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov http://www.cs.utexas.edu/~jsinapov/teaching/cs378/ Audio Processing and Computational Perception of Natural Sound Why Sound? Why Sound? What actually happened: The robot


slide-1
SLIDE 1

CS 378: Autonomous Intelligent Robotics

Instructor: Jivko Sinapov

http://www.cs.utexas.edu/~jsinapov/teaching/cs378/

slide-2
SLIDE 2

Audio Processing and Computational Perception

  • f Natural Sound
slide-3
SLIDE 3

Why Sound?

slide-4
SLIDE 4

Why Sound?

What actually happened: The robot dropped a soda-can

slide-5
SLIDE 5

Why Natural Sound is Important

“…natural sound is as essential as visual information because sound tells us about things that we can't see, and it does so while our eyes are

  • ccupied elsewhere. “

“Sounds are generated when materials interact, and the sounds tell us whether they are hitting, sliding, breaking, tearing, crumbling, or bouncing. “ “Moreover, sounds differ according to the characteristics of the objects, according to their size, solidity, mass, tension, and material. “

Don Norman, “The Design of Everyday Things”, p.103

slide-6
SLIDE 6

Why Natural Sound is Important

Sound Producing Event [Gaver, 1993]

slide-7
SLIDE 7

Why should a robot use acoustic information?

Human environments are cluttered with

  • bjects that generate sounds

Help a robot perceive events and objects

  • utside of field of view

Help a robot perceive material properties of

  • bjects, and form natural object categories
slide-8
SLIDE 8

What is Sound?

slide-9
SLIDE 9

What is Sound?

slide-10
SLIDE 10

What is Sound?

slide-11
SLIDE 11

What is Sound?

….from a computer's point of view, raw audio is a sequence of 44.1K floating point numbers arriving each second

slide-12
SLIDE 12
slide-13
SLIDE 13

Sine Wave

[http://www.audiophilejournal.com/what-is-a-hz-or-hertz-in-audio/]

slide-14
SLIDE 14

Sine Curve

[http://clem.mscd.edu/~talmanl/HTML/SineCurve.html]

slide-15
SLIDE 15

Frequency

  • Measured in Hertz (Hz)
  • Named after Heinrich Hertz
  • 1 Hertz = 1 repetition per second
  • Typically denoted with the letter f
slide-16
SLIDE 16

Period

  • How long does one cycle take?
  • It is the reciprocal of the frequency
  • Measured in seconds
  • Typically denoted with the letter T
slide-17
SLIDE 17

Frequency vs Period Animation

[http://en.wikipedia.org/wiki/Frequency]

slide-18
SLIDE 18

Frequency vs Period

slide-19
SLIDE 19

Amplitude (vertical stretch)

[http://www.sparknotes.com/math/trigonometry/graphs/section4.rhtml]

3 sin(x)

slide-20
SLIDE 20

Frequency (horizontal stretch)

[http://www.sparknotes.com/math/trigonometry/graphs/section4.rhtml]

slide-21
SLIDE 21

What is the Period and the Amplitude?

[http://www.sparknotes.com/math/trigonometry/graphs/problems_3.html]

slide-22
SLIDE 22

What is the Period and the Amplitude?

[http://www.sparknotes.com/math/trigonometry/graphs/problems_3.html]

slide-23
SLIDE 23

Sines vs Cosines

[http://en.wikipedia.org/wiki/Sine_wave]

slide-24
SLIDE 24

Formula for the Sine Wave

slide-25
SLIDE 25

Formula for the Sine Wave

  • A, the amplitude, is the peak deviation of the function

from its center position.

  • ω, the angular frequency, specifies how many
  • scillations occur in a unit time interval, in radians per

second

  • φ, the phase, specifies where in its cycle the oscillation

begins at t = 0.

slide-26
SLIDE 26

A function x(t) is periodic if we can find a T for which the following hold

slide-27
SLIDE 27

Sinusoidal waves of various frequencies

High Frequency Low Frequency

[http://en.wikipedia.org/wiki/Frequency]

slide-28
SLIDE 28

Spectrum

[http://en.wikipedia.org/wiki/Spectrum]

slide-29
SLIDE 29

Light Spectrum

[http://en.wikipedia.org/wiki/Frequency]

slide-30
SLIDE 30

[http://en.wikipedia.org/wiki/Spectrum_allocation]

slide-31
SLIDE 31

Standing Wave

(shown in black, equal to the sum of the red and the blue waves traveling in opposite directions)

[http://en.wikipedia.org/wiki/Wavelength]

slide-32
SLIDE 32

Fourier Series

A Fourier series decomposes periodic functions or periodic signals into the sum of a (possibly infinite) set of simple oscillating functions, namely sines and cosines

slide-33
SLIDE 33

Approximation

[http://en.wikipedia.org/wiki/Fourier_series]

slide-34
SLIDE 34

Approximation

slide-35
SLIDE 35

Discrete Fourier Transform

. . . .

slide-36
SLIDE 36

Discrete Fourier Transform

slide-37
SLIDE 37

Discrete Fourier Transform

Frequency bin Time

slide-38
SLIDE 38

Research Question

Can the DFT be used by a robot to perceive

  • bjects and their properties using sound?
slide-39
SLIDE 39

Research Question

Can the DFT be used by a robot to perceive

  • bjects and their properties using sound?

How should the robot associate a particular sound with an object?

slide-40
SLIDE 40

Object Exploration by a Robot

slide-41
SLIDE 41

Object Exploration by a Robot

slide-42
SLIDE 42

Objects

[Sinapov, Weimer, and Stoytchev, ICRA 2009]

slide-43
SLIDE 43

Behaviors

Grasp: Shake: Drop: Push: Tap:

slide-44
SLIDE 44

Recognition Video

slide-45
SLIDE 45

Audio Feature Extraction

Behavior Execution: WAV file recorded: Discrete Fourier Transform:

slide-46
SLIDE 46
  • 1. Training a self-organizing map (SOM) using DFT column vectors:

Audio Feature Extraction

slide-47
SLIDE 47
  • 2. Use SOM to convert DFT spectrogram to a sequence:

Audio Feature Extraction

slide-48
SLIDE 48
  • 2. Use SOM to convert DFT spectrogram to a sequence:

Si: (3,2) ->

Audio Feature Extraction

slide-49
SLIDE 49
  • 2. Use SOM to convert DFT spectrogram to a sequence:

Si: (3,2) -> (2,2) ->

Audio Feature Extraction

slide-50
SLIDE 50
  • 2. Use SOM to convert DFT spectrogram to a sequence:

Si: (3,2) -> (2,2) -> (4,4) -> ….

Audio Feature Extraction

slide-51
SLIDE 51
  • 1. Training a self-organizing map

(SOM) using column vectors:

  • 2. Discretization of a DFT of a

sound using a trained SOM

is the sequence of activated SOM nodes

  • ver the duration of the sound

Audio Feature Extraction

slide-52
SLIDE 52

Si

Object Recognition Model Behavior Recognition Model Sound Sequence: drop

Model predictions:

Problem Formulation

slide-53
SLIDE 53

Recognition Model

  • k-NN: memory-based learning algorithm

? Test point With k = 3: 2 neighbors 1 neighbors

Therefore, Pr(red) = 0.66 Pr(blue) = 0.33

slide-54
SLIDE 54

Off-Line Evaluation

  • 10 trials performed with each of the 36
  • bjects with each of the 5 behaviors
  • A total of 1800 interactions, about 12

hours

  • 10 fold cross-validation
  • Performance Measure for object and

behavior recognition:t

slide-55
SLIDE 55

Evaluation Results

Chance accuracy = 2.7 %

slide-56
SLIDE 56

Evaluation Results

slide-57
SLIDE 57

Estimating Acoustic Object Similarity using Confusion Matrix

40 4 6 42 21 6 8 35 Predicted → Actual : similar : similar : different : different

slide-58
SLIDE 58

Full Confusion Matrix for all 36 objects:

i n v e r t

ISOMAP ISOMAP Hierarchical Clustering Hierarchical Clustering

slide-59
SLIDE 59
slide-60
SLIDE 60

(mostly) metal

  • bjects

(mostly) metal

  • bjects

Objects with contents inside Objects with contents inside Balls Balls Paper Objects Paper Objects Plastic Objects Plastic Objects (mostly) wooden

  • bjects

(mostly) wooden

  • bjects
slide-61
SLIDE 61

Recognizing the sounds of objects manipulated by other agents

slide-62
SLIDE 62

Recognizing the sounds of objects manipulated by other agents

slide-63
SLIDE 63

Further Reading

  • Sinapov, J., Wiemer, M., and Stoytchev, A. (2008).

Interactive Learning of the Acoustic Properties of Objects by a Robot. In proceedings of the "Robot Manipulation: Intelligence in Human Environments" workshop held at the Robotics Science and System Conference, 2008.

  • Sinapov, J., Wiemer, M., and Stoytchev, A. (2009).

Interactive Learning of the Acoustic Properties of Household Objects. In proceedings of the 2009 IEEE International Conference on Robotics and Automation (ICRA).

slide-64
SLIDE 64

Discussion

  • What kind of sounds should our mobile

robots pay attention to?

  • What would auditory perception allow

them to do that they currently cannot?

slide-65
SLIDE 65

THE END