COMP 150: Developmental Robotics Instructor: Jivko Sinapov - - PowerPoint PPT Presentation

comp 150 developmental robotics
SMART_READER_LITE
LIVE PREVIEW

COMP 150: Developmental Robotics Instructor: Jivko Sinapov - - PowerPoint PPT Presentation

COMP 150: Developmental Robotics Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov Audio Processing and Computational Perception of Natural Sound Project Deadlines Project Presentations: Dec 5 and 7 Final Report + Deliverables:


slide-1
SLIDE 1

COMP 150: Developmental Robotics

Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov

slide-2
SLIDE 2

Audio Processing and Computational Perception

  • f Natural Sound
slide-3
SLIDE 3

Project Deadlines

  • Project Presentations: Dec 5 and 7
  • Final Report + Deliverables: Dec 1
  • Deliverables:

– Presentation slides + videos – Final Report (PDF) – Source code (link to github repositories)

slide-4
SLIDE 4

Undergraduate Research

  • Undergraduate research assistant

positions available in my lab

  • 6-10 hours a week
  • Paid or for credit
  • Email me if interested
slide-5
SLIDE 5

Summer Internships in Robotics

  • Aurora Flight Sciences is hiring summer

interns in Robotics, AI, ML

  • Email me if interested and I’ll forward you

the email to which to respond

slide-6
SLIDE 6

Additional robotics internships

  • Rethink robotics
  • Toyota Research Institute
  • ….
slide-7
SLIDE 7

Tufts Summer Scholar Program

  • Tufts offers summer scholarships and

stipends for undergraduates to stay and engage in research

  • Google “tufts summer scholar” to find out

more

  • Deadline to apply: March 2nd
slide-8
SLIDE 8

Why Sound?

slide-9
SLIDE 9
slide-10
SLIDE 10

Why Sound?

What actually happened: The robot dropped a soda-can

slide-11
SLIDE 11

Why Natural Sound is Important

“…natural sound is as essential as visual information because sound tells us about things that we can't see, and it does so while our eyes are

  • ccupied elsewhere. “

“Sounds are generated when materials interact, and the sounds tell us whether they are hitting, sliding, breaking, tearing, crumbling, or bouncing. “ “Moreover, sounds differ according to the characteristics of the objects, according to their size, solidity, mass, tension, and material. “

Don Norman, “The Design of Everyday Things”, p.103

slide-12
SLIDE 12

Why Natural Sound is Important

Sound Producing Event [Gaver, 1993]

slide-13
SLIDE 13

Types of Listening

  • Musical listening:

– Pitch, timbre, tempo, masking, loudness

  • Everyday listening:

– Directly perceiving the event and its structural properties (e.g., a big-engine car driving up behind you)

slide-14
SLIDE 14

“The distinction between everyday and musical listening is between experiences, not sounds”

slide-15
SLIDE 15

What do we hear?

“… sound provides information about an interaction of materials at a location in an

  • environment. We can hear an approaching

automobile, its size, and its speed. We can hear where it is and how fast it is

  • approaching. And we can hear the narrow,

echoing walls of the alley it is driving along.”

slide-16
SLIDE 16

Why should a robot use acoustic information?

Human environments are cluttered with

  • bjects that generate sounds

Help a robot perceive events and objects

  • utside of field of view

Help a robot perceive material properties of

  • bjects, and form natural object categories
slide-17
SLIDE 17

What is Sound?

slide-18
SLIDE 18

What is Sound?

slide-19
SLIDE 19

What is Sound?

slide-20
SLIDE 20

What is Sound?

….from a computer's point of view, raw audio is a sequence of 44.1K floating point numbers arriving each second

slide-21
SLIDE 21
slide-22
SLIDE 22

Sine Wave

[http://www.audiophilejournal.com/what-is-a-hz-or-hertz-in-audio/]

slide-23
SLIDE 23

Sine Curve

[http://clem.mscd.edu/~talmanl/HTML/SineCurve.html]

slide-24
SLIDE 24

Frequency

  • Measured in Hertz (Hz)
  • Named after Heinrich Hertz
  • 1 Hertz = 1 repetition per second
  • Typically denoted with the letter f
slide-25
SLIDE 25

Period

  • How long does one cycle take?
  • It is the reciprocal of the frequency
  • Measured in seconds
  • Typically denoted with the letter T
slide-26
SLIDE 26

Frequency vs Period Animation

[http://en.wikipedia.org/wiki/Frequency]

slide-27
SLIDE 27

Frequency vs Period

slide-28
SLIDE 28

Amplitude (vertical stretch)

[http://www.sparknotes.com/math/trigonometry/graphs/section4.rhtml]

3 sin(x)

slide-29
SLIDE 29

Frequency (horizontal stretch)

[http://www.sparknotes.com/math/trigonometry/graphs/section4.rhtml]

slide-30
SLIDE 30

What is the Period and the Amplitude?

[http://www.sparknotes.com/math/trigonometry/graphs/problems_3.html]

slide-31
SLIDE 31

What is the Period and the Amplitude?

[http://www.sparknotes.com/math/trigonometry/graphs/problems_3.html]

slide-32
SLIDE 32

Sines vs Cosines

[http://en.wikipedia.org/wiki/Sine_wave]

slide-33
SLIDE 33

Formula for the Sine Wave

slide-34
SLIDE 34

Formula for the Sine Wave

  • A, the amplitude, is the peak deviation of the function

from its center position.

  • ω, the angular frequency, specifies how many
  • scillations occur in a unit time interval, in radians per

second

  • φ, the phase, specifies where in its cycle the oscillation

begins at t = 0.

slide-35
SLIDE 35

A function x(t) is periodic if we can find a T for which the following hold

slide-36
SLIDE 36

Sinusoidal waves of various frequencies

High Frequency Low Frequency

[http://en.wikipedia.org/wiki/Frequency]

slide-37
SLIDE 37

Spectrum

[http://en.wikipedia.org/wiki/Spectrum]

slide-38
SLIDE 38

Light Spectrum

[http://en.wikipedia.org/wiki/Frequency]

slide-39
SLIDE 39

[http://en.wikipedia.org/wiki/Spectrum_allocation]

slide-40
SLIDE 40

Standing Wave

(shown in black, equal to the sum of the red and the blue waves traveling in opposite directions)

[http://en.wikipedia.org/wiki/Wavelength]

slide-41
SLIDE 41

Fourier Series

A Fourier series decomposes periodic functions or periodic signals into the sum of a (possibly infinite) set of simple oscillating functions, namely sines and cosines

slide-42
SLIDE 42

Approximation

[http://en.wikipedia.org/wiki/Fourier_series]

slide-43
SLIDE 43

Approximation

slide-44
SLIDE 44

Filtering

  • Low-pass filter

– passes only the low frequencies

  • High-pass filter

– passes only the high-frequencies

  • Band-Pass Filter

– passes only frequencies in a given range

slide-45
SLIDE 45

Band-Pass Filter

[http://en.wikipedia.org/wiki/Band-pass_filter]

slide-46
SLIDE 46

Discrete Fourier Transform

. . . .

slide-47
SLIDE 47

Discrete Fourier Transform

slide-48
SLIDE 48

Discrete Fourier Transform

Frequency bin Time

slide-49
SLIDE 49

Research Question

Can the DFT be used by a robot to perceive

  • bjects and their properties using sound?
slide-50
SLIDE 50

Research Question

Can the DFT be used by a robot to perceive

  • bjects and their properties using sound?

How should the robot associate a particular sound with an object?

slide-51
SLIDE 51

Object Exploration by a Robot

slide-52
SLIDE 52

Object Exploration by a Robot

slide-53
SLIDE 53

Objects

[Sinapov, Weimer, and Stoytchev, ICRA 2009]

slide-54
SLIDE 54

Behaviors

Grasp: Shake: Drop: Push: Tap:

slide-55
SLIDE 55

Audio Feature Extraction

Behavior Execution: WAV file recorded: Discrete Fourier Transform:

slide-56
SLIDE 56
  • 1. Training a self-organizing map (SOM) using DFT column vectors:

Audio Feature Extraction

slide-57
SLIDE 57
  • 2. Use SOM to convert DFT spectrogram to a sequence:

Audio Feature Extraction

slide-58
SLIDE 58
  • 2. Use SOM to convert DFT spectrogram to a sequence:

Si: (3,2) ->

Audio Feature Extraction

slide-59
SLIDE 59
  • 2. Use SOM to convert DFT spectrogram to a sequence:

Si: (3,2) -> (2,2) ->

Audio Feature Extraction

slide-60
SLIDE 60
  • 2. Use SOM to convert DFT spectrogram to a sequence:

Si: (3,2) -> (2,2) -> (4,4) -> ….

Audio Feature Extraction

slide-61
SLIDE 61
  • 1. Training a self-organizing map

(SOM) using column vectors:

  • 2. Discretization of a DFT of a

sound using a trained SOM

is the sequence of activated SOM nodes

  • ver the duration of the sound

Audio Feature Extraction

slide-62
SLIDE 62

Auditory SOM Auditory SOM Sequence Xi Sequence Yj Global Sequence Alignment Similarity

very similar

sim(Xi,Yj) = 0.89

Detecting Acoustic Similarity

slide-63
SLIDE 63

Detecting Acoustic Similarity

Auditory SOM Auditory SOM Sequence Xi Sequence Yj Global Sequence Alignment Similarity

sim(Xi,Yj) = 0.23

not similar

slide-64
SLIDE 64

Si

Object Recognition Model Behavior Recognition Model Sound Sequence: drop

Model predictions:

Problem Formulation

slide-65
SLIDE 65

Dimensionality Reduction using SOM Auditory Recognition Model

Object Probability Estimates Discrete Auditory Sequence Auditory Data

Acoustic Object Recognition

slide-66
SLIDE 66

Recognition Model

  • k-NN: memory-based learning algorithm

? Test point With k = 3: 2 neighbors 1 neighbors

Therefore, Pr(red) = 0.66 Pr(blue) = 0.33

slide-67
SLIDE 67

Recognition Model

  • SVM: discriminative learning algorithm
slide-68
SLIDE 68

Off-Line Evaluation

  • 10 trials performed with each of the 36
  • bjects with each of the 5 behaviors
  • A total of 1800 interactions, about 12

hours

  • 10 fold cross-validation
  • Performance Measure for object and

behavior recognition:t

slide-69
SLIDE 69

Evaluation Results

Chance accuracy = 2.7 %

slide-70
SLIDE 70

Evaluation Results

slide-71
SLIDE 71

Recognition Video

slide-72
SLIDE 72

Estimating Acoustic Object Similarity using Confusion Matrix

40 4 6 42 21 6 8 35 Predicted → Actual : similar : similar : different : different

slide-73
SLIDE 73

Full Confusion Matrix for all 36 objects:

i n v e r t

ISOMAP ISOMAP Hierarchical Clustering Hierarchical Clustering

slide-74
SLIDE 74
slide-75
SLIDE 75

(mostly) metal

  • bjects

(mostly) metal

  • bjects

Objects with contents inside Objects with contents inside Balls Balls Paper Objects Paper Objects Plastic Objects Plastic Objects (mostly) wooden

  • bjects

(mostly) wooden

  • bjects
slide-76
SLIDE 76

Recognizing the sounds of objects manipulated by other agents

slide-77
SLIDE 77

Recognizing the sounds of objects manipulated by other agents

slide-78
SLIDE 78

Using Sound to Learn About Containers

Griffith, S., Sinapov, J., Sukhoy, V., and Stoytchev, A. (2012) A Behavior-Grounded Approach to Forming Object Categories: Separating Containers from Non-Containers IEEE Transactions on Autonomous Mental Development, March 2012.

slide-79
SLIDE 79

Further Reading

  • Sinapov, J., Wiemer, M., and Stoytchev, A. (2008).

Interactive Learning of the Acoustic Properties of Objects by a Robot. In proceedings of the "Robot Manipulation: Intelligence in Human Environments" workshop held at the Robotics Science and System Conference, 2008.

  • Sinapov, J., Wiemer, M., and Stoytchev, A. (2009).

Interactive Learning of the Acoustic Properties of Household Objects. In proceedings of the 2009 IEEE International Conference on Robotics and Automation (ICRA).

slide-80
SLIDE 80

Discussion

  • What kind of sounds should our mobile

robots pay attention to?

  • What would auditory perception allow

them to do that they currently cannot?

slide-81
SLIDE 81

THE END