Josh McDermott Dept. of Brain and Cognitive Sciences, MIT May 6, - - PowerPoint PPT Presentation

▶

Aug 12, 2023 106 likes •214 views

Josh McDermott Dept. of Brain and Cognitive Sciences, MIT May 6, 2015 NSF Speech Technology Workshop My research group: Laboratory for Computational Audition Psychology Neuroscience Engineering Experiments Auditory Machine in humans

SLIDE 1

Josh McDermott

Dept. of Brain and Cognitive Sciences, MIT

May 6, 2015 NSF Speech Technology Workshop

SLIDE 2

Experiments in humans Auditory neuroscience Machine algorithms Psychology Neuroscience Engineering My research group: Laboratory for Computational Audition

We study auditory scene analysis and sound recognition
Contact with speech technology through assistive devices

and machine intelligence

Funded by McDonnell Foundation and NSF

SLIDE 3

Recent approach in our lab: train deep convolutional neural networks on speech tasks, compare representations to brain

So far: word recognition, speaker identification in noise
CNN performs about as well as humans
Can use CNN as a hypothesis about neural representation

SLIDE 4

CNN layer

shallow  deep

Primary auditory cortex Speech- selective cortex Ability of shallow vs. deep CNN layers to predict brain responses provides insights into computational complexity:

SLIDE 5

Using speech analysis/synthesis to manipulate grouping cues:

STRAIGHT

decomposes speech into excitation and filtering.

Excitation modeled

sinusoidally

Altered to inharmonic,
r replaced with noise

to simulate whispering:

Do these manipulations

affect ability to segregate speech? joint work with Kawahara & Ellis

SLIDE 6

Single Word Word Pairs 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Mean # Correct Words Harmonic Jittered Whispered

Task: “WORD” or “WORD 1” + “WORD 2” Type in all the words you hear.

Single word recognition

similar for all conditions.

For word pairs,

recognition worse for inharmonic than harmonic speech, suggestive of effect on segregation.

But much larger effect
f whispering.
Potentially suggestive of

importance of sparsity.

SLIDE 7

Dry Reverberant Reverberation profoundly distorts sound signals: Percent Errors Problem for machine speech recognition: Reverberation is also a challenge for hearing- impaired listeners.

SLIDE 8

Characterizing the distribution of real-world reverberation

What is the empirical distribution

f environmental impulse

responses?

Broadcast fixed source signal
Record resulting reverberant

signal

From this, infer environmental IR

IR Measurement IR Survey

24 text messages/day
Phone returns GPS coordinates
Participants reply to text with

photo, address

SLIDE 9

10 1 2 3 4 5 6 Mean subband RT60 (s) Frequency asymmetry (skew of subband RT60)

271 IRs from 301 surveyed locations

1st quartile 4th quartile

Survey KEMAR HATS 8m

Everyday impulse responses are pretty stereotyped

Exponential decay
Faster at high frequencies
Exaggerated asymmetry in

large rooms

Suggests prior for

dereverberation…

SLIDE 10

Challenges to Impacting Technology

Lack of large high-quality labeled data sets in some domains
Emotional speech
Environmental sounds
Cultural divides between neuroscience and engineering
Different meetings, departments, jargon, funders
Possibly getting worse?
Workshops help, particularly if students have access