Josh McDermott Dept. of Brain and Cognitive Sciences, MIT May 6, - - PowerPoint PPT Presentation

josh mcdermott dept of brain and cognitive sciences mit
SMART_READER_LITE
LIVE PREVIEW

Josh McDermott Dept. of Brain and Cognitive Sciences, MIT May 6, - - PowerPoint PPT Presentation

Josh McDermott Dept. of Brain and Cognitive Sciences, MIT May 6, 2015 NSF Speech Technology Workshop My research group: Laboratory for Computational Audition Psychology Neuroscience Engineering Experiments Auditory Machine in humans


slide-1
SLIDE 1

Josh McDermott

  • Dept. of Brain and Cognitive Sciences, MIT

May 6, 2015 NSF Speech Technology Workshop

slide-2
SLIDE 2

Experiments in humans Auditory neuroscience Machine algorithms Psychology Neuroscience Engineering My research group: Laboratory for Computational Audition

  • We study auditory scene analysis and sound recognition
  • Contact with speech technology through assistive devices

and machine intelligence

  • Funded by McDonnell Foundation and NSF
slide-3
SLIDE 3

Recent approach in our lab: train deep convolutional neural networks on speech tasks, compare representations to brain

  • So far: word recognition, speaker identification in noise
  • CNN performs about as well as humans
  • Can use CNN as a hypothesis about neural representation
slide-4
SLIDE 4

CNN layer

shallow  deep

Primary auditory cortex Speech- selective cortex Ability of shallow vs. deep CNN layers to predict brain responses provides insights into computational complexity:

slide-5
SLIDE 5

Using speech analysis/synthesis to manipulate grouping cues:

  • STRAIGHT

decomposes speech into excitation and filtering.

  • Excitation modeled

sinusoidally

  • Altered to inharmonic,
  • r replaced with noise

to simulate whispering:

  • Do these manipulations

affect ability to segregate speech? joint work with Kawahara & Ellis

slide-6
SLIDE 6

Single Word Word Pairs 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Mean # Correct Words Harmonic Jittered Whispered

Task: “WORD” or “WORD 1” + “WORD 2” Type in all the words you hear.

  • Single word recognition

similar for all conditions.

  • For word pairs,

recognition worse for inharmonic than harmonic speech, suggestive of effect on segregation.

  • But much larger effect
  • f whispering.
  • Potentially suggestive of

importance of sparsity.

slide-7
SLIDE 7

Dry Reverberant Reverberation profoundly distorts sound signals: Percent Errors Problem for machine speech recognition: Reverberation is also a challenge for hearing- impaired listeners.

slide-8
SLIDE 8

Characterizing the distribution of real-world reverberation

What is the empirical distribution

  • f environmental impulse

responses?

  • Broadcast fixed source signal
  • Record resulting reverberant

signal

  • From this, infer environmental IR

IR Measurement IR Survey

  • 24 text messages/day
  • Phone returns GPS coordinates
  • Participants reply to text with

photo, address

slide-9
SLIDE 9

10

  • 2

10

  • 1

10

  • 1

1 2 3 4 5 6 Mean subband RT60 (s) Frequency asymmetry (skew of subband RT60)

271 IRs from 301 surveyed locations

1st quartile 4th quartile

Survey KEMAR HATS 8m

Everyday impulse responses are pretty stereotyped

  • Exponential decay
  • Faster at high frequencies
  • Exaggerated asymmetry in

large rooms

  • Suggests prior for

dereverberation…

slide-10
SLIDE 10

Challenges to Impacting Technology

  • Lack of large high-quality labeled data sets in some domains
  • Emotional speech
  • Environmental sounds
  • Cultural divides between neuroscience and engineering
  • Different meetings, departments, jargon, funders
  • Possibly getting worse?
  • Workshops help, particularly if students have access