Primer on Auditory Processing Mounya Elhilali Department of - - PDF document

primer on auditory processing
SMART_READER_LITE
LIVE PREVIEW

Primer on Auditory Processing Mounya Elhilali Department of - - PDF document

10/20/20 Primer on Auditory Processing Mounya Elhilali Department of Electrical & Computer Engineering Johns Hopkins University mounya@jhu.edu 601.467/667 Introduction to Human Language Technology 1 Speech as waves 2 1 10/20/20 Sound


slide-1
SLIDE 1

10/20/20 1

Primer on Auditory Processing

Mounya Elhilali Department of Electrical & Computer Engineering Johns Hopkins University mounya@jhu.edu

601.467/667 Introduction to Human Language Technology

1

Speech as waves

2

slide-2
SLIDE 2

10/20/20 2

Sound is a wave

3

  • Sound is a mechanical wave caused by a vibrating source
  • The vibrating source that causes the matter around it to move
  • No sound is produced in a vacuum
  • Matter (air, water, earth) must be present
  • Individual air molecules do not move with

the wave. A given molecule vibrates back and forth about a fixed location.

3

High Low Normal

Sound Pressure Time

Sound waves

5

  • Motion air particles do not travel, they oscillate around a

point in space

  • The rate of oscillation is called frequency (f)

ü denoted in cycles per second (cps) or hertz (Hz).

Period of oscillation

5

slide-3
SLIDE 3

10/20/20 3

6

Physical Dimensions of Sound

Amplitude

  • Height of a cycle

Frequency (F)

  • Cycles per second

Wavelength (λ)

  • Distance traveled by
  • ne cycle

Period (T)

6

Perceptual dimensions of Sound

7

Physical Properties of Sound Perceptual Dimensions Amplitude/Intensity Loudness Frequency Pitch Complexity Timbre (frequency content & time) 7

slide-4
SLIDE 4

10/20/20 4

8

Sounds in the environment

Note: Listening to loud music will gradually damage your hearing!

8

Equal Loudness Curves/Contours

9

Each contour represents equally-perceived tones

Loudness (dB)

9

slide-5
SLIDE 5

10/20/20 5

Pitch

  • At first approximation, the pitch of

a simple periodic signal is determined by its frequency.

  • Most oscillators (guitar string,

vocal chords) naturally oscillate at a fundamental frequency (𝐺!) as well as its integer multiples (called harmonics/partials/overtones).

  • The pitch of a complex period

signal is often determined by its fundamental frequency (𝐺!)

10

1f 2f 1 octave 3f 4f 2 octaves 8f 3 octaves

10

a Pitch scale

  • Perceptual scale of pitch: mel scale
  • How far in frequency do we have to be in order to feel a tone as doubled in pitch?

11

ü Mel-scaling is used in signal processing to build filters that approximate human pitch perception (MFCC)

It’s a relative scale, based on pitch comparisons

11

slide-6
SLIDE 6

10/20/20 6

Masking

  • Hearing phenomenon
  • When the perception of one sound is affected by presence of

another sound

  • one sound being masked by another
  • Term masking is used to describe effects of noise and

interference in sound perception

  • We experience masking everyday

12

12

Masking

13

13

slide-7
SLIDE 7

10/20/20 7

How do we perceive sounds?

17

The auditory system

  • Two major components in the auditory system
  • The peripheral auditory organs (the ear)
  • Converts sounds pressure into mechanical vibration patterns, which

then are transformed into neural firings

  • The auditory nervous system (the brain)
  • Extracts perceptual information in various stages

18

18

slide-8
SLIDE 8

10/20/20 8

19

Auditory Pathway

19

20

20

slide-9
SLIDE 9

10/20/20 9

21

21

The ear

  • The ear is the organ of hearing
  • It changes sound pressure waves

from the outside world into a signal of nerve impulses sent to the brain.

  • It consists of 3 components:
  • Outer ear
  • Middle ear
  • Inner ear

22

22

slide-10
SLIDE 10

10/20/20 10

Organ of hearing

  • uter ear

– The external ear plays the role of an acoustic antenna, – It diffracts and focuses sound waves (pinna), while the ear canal acts as a resonator => amplifies sounds in 2-5 kHz range – The end of the canal has an eardrum which vibrates with sound

23

23 – Eardrum (or tympanic membrane) vibrations cause mechanical motion

  • f the small bones of the

middle ear (malleus, incus & stapes) [3 smallest bones in the human body] – The middle ear acts as an impedance adapter to adjust energy difference between air environment and fluid environment

Organ of hearing middle ear

24

24

slide-11
SLIDE 11

10/20/20 11

Organ of hearing inner ear

25

  • Cochlea translates

physical vibrations into electrical signals for the brain to process

  • Cochlea acts a

frequency analyzer

  • f sound signals

25

The Cochlea

  • The cochlea is the

inner ear organ that converts sound waves into neural signals.

  • The neural signals

are passed to the brain via the auditory nerve.

26

26

slide-12
SLIDE 12

10/20/20 12

Cochlea as frequency analyzer

28

28

32

32

slide-13
SLIDE 13

10/20/20 13

§ Very complex. Just some major pathways shown. § Extensive binaural interactions § General principle: ü Increasing complexity

  • f responses (like

vision, touch)

Ascending pathway

33

Ascending pathway

FUNCTION Identify and process complex sounds Principle relay to cortex Form full spatial map Locate sound sources in space Start sound feature processing Sound sensor / periphery

34

slide-14
SLIDE 14

10/20/20 14

Tonotopy

  • Tonotopic map:
  • topographic organization (spatial arrangement) of where

sound is processed

  • Derived from Greek tono/topos = place of tones
  • Most nuclei along auditory pathway from cochlea to

A1 are tonotopically organized (inherit cochleotopy from periphery)

35

35

Auditory tonotopy

  • Adjacent cells in A1 form a frequency-map, similar to

the one observed in the cochlea.

Cochlea A1

36

36

slide-15
SLIDE 15

10/20/20 15

Encoding speech modulation beyond the cochlea

NLL LL TB DCN PVCN AVCN

IC MGB

Range of Temporal modulations 300 Hz 3000 Hz

Fast Medium Slow

30 Hz Auditory nerve Midbrain Cortex

39

Speech carries information at multiple levels

  • Any speech signal can be separated into two signals.

40

=

Example of good decomposition… A non-trivial task

40

slide-16
SLIDE 16

10/20/20 16

Speech carries information at multiple levels

  • Any speech signal can be separated into two signals.
  • The envelope is the amplitude of the sound
  • The fine structure is the detailed waveform, without its envelope

41

41