COMP 546 Lecture 22 Spectrograms (revisited), Auditory filters - - PowerPoint PPT Presentation

β–Ά
comp 546
SMART_READER_LITE
LIVE PREVIEW

COMP 546 Lecture 22 Spectrograms (revisited), Auditory filters - - PowerPoint PPT Presentation

COMP 546 Lecture 22 Spectrograms (revisited), Auditory filters Thurs. April 5, 2018 1 Spectrogram Partition a sound signal into blocks of samples each (i.e. the sound has samples in total). 2 Spectrogram Partition a sound


slide-1
SLIDE 1

1

COMP 546

Lecture 22 Spectrograms (revisited), Auditory filters

  • Thurs. April 5, 2018
slide-2
SLIDE 2

Spectrogram

2

Partition a sound signal into 𝐢 blocks of π‘ˆ samples each (i.e. the sound has πΆπ‘ˆ samples in total).

slide-3
SLIDE 3

Spectrogram

3

Partition a sound signal into 𝐢 blocks of π‘ˆ samples each (i.e. the sound has πΆπ‘ˆ samples in total). Take the Fourier transform of each block. Let 𝑐 be the block number, and πœ• units be cycles per block. [I will convert πœ• to cycles per second a few slides from now.]

slide-4
SLIDE 4

4

cycles per block Block number : 2 1 1 2 3 ….

π‘ˆ 2

𝑐

slide-5
SLIDE 5

5

cycles per second Block number 𝑐 : 2πœ•0 πœ•0 1 2 3 ….

π‘ˆ 2 πœ•0

πœ•0 units are

π‘π‘šπ‘π‘‘π‘™π‘‘ 𝑑𝑓𝑑

πœ• units are

π‘‘π‘§π‘‘π‘šπ‘“π‘‘ 𝑑𝑓𝑑

=

π‘‘π‘§π‘‘π‘šπ‘“π‘‘ π‘π‘šπ‘π‘‘π‘™ βˆ— π‘π‘šπ‘π‘‘π‘™π‘‘ 𝑑𝑓𝑑

πœ• in

slide-6
SLIDE 6

6

cycles per second time (sec) : 2πœ•0 πœ•0

1 πœ•0 2 πœ•0 3 πœ•0

…

𝑐 πœ•0

π‘ˆ 2 πœ•0

πœ•0 units are

π‘π‘šπ‘π‘‘π‘™π‘‘ 𝑑𝑓𝑑

πœ• in

1 πœ•0 units are 𝑑𝑓𝑑 π‘π‘šπ‘π‘‘π‘™

slide-7
SLIDE 7

7

cycles per second time (sec) : 2πœ•0 πœ•0

π‘ˆ 2 πœ•0

High quality audio: 44,100 samples/sec

πœ• in

1 πœ•0 units are 𝑑𝑓𝑑 π‘π‘šπ‘π‘‘π‘™

Multiply by 44,100 samples/sec to get π‘ˆ samples per block.

1 πœ•0 2 πœ•0 3 πœ•0

…

𝑐 πœ•0

slide-8
SLIDE 8

8

e.g. T = 512 samples (12 ms), πœ•0 = 86 Hz T = 2048 samples (48 ms), πœ•0 = 21 Hz

You cannot have high precision of both frequency and time.

t t

slide-9
SLIDE 9

9

Narrowband

(good frequency resolution, poor temporal resolution … ~48ms)

Wideband

(poor frequency resolution, good temporal resolution … ~12 ms)

slide-10
SLIDE 10

10

Example: Wideband spectrograms of 10 vowel sounds

formants

slide-11
SLIDE 11

11

Spectrogram time scales capture auditory events in the world (e.g. parts of speech, impacts, …) at relatively large time scales. e.g. period of 12 ms, πœ•0 = 86 Hz, πœ‡ ~ 4 meters

These low frequencies play little role in spatial hearing (last lecture).

slide-12
SLIDE 12

12

What are the impulse response functions of auditory filters? (durations, bandwidths and center frequencies)

slide-13
SLIDE 13

Auditory filters

  • head related impulse response
  • basilar membrane
  • hair cells and ganglion cells in cochlea
  • brainstem e.g. MSO, LSO
  • cortex A1 (later today … larger time scales)

13

http://www.neurosci.info/courses/systems/Nobels/1961%20von%20Bekesy/bekesy-lecture.pdf

slide-14
SLIDE 14

Auditory filters

Classical experiments used pure tones and/or noise. (starting in 1940’s and going for 50 years)

  • recording from single cells

(BM, nerve fibres in cochear nerve, brainstem)

  • psychophysics e.g. masking

14

slide-15
SLIDE 15

Example: Frequency tuning curves (thresholds) for different ganglion cells to pure tone stimuli

15

slide-16
SLIDE 16

Psychophysical Masking

How does presence of one frequency component affect our ability to hear other frequency components? Two similar frequencies mask each other more than two different frequencies.

16

slide-17
SLIDE 17

Example Masking Experiment

17

time

πœ•π‘’π‘“π‘‘π‘’ πœ•π‘›π‘π‘‘π‘™

Interval 1 interval 2

Task: Which interval contains the test tone?

slide-18
SLIDE 18

18

For each test frequency πœ•0 with some given SPL, For each masking frequency πœ•π‘ Measure a masking threshold 𝐽𝑁(πœ•π‘) Define β€œcritical bandwidth” for πœ•0 by βˆ†πœ•.

πœ•0

πœ•π‘ 𝐽𝑁

(Masking Threshold)

βˆ†πœ•

slide-19
SLIDE 19

19

0 1000 2000 3000 4000 …. 22,000

Auditory filters: typical bandwidth model

Ξ”πœ• is ~100 Hz for center frequency up to 1000 Hz. Ξ”πœ• is ~ 1/3 octave from 1000 Hz up to 22, 000 Hz.

Ξ”πœ•

slide-20
SLIDE 20

Gammatone filter model

20

Similar to Gabor filters but window is asymmetric. (Also, note shifted in time to enforce causality.)

10000 5000 3000 1000 700 400

center frequency

slide-21
SLIDE 21

Auditory filters

  • head related impulse response
  • basilar membrane
  • hair cells and ganglion cells in cochlea
  • brainstem e.g. MSO, LSO
  • cortex (A1 and beyond)

21

slide-22
SLIDE 22

V1: recall Hubel and Wiesel (1962)

22

Such a stimulus works well if you already know the cell is

  • rientation and motion selective.
slide-23
SLIDE 23

23

y Q: What to do if you don’t know anything about the receptive field? A: Compute β€œspike triggered average”.

slide-24
SLIDE 24

24

Use random input (often white noise). What is the average spatio- temporal stimulus that preceded the spikes?

e.g. XT illustration

x

= β€˜spike triggered average’

slide-25
SLIDE 25

25

[DeAngeles 1995]

Real data for V1 receptive field (XYT) Spike triggered average stimulus (backwards in time). Spike at t=0.

Negative Positive

slide-26
SLIDE 26

Auditory Cortex Receptive Fields

Inputs to A1 and have been spectrally bandpass filtered. There is ~ no more phase locking to stimulus sound.

26

slide-27
SLIDE 27

27

Spike histograms of auditory nerve fibres (cat) with different peak (β€œcharacteristic”) frequency sensitivities. [Delgotte 1997] Spectrogram of voice saying β€œJoe took father’s green shoe bench out”.

Example of responses of 8 auditory nerve fibres to a voice sound

slide-28
SLIDE 28

28

What stimuli to use? (Cats don’t understand human speech, so it unlikely we would find cells tuned for it.) Recall Hubel and Wiesel had first tried using center- surround stimuli for cells in V1. The analogy in audition would be to use the same bandpass stimuli used for auditory fibres. Any other ideas?

slide-29
SLIDE 29

29

[deCharms, 1998]

frequency 𝝏

Random β€œchord” stimuli

slide-30
SLIDE 30

30

πœ• 𝑒

+

What spike triggered average should we expect from a bandpass cell ?

slide-31
SLIDE 31

31

πœ• 𝑒

+

Do we find more interesting cells such as… ?

  • +
  • πœ•

𝑒 πœ• 𝑒

slide-32
SLIDE 32

32

Examples: Spectro-temporal receptive fields of A1 neurons

[de Charms, 1998]

slide-33
SLIDE 33

33

Orientation πœ•, 𝑒 selective ?

Verify the responses of the above cell to a tone and its harmonics, changing over time:

slide-34
SLIDE 34

34

ASIDE: Two Applications

slide-35
SLIDE 35

35

Microphone + speech/sound processor Electrode array (inserted into cochlea) Cochlear implants are used for profoundly deaf people whose hair cells destroyed by disease but auditory nerve is intact.

slide-36
SLIDE 36

36

MP3: Data Compression

Simultaneous masking: what I mentioned earlier Forward masking: Sound at time t can mask sound at time t + Δ𝑒 and nearby frequency bands, even if Δ𝑒 is greater than auditory (gammatone) filter. In both cases, you can use fewer bits to code sound and listeners won’t notice.