Speech Processing 15-492/18-492 Speech Recognition Signal - - PowerPoint PPT Presentation

▶

Nov 26, 2023 327 likes •501 views

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech (sound) is analog Speech (sound) is analog Computers are digital Computers are digital We need to convert We need to convert

SLIDE 1

Speech Processing 15-492/18-492

Speech Recognition Signal Processing

SLIDE 2

Analog to Digital

Speech (sound) is analog

Speech (sound) is analog

Computers are digital

Computers are digital

  We need to convert

We need to convert

Sample from A

Sample from A-

D converter

D converter

N times a second

N times a second

How many times a second?

How many times a second?

SLIDE 3

Goals of Signal Processing

Distinguish between phonetic types

Distinguish between phonetic types

Be invariant to channel/room conditions

Be invariant to channel/room conditions

Be invariant to speaker characteristics

Be invariant to speaker characteristics

Computational efficiency

Computational efficiency

SLIDE 4

Time vs Frequency Domain

Human ear distinguishes frequencies

Human ear distinguishes frequencies

Initial ASR used time domain features

Initial ASR used time domain features

Power

Power

Zero crossings (sort of frequency)

Zero crossings (sort of frequency)

SLIDE 5

Source Filter Model

Pulse Noise Filter Vocal Track Model Pitch Voiced Unvoiced

SLIDE 6

Time domain Signal

SLIDE 7

Waveform Representation

SLIDE 8

Speech Spectragram

SLIDE 9

/iy/ vs /ae/

“beat” /b iy t/ and “bat” /b ae t/

SLIDE 10

Frequency Domain

“pencils” /p eh n s ih l z/

SLIDE 11

Frequency Domain

“beats pits” / b iy t s p ih t s /

SLIDE 12

Speech Analysis

SLIDE 13

Standard Parameterization

Split waveform into “frames”

Split waveform into “frames”

Advance every 10ms

Advance every 10ms

Size around 25ms (overlapping frames)

Size around 25ms (overlapping frames)

Window them

Window them

Perform FFT/Mel

Perform FFT/Mel Cepstral Cepstral analysis analysis

Find Deltas (difference from previous)

Find Deltas (difference from previous)

Find Delta Deltas (difference in delta)

Find Delta Deltas (difference in delta)

SLIDE 14

Summary

Time domain

Time domain vs vs Frequency domain Frequency domain

Parameterization of speech

Parameterization of speech

Frequency domain

Frequency domain

Short term

Short term FFTs FFTs

FFT vs vs MEL MEL Cepstrum Cepstrum

SLIDE 15