Speech Processing 15-492/18-492 Speech Recognition Signal - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Speech Recognition Signal - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech (sound) is analog Speech (sound) is analog Computers are digital Computers are digital We need to convert We need to convert


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Speech Recognition Signal Processing

slide-2
SLIDE 2

Analog to Digital

  • Speech (sound) is analog

Speech (sound) is analog

  • Computers are digital

Computers are digital

  We need to convert

We need to convert

  • Sample from A

Sample from A-

  • D converter

D converter

  • N times a second

N times a second

  • How many times a second?

How many times a second?

slide-3
SLIDE 3

Goals of Signal Processing

  • Distinguish between phonetic types

Distinguish between phonetic types

  • Be invariant to channel/room conditions

Be invariant to channel/room conditions

  • Be invariant to speaker characteristics

Be invariant to speaker characteristics

  • Computational efficiency

Computational efficiency

slide-4
SLIDE 4

Time vs Frequency Domain

  • Human ear distinguishes frequencies

Human ear distinguishes frequencies

  • Initial ASR used time domain features

Initial ASR used time domain features

  • Power

Power

  • Zero crossings (sort of frequency)

Zero crossings (sort of frequency)

slide-5
SLIDE 5

Source Filter Model

Pulse Noise Filter Vocal Track Model Pitch Voiced Unvoiced

slide-6
SLIDE 6

Time domain Signal

slide-7
SLIDE 7

Waveform Representation

slide-8
SLIDE 8

Speech Spectragram

slide-9
SLIDE 9

/iy/ vs /ae/

  • “beat” /b iy t/ and “bat” /b ae t/
slide-10
SLIDE 10

Frequency Domain

  • “pencils” /p eh n s ih l z/
slide-11
SLIDE 11

Frequency Domain

  • “beats pits” / b iy t s p ih t s /
slide-12
SLIDE 12

Speech Analysis

slide-13
SLIDE 13

Standard Parameterization

  • Split waveform into “frames”

Split waveform into “frames”

  • Advance every 10ms

Advance every 10ms

  • Size around 25ms (overlapping frames)

Size around 25ms (overlapping frames)

  • Window them

Window them

  • Perform FFT/Mel

Perform FFT/Mel Cepstral Cepstral analysis analysis

  • Find Deltas (difference from previous)

Find Deltas (difference from previous)

  • Find Delta Deltas (difference in delta)

Find Delta Deltas (difference in delta)

slide-14
SLIDE 14

Summary

  • Time domain

Time domain vs vs Frequency domain Frequency domain

  • Parameterization of speech

Parameterization of speech

  • Frequency domain

Frequency domain

  • Short term

Short term FFTs FFTs

  • FFT

FFT vs vs MEL MEL Cepstrum Cepstrum

slide-15
SLIDE 15