Speech recognition frontend on Cell BE Pavel Bazika - - PowerPoint PPT Presentation

speech recognition frontend on cell be
SMART_READER_LITE
LIVE PREVIEW

Speech recognition frontend on Cell BE Pavel Bazika - - PowerPoint PPT Presentation

IBM - CVUT Student Research Projects Speech recognition frontend on Cell BE Pavel Bazika (bazikp1@fel.cvut.cz) Speech recognizer Input speech is represented by samples Inner format is 25ms length frames FRONTEND speech comparison


slide-1
SLIDE 1

IBM - CVUT Student Research Projects

Speech recognition frontend

  • n Cell BE

Pavel Bazika (bazikp1@fel.cvut.cz)

slide-2
SLIDE 2 IBM - CVUT Student Research Projects 2

Speech recognizer

speech FRONTEND
  • preprocessing
  • feature extraction
comparison vocabulary

word probability

  • Input speech is represented by samples
  • Inner format is 25ms length frames
slide-3
SLIDE 3 IBM - CVUT Student Research Projects 3

Algorithms needed for speech recognition

  • Mean value subtraction
  • Preemphasis
  • Hamming window selection
  • FFT
  • Logarithm
  • Triangular filters
  • DCT

} cepstrum

slide-4
SLIDE 4 IBM - CVUT Student Research Projects 4

Speed of our algorithm

  • Four frames are computed at once
  • Cepstrum calculation of 25 ms length frame

for input sampling frequency 8 kHz takes 3,7 μs

  • One SPU can process

2700 speeches in realtime

slide-5
SLIDE 5 IBM - CVUT Student Research Projects 5

Cepstrum calculation comparison with Pentium 4

200 400 600 800 1000 1200 5000 10000 15000 20000 25000 30000 SPU F4S Pentium 4 Frame size Time [ns]
slide-6
SLIDE 6 IBM - CVUT Student Research Projects 6

Highlights

  • Optimized algorithms for SPU, dual-issue

used when possible

  • FFT for four streams of data implemented
  • Pentium 4 is slower in every algorithm
  • Faster FFT than FFTW with SSE2 enabled
  • Input samples are converted to inner format

in parallel with mean value computation