first experiments in audio video features for phoneme
play

First experiments in audio/video features for phoneme recognition - PowerPoint PPT Presentation

First experiments in audio/video features for phoneme recognition Petr Motl cek FIT VUT Brno, motlicek@fit.vutbr.cz M4 meeting in Prague, January 22nd - 23rd 2004 Introduction Data: M4 IDIAP, 41 min., audio-video data (training,


  1. First experiments in audio/video features for phoneme recognition Petr Motl´ ıˇ cek FIT VUT Brno, motlicek@fit.vutbr.cz M4 meeting in Prague, January 22nd - 23rd 2004

  2. Introduction • Data: M4 – IDIAP, 41 min., audio-video data (training, testing). • Labels: 47 phoneme categories, obtained by forced alignment (models on ICSI data, adapted on M4 data). • Audio: Beam-formed recordings, 16kHz. • Video: Cut off head positions.

  3. Audio preprocessing • F s = 16kHz, frame-rate 100Hz, 20ms long frames of MFB log energies. Video preprocessing • Frame-rate 25Hz, RGB frames 70x70 points.

  4. Bimodal speech recognition system Audio signal Acoustic features Acoustic parameterization (16kHz) (23 dim., 100Hz) −−−> time Acoustic−visual features Feature Neural Net fusion (39ddim., 100Hz) 10 20 30 Visual signal Visual features Visual features Visual 40 50 Interpolation 60 Recognition results parameterization 70 (25Hz) (16 dim., 25Hz) (16 dim., 25Hz) 80 90 100 110 10 20 30 40 50 60 70 Gray Edge 2D−cross Resize scalization calculation correlation Maximum Square cropping 2D − DCT LPF

  5. Recognition results - Accuracy Acoustic [%] Visual [%] Acoustic-Visual [%] Phonemes 31.05 12.15 31.33 VAD 94.04 83.79 94.12 - 0 (83%) 96.86 99.62 96.89 - 1 (17%) 79.32 1.44 79.71

  6. Problems & Current focus • More data for acoustic-visual experiments. • Incorporation of robust mouth detection algorithm. • Compensation algorithms to reduce lighting variations, rotation, . . . . • LDA - to reduce dimensionality and improve discrimination among the speech classes.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend