esat psi speech hugo van hamme
play

ESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012 - PowerPoint PPT Presentation

ESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012 Leuven KU Leuven introduction 2 Leuven - university Associatie K.U.Leuven University + 12 university colleges 85000 students, 600 programs KU Leuven


  1. ESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012

  2. Leuven KU Leuven introduction 2

  3. Leuven - university • Associatie K.U.Leuven – University + 12 university colleges – 85000 students, 600 programs • KU Leuven – 35000 students 350 programs • Faculty of Engineering – 2000 Students (Ba+Ma), 60 programs • Department of Electrical Engineering (ESAT) - 150 Ma students, 6 programs 270 PhD students and postdocs … - - 35 FTE permanent staff Centre for Processing of Speech and Images (PSI) - 37 PhD students and postdocs - 8 FTE permanent staff - Speech research group - 12 PhD students and 1.4 postdocs - 5 Master students - 2.3 FTE permanent staff - Patrick Wambacq - Dirk Van Compernolle - Hugo Van hamme KU Leuven introduction 3

  4. ESAT/PSI-Speech research areas • noise robustness – speech enhancement – source separation – source localization • new paradigms for speech recognition – episodic models • build and consolidate digital infrastructure for the Dutch language • speaker properties (text-independent): ID, language, dialect, age, height • acoustic environment modeling – ADL recognition • zero-resource ASR - language acquisition by machines • speech assessment - education

  5. Speech assessment • Reading tutor (dyslexia) / trainer after CI fitting • Assess native (?) pronunciation, reading/respeak tracking • Children’s speech, hesitant, poorly articulated

  6. Zero-resource speech recognition Why ? • Assistive technologies: – people with limited fine motor control – alternative to scanning – cope with dysarthric voices • Huge inter-speaker variation • Timing, extraneous sounds • Dialects • Long-term: interacting with robots – “Fetch a Hoegaarden Grand Cru from the fridge” – “Get my red slippers” – “Open the garden window for me”

  7. What’s different ? • Learn acoustic model and language model from examples with noisy, high-level supervision information – Not like traditional ASR – Not like the zero-resource challenge (IS15) • Our first steps: – Home automation – “open the kitchen door”, “kitchen door open” – Learn from demonstrations = weak supervision • Learn acoustic model, vocabulary and grammar (ASR) • Learn mapping to semantic frames (NLU)

  8. VIVOCA results

  9. Work ahead • Larger vocabularies – How does a word spurt come about ? • Faster learning – Ideally from one example • More complex instructions and semantic representations – Continuous state space – Dynamic representation of semantics – Uncertainty in meaning – ... – Related to many actual research topics in robotics

  10. What’s needed ? Speech assessment • Investment in non-native and regional accent data • Getting government involved is hard (budget cuts etc.) Zero-resource ASR • Interaction data – grow complexity of task – Limited reuse from one task to the next • Understanding by community of relevance of the problem – Cfr. reviewer instructions for IS15 Zero-resource Challenge •  Investment attitude in Europe/Belgium •  Industrial interest is growing internationally

  11. Questions ?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend