 
              ESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012
Leuven KU Leuven introduction 2
Leuven - university • Associatie K.U.Leuven – University + 12 university colleges – 85000 students, 600 programs • KU Leuven – 35000 students 350 programs • Faculty of Engineering – 2000 Students (Ba+Ma), 60 programs • Department of Electrical Engineering (ESAT) - 150 Ma students, 6 programs 270 PhD students and postdocs … - - 35 FTE permanent staff Centre for Processing of Speech and Images (PSI) - 37 PhD students and postdocs - 8 FTE permanent staff - Speech research group - 12 PhD students and 1.4 postdocs - 5 Master students - 2.3 FTE permanent staff - Patrick Wambacq - Dirk Van Compernolle - Hugo Van hamme KU Leuven introduction 3
ESAT/PSI-Speech research areas • noise robustness – speech enhancement – source separation – source localization • new paradigms for speech recognition – episodic models • build and consolidate digital infrastructure for the Dutch language • speaker properties (text-independent): ID, language, dialect, age, height • acoustic environment modeling – ADL recognition • zero-resource ASR - language acquisition by machines • speech assessment - education
Speech assessment • Reading tutor (dyslexia) / trainer after CI fitting • Assess native (?) pronunciation, reading/respeak tracking • Children’s speech, hesitant, poorly articulated
Zero-resource speech recognition Why ? • Assistive technologies: – people with limited fine motor control – alternative to scanning – cope with dysarthric voices • Huge inter-speaker variation • Timing, extraneous sounds • Dialects • Long-term: interacting with robots – “Fetch a Hoegaarden Grand Cru from the fridge” – “Get my red slippers” – “Open the garden window for me”
What’s different ? • Learn acoustic model and language model from examples with noisy, high-level supervision information – Not like traditional ASR – Not like the zero-resource challenge (IS15) • Our first steps: – Home automation – “open the kitchen door”, “kitchen door open” – Learn from demonstrations = weak supervision • Learn acoustic model, vocabulary and grammar (ASR) • Learn mapping to semantic frames (NLU)
VIVOCA results
Work ahead • Larger vocabularies – How does a word spurt come about ? • Faster learning – Ideally from one example • More complex instructions and semantic representations – Continuous state space – Dynamic representation of semantics – Uncertainty in meaning – ... – Related to many actual research topics in robotics
What’s needed ? Speech assessment • Investment in non-native and regional accent data • Getting government involved is hard (budget cuts etc.) Zero-resource ASR • Interaction data – grow complexity of task – Limited reuse from one task to the next • Understanding by community of relevance of the problem – Cfr. reviewer instructions for IS15 Zero-resource Challenge •  Investment attitude in Europe/Belgium •  Industrial interest is growing internationally
Questions ?
Recommend
More recommend