ESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012 - - PowerPoint PPT Presentation

esat psi speech hugo van hamme
SMART_READER_LITE
LIVE PREVIEW

ESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012 - - PowerPoint PPT Presentation

ESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012 Leuven KU Leuven introduction 2 Leuven - university Associatie K.U.Leuven University + 12 university colleges 85000 students, 600 programs KU Leuven


slide-1
SLIDE 1

ESAT/PSI-Speech Hugo Van hamme

NSF workshop May 2015

4 July 2012

slide-2
SLIDE 2

Leuven

KU Leuven introduction 2

slide-3
SLIDE 3

Leuven - university

  • Associatie K.U.Leuven

– University + 12 university colleges – 85000 students, 600 programs

  • KU Leuven

– 35000 students 350 programs

  • Faculty of Engineering

– 2000 Students (Ba+Ma), 60 programs

  • Department of Electrical Engineering (ESAT)
  • 150 Ma students, 6 programs
  • 270 PhD students and postdocs …
  • 35 FTE permanent staff

Centre for Processing of Speech and Images (PSI)

  • 37 PhD students and postdocs
  • 8 FTE permanent staff
  • Speech research group
  • 12 PhD students and 1.4 postdocs
  • 5 Master students
  • 2.3 FTE permanent staff
  • Patrick Wambacq
  • Dirk Van Compernolle
  • Hugo Van hamme

KU Leuven introduction 3

slide-4
SLIDE 4
  • noise robustness – speech enhancement – source separation – source

localization

  • new paradigms for speech recognition – episodic models
  • build and consolidate digital infrastructure for the Dutch language
  • speaker properties (text-independent): ID, language, dialect, age, height
  • acoustic environment modeling – ADL recognition
  • zero-resource ASR - language acquisition by machines
  • speech assessment - education

ESAT/PSI-Speech research areas

slide-5
SLIDE 5

Speech assessment

  • Reading tutor (dyslexia) / trainer after CI fitting
  • Assess native (?) pronunciation, reading/respeak tracking
  • Children’s speech, hesitant, poorly articulated
slide-6
SLIDE 6

Zero-resource speech recognition

Why ?

  • Assistive technologies:

– people with limited fine motor control – alternative to scanning – cope with dysarthric voices

  • Huge inter-speaker variation
  • Timing, extraneous sounds
  • Dialects
  • Long-term: interacting with robots

– “Fetch a Hoegaarden Grand Cru from the fridge” – “Get my red slippers” – “Open the garden window for me”

slide-7
SLIDE 7

What’s different ?

  • Learn acoustic model and language model from

examples with noisy, high-level supervision information

– Not like traditional ASR – Not like the zero-resource challenge (IS15)

  • Our first steps:

– Home automation – “open the kitchen door”, “kitchen door open” – Learn from demonstrations = weak supervision

  • Learn acoustic model, vocabulary and grammar (ASR)
  • Learn mapping to semantic frames (NLU)
slide-8
SLIDE 8

VIVOCA results

slide-9
SLIDE 9

Work ahead

  • Larger vocabularies

– How does a word spurt come about ?

  • Faster learning

– Ideally from one example

  • More complex instructions and semantic representations

– Continuous state space – Dynamic representation of semantics – Uncertainty in meaning – ... – Related to many actual research topics in robotics

slide-10
SLIDE 10

What’s needed ?

Speech assessment

  • Investment in non-native and regional accent data
  • Getting government involved is hard (budget cuts etc.)

Zero-resource ASR

  • Interaction data

– grow complexity of task – Limited reuse from one task to the next

  • Understanding by community of relevance of the problem

– Cfr. reviewer instructions for IS15 Zero-resource Challenge

  •  Investment attitude in Europe/Belgium
  •  Industrial interest is growing internationally
slide-11
SLIDE 11

Questions ?