Description of yourself, your team/lab, your topic area, and who - - PowerPoint PPT Presentation

description of yourself your team lab your
SMART_READER_LITE
LIVE PREVIEW

Description of yourself, your team/lab, your topic area, and who - - PowerPoint PPT Presentation

Description of yourself, your team/lab, your topic area, and who funds it Director of the Center for Language and Speech Processing more than 15 faculties in language, speech, machine translation, machine learning, cognitive sciences and


slide-1
SLIDE 1

Description of yourself, your team/lab, your topic area, and who funds it

  • Director of the Center for Language and Speech

Processing

– more than 15 faculties in language, speech, machine translation, machine learning, cognitive sciences and neurosciences – more than 40 graduate students – usual funding sources

  • Collaborations with CoE HLT
  • Three-student team working directly with me

– acoustic processing for ASR

  • techniques based on temporal cues in the signal and on

artificial neural net post-processing

  • biology-inspired auditory processing

– funded by IARPA, DARPA and Google

slide-2
SLIDE 2

How does your area impact current speech technology (if at all) right now

  • Temporal features (perception of modulations)

– longer (syllable and beyond) temporal context – RASTA, LDA filters, TRAPS, MRASTA, modulation spectrum, …..

  • Data-guided features

– LDA, convolutive DNNs,…

  • Parallel processing streams

– different frequency ranges, different spectro-temporal properties, different expertise (training), different degrees of prior constraints,..

  • Hierarchical processing (deep learning?)

– frequency-localized to full spectrum, short context to longer context, …

perception of modulations physiology

  • f hearing
slide-3
SLIDE 3

Challenges

  • Human-like processing not always appreciated

by hard-core engineers)

  • Communication between engineering and life

sciences

– different goals, different vocabularies, different reward systems,…

  • Researchers trained in both the life sciences

and engineering are rare

slide-4
SLIDE 4

What Is The Problem?

  • ML (DNNs)

– train over all sources of unwanted variability

  • How to deal with previously unseen data?
  • Knowledge from life sciences?

– Emphasis on higher processing levels (beyond periphery)

  • Hierarchical processing in auditory system
  • Generalizations
  • Performance monitoring
  • Attention (what to ignore)
slide-5
SLIDE 5

processing streams

bottom-up dominated modalities, projections within modalities top-down influenced different strengths of prior constraints signal prior knowledge “smart” fusion information learning

1. how to create processing streams ? 2. “smart” fusion ?

Dealing with Unknown Unknowns:

Biologically-inspired multi-stream processing of sensory information

bottom-up streams

  • conflicts indicate localized

corruptions

  • leave out affected

streams

environment conventional proposed best by hand clean 31 % 28 % 25 % car at 0 dB SNR 54 % 38 % 35 %

“five” “three”

top-down and bottom-up streams

  • conflicts indicate unexpected

inputs

  • pportunity for learning

time /f/ /ay/ /v/ /z/ /iy/ /r/ /oh/ /sil/ /z/ /iy/ /r/ /oh/ “zero’ /th/ /r/ /iy/

strong priors weak priors

How do we know which combination of processing streams yield “correct” information ?

  • the information must “make sense”

Typical sound occurrences, typical confusions, and typical temporal patterns of speech sounds

divergence

~100K neurons ~10M neurons periphery ~1000 Hz cortex ~10 Hz

processing Preserving information in a system

slide-6
SLIDE 6

Training / Test Clean 10 dB SNR 5 dB SNR Clean 3.10 15.65 36.60 10 dB SNR 5.06 4.35 14.70 5 dB SNR 9.04 4.73 7.73

word error rates Aurora 4

DNN decoder TRAINING signal clean 10 dB SNR 5 dB SNR

multi-condition training 4.28 5.17 11.86 multi-band 3.06 3.12 10.29

signal “clean” DNN “10 dB” DNN “5 dB” DNN

slide-7
SLIDE 7

Where Are We Now ?

signal processing pattern classification decoder message Signal processing, information theory, machine learning, …

slide-8
SLIDE 8

Repetition, fillers, hesitations, interruptions, unfinished and non-gramatical sentences, new words, dialects, emotions, … Current DARPA and IARPA programs, research agenda of the JHU CoE HLT, industrial efforts (Google, Microsoft, IBM, Amazon,…)

Engineering and Life Sciences together !

Signal processing, information theory, machine learning, … neural information processing, psychophysics, physiology, cognitive science, phonetics and linguistics, ...

&

And Where Are We Heading ?