Voice Separation with tiny ML on the edge Main collaborators: Niels - - PowerPoint PPT Presentation

voice separation with tiny ml on the edge
SMART_READER_LITE
LIVE PREVIEW

Voice Separation with tiny ML on the edge Main collaborators: Niels - - PowerPoint PPT Presentation

Tiny ML Summit 2020 Voice Separation with tiny ML on the edge Main collaborators: Niels H. Pontoppidan, PhD Dr. Lars Bramslw (Eriksholm Research Centre, Denmark) Research Area Manager, Augmented Hearing Science Prof. Toumas Virtanen


slide-1
SLIDE 1

Voice Separation with tiny ML on the edge

Tiny ML Summit 2020 Main collaborators:

  • Dr. Lars Bramsløw (Eriksholm Research Centre, Denmark)
  • Prof. Toumas Virtanen (University of Tampere, Finland)

Gaurav Naithani (University of Tampere, Finland) Niels H. Pontoppidan, PhD Research Area Manager, Augmented Hearing Science

slide-2
SLIDE 2

Additional acknowledgements and references

  • Thomas “Tom” Barker
  • Giambattista Parascandolo
  • Joonas Nikunen
  • Rikke Rossing
  • Atefeh Hafez
  • Marianna Vatti
  • Umaer Hanif
  • Christian Grant
  • Christian Hansen
  • Bramsløw, L., Naithani, G., Hafez, A., Barker, T., Pontoppidan, N. H., & Virtanen, T.

(2018). Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm. The Journal of the Acoustical Society

  • f America, 144(1), 172–185.
  • Naithani, G., Barker, T., Parascandolo, G., Bramsløw, L., Pontoppidan, N. H., &

Virtanen, T. (2017). Low latency sound source separation using convolutional recurrent neural networks. 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 71–75.

  • Naithani, G., Barker, T., Parascandolo, G., Bramsløw, L., Pontoppidan, N. H., &

Virtanen, T. (2017). Evaluation of the benefit of neural network based speech separation algorithms with hearing impaired listeners. Proceedings of the 1st International Conference on Challenges in Hearing Assistive Technology. CHAT-17, Stockholm, Sweden.

  • Naithani, G., Parascandolo, G., Barker, T., Pontoppidan, N. H., & Virtanen, T.

(2016). Low-latency sound source separation using deep neural networks. 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 272– 276.

  • Pontoppidan, N. H., Vatti, M., Rossing, R., Barker, T., & Virtanen, T. (2016).

Separating known competing voices for people with hearing loss. Proceedings of the Speech Processing in Realistic Environments Workshop, SPIRE Workshop.

  • Barker, Thomas, Virtanen, T., & Pontoppidan, N. H. (2016). Hearing device

comprising a low-latency sound source separation unit.

  • Barker, Tom, Virtanen, T., & Pontoppidan, N. H. (2014). Hearing device comprising

a low-latency sound source separation unit (Patent No. US Patent App. 14/874,641).

  • Barker, Tom, Virtanen, T., & Pontoppidan, N. H. (2015). Low-latency sound-source-

separation using non-negative matrix factorisation with coupled analysis and synthesis dictionaries. Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2015, 241–245.

slide-3
SLIDE 3

Facts and stats about hearing aids and market

Market

  • 15+ million units sold per year
  • Global wholesale market of USD 4+ billion per year
  • Six largest manufacturers hold a market share of +90%
  • Main market: OECD countries
  • 4-6% yearly unit growth mainly due to demographic

development

  • Growing aging population and increasing life expectancy

Hearing-aid users

  • 10% of the population in OECD countries suffer from hearing

loss

  • Only 20% of people suffering from a hearing loss use a

hearing aid

  • 35-40% of the population aged 65+ suffer from a hearing loss
  • Average age of first-time user is 69 years (USA)
  • Average age of all users is 72 years (USA)
slide-4
SLIDE 4

Hearing devices

  • Hearing devices help people

communicate in simple and complex listening situations – also in sound environments were people with normal hearing give up using phones and headsets

  • Some rely on hearing devices for a few

hours a day for specific situations and many use them all awake hours

  • Power 1 mA from zinc-air batteries

replaced every week or Li-Ion batteries recharged every night

  • Hardware design employs many low

voltage and low clock-frequency methods

slide-5
SLIDE 5

Enhancing segregation by transforming “mono” to “stereo”

slide-6
SLIDE 6

History

1953

  • Cocktail

Party Problem coined by Colin Cherry

  • Cherry

proposes mono-to- Stereo to solve the probelm 2000

  • Sam Roweis:

One Microphone Source Separation at NIPS shows separation of known voices 2018

  • Bramsløw et

al: First time algorithms improve segregation

  • f known

voices for people with hearing loss 2020’s

  • When will

Tiny ML enable enhanced voice segregation in a hearing device?

slide-7
SLIDE 7

Spatial augmentation

  • The algorithms separates voices into

two (or more channels)

  • The hearing devices increases the

spatial difference cues, i.e. repositions the sound sources further apart

  • In case of spatial audio-visual cue

conflicts, visual cues are expected to

  • verride the auditory cues just like

with ventroqlists

m

  • n
  • Artificial stereo
slide-8
SLIDE 8

Flowchart for training

DNN training

slide-9
SLIDE 9

Flowchart for processing

slide-10
SLIDE 10

Enhanced segregation for people with mild/moderate hearing loss

  • DNN processing
  • 4 MIO weights for FDNNs (not optimized)
  • 250 Hz audio frame processing rate

Bramsløw et al, JASA, 2018

Unprocessed Ideal

slide-11
SLIDE 11

How listeners with normal hearing hears two competing voices

slide-12
SLIDE 12

How listeners with impaired hearing hears the two voices [The example could be harder to segregate]

slide-13
SLIDE 13

How it sounds when the two voices are separated out

slide-14
SLIDE 14

Focusing only on the female voice

slide-15
SLIDE 15

Focusing only on the male voice

slide-16
SLIDE 16

Enhanced segregation for people with mild/moderate hearing loss

  • Processing requirements
  • 4 MIO weights for FDNNs (not optimized)
  • 250 Hz audio frame processing rate

Bramsløw et al, JASA, 2018

Unprocessed Ideal

slide-17
SLIDE 17

Next steps

Feature performance

Increasing robustness to additional noise and reverberation Increasing robustness to personal voice changes Break reliance on training on specific voices (transfer learning) Further decrease network sizes from 4 MIO weights

Hardware performance

See Zuzana Jelčicová’s poster:

  • Benchmarking and improving NN execution
  • n DSP vs. custom accelerator for hearing

instruments

  • From float to fixed point
  • Parallel MACS
  • Two-step scaling
slide-18
SLIDE 18

Zuzana Jelčicová: Benchmarking and improving NN execution on DSP vs. custom accelerator for hearing instruments

slide-19
SLIDE 19

Voice Separation with tiny ML on the edge

Tiny ML Summit 2020 Main collaborators:

  • Dr. Lars Bramsløw (Eriksholm Research Centre, Denmark)
  • Prof. Toumas Virtanen (University of Tampere, Finland)

Gaurav Naithani (University of Tampere, Finland) Niels H. Pontoppidan, PhD Research Area Manager, Augmented Hearing Science