developing your own wake word engine
play

Developing Your Own Wake Word Engine Just Like Alexa and OK Google - PowerPoint PPT Presentation

Developing Your Own Wake Word Engine Just Like Alexa and OK Google Xuchen Yao, CEO, KITT.AI Guoguo Chen, CTO, KITT.AI Whats a wake word? Alexa whats the weather today? OK Google Hey Siri Wake word One shot


  1. Developing Your Own Wake Word Engine Just Like “ Alexa ” and “OK Google” Xuchen Yao, CEO, KITT.AI Guoguo Chen, CTO, KITT.AI

  2. What’s a “wake word”? Alexa what’s the weather today? OK Google Hey Siri • Wake word • One shot • Hot word understanding • Offline • Online • Code runs on • Code runs on cloud CPU/DSP/MCU • 7x24 • On Demand • Always listening • Explicit permission

  3. Conversational UI Pipeline wake up device voice speech  text text  speech text text dialogue understanding management

  4. a customizable hotword detection engine a.k.a: deep neural network in 2MB of RAM hotword.io video blog

  5. Who’s using it (released 5/2016) 10,000+ developers, 7000+ unique hotwords Dominating developer community for hotword detection

  6. Use Cases

  7. #1 Hotword: Smart Mirror https://github.com/evancohen/smart-mirror (credits to Evan Cohen) video link

  8. Command & Control: GoPiGo (credits to Paul Matz) video link

  9. Project RePL (credits to Chris Burns) video link

  10. Conversational UI Pipeline wake up device voice speech  text text  speech Speech Pipeline text text dialogue understanding management

  11. Speech Pipeline Wake Word Speech Microphone Voice Detection Recognition Array local cloud/local • Close talking • IBM/Microsoft/Nua • Telephone nce/Google (8KHz Sampling) • Far field (3-9 • Alexa Voice Service • Others (16KHz) feet) • Voice Activity Detection • 2, 4, or 6 • Kaldi • Noises: TV, • Auto Gain microphones • PocketSphinx radio, street, Control • Linear/circular • HTK café, car, music • Adaptive Echo • Fast response • Command & Control • Pitch: children, • Language Cancellation (0.1 second) adults, senior Understanding • Beam forming • High accuracy • Accent: US/UK/Europe/ Asian …

  12. Supported Platforms and Wrappers • Raspberry Pi • Mac OS X • iPhone/iPad/iPod • x86/64bit Ubuntu • Android • Pine 64 • Intel Edison • Samsung Artik • Allwinner R-series • Ingenic X1000 • Rockchip

  13. Personal vs. Universal models Personal Universal Voice samples needed 3 At least 1500 Speaker-independent No Yes Speaker-specific Sort of No Robust against noise No Yes Free Yes No Time needed Immediately 2 weeks

  14. Customizing a universal model hotword collect voice web API from device Iterate & Improve define train a deliver & deploy to collect voice hotword model evaluate beta users desired performance: ship & >90% detection rate success <= 3 false alarms in 24 hours

  15. Science behind wake word

  16. Challenges Is this “ Alexa ”? • High detection rate • Low false alarm • Efficient: detect every 0.1 short window longer window second • Small RAM: <2MB • Too much ambiguity, not much context

  17. Existing Algorithm

  18. Existing Algorithm

  19. Existing Algorithm • Advantage: – Simplified pipeline – Simplified decoder • Disadvantage: – Massive hotword specific training data

  20. Possible Ways to Improve • Data augmentation – Adding noise – Adding reverberation – And so on … original add noise add noise and reverberation

  21. Possible Ways to Improve • Network models – Model selection • Feedforward models? Recurrent models? – Model compression • 32-bit float  16-bit float  8-bit integer • Parameters with small absolute value

  22. Possible Ways to Improve • Decoder redesigning – Modeling smaller units • Syllables, phones, etc – False alarm suppression • Additional classifier?

  23. Training with Tesla K20/K80 • Positive data – 1,500 hotword samples • Negative data – Thousands of hours of speech • Training time – Half a day with 4 K80 GPUs

  24. Software Architecture Backend Frontend

  25. KITT.AI Scientific Computing Content Data Training Model Deploy Websocket audio, msg Traffic HTTPs  Deep Learning Cloud ELB Message Queue Production Devices Cloud

  26. Running Your First Snowboy Demo

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend