PocketSphinx: Open-Source Speech Recognition for Hand-held and - PowerPoint PPT Presentation

PocketSphinx: Open-Source Speech Recognition for Hand-held and Embedded Devices David Huggins-Daines (dhuggins@cs.cmu.edu) Mohit Kumar (mohitkum@cs.cmu.edu) Arthur Chan (archan@cs.cmu.edu) Alan W Black (awb@cs.cmu.edu) Mosur Ravishankar (rkm@cs.cmu.edu) Alexander I. Rudnicky (air@cs.cmu.edu) Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 1

What is PocketSphinx? ● Based on Sphinx-II – Open source code under MIT-style license – Widely used in CMU and elsewhere – Mature and stable API ● Design goals – Statistical Language Model support ● Finite-State Grammars also available – Medium-Large Vocabulary (1-10kwords) – Make it go faster Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 2

Why do we need it? ● Typical desktop/workstation of 2006 – 128-bit memory bus (6-10GB/sec) – 1.8-3GHz processor (5000 MIPS) – ATA, SATA, or SCSI storage (100-300MB/sec) ● Typical PDA/SOC/smartphone of 2006 – 16 or 32-bit memory bus (100-400MB/sec) – 200-600MHz processor (200-700 MIPS) – SD/MMC or CF storage (1-16MB/sec) – no FPU or vector unit (sometimes a DSP...) Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 3

ASR bottlenecks ● Wait, you say: – My cell phone is pretty darn fast! – At least as fast as that DEC we had a real-time 20k system on back in 1996! ● However: ASR is system bandwidth limited – Sphinx benchmarks (shown to the right) favor large caches and high memory bandwidth (Intel) – Search, LM, and dictionary lookup are highly memory-intensive – We will have to deal with them (Source: techreport.com) Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 4

Scaling: Hand-held vs Desktop 2.25 2 1.75 Speed (xRT) 1.5 1.25 Hand-held 1 Desktop 0.75 0.5 0.25 0 10 1000 5000 # of words in vocabulary Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 5

How to make it go faster ● Low-hanging fruit – Front-end optimizations (fixed-point, logarithm) – Speeding up GMM computation – Old-fashioned beam tuning ● Non-speech-related work – Memory optimization (+ model compression) – Machine-level optimization (assembly code) ● What's left? – Search optimization – dynamic beam tuning – Language model compression and optimization Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 6

Front-End Optimizations ● Fixed-point calculations – 32-bit, 16.16 or 18.14 format – Using 64-bit multiply (SMULL) on ARM, 16.16 multiply-accumulate on DSP – MFCC calculated in log domain, using a lookup of log 2 w/conversion to log 1.0001 ● Audio downsampling – Allows smaller order FFT and MFCC – Not as useful for large-vocabulary systems Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 7

GMM Optimizations ● Top- N based Gaussian selection (Mosur 96) – Use previous frame's top codewords to select current frame – standard Sphinx-II technique ● Partial frame-based downsampling (Woszczyna 98) – Only update top- N every M th frame – Can significantly affect accuracy ● k d-tree based Gaussian selection (Fritsch 96) – Approximate nearest neighbor search in k dimensions using stable partition trees – 10% speedup, little or no effect on accuracy Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 8

Search Optimizations ● Absolute pruning – Approximations in the front end and GMM increase the effective beam width, paradoxically decreasing performance – We would like to enforce a hard limit on the number of states or word exits evaluated per frame - how? ● Histogram pruning (Ney 1996) – Partition the beam width into bins – Dynamically recompute beam based on bin occupancy counts – 30% speedup with 10% relative degradation in WER Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 9

Memory Optimizations ● Read-only model files – mmap(2)able, shareable between processes – leverage OS-level caching (virtual memory) ● Precompiled (binary) LM – Inherited from Sphinx-II – Adapted for memory-mapping – 5000+ vocabulary in <32M of RAM ● Read-only binary model definition file – Pre-built radix tree of triphones->senones Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 10

Performance Task Vocabulary Perplexity xReal-Time Word Error TIDIGITS 10 13.86 0.5 0.87% RM1 994 46.79 0.71 13.11% WSJ devel5k 4989 143.5 0.96 18.50% ● Test platform: iPaq 3670 – 206MHz StrongARM running Linux (FPU emulation in kernel) ● Also running on: – Other embedded Linux platforms – Analog Devices Blackfin, uClinux – WinCE using GNU toolchain (untested) Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 11

How to get it ● Web Site: http://www.speech.cs.cmu.edu/pocketsphinx/ ● Compiles with GCC for i386, ARM, PowerPC, and Blackfin ● Cross-compiles using an arm-wince-pe toolchain (available in various Linux distributions) for Windows CE ● Compatible with Sphinx2 fbs.h interface ● Good (fast) acoustic models forthcoming Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 12

Future work ● Improve accuracy – Remove Sphinx-II codebook limitations ● Optimize the language model and dictionary – Statistical profiling of LM access patterns ● Investigate dynamic search strategies ● Remove various legacy code ● Fast speaker and channel adaptation Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 13

Thank you ● Any questions? This work was supported by DARPA grant NB CH-D-03-0010. The content of the information in this publication does not necessarily reflect the position or the policy of the US Government, and no official endorsement should be inferred. Language Technologies Institute, Carnegie Mellon University 05/18/06 Slide 14

PocketSphinx: Open-Source Speech Recognition for Hand-held and - PowerPoint PPT Presentation

PocketSphinx: Open-Source Speech Recognition for Hand-held and Embedded Devices David Huggins-Daines (dhuggins@cs.cmu.edu) Mohit Kumar (mohitkum@cs.cmu.edu) Arthur Chan (archan@cs.cmu.edu) Alan W Black (awb@cs.cmu.edu) Mosur Ravishankar

Weakly self-avoiding walk in dimension four Gordon Slade University of British Columbia

M obius transformations and Furstenbergs theorem Piotr Rutkowski BSc Wednesday 8 th April,

AN APPLICATION OF THE HARDENED FLOATING-POINT CORES ON HIL SIMULATIONS Elas Todorovich,

The Role of Item Models in Automatic Item Generation Mark J. Gierl Hollis Lai Centre for Research

Deriving Efficient Data Movement From Decoupled Access/Execute Specifications Lee W. Howes,

TenantGuard: Scalable Runtime Verification of Cloud-Wide VM-Level Network Isolation Han Song

Physics and hard disk drives- an industrial career perspective Steven Lambert APS Industrial

Brady Corporation F16 Q4 Financial Results September 9, 2016 Forward-Looking Statements 2 In

$2 to $8 million 2 1 7/30/2013 M ANAGING RISK UNDER THE AIA $1.8 billion $1.5 billion

20 2018 Exp xper erimental imental of Ener ergy gy Conser nservati vation on Using ng Th

Don Piller Agenda Windows 10 Overview Highlights User Interface Changes New Apps

Dynamical Systems Continuous maps of metric spaces We work with metric spaces, usually a

Exploring Semantic Properties of Sentence Embeddings Tingfeng Li * Xunjie Zhu Gerard de Melo

Friends Close and Enemies Closer UNDERSTANDING AND ASSESSING THE COMPETITION Definition:

RENORMALIZATION GROUP TRAJECTORIES BETWEEN TWO FIXED POINTS Abdelmalek Abdesselam University of

R/V MANTA Engine Room Fire LT Melissa M. Trede, NOAA Corps Timeline of Events Captain reports

Three-in-one PCIe solution for modern-day SSD storage challenges Flash Memory Summit 2019

Investor Presentation February, 2017 Forward looking statements Factors that could cause actual

Forward Looking Statements Factors that could cause actual events or results to differ materially

CDS product guides We have a number of different product guides available, please click on the

COATED PRODUCTS IN OPTICAL APPLICATIONS Sheila Hamilton Teknek Ltd TAAT 2012 1 OUTLINE

OMMR=f= OMMR=f= g=m=J

FLREDC Public Meeting Wegmans Conference Center August 30, 2017 THANK YOU, DANNY! Executive

CLEANING IN A VACUUM Outline Vacuum processing Contamination and defects Cleaning in a