Improved Soft Decisions in Missing Data ASR: Using Harmonicity in - PowerPoint PPT Presentation

Improved Soft Decisions in Missing Data ASR: Using Harmonicity in Conjunction with Local SNR Estimates Speech and Hearing Research Group, Dept. Computer Science, University of Sheffield, UK January 24, 2001

� � � � � ved Soft Decisions in Missing Data ASR Improved Soft Decisions in Missing Data ASR: Combining Masks Soft Decisions in Missing Data Harmonicity-based Fuzzy Masks Merging Local SNR and Harmonicity Masks Aurora 2000 Results Conclusions Septemer 25, 2000 1

ved Soft Decisions in Missing Data ASR Soft Decisions in Missing Data Discrete 0/1 Mask Threshold 1 SNR Estimate 0 Frequency Fuzzy Mask F (S) v, T Time 1 0 Soft mask values are interpreted as "the probability that the data is reliable". So rather than use the present data likelihood OR the missing data ‘induction constraint’ , every point uses weighted sum of BOTH terms. Septemer 25, 2000 2

✂ ✔ ✢ ✂ ✌ ✏ ✡ � ✁ ✂ ☎ ✆ ✞ ✟ ✡ ☎ ✕ ✖ ✗ ✘ ✡ � ✡ ✎ ✡ ✞ ✞ ✛ ✁ ✆ ✌ ✘ ✡ ✣ ✡ � ✂ ✡ ✡ ✡ ☛ ☞ � ✡ � ☎ ✒ ✆ ✞ ✌ ☛ ✍ ✎ ✂ ✌ ✏ ✑ ✒ � ✌ ✡ ved Soft Decisions in Missing Data ASR Using Soft Decisions Missing data probability calculation for discrete masks, showing the separate present and missing components: ✞✠✟ ✁✄✂ ☎✝✆ ✁✄✂ ✁✄✂ ✞✄✓ With soft decisions the probability due to each feature vector component becomes a weighted sum of the present and missing probability terms: ✎✝✜ ✁✄✂ ☎✚✙ ✁✄✂ ☎✚✙ ✞✄✓ Septemer 25, 2000 3

✓ ✔ ✠ ☛ ✒ ✝ ✖ ✎ ✍ ☛ ✑ ✝ ✍ ✗ ✘ ✙ ✏ ✚ ✆ ✑ ✒ ✟ ✟ ☛ ✜ � ✁ ✂ ✄ ☎ ☛ ✍ ☞ ✍ ✌ ✍ ✂ ✄ ✑ ✍ ✏ ✍ ✠ ✍ ved Soft Decisions in Missing Data ASR Using Soft Decisions Generalising to models employing Gaussian mixtures: ✆✞✝ ✟✡✠ ✆✞✎ ✆✞✑ ✆✕✔ ☛✞✛ Septemer 25, 2000 4

� � � ved Soft Decisions in Missing Data ASR Harmonicity Masks 32 Channels 32 frequency channels, 150 lags s ,T 1 1 Fuzzy Select lag 1 32 frequency channels Harmonicity from Correlogram (freq, lag) 0 Mask Correlogram Peak’s lag index (~1/f0) Noisy Gammatone Sum Across f0 Haircell Pitch Peak Autocorrelation Filterbank Signal Frequency Model Tracking Peak’s Height (Degree of Voicing) Instanteous Envelope Summary Applied to each channel Autocorrelogram over a temporal window The Harmonicity Mask is designed to mark voiced speech regions. It works well when noise is inharmonic or the SNR is favourable. Refinements necessary when noise is harmonic and dominant: –> pitch tracking, multisource decoding? Septemer 25, 2000 5

� � ved Soft Decisions in Missing Data ASR Mask Combination We now have two fuzzy masks: Fuzzy SNR-based mask - Works well in stationary noise. Fuzzy Harmonicity-based mask - Highlights voiced speech regions. We also have a ‘degree of voicing’ parameter, V. How do we combine the masks? Septemer 25, 2000 6

✙ ✂ ✂ � ✁ � ved Soft Decisions in Missing Data ASR Mask Combination Discrete combination : (One parameter) If frame is Voiced, else frame is Unvoiced. Then, Voiced frames –> Use harmonicity-based mask Unvoiced frames –> Fall back on SNR masks Fuzzy combination : (Two parameters) Raw Harmonicity Data Harmonicity Mask, M h 1 0 s ,T 1 1 Hybrid Mask Mask Combination Degree of Voicing 1 0 w wM +(1-w)M s s ,T h 2 2 Local SNR Estimate 1 0 SNR Mask, M s ,T s 3 3 Septemer 25, 2000 7

ved Soft Decisions in Missing Data ASR Tuning the Voicing Sigmoid Clean Car 10dB 1.2 1.2 1 1 Voicing Voicing 0.8 0.8 0.6 0.6 0.4 0.4 5ms 10ms 15ms 5ms 10ms 15ms Lag (~ 1/f0) Lag (~ 1/f0) Voicing vs. Lag for female and male speakers. Septemer 25, 2000 8

ved Soft Decisions in Missing Data ASR Comparison with Apriori Masks Male "4382" + Car @ 20dB SNR Apriori 3800Hz 50Hz Local SNR Estimate Mask Harmonicity Based Mask Combined Mask 0 1.7 secs Septemer 25, 2000 9

ved Soft Decisions in Missing Data ASR Comparison with Apriori Masks Male "4382" + Car @ 10dB SNR Apriori 3800Hz 50Hz Local SNR Estimate Mask Harmonicity Based Mask Combined Mask 0 1.7 secs Septemer 25, 2000 10

� � � � � ved Soft Decisions in Missing Data ASR Aurora 2000 Experiments Trained on clean data . Testing using Set A (i.e. subway, exhibition, babble and car noises). Features: 32 channel gammatone filter bank, + deltas. Two slightly different sets of models + Aurora Models: 16 states per digit, + DC Models: 11.5 states per digit on average. 7 mixtures per state (note, relatively large num. of mixes needed for spectral features). Septemer 25, 2000 11

ved Soft Decisions in Missing Data ASR Aurora Results: Test Set A Car Noise Exhibition Noise 100 100 Discrete SNR Discrete SNR 80 80 Fuzzy SNR Fuzzy SNR +Harmonicity +Harmonicity 60 60 WER WER 40 40 20 20 0 0 −5 0 5 10 15 20 Clean −5 0 5 10 15 20 Clean SNR (dB) SNR (dB) Subway Noise Babble Noise 100 100 Discrete SNR Discrete SNR 80 80 Fuzzy SNR Fuzzy SNR +Harmonicity +Harmonicity 60 60 WER WER 40 40 20 20 0 0 −5 0 5 10 15 20 Clean −5 0 5 10 15 20 Clean SNR (dB) SNR (dB) (32 channel filter bank + deltas) Septemer 25, 2000 12

ved Soft Decisions in Missing Data ASR Aurora Results: WER averaged over noise condition 100 Discrete SNR Fuzzy SNR 90 +Harmonicity (MultiCondition) 80 70 60 WER 50 40 30 20 10 0 −5 0 5 10 15 20 Clean SNR (dB) MASK / SNR -5dB 0dB 5dB 10dB 15dB 20dB Clean Discrete SNR 83.8 56.6 34.0 17.2 8.5 4.1 1.2 Fuzzy SNR 69.7 41.2 20.1 10.1 5.7 3.4 1.5 + Harmonicity 66.6 36.4 16.9 8.3 4.3 2.5 1.4 Septemer 25, 2000 13

ved Soft Decisions in Missing Data ASR Aurora WER Results: Aurora vs. DC Word Models 100 16 State Models DC Models 90 80 70 60 WER 50 40 30 20 10 0 −5 0 5 10 15 20 Clean SNR (dB) Models / SNR -5dB 0dB 5dB 10dB 15dB 20dB Clean 16 State Models 66.6 36.4 16.9 8.3 4.3 2.5 1.4 DC Word Models 69.4 39.8 19.9 9.9 5.3 3.2 1.7 Septemer 25, 2000 14

� � � ved Soft Decisions in Missing Data ASR Conclusions In combination, Harmonicity and Local SNR masks perform better than either mask individually , i.e: + better approximation to the apriori (‘cheating’) mask, + better recognition results. The mask generation parameters are robust , i.e. one set of parameters will perform well over a large range of noise types, and noise levels. Sensible values can be estimated from clean speech. Septemer 25, 2000 15

� � � � ved Soft Decisions in Missing Data ASR Further Work Temporal Smoothing. Smoothing the masks appears to improve results for some noise types - but seriously damages results for others. Using F0 Information. Using F0 to distinguish between voiced speech and harmonic noise. F0 tracking. ‘Multi-pitch’ decoding. Adaptive Sigmoid Parameters. Techniques for fine tuning the mask generation parameters according to the noise estimate. More General Mask Combination Techniques. Septemer 25, 2000 16

ved Soft Decisions in Missing Data ASR Learning Noise Specific Parameters 20 KHz TIDigits + Factory Noise 50 Discrete SNR Fuzzy SNR (ICSLP) 45 Tuned Fuzzy Autoc/SNR (Apriori) 40 Digit recognition accuracy 35 30 25 20 15 10 5 0 0 5 10 15 20 200 SNR (dB) Parameters tuned to minimise distance to Apriori masks at 0 & 5 db. Septemer 25, 2000 17

Improved Soft Decisions in Missing Data ASR: Using Harmonicity in - PowerPoint PPT Presentation

Improved Soft Decisions in Missing Data ASR: Using Harmonicity in Conjunction with Local SNR Estimates Speech and Hearing Research Group, Dept. Computer Science, University of Sheffield, UK January 24, 2001 ved Soft

CPSC 503 - Intro to E2E ASR Peter Sullivan - April 24th 2020 Lecture Overview Intro to ASR

S2S ASR Advanced issues Tight coupling Tight coupling ASR should output N ASR should

Speech Processing 15-492/18-492 Speech Recognition Systems Other ASR techniques ASR Systems

Use of f th the SA SAWS ASR ASR for r Sp Spri ringflow Protection Optimization through

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

WALES SOFT POWER BAROMETER 2018 Measuring soft power beyond the nation-state April 2018 01 WHAT

Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH , 2017 Outline Types of missing data

1 In this presentation the two types of alkali-aggregate reaction ASR and ACR will de

ASR, NLU, DM Ling575 Spoken Dialog Systems April 12, 2017 Roadmap ASR Basic

Water Authoritys ASR Policy Perspective RICK SHEAN, WATER QUALITY HYDROLOGIST AUG. 16, 2017

On Fuzzy Soft Rings Banu Pazar Varol and Halis Ayg un Department of Mathematics, Kocaeli

Introduction 1 Turbo Principle 2 Coding and uncoding SISO (Soft Input Soft Output) 3

Missing Values in SAS Magnus Mengelbier Director PhUSE 2011 1 Topics Introduction

Soft Soft Soft LArSoft coord, Oct 10 th , 2017 G. Petrillo (FNAL) Proxies for data products 1

Searching for and replacing missing values Nicholas Tierney Statistician DataCamp Dealing With

Bayesian Generalized linear mixed models with data missing not at random Overview: Two simple

Understanding the Long-Term Self-Similarity of Internet Traffic Steve Uhlig and Olivier

New pictures for correlation structure Jan Graffelman 1 1 Department of Statistics and Operations

Time-varying signals: cross- and auto-correla5on,

SEPARATING THE WHEAT FROM THE CHAFF Tips on how to identify and characterize essential movements

Color Image Indexing Using BTC Author: Guoping Qiu Source: IEEE Transaction on Image Processing,

Multimedia Indexing and Retrieval Georges Qunot Multimedia Information Modeling and Retrieval

Program an analysis workflow Day 1. Basic functionality of Chipster (Eija) Microarray

Modeling brain cognitive functions by oscillatory neural networks Institute of Mathematical

Sambuz

Useful Links

Newsletter

Mail Us