an algorithm for determining
play

An Algorithm for Determining Intro to problem the Endpoints for - PDF document

Outline An Algorithm for Determining Intro to problem the Endpoints for Isolated Solution Utterances Algorithm Summary L.R. Rabiner and M.R. Sambur The Bell System Technical Journal , Vol. 54, No. 2, Feb. 1975, pp. 297-315


  1. Outline An Algorithm for Determining • Intro to problem the Endpoints for Isolated • Solution Utterances • Algorithm • Summary L.R. Rabiner and M.R. Sambur The Bell System Technical Journal , Vol. 54, No. 2, Feb. 1975, pp. 297-315 Visual Recognition Motivation • Word recognition needs to detect word boundaries in speech “Eight” • Recognizing silence can reduce: – Processing load – (Network not identified as savings source) • Easy in sound proof room, with digitized tape • Easy • Note how quiet beginning is (tape) Tough Visual Recognition Slightly Tougher Visual Recognition “Four” “Six” • “sss” starts crossing the ‘zero’ line, so can still • Eye picks ‘B’, but ‘A’ is real start detect – /f/ is a weak fricative 1

  2. Tough Visual Recognition Tough Visual Recognition “Nine” “Five” • Difficult to say where final trailing off ends • Eye picks ‘A’, but ‘B’ is real endpoint – V becomes devoiced The Problem The Solution • Noisy computer room with background noise • Two measurements: – Weak fricatives: /f, th, h/ – Energy – Weak plosive bursts: /p, t, k/ – Zero crossing rate • Simple, fast, accurate – Final nasals – Voiced fricatives becoming devoiced – Trailing off of sounds (ex: binary, three) • Simple, efficient processing – Avoid hardware costs Energy Zero (Level) Crossing Rate • Sum of magnitudes of 10 ms of sound, centered on interval: • Number of zero crossings per 10 ms – E(n) = Σ i =-50 to 50 | s(n + i) | – Normal number of cross-overs during silence – Increase in cross-overs during speech 2

  3. The Algorithm: Startup The Algorithm: Thresholds • Compute energy, E ( n ), for interval • At initialization, record sound for 100ms – Get max, IMX – Have silence, IMN – Assume ‘silence’ I 1 = 0.03 * ( IMX – IMN ) + IMN – Measure background noise (3% of peak energy) • Compute average (IZC’) and std dev ( σ ) of I 2 = 4 * IMN zero crossing rate (4x silent energy) • Choose Zero-crossing threshold (IZCT) • Get energy thresholds (ITU and ITL) – Threshold for unvoiced speech – ITL = MIN( I 1, I 2) – IZCT = min(25 / 10ms, IZC’ * 2 σ ) – ITU = 5 * ITL The Algorithm: Zero Crossing The Algorithm: Energy Computation Computation • Search sample for energy greater than ITL • Search back 250 ms – Save as start of speech, say s • Search for energy greater than ITU – Count number of intervals where rate exceeds IZCT – s becomes start of speech • If 3+, set starting point, s , to first time – If energy falls below ITL, restart • Else s remains the same • Search for energy less than ITL • Do similar search after end – Save as end of speech • Results in conservative estimates – Endpoints may be outside Algorithm: Examples The Algorithm: Example “Half” • Caught trailing /f/ (Word begins with strong fricative) 3

  4. Algorithm: Evaluation: Part 1 Examples • 54-word vocabulary • Read by 2 males, 2 females “Four” • No gross errors (off by more than 50ms) • Some small errors – Losing weak fricatives Notice how different each – None affected recognition “four” is Evaluation: Part 2 Evaluation 3: Your Project 1 • 10 speakers • Count 0 to 9 • No errors at all Future Work • Three classes of speech: – Silence – Unvoiced speech – Voiced speech • May be more computationally intensive solutions that are more effective 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend