Speech Detection for Text-Dependent Speaker Verification Orith - PowerPoint PPT Presentation

Speech Detection for Text-Dependent Speaker Verification Orith Toledo-Ronen Persay Ltd.

Outline • Motivation • Review of existing techniques • HMM-based speech detection • The Evaluation Track corpus • Experimental results • Summary

Motivation • Improving end-point detection improves text-dependent speaker verification performance • Existing algorithms: energy-based voice activity detector (VAD) • Problem: background speech may pass the energy threshold

Existing Techniques • Energy • Amplitude • Zero-crossing rate • Linear prediction error • Pitch • HMM

Comparison of Techniques • Energy-based VAD - Statistics on frame energy - Threshold setting • HMM-based VAD - Speaker dependent model - Password detection - Filters the noise

Energy-based VAD • Compute the energy of all frames • Find statistics of energy values Ω (E) • Compute the energy threshold T = f ( Ω (E)) • Filter out all frames with energy below T

HMM-based VAD • A left-to-right hidden Markov model of the phrase • Not phoneme-based • Trained from 3 repetitions

Training • Use the energy-based VAD first • Train the speaker HMM • Train a background HMM from: - noise segments - background speech • Merge the speaker and background HMMs

Merging Models Audio Noise Speaker Noise

Detection • Run Viterbi with the merged HMM and find the speaker’s states in the segmentation • Use the HMM VAD as a filter before verification

Example

The Evaluation Track Corpus • Database : Persay’s TD corpus • Passwords : 9-digit telephone number 4-digit personal code • Speakers : 45 males 37 females • Impostors : up to 5 same-gender impostors for each speaker

The Evaluation Track Corpus • Sessions : ~5 calls per speaker with 3 repetition of each password in each call • Media : cellular phone • Language : Hebrew

Experimental Results • Results : % Equal Error Rate Gender Password Energy HMM H+E E+H Male 9-digit 7.2 8.1 8.7 6.7 4-digit 11 .1 12.6 10.8 9.0 Female 9-digit 6.3 5.8 7.1 6.4 4-digit 10 .8 12.2 12.5 12.4

Password Rejection • Impostor : the Viterbi path does not reach the speaker’s model • Partial password : the Viterbi path does not cover all the speaker’s states Gender Password H+E E+H Male 9-digit 1 / 39 5 / 54 % Rejected (Target / Impostor) 4-digit 0 / 21 3 / 45 Female 9-digit 2 / 52 6 / 82 4-digit 1 / 33 7 / 68

Password Rejection - Cont’d • The Persay’s TD corpus was manually cleaned by a human listener. • Rejected by human: 102 target attempts 115 impostor attempts • Algorithm rejection: 33% target attempts 86% impostor attempts

Password Rejection - Cont’d • Segments rejected by human and algorithm: - non-speech: DTMFs, ring tone, silence - corrupted audio - wrong password - strong background speech • Segments rejected only by human: - all contain the password, by poor quality - low volume, background speech, error and repair

Summary • We have presented a method for speech detection in a text-dependent speaker verification system. • The HMM-based VAD can be used in combination of an energy-based VAD. • It can detect the password and reject invalid verification audio segments.

Speech Detection for Text-Dependent Speaker Verification Orith - PowerPoint PPT Presentation

Speech Detection for Text-Dependent Speaker Verification Orith Toledo-Ronen Persay Ltd. Outline Motivation Review of existing techniques HMM-based speech detection The Evaluation Track corpus Experimental results

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Joint Factor Analysis for Text-Dependent Speaker Verification Patrick Kenny, Themos Stafylakis,

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Why Dependent Origination? So what is dependent origination? Dependent on ignorance, there

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Combining Speech and Speaker Recognition - A Joint Modeling Approach Hang Su Supervised by:

W3C Speaker Identification W3C Speaker Identification and Verification Workshop and Verification

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Deep Neural Networks based Text- Dependent Speaker Verification Gautam Bhattacharya, Jahangir

Parent Math Night Welcome Thank you for joining us tonight. Big Ideas - The Shifts in Common

FY 2015 Results Presentation New York, April 11 th 2016 Agenda Presentation 11:00am 11:45am

CONTRACTING BASICS 410 th COR Training 410th CSB 410th CSB LEARNING OBJECTIVES CONTRACTING

Q3 2019 SALES Continued growth acceleration October 18, 2019 Ccile Cabanis CFO I 1 I

Escape the Room! You have been helping your teacher to tidy up the sports equipment after a P.E.

Somatotopic Map and Inter- and Intra-Digit Distance in Brodmann Area by Vibration and Pressure

MyECC 101 Presentation El Camino College 1 MyECC 101 Presentation El Camino College These are

European Citizens Initiative IT developments (central online collection system and file

Sambuz

Useful Links

Newsletter

Mail Us