Text-independent Speaker Verification Using Support Vector Machines - PowerPoint PPT Presentation

Text-independent Speaker Verification Using Support Vector Machines (SVM) Jamal Kharroubi Dijana Petrovska-Delacrétaz Gérard Chollet (kharroub, petrovsk,chollet)@tsi.enst.fr ENST/CNRS-LTCI, 46 rue Barrault 75634 PARIS cedex 13 Odyssey 2001 Workshop, 18-22 June 2001

Overview 1 Introduction and motivations 2 SVM principles 3 SVM and speaker recognition Identification Verification 4 SVM Theory 5 Combining GMM and SVM for speaker verification 6 Database 7 Experimental protocol 8 Results 9 Conclusions and perspectives

1 Introduction and Motivations � Gaussian Mixture Models (GMM) � State of the art for speaker verification � Support Vector Machines (SVM) � New and promising technique in statistical learning theory � Discriminative method � Good performance in image processing and multi-modal authentication � Combine GMM and SVM for Speaker Verification

2 SVM Principles � Pattern classification problem : given a set of labelled training data, learn to classify unlabelled test data � Solution : find decision boundaries that separate the classes, minimising the number of classification errors � SVM are : � Binary classifiers � Capable of determining automatically the complexity of the decision boundary

2.2 SVM principles Separating hyperplane H , e c e with the optimal hyperplane H o a c p a s p s e r t u u p t a n H e I F X ψ (X) Class(X) H o

2.3 Example Φ : ℜ 2 → ℜ 3 → ( x , x ) ( x , 2 x x , x ) 2 2 1 2 1 2 1 2 Z 2 X 2 Z 1 X 1 Z 3

3 SVM and Speaker Recognition Speaker Identification with SVM : Schmidt and Gish, 1996 � Goal : identify one among a given closed set of speakers � Methods used : one vs. other speakers or pairwize classifier ( N(N-1)/2 = 325 for N = 26 ) � The input vectors of the SVM’s are spectral parameters � Database : Switchboard, 26 mixed sex speakers, 15 s for train, 5 s for tests � Baseline comparison with Bayesian (GMM) modeling

� Results => slightly better performance with SVM’s, with the pairwize classifier � Why these disapointing results ? � Too short train/test durations � GMM’s perhaps better suited to model the data � GMM’s perhaps more robust to channel variation

3.2 SVM and Speaker Verification � Not done before � Difficulty : mismatch of the quantity of labelled data, more data available for impostor access than true target � Our preliminary test, with speech frames as input to SVM => no satisfactory results � Present approach : model globally the client-client against client-impostor access

4. SVM Theory Input Space Feature Space { { { } } { } } = ∈ ∈ − = = Ψ ∈ ∈ − = D ( x , y ) x E ; y 1 , 1 ; i 1 ,.. m D ( ( x ), y ) x E ; y 1 , 1 ; i 1 ,.. m i i i i i i i i Classification Function ∑ = Ψ × Ψ + class ( x ) sign [ a y ( ( x ) ( x ) ) b ] o i i o � �� SV K ( x , x ) i

4.2 SVM – usual kernels used � Linear = × K ( x , y ) x y � Polynomial = × + d K ( x , y ) [( x y ) 1 ] � Radial Basis Function (RBF) 2 = − γ − K ( x , y ) exp( x y )

5 Combining GMM and SVM for Speaker Verification � Reminder : GMM speaker modeling and Log Likelihood Ratio Scoring, referred as LLR � SVM classifier � construction of the SVM input vector � SVM train/test procedure

5.1 GMM speaker modeling WORLD GMM GMM Front-end MODELING MODEL TARGET GMM GMM Front-end ADAPTATION MODEL

5.2 LLR Scoring λ ( λ P x / ) HYPOTH. h c e TARGET e p S Λ t s GMM MOD. = e T λ P ( x / ) E λ Front-end R O Log [ ] C S λ P ( x / ) R L L ( λ P x / ) WORLD GMM MODEL

5.3 Construction of the SVM input vectors Additionnal labelled development data, with T frames = T t ... t j t ... 1 T t S For each frame , the score is computed as follows : t j j [ ] = S Max Log [ P ( t / g )] t j i j ∈ λ λ g , i ( λ ( λ V ) V ) Two vectors , are constructed as follows: λ X λ X � First, all the components of the vectors are initialized to zero

� If is given by g i belonging to , the i th component of S λ t j the vector is incremented ( λ V ) λ X λ S by the frame score. If is given by g j belonging to , the t j j th component of the vector ( λ V ) λ X is incremented by the frame score . ( λ V ) � The input SVM vector is the concatenation of ( λ λ X V ) λ X � Summation and normalization of the SVM input vector by the number of frames of the test segment T   = ∑ T S S j / T   T t   = j 1

5.3 SVM Input Vector Construction N Gaus. Mixtures λ dim= 2N h HYPOTH. c e e p s TARGET Log [ P ( t / g )] d i e l e =P gi j b t GMM MOD. a e L m a r F S t j S λ N Gaus. Mixtures Front-end = t j Max [P gi ] WORLD GMM MODEL

5.4 SVM : Train /Test Train Client class Impostor class SVM ... ... CLASSIFIER Test Test speech SVM INPUT VECTOR Decision score CONSTRUCTION

6. Database Complete Nist’99 evaluation data splitted in : � Development data = 100 speakers � 2min GMM model � Corresponding test data to train the SVM classifier (519 true and 5190 impostor accesses) � World data = 200 speakers � 4 sex/handset dependent world models � Pseudo-impostors = 190 sp. used for the h-norm � Evaluation data = 100 speakers = 449 true and 4490 impostor accesses

7. Experimental Protocol: 7.1 Feature Extraction � LFCC parametrization (32.5 ms windows every 10 ms) � Cepstral mean substraction for channel compensation � Feature vector dimension is 33 (16 cep, 16 dcep, ∆ log E) (Delta cepstral features on 5-frames windows) � Frame removal algorithm applied on feature vectors to discard non significant frames (bimodal energy distributions)

7.2 GMM Modeling � Speaker and background models � GMM’s with 128 mixtures � Diagonal covariance matrix � Standard EM algorithm with a max. of 20 iterations => Four speaker-independent, gender and handset dependent background (world) models

7.3 SVM Scoring � SVM model was trained using a development corpus (coming from the NIST’99 database) � Linear kernel is used � There are 519 true-target speakers accesses and 5190 impostors accesses � 5489 tests on the evaluation corpus (449 true-target speakers accesses and 4490 impostors accesses)

8.1 Results – preliminary results SVM trained with feature vectors used as input vectors – condition all

8.2 SVM and LLR scoring dndt = different Nu, different type, dnst = different Nu, same type no normalization

8.3 LLR - Influence of h-horm

8.3 SVM - Influence of h-horm

8.3 SVM – LLR comparison

8.4 Results table at EER DNST DNDT LLR SVM LLR SVM no 17.6 % 15.8 % 27.8 % 21.6 % normalization h-norm 15.2 % 14.0 % 23.3 % 20.5 %

9. Conclusions � Better results with GMM-SVM method in all the experimental conditions tested � Proposed method seems to be more robust to channel variations

10. Perspectives � Different kernel types and features will be experimented � Other normalization techniques � Another feature representation will be experimented to use the SVM in SV: λ = V ( ) [ P ( X / g ), .. , P ( X / g ) ] λ λ X n λ 1 λ λ = V ( ) [ P ( X / g ), .. , P ( X / g ) ] λ λ X n 1

Text-independent Speaker Verification Using Support Vector Machines - PowerPoint PPT Presentation

Text-independent Speaker Verification Using Support Vector Machines (SVM) Jamal Kharroubi Dijana Petrovska-Delacrtaz Grard Chollet (kharroub, petrovsk,chollet)@tsi.enst.fr ENST/CNRS-LTCI, 46 rue Barrault 75634 PARIS cedex 13 Odyssey

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

W3C Speaker Identification W3C Speaker Identification and Verification Workshop and Verification

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

INDEPENDENT INTEGRATED VERIFICATION AND VALIDATION (I 2 V 2 ) INDEPENDENT VERIFICATION and

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

DIVS DL/ID Verification Systems Verification of Legal Status DIVS Passport Verification

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Text Text #ICANN50 Contractual Compliance Text Text GNSO Council Meeting Wednesday, Jun 25

Configuration and Management of Speaker Verification Systems W3C Workshop on Speaker Biometrics

God Rescues Daniel from the Lions Daniel 6 Here is some test text Here is some test text Here

5. Text CHAPTER HIGHLIGHTS Text tradition. Codes for computer text. C d f t t t

STAR-CCM+: NACA0012 Flow and Aero-Acoustics Analysis James Ruiz Application Engineer January

smart data mobility smart data mobility smart data mobility grass coal oil data data

Roundtable Seeking your opinion on future OLI objectives Andre Anderko, OLI Systems & Pat

Mobile Location Analytics Mobile Location Analytics (MLA): Aggregated Insights about

Towards Language-Independent News Summarization Josef Steinberger Mijail Kabadjov, Ralf

Pl Plai ainview nview -Ol Old Be Bethpa hpage ge CSD 20 2016 16-2017 2017 POBJFK High

than fire or electricity. Google CEO, Sundar Pichai The pace of progress in artificial

Software is eating the world 128k LoC 4-5M LoC 9M LoC 18M LoC 45M LoC 150M LoC ML will

Text-independent Speaker Verification Using Support Vector Machines - PowerPoint PPT Presentation

Text-independent Speaker Verification Using Support Vector Machines (SVM) Jamal Kharroubi Dijana Petrovska-Delacrtaz Grard Chollet (kharroub, petrovsk,chollet)@tsi.enst.fr ENST/CNRS-LTCI, 46 rue Barrault 75634 PARIS cedex 13 Odyssey

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

W3C Speaker Identification W3C Speaker Identification and Verification Workshop and Verification

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

INDEPENDENT INTEGRATED VERIFICATION AND VALIDATION (I 2 V 2 ) INDEPENDENT VERIFICATION and

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

DIVS DL/ID Verification Systems Verification of Legal Status DIVS Passport Verification

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Text Text #ICANN50 Contractual Compliance Text Text GNSO Council Meeting Wednesday, Jun 25

Configuration and Management of Speaker Verification Systems W3C Workshop on Speaker Biometrics

God Rescues Daniel from the Lions Daniel 6 Here is some test text Here is some test text Here

5. Text CHAPTER HIGHLIGHTS Text tradition. Codes for computer text. C d f t t t

STAR-CCM+: NACA0012 Flow and Aero-Acoustics Analysis James Ruiz Application Engineer January

smart data mobility smart data mobility smart data mobility grass coal oil data data

Roundtable Seeking your opinion on future OLI objectives Andre Anderko, OLI Systems &amp; Pat

Mobile Location Analytics Mobile Location Analytics (MLA): Aggregated Insights about

Towards Language-Independent News Summarization Josef Steinberger Mijail Kabadjov, Ralf

Pl Plai ainview nview -Ol Old Be Bethpa hpage ge CSD 20 2016 16-2017 2017 POBJFK High

than fire or electricity. Google CEO, Sundar Pichai The pace of progress in artificial

Software is eating the world 128k LoC 4-5M LoC 9M LoC 18M LoC 45M LoC 150M LoC ML will

Roundtable Seeking your opinion on future OLI objectives Andre Anderko, OLI Systems & Pat