Text-independent Speaker Verification Using Support Vector Machines (SVM)
Jamal Kharroubi Dijana Petrovska-Delacrétaz Gérard Chollet
(kharroub, petrovsk,chollet)@tsi.enst.fr
ENST/CNRS-LTCI, 46 rue Barrault 75634 PARIS cedex 13
Text-independent Speaker Verification Using Support Vector Machines - - PowerPoint PPT Presentation
Text-independent Speaker Verification Using Support Vector Machines (SVM) Jamal Kharroubi Dijana Petrovska-Delacrtaz Grard Chollet (kharroub, petrovsk,chollet)@tsi.enst.fr ENST/CNRS-LTCI, 46 rue Barrault 75634 PARIS cedex 13 Odyssey
ENST/CNRS-LTCI, 46 rue Barrault 75634 PARIS cedex 13
Gaussian Mixture Models (GMM)
Support Vector Machines (SVM)
Combine GMM and SVM for Speaker Verification
Pattern classification problem :
Solution : find decision boundaries that separate the
SVM are :
I n p u t s p a c e F e a t u r e s p a c e
Separating hyperplane H , with the optimal hyperplane Ho
Ho H Class(X)
X1 X2 Z2 Z3
2 2 2 1 2 1 2 1
Z1
Goal : identify one among a given closed set of speakers Methods used : one vs. other speakers or
The input vectors of the SVM’s are spectral parameters Database : Switchboard, 26 mixed sex speakers,
Baseline comparison with Bayesian (GMM) modeling
Results => slightly better performance with SVM’s,
Why these disapointing results ?
Not done before Difficulty : mismatch of the quantity of labelled data,
Our preliminary test, with speech frames as input to SVM
Present approach :
m i y E x y x D
i i i i
,.. 1 ; 1 , 1 ; ) , ( = − ∈ ∈ =
m i y E x y x D
i i i i
,.. 1 ; 1 , 1 ; ) ), ( ( = − ∈ ∈ Ψ =
i
i
Linear Polynomial Radial Basis Function (RBF)
d
2
Reminder : GMM speaker modeling and
SVM classifier
Front-end
Front-end
Front-end
T e s t S p e e c h
L L R S C O R E
First, all the components of the vectors are initialized to zero
λX
λX
i j
T j t
1
j
t
j
If is given by gi belonging to , the ith component of
The input SVM vector is the concatenation of Summation and normalization of the SVM input vector by the
λX
λX
λX
λX
j
t
j
t
T j t T
j /
1
=
N Gaus. Mixtures
HYPOTH. TARGET GMM MOD.
Front-end
L a b e l e d s p e e c h F r a m e t j
)] / ( [
i
g t P Log
N Gaus. Mixtures
= Max [Pgi]
j
t
j
t
dim= 2N
... ...
Client class
SVM CLASSIFIER
Test speech SVM INPUT VECTOR CONSTRUCTION Decision score
Impostor class
Development data = 100 speakers
World data = 200 speakers
Pseudo-impostors = 190 sp. used for the h-norm Evaluation data = 100 speakers =
LFCC parametrization (32.5 ms windows every 10 ms) Cepstral mean substraction for channel compensation Feature vector dimension is 33 (16 cep, 16 dcep, ∆ log E)
Frame removal algorithm applied on feature vectors
Speaker and background models
SVM model was trained using a development corpus (coming from
Linear kernel is used There are 519 true-target speakers accesses
5489 tests on the evaluation corpus (449 true-target speakers
dndt = different Nu, different type, dnst = different Nu, same type no normalization
Better results with GMM-SVM method in all the
Proposed method seems to be more robust to channel
Different kernel types and features will be experimented Other normalization techniques Another feature representation will be experimented to
1
n X λ λ λ
1
n X λ λ λ λ =