SLIDE 1 1
Value and Interpretation
in Forensic Automatic Speaker Recognition
Speech Processing and Biometrics Group Swiss Federal Institute of Technology Lausanne (EPFL)
Forensic Voice Comparison and Forensic Acoustics
3aSC1 Special Session on Forensic Voice Comparison and Forensic Acoustics @ 2nd Pan-American/Iberian Meeting on Acoustics, Cancún, México, 15–19 November, 2010 http://cancun2010.forensic-voice-comparison.net
SLIDE 2
2
European Network of Forensic Science Institutes
Forensic Speech and Audio Analysis Working Group
SLIDE 3 3
Outline
- Forensics and Biometrics
- Forensic Speaker Recognition (FSR)
- Bayesian Interpretation of Forensic Evidence
- Forensic Automatic Speaker Recognition (FASR)
- Automatic Speaker Recognition (ASR)
- Deterministic and Statistical Methods
- Voice as Biometric Evidence
- FASR - Univariate (Scoring) and Multivariate (direct)
Methods
SLIDE 4 4
Forensics
- Forensic science (Forensics) refers to the
applications of scientific principles and technical methods to the investigation of criminal activities, in order to demonstrate the existence of a crime, and to determine the identity of its author(s) and their modus operandi.
– Forensic (adj.) means the use of science or
technology in the investigation and establishment of facts or evidence in the court of law.
- Biometrics is the science of establishing identity
- f individuals based on their biological and
behavioral characteristics
SLIDE 5
5
Forensic Speaker Recognition
Trace Suspect
Casework
Forensic speaker recognition (FSR) is the process of determining if a specific individual (suspected speaker) is the source of a questioned voice recording (trace).
Questioned recording
SLIDE 6 6
Forensic Speaker Recognition
– earwitnesses, line-ups
- Visual methods and « voiceprint? »
– visual comparison of spectrograms of linguistically identical
utterances (utterly misleading!)
- Aural-instrumental methods
– analytical acoustic approach combined with an auditory phonetic
analysis
– Speaker verification – not adequate – Speaker identification – not adequate – Bayesian framework for the evaluation of voice as biometric evidence
Despite recent advances in Bayesian Statistics, it is critical not to loose sight of the fact that these methods are merely tools.
SLIDE 7 7
Automatic Speaker Recognition
- Speaker recognition is the general pattern recognition term
used to include all of the many different tasks of discriminating people based on the sound of their voices.
- Speaker identification is the task of deciding, given a
sample of speech, who among many candidate speakers said it. This is an N-class decision task, where N is the number of candidate speakers.
- Speaker verification is the task of deciding, given a sample
- f speech, whether a specified candidate speaker said it.
This is a 2-class decision task and is sometimes referred to as a speaker detection task.
SLIDE 8 8
Forensic Automatic Speaker Recognition
- Forensic automatic speaker recognition – data-driven methodology for
quantitative interpretation of recorded speech as evidence
- The interpretation of recorded voice as evidence in the forensic context
presents particular challenges, including within-speaker (within-source) variability, between-speakers (between-sources) variability, and differences in recording sessions conditions
- Consequently, FASR methods should provide a probabilistic evaluation
which gives the court an indication of the strength of the evidence given the estimated within-source, between-sources and between-session variabilities, and this evaluation should be compatible with other interpretations in other forensic disciplines
- The Bayesian interpretation framework, using a likelihood ratio concept,
- ffers such interoperability
Bayesian probability statements are about states of mind over states of the world, and not about states of the world per se).
SLIDE 9 9
Forensic specificity
- Short utterances
- Questioned recording - uncontrolled environment
- Investigations in controlled conditions (longer utterances)
- Telephone quality (95%)
- Clear understanding of the inferential process
- Respective duties of the actors involved in the judicial
process: jurists, forensic experts, judges, etc. The forensic expert’s role is to testify to the worth of the evidence by using, if possible a quantitative measure of this worth. It is up to the judge and/or the jury to use this information as an aid to their deliberations and decision.
SLIDE 10 10
Inference and Reasoning
- The role of forensic science is the provision of information
(factual or opinion) to help answer questions of importance to investigators and to courts of law.
- In developing an opinion, the forensic expert has to utilise
some form of inference process (from observations to the source).
– Deductive reasoning occurs in those situations where a logical rule
can be applied to a particular set of observations
– Induction is the process of reasoning from a set of observations within
a framework of incomplete knowledge.
- Hypothetical-deductive method combined with statistical
inference and inductive reasoning for forensic automatic speaker recognition – Bayesian interpretation of evidence
SLIDE 11 11
Evaluative forensic science opinion
- Evaluative opinion – an opinion of
evidential weight, based upon case specific propositions and clear conditioning information (framework of circumstances) that is provided for use as evidence in court.
- An evaluative opinion is an opinion based
upon the estimation of a likelihood ratio.
– UK Association of Forensic Science Providers,
"Standards for the formulation of evaluative forensic science expert opinion“, Science and Justice 49 (2009), 161-164.
SLIDE 12 12
Adversary System
The speaker at the origin
- f the questioned recording
is not the suspected speaker The suspected speaker is the source of the questioned recording
Expert opinion testimony has to be carefully documented, and expressed with precision, in as neutral and objective a way as the adversary system permits.
SLIDE 13
13
Bayesian Interpretation of Forensic Evidence
Principle
The Bayesian model, proposed for forensic speaker recognition
by Lewis in 1984, allows for revision based on new information of a measure of uncertainty (likelihood ratio of the evidence (province of the forensic expert)) which is applied to the pair of competing hypotheses.
The Bayesian model shows how new data (questioned recording)
can be combined with prior background knowledge (prior odds (province of the court)) to give posterior odds (province of the court) for judicial outcomes or issues.
prior odds x ? = posterior odds
Bayes’ Theorem tells us how we should rationally update subjective, probabilistic beliefs in light of evidence.
SLIDE 14 14
Bayesian Interpretation of Forensic Evidence
( ) ( )
( ) ( ) ( ) ( )
1 1 1
P E P E P P P E H H H H H P E H × =
prior background knowledge posterior knowledge
New Data Prior odds Posterior odds Likelihood Ratio (LR) province of the court province of the court province of the forensic expert
The odds form of Bayes’ theorem
Subjective probabilities are whatever a particular person believes, provided they satisfy the axioms of probability.
SLIDE 15 15
Bayesian Interpretation of Forensic Evidence
- H0 – the suspected speaker is the source of the
questioned recording
- H1 – the speaker at the origin of the questioned
recording is not the suspected speaker
1 1 1
( ) ( | ) ( | ) ( ) ( | ) ( | ) P P E P E P P E H H H P H H E H × =
1
( | ) ( | ) P E P H E H
Likelihood ratio Strength of evidence
similarity typicality Evidence evaluation and its value? Relevance and the formulation
SLIDE 16 16
Bayesian Interpretation of Forensic Evidence
- At a high level of abstraction, Bayesian data
analysis is extremely simple:
–
following the same, basic recipe: via Bayes Rule, we use the data to update prior beliefs about unknowns
- There is much to be said on the implementation of
this procedure in any specific application (e.g. FASR)
– Freedom of choosing evidence evaluation and its value – Freedom of formulating propositions (and corresponding
mathematical models) in relevance to the case
– Freedom of choosing automatic speaker recognition
method
SLIDE 17 17
Automatic Speaker Recognition
Feature Feature extraction extraction Reference Reference models/templates models/templates for each speaker for each speaker Similarity Similarity /Distance /Distance
Speech wave
Training Recognition
Recognition results Speaker model is a representation
- f the identity of a speaker obtained
from a speech utterance
SLIDE 18 18
Principal structure of speaker recognition systems
Feature extraction Similarity (Distance) Models for each speaker
Score Speech wave 2 Training
Text-dependent methods:
- Dynamic Time Warping (DTW)
- Hidden Markov Models (HMMs)
Text-independent methods:
- Vector Quantization (VQ)
- Gaussian Mixture Models (GMMs)
Feature extraction
Speech wave 1 Testing
SLIDE 19 19
Deterministic and Statistical Methods
– Dynamic Time Warping (DTW) – Vector Quantization (VQ) – …
– Hidden Markov Model (HMM) – Gaussian Mixture Model (GMM) – …
SLIDE 20
20
Dynamic Time Warping (DTW)
Reference Test Accumulated Distance Template (Acoustic vectors)
SLIDE 21
21
Dynamic Time Warping (DTW)
Acoustic vectors Frames Reference acoustic vectors Local distance Accumulated distance
SLIDE 22
22
Vector Quantization (VQ)
Spectral envelopes Speaker-specific codebook
SLIDE 23
23
Vector Quantization (VQ)
Code-book vectors Acoustic vectors Frames Local distance Accumulated distance
SLIDE 24
24
Phoneme models Feature vectors Phoneme k-1 Phoneme k Phoneme k+1 time
1 1 3 3 2 2
Output probabilities
b1(x) b2(x) b3(x) x x x
0.2 0.4 0.7 0.5 0.6 0.3 0.3
Hidden Markov Model (HMM)
SLIDE 25 25
Hidden Markov Model (HMM)
Frames States Local distance
Acoustic Vectors
Model
Accumulated distance (likelihood) Score
SLIDE 26 26
Gaussian Mixture Model (GMM)
1 2 1 2 1 2
( ) ( ) (1) (1) (1) ( (2) (2) (2) )
T T T
v D v D v v v v v v v D ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎥ ⎢ ⎥ ⎢ ⎥
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
Acoustic vectors for training GMM Feature 1 Feature 2 Feature D Histograms
score = likelihood (speech | model)
SLIDE 27 27
Gaussian Mixture Model
Gaussian distributions The most probable Gaussian distribution Frames Accumulated distance (likelihood) Score Local distance (likelihood)
Acoustic vectors
SLIDE 28 28
Voice as Biometric Evidence
- In the case of questioned recording (trace),
the biometric evidence does not consist in speech itself, but in the quantified degree of similarity between speaker dependent features extracted from the trace, and speaker dependent features extracted from recorded speech of a suspect, represented by his/her model.
Value of biometric evidence
SLIDE 29
29
Univariate (Scoring) Method
Feature extraction Similarity (Distance) Models for each speaker Score Suspected speaker reference database (R) Suspect Trace Evidence (E) Suspected speaker model Signification ? Bayesian Interpretation Questioned recording
SLIDE 30
30
Within-source variability
Feature extraction Similarity (Distance) Models for each speaker Scores Suspected speaker reference database (R) Suspect Suspected speaker model Distribution of the within-source variability Suspect
Suspected speaker control database (C)
SLIDE 31
31
Between-sources Variability
Feature extraction Similarity (Distance) Models for each speaker Scores Trace Speaker models of the potential population Questioned recording Potential population database (P) Distribution of the between-sources variability
SLIDE 32
32
Univariate (Scoring) Method
Trace Relevant population Suspect
Casework
Suspected speaker reference database (R) Suspected speaker control database (C) Potential population database (P)
SLIDE 33 33
Strength of Evidence - Likelihood Ratio
A likelihood ratio of 9.16 obtained means that it is 9.16 times more likely to
given the hypothesis H0 (the suspect is the source of the questioned recording) than given the hypothesis H1 (that another speaker from the relevant population is the source of the questioned recording).
Interpretation of Biometric Evidence
SLIDE 34
34
Univariate (Scoring) Method
Trace Potential population database (P) Feature extraction Feature extraction and modelling Feature extraction Feature extraction and modelling Suspected speaker control database (C) Suspected speaker reference database (R) Features Suspected speaker model Features Relevant speakers models Comparative analysis Comparative analysis Comparative analysis Similarity scores Similarity scores Evidence (E)
SLIDE 35
35
Interpretation of the evidence
Similarity scores Similarity scores Evidence (E) Modelling of the within-source variability Modelling of the between-sources variability Numerator of the likelihood ratio Denominator of the likelihood ratio Likelihood ratio (LR) Distribution of the within-source variability Distribution of the between-sources variability
SLIDE 36 36
Double Statistical Model (Scoring method) Double Statistical Model (Scoring method)
5 10 0.2 0.4 0.6 −2 2 1 2 3 −1 1 1 2 3 −2 2 1 2 3 4 −2 2 1 2 3 −1 1 1 2 3 4 −1 1 1 2 3 4 −0.5 0.5 1 2 3 4 −1 1 2 4 6 −0.5 0.5 2 4 6 8 −0.5 0.5 5 10 15 −0.5 0.5 5 10 15
First level: GMM of the features Second level: Models of within-source and between-sources variability
Distributions of LRs Tippett Plots: Cumulative distributions of LRs
Individual case Performance across several cases
Evaluation of the Strength of Evidence
SLIDE 37 37
Multivariate (Direct) Method
- The odds form of Bayes’ theorem
- H0 – the speaker’s model ( ) and the
questioned recording (T) have the same source
- H1 – the speaker’s model ( ) and the
questioned recording (T) have different sources
1 1 1
( ) ( | ) ( | ) ( ) ( | ) ( | ) P P T P T P P T H H H P H H T H × =
1
( | ) ( | ) P T P T λ λ
Likelihood ratio
λ
1
λ
Strength of evidence ?
SLIDE 38 38
Multivariate (Direct) Method
- The odds form of Bayes’ theorem
- H0 – the speaker’s model ( ) and multivariate
representation of trace (T) have the same source
- H1 – the speaker’s model ( ) and multivariate
representation of trace (T) have different sources λ
1
λ
1 1 1
( ) ( | ) ( | ) ( ) ( | ) ( | ) P P E P E P P E H H H P H H E H × =
Likelihood ratio Strength of trace evidence with respect to new hypotheses
1
( | ) ( | ) P E P H E H E – multivariate feature representation of trace evidence
SLIDE 39
39
Multivariate (Direct) Method – LR Numerator
Feature extraction Similarity (Distance) Models for each speaker Score Suspected speaker reference database (R) Suspect Trace Multivariate model of suspected speaker Numerator of the likelihood ratio Questioned recording score = log-likelihood (trace | H0) Evidence (E)
SLIDE 40
40 Feature extraction Similarity (Distance) Model of all speakers Score Trace Multivariate model of potential population Questioned recording Potential population database (P)
Multivariate (Direct) Method – LR Denominator
Denominator of the likelihood ratio score = log-likelihood (trace | H1) Evidence (E)
SLIDE 41
41
Multivariate (Direct) Method
Trace Potential population database (P) Feature extraction and modelling Feature extraction Feature extraction and modelling Suspected speaker reference database (R) Suspected speaker model Features Relevant speakers model Comparative analysis Comparative analysis Numerator of the likelihood ratio Denominator of the likelihood ratio Likelihood ratio (LR) Score 1 Score 2 GMM 1 GMM 2
E
SLIDE 42 42
Conclusions
- Statistical evaluation, and particularly Bayesian methods
such as calculation of likelihood ratios based on automatic (deterministic and statistical) pattern recognition methods, have been criticized, but they are the only demonstrably rational means of quantifying and evaluating the value of biometric evidence available at the moment.
- The data-driven based methodology provides a coherent
way of assessing and presenting the biometric evidence of questioned recording.
- The future methods to be developed for interpretation of
voice as forensic evidence should combine the advantages
- f automatic signal processing and pattern recognition
- bjectivity with the methodological transparency solicited in
forensic investigations.
SLIDE 43 43
References (1998-2009)
- D. Meuwly, M. El-Maliki, A. Drygajlo, "Forensic Speaker Recognition Using
Gaussian Mixture Models and a Bayesian Framework", COST 250 Workshop on Speaker Recognition by Man and by Machine: Directions for Forensic Applications, Ankara, Turkey, April 1998, pp. 52-55.
- D. Meuwly, A. Drygajlo, "Forensic Speaker Recognition Based on a Bayesian
Framework and Gaussian Mixture Modelling (GMM)", The Workshop on Speaker Recognition “2001: A Speaker Odyssey”, Crete, Greece, June, 2001, pp. 145- 150.
- D. Meuwly, "Reconnaissance de locuteurs en sciences forensiques: l’apport d’une
approche automatique“, PhD thesis, IPSC, University of Lausanne, 2001.
- A. Drygajlo, D. Meuwly, A. Alexander, "Statistical Methods and Bayesian
Interpretation of Evidence in Forensic Automatic Speaker Recognition", EUROSPEECH'2003, Geneva, Switzerland, Sept. 2003, pp. 689-692.
- A. Alexander, A. Drygajlo, "Scoring and Direct Methods for the Interpretation of
Evidence in Forensic Speaker Recognition“, ICSLP 2004, Jeju, Korea, 2004.
- A. Alexander, F. Botti, D. Dessimoz, A. Drygajlo, "The Effect of Mismatched
Recording Conditions on Human and Automatic Speaker Recognition in Forensic Applications", Forensic Science International, 146S (2004), pp. S95-S99.
SLIDE 44 44
References (1998-2009)
- A. Alexander, D. Dessimoz, F. Botti, and A. Drygajlo, "Aural and Automatic
Forensic Speaker Recognition in Mismatched Conditions", The International Journal of Speech, Language and the Law, vol. 12, Dec. 2005, pp. 214-234.
- M. Arcienega, A. Alexander, P. Zimmermann, A. Drygajlo, "A Bayesian Network
Approach Combining Pitch and Spectral Envelope Features to Reduce Channel Mismatch in Speaker Verification and Forensic Speaker Recognition", InterSpeech 2005, Lisbon, Portugal, Sept. 4-8, 2005.
- J. Gonzalez-Rodriguez, A. Drygajlo, D. Ramos-Castro, M. Garcia-Gomar, J.
Ortega-Garcia, "Robust Estimation, Interpretation and Assessment of Likelihood Ratios in Forensic Speaker Recognition", invited paper, Computer Speech and Language, vol. 20, 2006, pp. 331-355.
- A. Drygajlo, "Forensic Automatic Speaker Recognition", IEEE Signal Processing
Magazine, 24 (2): 132-135 (2007).
- A. Drygajlo, “Statistical Evaluation of Biometric Evidence in Forensic Automatic
Speaker Recognition", 3rd International Workshop on Computational Forensics (IWCF 2009), The Hague, The Netherlands: 1-12 (2009).
- A. Drygajlo, "Forensic evidence of voice", Chapter in S.Z. Li (Ed.), "Encyclopedia
- f Biometrics", Springer, New York, August 2009, pp. 1388-1396