[PPT] - Forensic Voice Comparison and Forensic Acoustics 1 Value and PowerPoint Presentation

SLIDE 1

1

Value and Interpretation

f Biometric Evidence

in Forensic Automatic Speaker Recognition

Dr. Andrzej Drygajlo

Speech Processing and Biometrics Group Swiss Federal Institute of Technology Lausanne (EPFL)

Forensic Voice Comparison and Forensic Acoustics

3aSC1 Special Session on Forensic Voice Comparison and Forensic Acoustics @ 2nd Pan-American/Iberian Meeting on Acoustics, Cancún, México, 15–19 November, 2010 http://cancun2010.forensic-voice-comparison.net

SLIDE 2

2

European Network of Forensic Science Institutes

Forensic Speech and Audio Analysis Working Group

SLIDE 3

3

Outline

Forensics and Biometrics
Forensic Speaker Recognition (FSR)
Bayesian Interpretation of Forensic Evidence
Forensic Automatic Speaker Recognition (FASR)
Automatic Speaker Recognition (ASR)
Deterministic and Statistical Methods
Voice as Biometric Evidence
FASR - Univariate (Scoring) and Multivariate (direct)

Methods

Conclusions

SLIDE 4

4

Forensics

Forensic science (Forensics) refers to the

applications of scientific principles and technical methods to the investigation of criminal activities, in order to demonstrate the existence of a crime, and to determine the identity of its author(s) and their modus operandi.

– Forensic (adj.) means the use of science or

technology in the investigation and establishment of facts or evidence in the court of law.

Biometrics is the science of establishing identity
f individuals based on their biological and

behavioral characteristics

SLIDE 5

5

Forensic Speaker Recognition

Trace Suspect

Casework

Forensic speaker recognition (FSR) is the process of determining if a specific individual (suspected speaker) is the source of a questioned voice recording (trace).

Questioned recording

SLIDE 6

6

Forensic Speaker Recognition

Aural-perceptual methods

– earwitnesses, line-ups

Visual methods and « voiceprint? »

– visual comparison of spectrograms of linguistically identical

utterances (utterly misleading!)

Aural-instrumental methods

– analytical acoustic approach combined with an auditory phonetic

analysis

Automatic methods

– Speaker verification – not adequate – Speaker identification – not adequate – Bayesian framework for the evaluation of voice as biometric evidence

Despite recent advances in Bayesian Statistics, it is critical not to loose sight of the fact that these methods are merely tools.

SLIDE 7

7

Automatic Speaker Recognition

Speaker recognition is the general pattern recognition term

used to include all of the many different tasks of discriminating people based on the sound of their voices.

Speaker identification is the task of deciding, given a

sample of speech, who among many candidate speakers said it. This is an N-class decision task, where N is the number of candidate speakers.

Speaker verification is the task of deciding, given a sample
f speech, whether a specified candidate speaker said it.

This is a 2-class decision task and is sometimes referred to as a speaker detection task.

SLIDE 8

8

Forensic Automatic Speaker Recognition

Forensic automatic speaker recognition – data-driven methodology for

quantitative interpretation of recorded speech as evidence

The interpretation of recorded voice as evidence in the forensic context

presents particular challenges, including within-speaker (within-source) variability, between-speakers (between-sources) variability, and differences in recording sessions conditions

Consequently, FASR methods should provide a probabilistic evaluation

which gives the court an indication of the strength of the evidence given the estimated within-source, between-sources and between-session variabilities, and this evaluation should be compatible with other interpretations in other forensic disciplines

The Bayesian interpretation framework, using a likelihood ratio concept,
ffers such interoperability

Bayesian probability statements are about states of mind over states of the world, and not about states of the world per se).

SLIDE 9

9

Forensic specificity

Short utterances
Questioned recording - uncontrolled environment
Investigations in controlled conditions (longer utterances)
Telephone quality (95%)
Clear understanding of the inferential process
Respective duties of the actors involved in the judicial

process: jurists, forensic experts, judges, etc. The forensic expert’s role is to testify to the worth of the evidence by using, if possible a quantitative measure of this worth. It is up to the judge and/or the jury to use this information as an aid to their deliberations and decision.

SLIDE 10

10

Inference and Reasoning

The role of forensic science is the provision of information

(factual or opinion) to help answer questions of importance to investigators and to courts of law.

In developing an opinion, the forensic expert has to utilise

some form of inference process (from observations to the source).

Reasoning

– Deductive reasoning occurs in those situations where a logical rule

can be applied to a particular set of observations

– Induction is the process of reasoning from a set of observations within

a framework of incomplete knowledge.

Hypothetical-deductive method combined with statistical

inference and inductive reasoning for forensic automatic speaker recognition – Bayesian interpretation of evidence

SLIDE 11

11

Evaluative forensic science opinion

Evaluative opinion – an opinion of

evidential weight, based upon case specific propositions and clear conditioning information (framework of circumstances) that is provided for use as evidence in court.

An evaluative opinion is an opinion based

upon the estimation of a likelihood ratio.

– UK Association of Forensic Science Providers,

"Standards for the formulation of evaluative forensic science expert opinion“, Science and Justice 49 (2009), 161-164.

SLIDE 12

12

Adversary System

The speaker at the origin

f the questioned recording

is not the suspected speaker The suspected speaker is the source of the questioned recording

Expert opinion testimony has to be carefully documented, and expressed with precision, in as neutral and objective a way as the adversary system permits.

SLIDE 13

13

Bayesian Interpretation of Forensic Evidence

Principle

The Bayesian model, proposed for forensic speaker recognition

by Lewis in 1984, allows for revision based on new information of a measure of uncertainty (likelihood ratio of the evidence (province of the forensic expert)) which is applied to the pair of competing hypotheses.

The Bayesian model shows how new data (questioned recording)

can be combined with prior background knowledge (prior odds (province of the court)) to give posterior odds (province of the court) for judicial outcomes or issues.

prior odds x ? = posterior odds

Bayes’ Theorem tells us how we should rationally update subjective, probabilistic beliefs in light of evidence.

SLIDE 14

14

Bayesian Interpretation of Forensic Evidence

( ) ( )

( ) ( ) ( ) ( )

1 1 1

P E P E P P P E H H H H H P E H × =

prior background knowledge posterior knowledge

n the issue

New Data Prior odds Posterior odds Likelihood Ratio (LR) province of the court province of the court province of the forensic expert

The odds form of Bayes’ theorem

Subjective probabilities are whatever a particular person believes, provided they satisfy the axioms of probability.

SLIDE 15

15

Bayesian Interpretation of Forensic Evidence

H0 – the suspected speaker is the source of the

questioned recording

H1 – the speaker at the origin of the questioned

recording is not the suspected speaker

1 1 1

( ) ( | ) ( | ) ( ) ( | ) ( | ) P P E P E P P E H H H P H H E H × =

1

( | ) ( | ) P E P H E H

Likelihood ratio Strength of evidence

similarity typicality Evidence evaluation and its value? Relevance and the formulation

f propositions?

SLIDE 16

16

Bayesian Interpretation of Forensic Evidence

At a high level of abstraction, Bayesian data

analysis is extremely simple:

–

following the same, basic recipe: via Bayes Rule, we use the data to update prior beliefs about unknowns

There is much to be said on the implementation of

this procedure in any specific application (e.g. FASR)

– Freedom of choosing evidence evaluation and its value – Freedom of formulating propositions (and corresponding

mathematical models) in relevance to the case

– Freedom of choosing automatic speaker recognition

method

SLIDE 17

17

Automatic Speaker Recognition

Feature Feature extraction extraction Reference Reference models/templates models/templates for each speaker for each speaker Similarity Similarity /Distance /Distance

Speech wave

Training Recognition

Recognition results Speaker model is a representation

f the identity of a speaker obtained

from a speech utterance

f known origin

SLIDE 18

18

Principal structure of speaker recognition systems

Feature extraction Similarity (Distance) Models for each speaker

Score Speech wave 2 Training

Text-dependent methods:

Dynamic Time Warping (DTW)
Hidden Markov Models (HMMs)

Text-independent methods:

Vector Quantization (VQ)
Gaussian Mixture Models (GMMs)

Feature extraction

Speech wave 1 Testing

SLIDE 19

19

Deterministic and Statistical Methods

Deterministic Methods

– Dynamic Time Warping (DTW) – Vector Quantization (VQ) – …

Statistical Methods

– Hidden Markov Model (HMM) – Gaussian Mixture Model (GMM) – …

SLIDE 20

20

Dynamic Time Warping (DTW)

Reference Test Accumulated Distance Template (Acoustic vectors)

SLIDE 21

21

Dynamic Time Warping (DTW)

Acoustic vectors Frames Reference acoustic vectors Local distance Accumulated distance

SLIDE 22

22

Vector Quantization (VQ)

Spectral envelopes Speaker-specific codebook

SLIDE 23

23

Vector Quantization (VQ)

Code-book vectors Acoustic vectors Frames Local distance Accumulated distance

SLIDE 24

24 Phoneme models Feature vectors Phoneme k-1 Phoneme k Phoneme k+1 time

1 1 3 3 2 2

Output probabilities

b1(x) b2(x) b3(x) x x x

0.2 0.4 0.7 0.5 0.6 0.3 0.3

Hidden Markov Model (HMM)

SLIDE 25

25

Hidden Markov Model (HMM)

Frames States Local distance

Acoustic Vectors

Model

Accumulated distance (likelihood) Score

SLIDE 26

26

Gaussian Mixture Model (GMM)

1 2 1 2 1 2

( ) ( ) (1) (1) (1) ( (2) (2) (2) )

T T T

v D v D v v v v v v v D ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

• •
⎢

⎥ ⎢ ⎥ ⎢ ⎥

⎢

⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦

Acoustic vectors for training GMM Feature 1 Feature 2 Feature D Histograms

score = likelihood (speech | model)

SLIDE 27

27

Gaussian Mixture Model

Gaussian distributions The most probable Gaussian distribution Frames Accumulated distance (likelihood) Score Local distance (likelihood)

Acoustic vectors

SLIDE 28

28

Voice as Biometric Evidence

In the case of questioned recording (trace),

the biometric evidence does not consist in speech itself, but in the quantified degree of similarity between speaker dependent features extracted from the trace, and speaker dependent features extracted from recorded speech of a suspect, represented by his/her model.

Value of biometric evidence

SLIDE 29

29

Univariate (Scoring) Method

Feature extraction Similarity (Distance) Models for each speaker Score Suspected speaker reference database (R) Suspect Trace Evidence (E) Suspected speaker model Signification ? Bayesian Interpretation Questioned recording

SLIDE 30

30

Within-source variability

Feature extraction Similarity (Distance) Models for each speaker Scores Suspected speaker reference database (R) Suspect Suspected speaker model Distribution of the within-source variability Suspect

Suspected speaker control database (C)

SLIDE 31

31

Between-sources Variability

Feature extraction Similarity (Distance) Models for each speaker Scores Trace Speaker models of the potential population Questioned recording Potential population database (P) Distribution of the between-sources variability

SLIDE 32

32

Univariate (Scoring) Method

Trace Relevant population Suspect

Casework

Suspected speaker reference database (R) Suspected speaker control database (C) Potential population database (P)

SLIDE 33

33

Strength of Evidence - Likelihood Ratio

A likelihood ratio of 9.16 obtained means that it is 9.16 times more likely to

bserve the score (E)

given the hypothesis H0 (the suspect is the source of the questioned recording) than given the hypothesis H1 (that another speaker from the relevant population is the source of the questioned recording).

Interpretation of Biometric Evidence

SLIDE 34

34

Univariate (Scoring) Method

Trace Potential population database (P) Feature extraction Feature extraction and modelling Feature extraction Feature extraction and modelling Suspected speaker control database (C) Suspected speaker reference database (R) Features Suspected speaker model Features Relevant speakers models Comparative analysis Comparative analysis Comparative analysis Similarity scores Similarity scores Evidence (E)

SLIDE 35

35

Interpretation of the evidence

Similarity scores Similarity scores Evidence (E) Modelling of the within-source variability Modelling of the between-sources variability Numerator of the likelihood ratio Denominator of the likelihood ratio Likelihood ratio (LR) Distribution of the within-source variability Distribution of the between-sources variability

SLIDE 36

36

Double Statistical Model (Scoring method) Double Statistical Model (Scoring method)

5 10 0.2 0.4 0.6 −2 2 1 2 3 −1 1 1 2 3 −2 2 1 2 3 4 −2 2 1 2 3 −1 1 1 2 3 4 −1 1 1 2 3 4 −0.5 0.5 1 2 3 4 −1 1 2 4 6 −0.5 0.5 2 4 6 8 −0.5 0.5 5 10 15 −0.5 0.5 5 10 15

First level: GMM of the features Second level: Models of within-source and between-sources variability

Distributions of LRs Tippett Plots: Cumulative distributions of LRs

Individual case Performance across several cases

Evaluation of the Strength of Evidence

SLIDE 37

37

Multivariate (Direct) Method

The odds form of Bayes’ theorem
H0 – the speaker’s model ( ) and the

questioned recording (T) have the same source

H1 – the speaker’s model ( ) and the

questioned recording (T) have different sources

1 1 1

( ) ( | ) ( | ) ( ) ( | ) ( | ) P P T P T P P T H H H P H H T H × =

1

( | ) ( | ) P T P T λ λ

Likelihood ratio

λ

1

λ

Strength of evidence ?

SLIDE 38

38

Multivariate (Direct) Method

The odds form of Bayes’ theorem
H0 – the speaker’s model ( ) and multivariate

representation of trace (T) have the same source

H1 – the speaker’s model ( ) and multivariate

representation of trace (T) have different sources λ

1

λ

1 1 1

( ) ( | ) ( | ) ( ) ( | ) ( | ) P P E P E P P E H H H P H H E H × =

Likelihood ratio Strength of trace evidence with respect to new hypotheses

1

( | ) ( | ) P E P H E H E – multivariate feature representation of trace evidence

SLIDE 39

39

Multivariate (Direct) Method – LR Numerator

Feature extraction Similarity (Distance) Models for each speaker Score Suspected speaker reference database (R) Suspect Trace Multivariate model of suspected speaker Numerator of the likelihood ratio Questioned recording score = log-likelihood (trace | H0) Evidence (E)

SLIDE 40

40 Feature extraction Similarity (Distance) Model of all speakers Score Trace Multivariate model of potential population Questioned recording Potential population database (P)

Multivariate (Direct) Method – LR Denominator

Denominator of the likelihood ratio score = log-likelihood (trace | H1) Evidence (E)

SLIDE 41

41

Multivariate (Direct) Method

Trace Potential population database (P) Feature extraction and modelling Feature extraction Feature extraction and modelling Suspected speaker reference database (R) Suspected speaker model Features Relevant speakers model Comparative analysis Comparative analysis Numerator of the likelihood ratio Denominator of the likelihood ratio Likelihood ratio (LR) Score 1 Score 2 GMM 1 GMM 2

E

SLIDE 42

42

Conclusions

Statistical evaluation, and particularly Bayesian methods

such as calculation of likelihood ratios based on automatic (deterministic and statistical) pattern recognition methods, have been criticized, but they are the only demonstrably rational means of quantifying and evaluating the value of biometric evidence available at the moment.

The data-driven based methodology provides a coherent

way of assessing and presenting the biometric evidence of questioned recording.

The future methods to be developed for interpretation of

voice as forensic evidence should combine the advantages

f automatic signal processing and pattern recognition
bjectivity with the methodological transparency solicited in

forensic investigations.

SLIDE 43

43

References (1998-2009)

D. Meuwly, M. El-Maliki, A. Drygajlo, "Forensic Speaker Recognition Using

Gaussian Mixture Models and a Bayesian Framework", COST 250 Workshop on Speaker Recognition by Man and by Machine: Directions for Forensic Applications, Ankara, Turkey, April 1998, pp. 52-55.

D. Meuwly, A. Drygajlo, "Forensic Speaker Recognition Based on a Bayesian

Framework and Gaussian Mixture Modelling (GMM)", The Workshop on Speaker Recognition “2001: A Speaker Odyssey”, Crete, Greece, June, 2001, pp. 145- 150.

D. Meuwly, "Reconnaissance de locuteurs en sciences forensiques: l’apport d’une

approche automatique“, PhD thesis, IPSC, University of Lausanne, 2001.

A. Drygajlo, D. Meuwly, A. Alexander, "Statistical Methods and Bayesian

Interpretation of Evidence in Forensic Automatic Speaker Recognition", EUROSPEECH'2003, Geneva, Switzerland, Sept. 2003, pp. 689-692.

A. Alexander, A. Drygajlo, "Scoring and Direct Methods for the Interpretation of

Evidence in Forensic Speaker Recognition“, ICSLP 2004, Jeju, Korea, 2004.

A. Alexander, F. Botti, D. Dessimoz, A. Drygajlo, "The Effect of Mismatched

Recording Conditions on Human and Automatic Speaker Recognition in Forensic Applications", Forensic Science International, 146S (2004), pp. S95-S99.

SLIDE 44

44

References (1998-2009)

A. Alexander, D. Dessimoz, F. Botti, and A. Drygajlo, "Aural and Automatic

Forensic Speaker Recognition in Mismatched Conditions", The International Journal of Speech, Language and the Law, vol. 12, Dec. 2005, pp. 214-234.

M. Arcienega, A. Alexander, P. Zimmermann, A. Drygajlo, "A Bayesian Network

Approach Combining Pitch and Spectral Envelope Features to Reduce Channel Mismatch in Speaker Verification and Forensic Speaker Recognition", InterSpeech 2005, Lisbon, Portugal, Sept. 4-8, 2005.

J. Gonzalez-Rodriguez, A. Drygajlo, D. Ramos-Castro, M. Garcia-Gomar, J.

Ortega-Garcia, "Robust Estimation, Interpretation and Assessment of Likelihood Ratios in Forensic Speaker Recognition", invited paper, Computer Speech and Language, vol. 20, 2006, pp. 331-355.

A. Drygajlo, "Forensic Automatic Speaker Recognition", IEEE Signal Processing

Magazine, 24 (2): 132-135 (2007).

A. Drygajlo, “Statistical Evaluation of Biometric Evidence in Forensic Automatic

Speaker Recognition", 3rd International Workshop on Computational Forensics (IWCF 2009), The Hague, The Netherlands: 1-12 (2009).

A. Drygajlo, "Forensic evidence of voice", Chapter in S.Z. Li (Ed.), "Encyclopedia
f Biometrics", Springer, New York, August 2009, pp. 1388-1396