p(E|H p(E|H p ) p ) p(E|H p(E|H d ) d ) Need for testing In - - PowerPoint PPT Presentation

p e h p e h p p p e h p e h d d need for testing
SMART_READER_LITE
LIVE PREVIEW

p(E|H p(E|H p ) p ) p(E|H p(E|H d ) d ) Need for testing In - - PowerPoint PPT Presentation

Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case forensic_eval_01 Geoffrey Stewart Morrison Ewald Enzinger p(E|H p(E|H p ) p ) p(E|H p(E|H d ) d ) Need for testing


slide-1
SLIDE 1

Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case forensic_eval_01 Geoffrey Stewart Morrison Ewald Enzinger

p(E|H

p)

p(E|H

d)

p(E|H

p)

p(E|H

d)

slide-2
SLIDE 2

Need for testing

In forensic voice comparison, calls for validity and reliability to be

empirically tested under casework conditions date back to the 1960s, but still go widely unheeded.

Across all branches of forensic science, there is now increasing pressure

to validate performance before analysis systems are used to assess strength of evidence for presentation in court – [1993, 509 US 579] Daubert v Merrell Dow Pharmaceuticals – National Research Council Report 2009 – Forensic Science Regulator Codes of Practice 2014 – ENFSI 2015 Methodological guidelines for best practice in forensic semiautomatic and automatic speaker recognition

slide-3
SLIDE 3

forensic_eval_01

Open to operational forensic laboratories and research laboratories Training and test data based on a real forensic case

– relevant population – speaking styles – recording conditions

Virtual Special Issue in Speech Communication

– introductory paper includes rules – describe system and procedures in sufficient detail for replication – performance metrics and graphics – discussion and conclusion may include recommendations for practice – submissions accepted over a 2 year timeframe

slide-4
SLIDE 4

forensic_eval_01

Casework conditions vary substantially from case to case forensic_eval_01 evaluates systems under conditions reflecting those of

  • ne real case

Results should not be assumed to be generalisable to other case

conditions

For each case, the validity and reliability of the system employed

should be assessed under conditions reflecting those of that case

slide-5
SLIDE 5

Offender recording

Telephone call made to a financial institution’s call centre – landline – call centre background noise babble, typing – saved in a compressed format – 46 seconds net speech – adult male Australian English speaker

  • Suspect recording

Police interview – reverberation – ventilation system noise – saved in a compressed format

Forensic Voice Comparison Case

slide-6
SLIDE 6

Data

Male Australian English speakers Multiple non-contemporaneous recordings per speaker Multiple speaking tasks per recording session High-quality audio

8kHz

xr[i] yr[i] xn[i]

300 Hz 3400 Hz

a-Law G.723.1 scaling

  • ffender

recording noise

compression/ decompression compression/ decompression

s r

xr[i]

MPEG-1 layer 2

yr[i] xn[i]

scaling

suspect recording noise

compression/ decompression

s r

Offender condition

– information exchange task as input

Suspect condition

– interview task as input

slide-7
SLIDE 7

Data

Training data:

– 423 recordings from 105 speakers – 191 recordings in offender condition – 232 in suspect condition

Test data:

– 223 recordings from 61 speakers – 61 recordings in offender condition – 162 in suspect condition

slide-8
SLIDE 8

forensic_eval_01

preliminary results from systems already tested on the forensic_eval_01

data

slide-9
SLIDE 9

Enzinger & Morrison i-vector system

1st through 14th MFCCs + deltas

– feature warping

UBM

– 512 Gaussians

T-matrix

– 400 or 200 dimensions

i-vector domain mismatch compensation

– canonical linear discriminant functions (aka LDA), 50 dimensions

PLDA

– full rank covariance for and for B W

score to likelihood ratio conversion (aka calibration)

– logistic regression

slide-10
SLIDE 10

Enzinger & Morrison i-vector system

Generic data for training models which calculate scores Generic data for training mismatch compensation models in i-vector

domain

Case specific data for training score-to-LR model Case specific data for training models which calculate scores Case specific + generic data for training mismatch compensation models in

i-vector domain

Case specific data for training score-to-LR model

slide-11
SLIDE 11

Enzinger & Morrison i-vector system

Generic data Case specific data

0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 95% credible interval (± order of magnitude) Cllr−mean 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cllr−pooled

slide-12
SLIDE 12

Enzinger & Morrison i-vector system

Generic data Case specific data

0.2 0.4 0.6 0.8 1 Cumulative Proportion −4 −3 −2 −1 1 2 3 4 0.2 0.4 0.6 0.8 1 log10 Likelihood ratio Cumulative Proportion

slide-13
SLIDE 13

Batvox v4.1

evaluated by David van der Vloed, Netherlands Forensic Institute reference population data

– all 105 speakers (1 suspect-condition recording per speaker) – 30 selected by Batvox

imposter data

– none – all 105 speakers (1 offender-condition recording per speaker)

slide-14
SLIDE 14

0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 95% credible interval (± order of magnitude) Cllr−mean 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cllr−pooled

all reference data + no imposter data all reference data + imposter data selected reference data + no imposter data selected reference data + imposter data

Batvox v4.1

slide-15
SLIDE 15

0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 95% credible interval (± order of magnitude) Cllr−mean 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cllr−pooled

all reference data + no imposter data all reference data + imposter data selected reference data + no imposter data selected reference data + imposter data

Batvox v4.1

30 reference speakers 105 reference speakers

slide-16
SLIDE 16

0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 95% credible interval (± order of magnitude) Cllr−mean 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cllr−pooled

all reference data + no imposter data all reference data + imposter data selected reference data + no imposter data selected reference data + imposter data

Batvox v4.1

no imposters 105 imposters

slide-17
SLIDE 17

0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 95% credible interval (± order of magnitude) Cllr−mean 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cllr−pooled

all reference data + no imposter data all reference data + imposter data selected reference data + no imposter data selected reference data + imposter data

Batvox v4.1

105 reference speakers 105 imposters

slide-18
SLIDE 18

0.2 0.4 0.6 0.8 1 Cumulative Proportion 0.2 0.4 0.6 0.8 1 Cumulative Proportion −4 −3 −2 −1 1 2 3 4 log10 Likelihood ratio −4 −3 −2 −1 1 2 3 4 log10 Likelihood ratio all reference data + no imposter data all reference data + imposter data selected reference data + no imposter data selected reference data + imposter data

Batvox v4.1

105 reference speakers no imposters 30 reference speakers 105 imposters

slide-19
SLIDE 19

Esk rrik Asko e

http://geoff-morrison.net/ http://forensic-evaluation.net/

slide-20
SLIDE 20

0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 95% credible interval (± order of magnitude) Cllr−mean 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cllr−pooled

Best of

Batvox v4.1 Enzinger & Morrison

slide-21
SLIDE 21

Best of

Batvox v4.1 Enzinger & Morrison

−4 −3 −2 −1 1 2 3 4 log10 Likelihood ratio 0.2 0.4 0.6 0.8 1 Cumulative Proportion

0.2 0.4 0.6 0.8 1

Cumulative Proportion