[PPT] - Investigating the forensic applications of global and local temporal PowerPoint Presentation

SLIDE 1

Leah Bradshaw, Vincent Hughes, and Eleanor Chodroff

Department of Language and Linguistic Science University of York

Investigating the forensic applications of global and local temporal representations of speech for dialect discrimination

SLIDE 2

Forensi sic phonetics

INTRODUCTION Voice comparison Voice analysis

SLIDE 3

INTRODUCTION

Speake ker classi ssification

Process of determining speaker-specific features (e.g., gender, age, dialect, idiosyncratic speech markers, etc.) using:

Auditory analysis
Acoustic-phonetic analysis
Automatic speaker recognition approaches

SLIDE 4

INTRODUCTION

Acoustic-phonetic analysis frequently involves court-presentable measurements that are strongly focused on segmental information:

Formants
F0
Voice onset time

But what about suprasegmental information, and specifically information about a speaker’s rhythmic pattern?

SLIDE 5

INTRODUCTION

Rhyt ythm in sp speake ker classi ssification

Previous studies demonstrate some utility of rhythm for dialect discrimination and forensic purposes Limited in its application in research and casework

Ferragne and Pellegrino 2004, Biadsy and Hirschberg 2009, Torgersen and Szakay 2012, Leemann et al. 2012, 2015, Dellwo et al. 2015

SLIDE 6

Rhyt ythm depends s on so some temporal represe sentation of sp speech

Rhythm: Temporal characteristics of a spoken utterance How can temporal characteristics of a spoken utterance be represented in an acoustic-phonetic analysis? In an ASR analysis?

REPRESENTING TIME IN SPEECH

SLIDE 7

Global temporal represe sentations

Long-term alternations in vocalic and consonantal intervals which may approximate the rhythmic pattern of speech Rhyt ythm Metrics: s: measures examining the degree

f variability in the duration of pre-specified

intervals (e.g., vowels, consonants, CV sequences, adjacent intervals, etc.)

REPRESENTING TIME IN SPEECH

Rasmus et al., 1990, Grabe and Low 2002, Dellwo 2006

SLIDE 8

Rhyt ythm in sp speake ker classi ssification

Syllable vs stress-timed distinctions Syl yllable-ti timed: d: equal syllable durations Stress ss-ti timed: d: equal stressed syllable durations (more variability between stressed and unstressed syllables) Problematic: too coarse – but, possibly a place to start

REPRESENTING TIME IN SPEECH

Pike 1945, Abercrombie 1967, Dauer 1983, Arvaniti 2009

SLIDE 9

Local temporal represe sentations

De Delt lta (Δ) ) and delta-delta (ΔΔ) features: s: Reflect the change in spectral properties between adjacent temporal frames and the acceleration of that change Common in ASR systems

REPRESENTING TIME IN SPEECH

e.g., Lee et al. 1990, Matsui and Furui 1990, Gish and Schmidt 1994

SLIDE 10

1) Analyze rhythmic profile of four varieties of British English: Cambridge, Multicultural London English, Leicester, and Punjabi-Leicester 2) Investigate the utility of global RMs for discriminating among the dialects 3) Compare global and local temporal representations for dialect discrimination

GOALS

SLIDE 11

Introduction Corpus Description Global: Rhythm Metrics Local: Deltas and Delta-deltas Discussion

OUTLINE

SLIDE 12

METHODS Four British sh English sh Dialects

International Varieties s of English sh (IV IViE iE) corpus: s: 12 CE, 12 MLE, age 16 Wo Worma rmald (2 (2016): ): 8 LE, 22 PLE, ages 20–53

“So “South” Leicest ster (“Midlands” s”) No Non-contact contact (A (Angl glo)

Cambridge English (CE) Leicester English (LE)

Co Contact (Et (Ethnic)

Multicultural London English (MLE) Caribbean descent Punjabi-Leicester English (PLE) At least one parent as native Punjabi speaker

SLIDE 13

OUTLINE

Introduction Corpus Description Global: Rhythm Metrics Local: Deltas and Delta-deltas Discussion

SLIDE 14

st stdevV vV Standard deviation of vocalic interval duration

GLOBAL MEASURES: Rhythm Metrics

Va Varc rcoV Coefficient of variation for the vocalic interval duration st stdevC vC Standard deviation of consonantal interval duration

SLIDE 15

nP nPVI-V Pairwise Variability Index for vocalic interval durations nP nPVI-C Pairwise Variability Index for consonant interval durations nP nPVI-CV CV Normalised pairwise variability index for summed consonantal and vocalic interval durations

GLOBAL MEASURES: Rhythm Metrics

SLIDE 16

METHODS

Cambridge, MLE: Praat EasyAlign for British English Leicester varieties: Alignments accompanied the recordings All phone alignments were manually adjusted Consonantal and vowel intervals determined based on the phone alignments RMs measured with the Duration Analyzer Praat script

Dellwo 2019

SLIDE 17

RESULTS: Rhythm Metrics

nPVI_V nPVI_C nPVI_CV stdevV stdevC VarcoV CE MLE LE PLE CE MLE LE PLE CE MLE LE PLE −2 2 −2 2

Dialect value (z−scored)

Dialect significantly improved model fit No gender differences

SLIDE 18

RESULTS: Rhythm Metrics

−2 −1 1 2 CE MLE LE PLE

Dialect stdev−V (z−scored)

−2 −1 1 2 3 CE MLE LE PLE

Dialect nPVI−CV (z−scored)

Cambridge English sh: higher stdev-V, VarcoV, nPVI-CV ML MLE: average Leicest ster English sh: higher VarcoV Punjabi Leicest ster: lower stdev-V, VarcoV

All relative to the average production across all four dialects

−2 −1 1 2 CE MLE LE PLE

Dialect Varco−V (z−scored)

SLIDE 19

−1 1 2 3 CE MLE LE PLE

Dialect stdev−C (z−scored)

−2 −1 1 2 3 CE MLE LE PLE

Dialect nPVI−V (z−scored)

−1 1 2 3 CE MLE LE PLE

Dialect nPVI−C (z−scored)

RESULTS: Rhythm Metrics

Cambridge English sh: lower stdev-C ML MLE: lower stdev-C, nPVI-V, nPVI-C Leicest ster English sh: higher stdev-C, nPVI-V, nPVI-CV Pu Punjabi bi-Leicest ster: higher stdev-C lower nPVI-V

All relative to the average production across all four dialects

SLIDE 20

RESULTS: Rhythm Metrics

LE 1

LE 2 PLE 3 PLE 4 PLE 5 PLE 6 LE 7 LE 8 PLE 9 PLE 10 PLE 11 PLE 12 PLE 13 LE 14 LE 15 PLE 16 PLE 17 PLE 18 PLE 19 PLE 20 PLE 21 LE 22 LE 23 PLE 24 PLE 25 PLE 26 PLE 27 PLE 28 PLE 29 PLE 30 CE 31 CE 32 CE 33 CE 34 CE 35 CE 36 CE 37 CE 38 CE 39 CE 40 CE 41 CE 42 MLE 43 MLE 44 MLE 45 MLE 46 MLE 47 MLE 48 MLE 49 MLE 50 MLE 51 MLE 52 MLE 53 MLE 54

−4 −2 2 −5.0 −2.5 0.0 2.5

Dim1 (54.1%) Dim2 (33.7%) cluster

a

a a a

1 2 3 4

Cluster plot

Purity: 0.64

~Anglo vs Contact ~Midlands vs. South

SLIDE 21

OUTLINE

Introduction Corpus Description Global: Rhythm Metrics Local: Deltas and Delta-deltas Discussion

SLIDE 22

MFCCs s

Voice activity automatically detected + manually corrected 20 ms frames, shifted by 10 ms 0–4000 Hz CMVN applied for room/equipment normalization

METHODS: Δ and ΔΔ ΔΔs

SLIDE 23

Δs and and ΔΔs

Deltas: change between MFCCs in adjacent frames Delta-deltas: change between deltas in adjacent vectors Averaged for each recording MFCCs not included in the analysis

METHODS: Δ and ΔΔ ΔΔs

SLIDE 24

RESULTS: Deltas and delta-deltas

LE 1

LE 2 PLE 3 PLE 4 PLE 5 PLE 6 LE 7 LE 8 PLE 9 PLE 10 PLE 11 PLE 12 PLE 13 LE 14 LE 15 PLE 16 PLE 17 PLE 18 PLE 19 PLE 20 PLE 21 LE 22 LE 23 PLE 24 PLE 25 PLE 26 PLE 27 PLE 28 PLE 29 PLE 30 CE 31 CE 32 CE 33 CE 34 CE 35 CE 36 CE 37 CE 38 CE 39 CE 40 CE 41 CE 42 MLE 43 MLE 44 MLE 45 MLE 46 MLE 47 MLE 48 MLE 49 MLE 50 MLE 51 MLE 52 MLE 53 MLE 54

−5.0 −2.5 0.0 2.5 −4 −2 2

Dim1 (12.1%) Dim2 (10.9%) cluster

a

a a a

1 2 3 4

Cluster plot

Purity: 0.44

SLIDE 25

Significant differences in RMs among four British English dialects CE and LE more stress-timed––but in different ways MLE and PLE more syllable-timed––but in different ways Combination of RMs can be used as a Rhyt ythmic Profile DISCUSSION

SLIDE 26

Rhyt ythmic profile is a useful feature in dialect discrimination Issu ssue: RMs somewhat correlated Future directions: s: Which RMs and combinations

f RMs are indeed best and least redundant?

Examine whether these results hold for dialects collected in a single corpus DISCUSSION

SLIDE 27

Pr Proof of concept pt: Global temporal representations > local temporal representations for dialect discrimination Demonstrates need for global temporal representation in automatic speaker and language recognition systems (some work done already) Forensi sic application of RMs: s: directly interpretable, court presentable DISCUSSION

Adami et al. 2003, Shriberg et al. 2005, Dehak et al. 2007

SLIDE 28

Thanks to: Jess Wormald Paul Foulkes Peter French Sam Hellmuth

Thank you!

SLIDE 29