Long-Term Formant Long-Term Formant Distribution as a forensic- - - PowerPoint PPT Presentation

long term formant long term formant distribution as a
SMART_READER_LITE
LIVE PREVIEW

Long-Term Formant Long-Term Formant Distribution as a forensic- - - PowerPoint PPT Presentation

Long-Term Formant Long-Term Formant Distribution as a forensic- phonetic feature phonetic feature ASA 2 d P ASA 2nd Pan-American/Iberian A i /Ib i Meeting on Acoustics Cancn, Mxico, Nov 15-19, 2010 2010 Michael Jessen and Timo


slide-1
SLIDE 1

Long-Term Formant Long-Term Formant Distribution as a forensic- phonetic feature phonetic feature ASA 2 d P A i /Ib i ASA 2nd Pan-American/Iberian Meeting on Acoustics Cancún, México, Nov 15-19, 2010 2010

Michael Jessen and Timo Becker Michael Jessen and Timo Becker BKA, Department of Speaker Identification and Audio Analysis (KT54)

3aSC4 Special Session on Forensic Voice Comparison and Forensic Acoustics @ 2nd Pan-American/Iberian Meeting on Acoustics, Cancún, México, 15–19 November, 2010 http://cancun2010.forensic-voice-comparison.net

slide-2
SLIDE 2

Structure Structure

1.

Long-Term Formant Distribution: measurement methods and background g

2.

LTF and body height

3

LTF t i t

3.

LTF measurement consistency

4.

Language dependence of LTF

5.

Recognition performance based on LTF and automatic speaker recognition

6.

Conclusions

Nov 17, 2010 2 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

slide-3
SLIDE 3

Long-Term Formant (LTF) Distribution: t i l terminology

Long Te m Fo mant Dist ib tion (Nolan & G igo as 2005) Long-Term Formant Distribution (Nolan & Grigoras, 2005) is a global (as opposed to segment-based) representation

  • f vowel formant frequencies over an entire recording of

a speaker (or over a long stretch of speech from that speaker). Formant frequencies are extracted with a formant tracker (LPC-based) and manually corrected. No segmentation into sounds is performed. into sounds is performed. The resulting distribution of formant values (mainly F2 and F3) can be characterized in different ways The and F3) can be characterized in different ways. The simplest way is to calculate the average. More advanced ways include modeling of the LTF distribution with Gaussian Mixture Models (GMM) (Becker et al 2008)

Nov 17, 2010 3 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

3

Gaussian Mixture Models (GMM) (Becker et al., 2008).

slide-4
SLIDE 4

Speech-Datei Ungeschnitten geschnitten und Excel-Ausschnitt

Illustration of the method: Illustration of the method:

Step 1: Editing the signal in a way that

  • nly vowels with clear formant
  • nly vowels with clear formant

structure remain Step 2: LPC-analysis and manual correction of the formant tracks

Nov 17, 2010 4 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

Workshop LTF - BKA 2010 - M.Jessen 4

slide-5
SLIDE 5

Step 3: Exporting the formant tracks F1,2,3 for further processing

F1 of limited reliability in telephone speech; F4 unreliable or invisible

3500 4000 2000 2500 3000 3500 F1 F2 500 1000 1500 F3

Nov 17, 2010 5 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

5

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129

Formant values every 10 ms

slide-6
SLIDE 6

Example of the raw LTF di t ib ti f k distribution of a speaker

from freeware Catalina Forensic Expert opinion v1.0 from Catalin Grigoras (U Colorado Denver)

http://www forensicav ro/download/CatalinaManual3h pdf

Nov 17, 2010 6 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

http://www.forensicav.ro/download/CatalinaManual3h.pdf

slide-7
SLIDE 7

Correlation between LTF and body h i ht

1800

height F2

Pearson's product-moment correlation

1400 1500 1600 1700

F2 [Hz]

F2

One-sided (less) rho=-0.315726857072528 p=0.00204454743894922

1100 1200 1300 1400

LTF

1100 150 155 160 165 170 175 180 185 190 195 200 205

Body height [cm]

2800

F3

rho=-0.339139631480740 p 0 00097693931875183

2400 2500 2600 2700

F3 [Hz]

F3

Significant negative correlations between long-

p=0.00097693931875183

2000 2100 2200 2300

LTF

LTF-means from 81 speakers in

term formant frequencies (F2, F3) and body height

Nov 17, 2010 7 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

7

2000 150 155 160 165 170 175 180 185 190 195 200 205

Body height [cm]

LTF means from 81 speakers in Pool 2010 (telephone-transmitted) (thanks to Hanna Feiser for assistance)

slide-8
SLIDE 8

Measurements consistency across h ti i LT F2 phoneticians: LT-F2

1800 1600 1700

F2

1400 1500 1600

2 [Hz]

JF AK Bay

1200 1300 1400

LT-F2

Bay B1 B2

1000 1100 1200 1000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 recordings of different speakers

Nov 17, 2010 8 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

8

Pearson correlations (two-sided) between 0.84 and 0.95

LTF-means from 20 speakers in “Digs” dialect corpus under forensically realistic conditions

slide-9
SLIDE 9

Measurements consistency across h ti i LT F3 phoneticians: LT-F3

2800

F3

2600 2700

F3

2400 2500 3 [Hz] JF AK Bay 2200 2300 2400 LT-F Bay B1 B2 2100 2200 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 recordings of different speakers

Nov 17, 2010 9 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

9

Pearson correlations (two-sided) between 0.98 and 0.99

slide-10
SLIDE 10

Language influence on LTF Language influence on LTF

2900 3000 Russian German probe1 German probe2

For these data,

2600 2700 2800

[Hz]

German probe3 Albanian

different languages do not differ in the LTF-space that th

2400 2500

LT‐F3 [

they occupy

(one-way ANOVA [F(4,55) = 0.44; p= 0.77]).

2100 2200 2300 2000 2100 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

LTF-means from three German speakers in Digs dialect corpus and from Russian and Albanian speakers in case data under

Nov 17, 2010 10 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

10

LT‐F2 [Hz] p analogous conditions (spont telephone)

slide-11
SLIDE 11

Speaker recognition tests

37 target trials and 803 non-target trials, involving 21 speakers

Speaker recognition tests

g g , g p from casework, comparing:

  • Baseline = a standard GMM-UBM automatic system
  • FGMM = GMM-modeled LTF

Nov 17, 2010 11 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

slide-12
SLIDE 12

Target trials (same speaker) Non-target trials (different speakers)

New development at BKA:

DiSC-Plot

Discrimination, Scatter, Correlation

bution mant Distrib g-Term Form Long

Nov 17, 2010 12 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

Automatic speaker recognition system

logLR (lnLR)

slide-13
SLIDE 13

Conclusions: LTF analysis in forensic h ti d ti (1) phonetics and acoustics (1)

LTF (F2 and F3) correlates negatively with body height (relevant for voice profiling).

LTF measurements have high consistency across phonetic experts.

f f

Pending further tests and with some degree of caution, LTF statistics established for one language can be used across languages.

LTF (F2 and F3) do not differ much between different vocal effort levels. Vocal effort differences are a common problem i f i t i l in forensic material.

Nov 17, 2010 13 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

slide-14
SLIDE 14

Conclusions: LTF analysis in forensic h ti d ti (2) phonetics and acoustics (2)

  • Performance of LTF analysis with classical evaluation measures
  • Performance of LTF analysis with classical evaluation measures

(DET-plots, APE-plots, Cllr) is worse than performance of automatic speaker recognition and fusion does not increase

  • verall performance. But:

p

  • The tests so far are based predominantly on matching conditions;

under mismatched conditions, the relative performance of LTF analysis might increase. analysis might increase.

Detailed results in the DiSC plot shows that LTF and automatic speaker recognition can make different errors: using both methods is a good safeguard against false conclusions. methods is a good safeguard against false conclusions.

  • Quite limited LR values in same-speaker comparisons (max about

LR=16 in case material for the tests so far): LTF cannot give very strong support for same-speaker hypothesis strong support for same speaker hypothesis.

Different-speaker comparisons can yield very low LR values: LTF can give very strong support for different-speaker hypothesis.

Nov 17, 2010 14 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

slide-15
SLIDE 15

References

Becker, Timo, Michael Jessen and Catalin Grigoras (2008): Forensic speaker verification using formant features and Gaussian mixture models. Proceedings of Interspeech 2008, 1505-1508. Kirchhübel Christin (2009): The effects of Lombard speech on vowel formant measurements MSc thesis Kirchhübel, Christin (2009): The effects of Lombard speech on vowel formant measurements. MSc thesis, University of York, UK. Moos, Anja (2008): Forensische Sprechererkennung mit der Messmethode LTF (long-term formant distribution) MA thesis Universität des Saarlandes distribution). MA thesis, Universität des Saarlandes. www.psy.gla.ac.uk/docs/download.php?type=PUBLS&id=1286. Moos, Anja (2010): Long-term formant distribution as a measure of speaker characteristics in read and spontaneous speech To appear in The Phonetician spontaneous speech. To appear in The Phonetician. Nolan, Francis and Catalin Grigoras (2005): A Case for formant analysis in forensic speaker identification. International Journal of Speech, Language and the Law 12: 143-173. Wagner, Katrin (2010): Der Einfluss der Sprechlautstärke auf die ersten drei Vokalformanten in mobilfunkübertragener Sprache: Forensischer Stimmenvergleich anhand der LTF-Methode“. BA thesis, Universität Frankfurt.

Nov 17, 2010 15 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

Workshop LTF - BKA 2010 - M.Jessen 15

slide-16
SLIDE 16

Nov 17, 2010 16 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

slide-17
SLIDE 17

Inter-speaker variation: Mean LTF for 71 d lt l k f G 71 adult male speakers of German

Means of LT F2 and LT F3 Means of LT-F2 and LT-F3

Moos (2008, 2010), based on GSM transmitted speech in GSM-transmitted speech in BKA corpus “Pool 2010”

Nov 17, 2010 17 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

17

slide-18
SLIDE 18

Influence of vocal effort (Lombard diti ) LT F1 condition) on LT-F1

800

LTF means from 31 speakers in Pool 2010 (telephone transmitted) based on Wagner

700

LTF-means from 31 speakers in Pool 2010 (telephone-transmitted), based on Wagner (2010); cf. also Kirchhübel (2009) and this conference

500 600

T-F1 [Hz] normal Lombard

400

LT

200 300 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

31 speakers

LT-F1 consistently higher in Lombard speech. Significant difference with paired t-

Nov 17, 2010 18 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

18

test, indicating substantial intra-speaker variation. But: LT-F1 is of limited forensic use anyway (due to the effect of telephone transmission on F1)

slide-19
SLIDE 19

Influence of vocal effort (Lombard diti ) LT F2 condition) on LT-F2

1700 1600 1400 1500

F2 [Hz] normal L b d

1300

LT-F Lombard

1100 1200

Lombard effect on LT F2 inconsistent across speakers Non significant difference

1100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

31 speakers

Nov 17, 2010 19 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

19

Lombard-effect on LT-F2 inconsistent across speakers. Non-significant difference with paired t-test, indicating acceptable intra-speaker variation.

slide-20
SLIDE 20

Influence of vocal effort (Lombard diti ) LT F3 condition) on LT-F3

2700 2600 2400 2500

3 [Hz] normal

2300 2400

LT-F3 Lombard

2200

Lik ith LT F2 L b d ff t LT F3 i i t t k N i ifi t

2100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

31 speakers

Nov 17, 2010 20 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

20

Like with LT-F2: Lombard-effect on LT-F3 inconsistent across speakers. Non-significant difference with paired t-test, indicating acceptable intra-speaker variation.

slide-21
SLIDE 21

DET-Plot

Automatic speaker recognition system u o a p a

  • g
  • y

GMM-modeled Long-Term Formant Distribution

Nov 17, 2010 21 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

slide-22
SLIDE 22

APE- Plot Cllr

llr

Nov 17, 2010 22 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature