Combining linguistic and non- linguistic information in - PowerPoint PPT Presentation

Acoustical Society of America Conference, Cancun, Mexico 17/11/’10: Invited Presentation at Special Session on Forensic Voice Comparison Combining linguistic and non- linguistic information in likelihood-ratio-based forensic voice comparison phil rose AAFS School of Language Studies, Australian National University Joseph Bell Centre for Forensic Statistics and Legal Reasoning, University of Edinburgh This presentation was researched as part of Australian Research Council Discovery Grant No. DP0774115. 3aSC3 Special Session on Forensic Voice Comparison and Forensic Acoustics @ 2nd Pan-American/Iberian Meeting on Acoustics, Cancún, México, 15–19 November, 2010 http://cancun2010.forensic-voice-comparison.net

Background • Assumption: LR-based FVC framework: – Logically & Legally correct – Testable & Tested (cf Daubert ) – Many other advantages (e.g combining evidence) • Having your FVC cake and eating it: – ‘traditional’ & automatic LR-based approaches – both must be missing information, – so why not combine them? • Neglected trad. FVC parameters: – Sonorant consonant F-pattern ([l  n …]) – Fricative consonant F-pattern ([s  …]) – Nasals, frics some non-deformable aspects in articulation

“… DNA profile evidence is now seen as setting a standard for rigorous quantification of evidential weight that forensic scientists using other evidence types should seek to emulate.” Balding: Weight of Evidence for Forensic DNA Profiles 2005. ‘Emulating DNA: Rigorous Quantification of Evidential Weight in Transparent and Testable Forensic Speaker Recognition’, Gonzalez- Rodriguez et al.: IEEE TASpLP 2007.

Fricative spectra in FVC Customs officer yes 4000 • R v Huffnagl et al. 2008 3500 3000 • $150 million telephone fraud case 2500 frequency (Hz) 2000 • Small amount of offender speech 1500 1000 • Adequate amount of suspect speech 500 0 1180 1190 1200 1210 1220 duration (csec.) • But off. and sus. speech highly comparable in many linguistic features, incl. /s/ spectrum in yes. Suspect yes 1 9 20 Offender yes 2 4000 4000 3500 3500 3000 3000 2500 2500 frequency (Hz) frequency (Hz) 2000 2000 1500 1500 1000 1000 500 500 0 0 1640 1650 1660 1670 1680 1690 1700 1710 1720 1730 1740 20 30 40 50 duration (csec.) duration (csec.)

Aim(s) • How well can same-speaker speech samples be discriminated from different-speaker speech samples using voiceless sibilant [  ] spectral features with LR as discriminant function? • i.e. should we make use of these features in FVC? • Can performance be enhanced by combining linguistic ([  ]) and non-linguistic LRs?

Integration of Traditional and automatic approaches • Two senses: – Use automatic backend processing (fusion, GMM) – Use automatic features (e.g.MFCCs) – But locally – That’s what this talk is about • Pull out and process comparable linguistic units • Do the rest globally • Combine results

Alveolopalatal fricative [  ]: articulation Back cavity Back cavity Front cavity Front cavity Abducted cords Abducted cords constriction constriction Palatal channel Palatal channel

Alveolopalatal fricative [  ]: acoustics [kai  a] • Sources at incisors, constriction •  /2 resonance < front cavity •  /4 resonance < palatal channel •  /2 resonance < back cavity • Helmholz resonance < SLVT • subglottal resonances • zeros

Data • (Japanese) National Research Institute of Police Science (NRIPS) database (ca.2004) • 300 male policemen; first 84 speakers used • Ca. 70-80 secs. net speech per recording , Sf = 10 k “I’ve planted a bomb”, “don’t tell • Set of vowels plus the police”, “get the money ready” • Single and polysyllabic word utterances • Non-contemporaneous landline recordings • Separation ca. 3 – 4 months • Two repeats per recording • Channel not controlled, but likely similar

Data: [  ] • 10 tokens of [  ] per repeat, various env’ts, e.g. – kaisha [kai  a] firm – ashita [a  :ta] tomorrow – shikaketa [  :kaketa] plant – yooishiro [jo:i  i  o] prepare • 20 tokens per recording

Processing • Very basic front-end • Non-linguistic: – LPC CCs 1 - 12 – Mean cepstral vector • Linguistic ([  ]): – Locate utterances with [  ], eyeball, Praat script to extract quasi steady-state (ca, 4 to 20+ csec.) – LPC CCs 1 – 12 – Mean cepstral subtraction from non-linguistic mean vector

Cepstral mean subtraction Cepstral spectra of [  ] in shape

Typical mean cepstral spectra (spk. 86)

Back-end-processing • Two types of LR: Generative LR developed at Joseph Bell Centre for Forensic Statistics and Legal Reasoning (Aitken & Lucy) – Multivariate LR •Morrison’s Matlab implementation of Reynolds – GMM/(U)BM LR Quaterieri & Dunn (2000) Adapted GMM Speaker Verification (Discriminative LR). • All 84 speakers (i.e. intrinsic), cross-val. • Log-reg fusion/calibration of LRs/scores from linguistic and non-linguistic data (Brümmer’s FoCal toolkit) • Evaluation with Cllr / EER • Empirically discard CCs 4 6 8 9.

Cllr Performance of LR-based detection systems is currently evaluated with the Log Likelihood Ratio Cost (Cllr):              1 1 1 1              C 1 1 1 log log LR   j 2 2   llr 2      N LR N  Hp i i Hd j •Simple scalar metric with 2 hypothesis-dependent log cost functions •Idea is to severely penalise highly misleading LRs •Cllr < unity considered “good”: • > system is delivering some information

MVLR formula numerator of MVLR =      1 2           1 1  1 2 1 2 1 2   1 p 1 2 p 2 D D C mh D D h C 1 2 1 2         T  1     1 exp - y y D D y y 2 1 2 1 2 1 2               m 1   1 T   1   1    1 2 exp * * - y x D D h C y x  i i 2 1 2    1 i denominator of MVLR =         1 2 1       2  m T  1 1  2        1 1 2            1 p  2 2 p 2 C mh D D  h C  -  y x   D h C   y x  exp         i i l l   l  l  l         2   1 l 1 i    

MVLR Results …

Uncalibrated Tippetts (MVLR) [  ] [  ] Non-linguistic Non-linguistic

Fused & calibrated Tippett (MVLR) Fairly big improvement over calibrated linguistic and non- linguistic data on their own

GMM/BM Results

Calibrated Tippetts: GMM/BM [  ] Non-linguistic

Fused & calibrated Tippett (GMM LR) Fairly big improvement over calibrated linguistic and non-linguistic data on their own

Conclusions • Yes, it does improve strength of evidence estimates (both MV- and GMM- both of which are good ) if you can combine linguistic with non-linguistic LRs. • Spectrum of [  ] is useful forensic parameter IN CONJUNCTION WITH OTHERS • This suggests that [    ] will also be of (perhaps greater) use; • Perhaps also [s], but needs testing. • But there is something else …

We have two rather different sets of LR estimates for the same data … •Don’t chose … fuse!

Fused hybrid-GMM-MV-LR Tippett Cllr = 0.135 EER = 4.2% Ca. 1% improvement over MV

Limitations • Factors possibly contributing to too good results: – Training / test data not separated – Too much control over channel? – Jap. /  / may have inherently longer allos than, say, English /  / - easier for speaker to reach target (certainly the case before devoiced /i/) • Also frics. not excluded from cepstral mean • But, crude automatic processing: better channel compensation etc. would probably give better results

More Questions and further work • MFCCs vs LPC CCs?? Might depend on segment. • Channel compensation methods other than MCS? (or other types of MCS?). • Band-limited cepstra … • Incorporate formants (or peak-picked poles) … • Do nasals, rhotics, laterals …

THANK YOU Comments very welcome

Combining linguistic and non- linguistic information in - PowerPoint PPT Presentation

Acoustical Society of America Conference, Cancun, Mexico 17/11/10: Invited Presentation at Special Session on Forensic Voice Comparison Combining linguistic and non- linguistic information in likelihood-ratio-based forensic voice comparison

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

Master EmLex CiTIUS Design and use of linguistic tools Introduction Linguistic Analysis

LCS 11: Cognitive Science Linguistic relativity Linguistic relativity GQ # 4.3 discussions

Combining Local and Global History for High Combining Local and Global History for High

Combining Point and Line Samples for Direct Illumination Points only Points + Lines Katherine

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

Combining XML querying Combining XML querying with ontology reasoning: with ontology reasoning:

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Validity-preservation properties of rules for combining inferential models combining

General Schemes of Combining Rules and the Quality Characteristics of Combining Alexander Lepskiy

Combining GLM and ABI Data for Enhanced GOES-R Rainfall Estimates A New GOES-R3 Project (combining

Reducing Label Cost by Combining Feature Labels and Crowdsourcing Combining Learning Strategies

The Calculus of Computation: Decision Procedures with 10. Combining Decision Procedures

Combining Images Combining Images Blending Seam Carving Corner Detection Today:

Combining Clustering with Pattern Combining Clustering with Pattern Matching for Architecture

Uncertain t y Chapter 14 c AIMA Slides Stuart Russell and P eter Norvig, 1998

Module 2 Probability Theory CS 886 Sequential Decision Making and Reinforcement Learning

Foundations of Artificial Intelligence 11. Making Simple Decisions under Uncertainty Probability

Coherent Beam Combining of 21 Semiconductor Gain Elements in a Common Cavity* SSDLTR 2012

Co Controlla labil ilit ity Controlla Co labil ilit ity & & Optj & tjmal

343H: Honors AI Lecture 9: Bayes nets, part 1 2/13/2014 Kristen Grauman UT Austin Slides

Foundations of Artificial Intelligence 47. Uncertainty: Representation Malte Helmert and Gabriele

Contents Foundations of Artificial Intelligence Motivation 1 7. Making Simple Decisions under

Sambuz

Useful Links

Newsletter

Mail Us

Combining linguistic and non- linguistic information in - PowerPoint PPT Presentation

Acoustical Society of America Conference, Cancun, Mexico 17/11/10: Invited Presentation at Special Session on Forensic Voice Comparison Combining linguistic and non- linguistic information in likelihood-ratio-based forensic voice comparison

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

Master EmLex CiTIUS Design and use of linguistic tools Introduction Linguistic Analysis

LCS 11: Cognitive Science Linguistic relativity Linguistic relativity GQ # 4.3 discussions

Combining Local and Global History for High Combining Local and Global History for High

Combining Point and Line Samples for Direct Illumination Points only Points + Lines Katherine

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

Combining XML querying Combining XML querying with ontology reasoning: with ontology reasoning:

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Validity-preservation properties of rules for combining inferential models combining

General Schemes of Combining Rules and the Quality Characteristics of Combining Alexander Lepskiy

Combining GLM and ABI Data for Enhanced GOES-R Rainfall Estimates A New GOES-R3 Project (combining

Reducing Label Cost by Combining Feature Labels and Crowdsourcing Combining Learning Strategies

The Calculus of Computation: Decision Procedures with 10. Combining Decision Procedures

Combining Images Combining Images Blending Seam Carving Corner Detection Today:

Combining Clustering with Pattern Combining Clustering with Pattern Matching for Architecture

Uncertain t y Chapter 14 c AIMA Slides Stuart Russell and P eter Norvig, 1998

Module 2 Probability Theory CS 886 Sequential Decision Making and Reinforcement Learning

Foundations of Artificial Intelligence 11. Making Simple Decisions under Uncertainty Probability

Coherent Beam Combining of 21 Semiconductor Gain Elements in a Common Cavity* SSDLTR 2012

Co Controlla labil ilit ity Controlla Co labil ilit ity &amp; &amp; Optj &amp; tjmal

343H: Honors AI Lecture 9: Bayes nets, part 1 2/13/2014 Kristen Grauman UT Austin Slides

Foundations of Artificial Intelligence 47. Uncertainty: Representation Malte Helmert and Gabriele

Contents Foundations of Artificial Intelligence Motivation 1 7. Making Simple Decisions under

Sambuz

Useful Links

Newsletter

Mail Us

Co Controlla labil ilit ity Controlla Co labil ilit ity & & Optj & tjmal