ZOO PLOTS FOR SPEAKER RECOGNITION 1 Oxford Wave Research Ltd, 2 - - PowerPoint PPT Presentation

▶

Dec 21, 2022 25 likes •247 views

Anil Alexander 1 , Oscar Forth 1 , John Nash 2 , and Neil Yager 3 ZOO PLOTS FOR SPEAKER RECOGNITION 1 Oxford Wave Research Ltd, 2 University of York, 3 AICBT Ltd, United Kingdom WITH TALL AND FAT ANIMALS {anil|oscar@oxfordwaveresearch.com,

SLIDE 1

ZOO PLOTS FOR SPEAKER RECOGNITION WITH TALL AND FAT ANIMALS

Anil Alexander1, Oscar Forth1, John Nash2, and Neil Yager3

1Oxford Wave Research Ltd, 2University

f York, 3AICBT Ltd, United Kingdom

{anil|oscar@oxfordwaveresearch.com, neil@aicbt.com}

SLIDE 2

“ALL ANIMALS ARE EQUAL, BUT SOME ANIMALS ARE MORE EQUAL THAN OTHERS”

GEORGE ORWELL, ANIMAL FARM, 1945

SLIDE 3

IAFPA 2014 Zurich

BACKGROUND

Performance in speaker recognition is normally discussed using database-centric single figures of merit such as equal error rates. These metrics fail to capture the performances of individual speakers or speaker groups, which are very important in forensic speaker recognition.

For instance, a recognition system that works well for male

speakers may perform poorly for female speakers.

SLIDE 4

IAFPA 2014 Zurich

THE MENAGERIE

Under the original Doddington classification, sheep, who are ‘normal’ speakers and tend to match well against themselves and poorly against others, are the majority of the speakers within the database. Goats are speakers who are difficult to verify and tend to have low genuine match scores. Lambs generally match with high scores against other speakers and are thus easily impersonated, resulting in false accepts. Wolves easily impersonate other speakers, also resulting in false accepts.

SLIDE 5

IAFPA 2014 Zurich

ZOOPLOTS FOR SPEAKER RECOGNITION

The zoo-plot analysis, developed by Yager and Dunstone (2011), extends George Doddington’s (1998) original classification of the biometric menagerie. New animals are introduced: 1. Doves they produce high match scores against their speaker model and low match scores against the imposter models. Doves are the best performers in a system and easily recognizable 2. Chameleons produce high match scores against their own models and high match scores against the imposter models. Chameleon speakers appear similar to everyone. 3. Phantoms have low match scores against their own models and against imposter models. Phantom speakers do not appear similar to anyone. 4. Worms are the worst performers in a system – they produce low match scores against their speaker model and high match scores against imposters. Worm speakers are not easily recognizable and can be easily confused for other speakers.

SLIDE 6

IAFPA 2014 Zurich

WHAT IS A ZOOPLOT?

It’s a plot of the average genuine match scores for an individual versus the average imposter scores for that individual.

Phantoms Worms Doves Sheep Chameleons Lambs Wolves Goats Average genuine scores Average Imposter Scores

SLIDE 7

IAFPA 2014 Zurich

PROPOSED METHODOLOGY

Zooplot analysis is performed as follows: 1. Select a group of speakers that represents a recording condition. 2. From this set of speakers, select non-contemporaneous files for testing and training

speakers. Ideally, there should be more than one file each for testing and training for

the same speaker. 3. For each speaker, match their training samples against all of their testing samples and compute their average genuine match score. 4. Similarly, the mean of all the scores obtained by comparing his/her training samples with files from other speakers gives the average imposter score. 5. The average genuine score is plotted against the average imposter score for all

speakers. The users who fall within the four quartiles (top and bottom 25%) are

assigned to the animal groups (worms, chameleons, doves and phantoms), with each set showing different characteristics.

SLIDE 8

IAFPA 2014 Zurich

Using speakers from the IPSC03 database using the VOCALISE spectral comparison

SLIDE 9

IAFPA 2014 Zurich

TALL AND FAT – SHORT AND THIN

In this work, we further extend the classification of these animals by characterising the speakers as ‘tall/short’ or ‘fat/thin’, depending on the variability of their genuine and imposter match scores. For example, if a ‘dove’ speaker has low genuine variability and high imposter variability, then he or she is a ‘tall thin dove’. Generally speaking, variability of match scores is symptomatic of an underlying problem, regardless of animal type. The tallness, skinniness or fatness depends on the genuine variability and imposter variability which is calculated by seeing how many standard deviations away from mean of all speakers the variability for a given speaker is. Therefore, the enhanced visualization adds a new dimension of independent and useful diagnostic information.

SLIDE 10

IAFPA 2014 Zurich

‘TALL AND FAT’ EXTENSION OF ZOOPLOTS

SLIDE 11

IAFPA 2014 Zurich

EXPERIMENTS

Corpus used: DyVIS Task 2 (telephone)
44.1kHz 16bit, band limited to 8kHz
Matched test audio and training data for each of 100 Speakers
Speaker models created from 60s of net speech
3 x Test audio files for each speaker [60s, 60s, residual >=40s]
30,000 cross comparisons using VOCALISE speaker recognition software
Two engines used [MFCC & LTFD]
Single session used - multiple sessions planned for future study
SNR, net speech length and voice quality analysis were considered – i.e. examining

acoustic features and (subjective) measurements intrinsic to the speaker

SLIDE 12

SPECTRAL VERSUS FORMANT COMPARISONS

Spectral Comparisons LTFD , SSBE UBM,f1,f2,f3,f4 32 Gaussians 6% EER MFCC, SSBE UBM, 32 Gaussians, 13F 1.244 % EER Formant Comparisons

SLIDE 13

IAFPA 2014 Zurich

NET SPEECH DURATION AND SNR

SPECTRAL FEATURES

SLIDE 14

IAFPA 2014 Zurich

THE ANOMALOUS SPEAKER

Despite the high quality of DyViS

recordings one speaker carried a specific and unique noise signature.

This speaker appeared the most

distinguishable to the ASR by far.

Demonstrates caution that should be

exercised in accepting audio for ASR use without examination, as characteristics influencing performance may not be easily identifiable by ear alone.

Speaker 12

SLIDE 15

IAFPA 2014 Zurich

TALL AND FAT VERSION

SPECTRAL FEATURES

SLIDE 16

IAFPA 2014 Zurich

NET SPEECH DURATION AND SNR

LONG TERM FORMANT FEATURES

SLIDE 17

IAFPA 2014 Zurich

ZOOMING IN ON THE PHANTOMS

Spkr 4o Spkr 8o

SLIDE 18

IAFPA 2014 Zurich

SO…

ARE THERE CORRELATES BETWEEN VOICE QUALITY AND ASR ZOO POSITION?

SLIDE 19

IAFPA 2014 Zurich

Voice quality data : MFCC engine

SLIDE 20

IAFPA 2014 Zurich

Speakers with VQ Lax Layrnx [LTFD engine 30,000 cross comparisons]

Voice quality data : LTFD engine

SLIDE 21

IAFPA 2014 Zurich

CONCLUSION

Zoo plot analysis as speakers are added into a database will to help identify commonalities of speaker groups or algorithmic weaknesses of system While single figures of merit like equal error rates provide information about performance

f a system against a database as a whole, zooplot analysis can provide valuable insight

into the properties of individual speakers and clusters of speakers in the database. It can help to identify potential algorithmic weaknesses of systems against certain classes of speakers, and can be used to adjust identification thresholds at an individual or group level. Preliminary research suggests a link between certain aspects of voice quality and speaker categories in the zooplots. We recommend that zooplot analysis is done as speakers are added into a database, to help identify commonalities of speaker groups or algorithmic weaknesses of systems.

SLIDE 22

IAFPA 2014 Zurich

REFERENCES

N. Yager, and T. Dunstone, Biometric Systems for Data Analysis: Design, Evaluation, and Data

Mining, 2009 Springer Press, ISBN-13:978-0-387-77625-5

G. Doddington, W. Liggett, A. Martin, M. Przybocki, D.Reynolds, Sheep, goats, lambs and

wolves: a statistical analysis of speaker performance, Proceedings of IC-SLD’98, NIST 1998 Speaker Recognition Evaluation, Sydney, Australia, November 1998, pp. 1351–1354. Special thanks to: Peter French & Louisa Stevens Cambridge University [DyVIS]