and Sharing DyViS : A Forensic Phonetic Journey Kirsty McDougall - - PowerPoint PPT Presentation

and sharing dyvis
SMART_READER_LITE
LIVE PREVIEW

and Sharing DyViS : A Forensic Phonetic Journey Kirsty McDougall - - PowerPoint PPT Presentation

WYRED Workshop, IAFPA 2018, University of Huddersfield Developing, Collecting and Sharing DyViS : A Forensic Phonetic Journey Kirsty McDougall (with particular thanks to Francis Nolan, Gea de Jong-Lendle and Toby Hudson) DyViS Collaborators


slide-1
SLIDE 1

Developing, Collecting and Sharing DyViS: A Forensic Phonetic Journey

Kirsty McDougall

(with particular thanks to Francis Nolan, Gea de Jong-Lendle and Toby Hudson)

WYRED Workshop, IAFPA 2018, University of Huddersfield

slide-2
SLIDE 2

DyViS Collaborators

Toby Hudson Gea de Jong-Lendle Francis Nolan

slide-3
SLIDE 3

Outline

  • Origins of DyViS
  • Development of DyViS
  • Sharing DyViS
  • Impact of DyViS

Image: http://nomanbefore.com/oxford-vs-cambridge/

slide-4
SLIDE 4

Origins of DyViS

Image: http://nomanbefore.com/oxford-vs-cambridge/

slide-5
SLIDE 5

Are voices unique?

  • To investigate this possibility

(or at least determine how frequently particular voice features or combinations occur), we need population databases for speech →populations must be phonetically controlled, i.e. contain a large no. of speakers with the same:

  • accent (regional, social)
  • sex
  • age group

→ need to hold demographic characteristics constant and examine variation between the speakers (“between-speaker variation”)

slide-6
SLIDE 6

Lack of population speech data

  • 2005 (time of DyViS proposal):

very few large-scale speech databases for phonetically- controlled populations available

  • Some exceptions:

German

  • Künzel (1989)
  • ‘Pool 2010’ (Jessen, Köster and Gfoerer 2005)

Japanese

  • NRIPS Speaker Database of Japanese

(Japanese National Research Institute for Police Sciences) (Osanai, Tanimosto, Kido and Suzuki 1995)

  • No forensically-oriented population database for English
slide-7
SLIDE 7

Künzel (1989) German mean f0 population data

Mean f0 data for 100 male & 50 female German speakers (Künzel 1989: 121, Figure 3)

slide-8
SLIDE 8

Further complication: within-speaker variation

  • Concept of each individual having a single ‘voice’

is not straightforward

  • Voices are not like fingerprints
  • Physical dimensions of vocal tract impose some

limits, but extensive variability possible within these

  • Within-speaker variation:
  • interlocutor
  • state of health
  • formality
  • competing with
  • emotion

background noise…

  • time of day
slide-9
SLIDE 9

What we need

  • To assess the typicality of any given speech

feature (or combination of features) need to consider both between-speaker and within-speaker variation in a phonetically controlled population → need speech across a range of styles to model within-speaker variation, as well as from a large no. of speakers

slide-10
SLIDE 10

ESRC no. RES-000-23-1248 Department of Linguistics, University of Cambridge Gea de Jong Toby Hudson Kirsty McDougall Francis Nolan 2005-2009

Enter: DyViS

‘Dynamic Variability in Speech: A Forensic Phonetic Study of British English’

slide-11
SLIDE 11

DyViS Research Aim 4

“To make available a speech database of SSBE for wider use by

  • ther researchers, forensic phonetic

practitioners, and other interested persons”

slide-12
SLIDE 12

Developing DyViS

Image: http://nomanbefore.com/oxford-vs-cambridge/

slide-13
SLIDE 13

Developing a forensic phonetic database for SSBE

  • 100 speakers from a single speech community
  • Standard Southern British English (SSBE)
  • Male, aged 18-25 years
  • Studio and telephone (landline) quality
  • Read and spontaneous speech
  • 4 speaking tasks
  • 20 speakers to be recorded a second time

(non-contemporaneous)

  • Speech files to be labelled orthographically
slide-14
SLIDE 14

Why SSBE?

  • No accent of English recorded and studied in

this way before

  • SSBE – convenient, spoken by a large

proportion of Cambridge University students

  • SSBE speakers commit crimes!
  • Starting point: other accents should also be

studied (SSBE has its own specific social profile)

  • SSBE also of great interest in other areas of

linguistic research

slide-15
SLIDE 15

Speaking tasks

  • Spontaneous speech (2 styles) and read

(2 styles)

  • Elicitation of same variables across all 4 tasks

Task Description Spontaneous Read Studio Quality Telephone Quality

1 Simulated police interview x x 2 Telephone conversation with ‘accomplice’ x x x 3 Read passage x x 4 Read sentences x x

slide-16
SLIDE 16

Telephone transmission

  • Telephone effect documented in several studies

(Künzel 2001, Nolan 2002)

  • One task recorded at studio quality and over a BT

telephone landline

  • DyViS project concerned with both technical effect
  • f telephone transmission and speaking style

effects

  • Future work: mobile telephone transmission

(technical challenge: no mobile signal in Cambridge sound-treated recording studio…)

slide-17
SLIDE 17

Non-contemporaneous recordings

  • Most speaker comparison cases involve recordings made

at different times, e.g. incriminating phone call and police interview

  • Typically days/weeks/months between the two recordings
  • DyViS: 20 speakers rerecorded

10-14 weeks after initial session

  • repeated the 2 reading tasks
  • nature of interview & phone call scenarios meant that these

couldn’t be repeated → further creative genius needed for additional spontaneous tasks…

slide-18
SLIDE 18

Recruiting

  • University of Cambridge students
  • Paid for their participation
  • Posters, emails, flyers, word of mouth…
  • Adverts called for male speakers, aged 18-25 years,

who spoke English with a ‘standard Southern accent’

slide-19
SLIDE 19

“It takes one to know one”

  • Toby Hudson, phonetically

trained research assistant, native SSBE speaker

  • To vet accents, speakers

asked to leave a message

  • n answer phone
  • name, contact details,

places lived

  • Sometimes further follow-up call with research

assistant

  • reading passage
  • Some volunteers screened out at this stage, others

after recording session

slide-20
SLIDE 20

More recruiting….

slide-21
SLIDE 21

Some DyViS speakers…

  • Speaker 95
  • Speaker 53
  • Speaker 62
  • Speaker 60
  • Speaker 65
  • Speaker 25
  • Speaker 88
  • Speaker 106
  • Speaker 112
slide-22
SLIDE 22

Recording

  • Sound-treated room, Phonetics Laboratory,

University of Cambridge

  • March 2006 – August 2007
  • Recording sessions lasted 1-2 hours
slide-23
SLIDE 23

23

__

  • o o
  • o o
  • o o

BT External telephone line Intercept: (TC22) Telephone 2

Recorder 2

__

  • o o
  • o o
  • o o

Recorder 1 Mic

Sound-treated booth Research room

Subject Researcher

Telephone 1

Experimental set-up

Image credit: Gea de Jong-Lendle

Prospect balance unit (TC22)

slide-24
SLIDE 24

Task 1: Simulated police interview

  • Extension of ‘map task’ elicitation (Anderson et al. 1991)
  • Task designed to induce stressful situation (through

‘lying’)

  • Experimenter = police officer,

Subject = drug deal suspect

  • ‘Memory’ on screen:

facts in black (OK to admit) OR red (must NOT be revealed)

slide-25
SLIDE 25

25 Scott Weadon tour guide friend from secondary school: Buckley School regularly chat on Skype Robert Freeman

  • wner of DIY shop
  • ld friend from school

see regularly in the pub

slide-26
SLIDE 26

Yewtree Reservoir Yewtree Footpath Dexter Road you met Robert Freeman here last Wednesday Badger Pass

slide-27
SLIDE 27

27

Task 2: Telephone conversation

Method:

Researcher phones to the sound-treated booth via an external telephone line.

Recording conditions:

The subject’s speech is recorded directly into the microphone and indirectly via a telephone intercept.

Content:

The researcher requests a short debriefing from the subject about the mock interview.

slide-28
SLIDE 28

28

Example…

Task 2: Telephone conversation

slide-29
SLIDE 29

29

Task 3: Reading passage

“Report: Hoards of Heroin in Parkville last Thursday Police announced last night that they have arrested one

  • f two men believed to be responsible for selling large

quantities of heroin at the Parkville petrol station at 10:15 pm last Thursday. The suspect, who cannot be named, works as a hairdresser in Carter Town. He is employed by Mr Eugene Burke at Eugene’s Hairdressers on Reeve Causeway, opposite the city tour bus stop. Reeve Causeway is north of the hypermarket on Pighty Road. …. ”

slide-30
SLIDE 30

30

Task 4: Reading sentences

Sentences designed to elicit target variables in phonetically controlled contexts

That driver was a CREEP yesterday. We decided to HIDE today. He had a difficult YOUTH I reckon. It won’t be King’s Cross; we’ll meet at EUSTON next time. etc.

Format: randomised x 6

slide-31
SLIDE 31

Sharing DyViS

Image: http://nomanbefore.com/oxford-vs-cambridge/

slide-32
SLIDE 32

Database Release

  • Database released publicly via UK Data Service, 19 July 2011
  • https://www.ukdataservice.ac.uk/
  • Why this route?
slide-33
SLIDE 33

Database Release

  • Freely available (+ small admin charge) to

anybody undertaking university research

  • Available to commercial users for a fee
slide-34
SLIDE 34

Database includes…

  • Documentation explaining database structure,

system for filenames, transcription conventions, etc.

  • Sound files (.wav)
  • Transcripts (Praat Textgrids)
  • Elicitation material and experimenter’s

instructions

  • List of target items elicited in tasks 1, 2, 3
slide-35
SLIDE 35

DyViS Database Article

(Nolan et al. 2009, IJSLL)

slide-36
SLIDE 36

Impact of DyViS

Image: http://nomanbefore.com/oxford-vs-cambridge/

slide-37
SLIDE 37

Impact of DyViS

Image: http://nomanbefore.com/oxford-vs-cambridge/

  • Our research in Cambridge
  • Forensic practitioner impact – some

examples

  • Forensic phonetic research community
  • Other areas of research –

sociophonetics…

slide-38
SLIDE 38

Our research in Cambridge

  • DyViS project research

→ formant dynamics & speaker characteristics

(McDougall and Nolan 2007)

→ sound change and speaker identity

(de Jong et al. 2007a, de Jong et al. 2007b)

→ perception of telephone speech

(Lawrence, Nolan and McDougall 2008)

→ SSBE fundamental frequency distribution

(Hudson et al. 2007)

200 400 600 800 1000 1200 500 1000 1500 2000 2500 Frequency of F2 (Hz) Frequency of F1 (Hz)

heed had hard hoard hood who'd

slide-39
SLIDE 39

Our research in Cambridge

  • ESRC VoiceSim Project

(2008; Nolan, McDougall and Hudson)

  • forensic phonetic research project

investigating earwitness identification, especially the effect of the telephone

  • British Academy

Postdoctoral Fellowship (2010-2015; McDougall) → ‘A phonetic theory of voice similarity’

  • British Academy Small

Grant (McDougall) → Voice Similarity and accent differences: YorViS → Comparison of SSBE and York English

slide-40
SLIDE 40

Forensic practitioner impact: some examples

  • SSBE f0 population statistics
  • widely consulted by practitioners for cases in English

(not German!), e.g. J.P. French Associates, Martin Barry Forensic Voice Services, Duckworth Consultancy

Distribution of Speaker F0 Means

5 10 15 20 25 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 More F0 (Hz)

  • No. subjects

Hudson et al. (2007)

slide-41
SLIDE 41

Forensic practitioner impact: Oxford Wave Research Ltd.

  • DyViS used in developing software products with

forensic applications, e.g.

  • ‘VOCALISE’ – combines automatic approaches with more

traditional phonetic variables to undertake speaker recognition

  • ‘CLEAVER’ – extracts the speech of individuals from

recordings of multi-speaker conversations

  • Products used by UK and European law enforcement

agencies and private companies

http://www.oxfordwaveresearch.com/

slide-42
SLIDE 42

Forensic practitioner impact: J.P. French Associates

  • DyViS used in testing performance of automatic

system in conjunction with traditional techniques with respect to casework examples

  • DyViS used in testing voice quality profiling prior to its

introduction in casework

  • A large-scale forensically-oriented database was

essential to both of these tasks – yet such database development beyond scope of a casework-focussed firm

http://www.jpfrench.com/

slide-43
SLIDE 43

Forensic practitioner impact:

Martin Barry Forensic Voice Services

  • DyViS database used to test various technical

enhancements for casework, including software for formant plotting

http://www.mbfvs.co.uk/

slide-44
SLIDE 44

Forensic practitioner impact:

Duckworth Consultancy

  • Martin Duckworth: research

collaboration with Kirsty McDougall investigating speaker-specific patterns of disfluency in normally fluent speakers using DyViS

  • IAFPA research grant and

University of Cambridge Humanities Research Grant

  • Development of ‘TOFFA’ Taxonomy of Fluency features

for Forensic Analysis (McDougall and Duckworth 2017) Duckworth Consultancy

Martin Duckworth

slide-45
SLIDE 45

TOFFA: Taxonomy of Fluency Features for Forensic Analysis

  • Filled Pauses

− er [er] − erm [erm] − others, e.g. ah [fpo]

  • Silent Pauses

− ‘grammatical’ [pg] − ‘other’ [po]

  • Repetitions
  • part word [pwr]
  • whole word [wrep]
  • phrase [prep]
  • multiple [mrep]
  • Prolongations
  • vocalic, e.g. vowel, nasal,

lateral [prov]

  • fricative [prof]
  • plosive or affricate [prop]
  • Self-Interruptions
  • word interruption [wint]
  • phrase interruption [pint]
  • All disfluencies [alldis]

McDougall and Duckworth (2017) Speech Communication

45

slide-46
SLIDE 46

Forensic practitioner impact: Duckworth Consultancy

  • Early versions of TOFFA

implemented in casework by Duckworth Consultancy since 2011

  • TOFFA also used by

J.P. French Associates in casework since 2015 (IAFPA 2018 paper: McDougall, Rhodes, et al.) Duckworth Consultancy

Martin Duckworth

slide-47
SLIDE 47

Forensic phonetic research community

  • Ph.D. projects, e.g.

Erica Gold – clicks, f0, AR; LR approaches Vince Hughes – LR approaches, reference populations Colleen Kavanagh – consonant features Nathan Atkinson – earwitness evidence

  • Many M.Sc. and M.Phil. projects
  • Forensic phonetic research studies various:
  • Duckworth et al. (2011) – consistency of formant measurement
  • IAFPA conferences:
  • 2012 Santander: 28% talks based on DyViS
  • ….
  • 2018 Huddersfield: 22%
slide-48
SLIDE 48

Forensic phonetic research community

4 funded research projects:

  • Voice and Identity, University of York (AHRC) PI P. Foulkes

Analysis of DyViS data – voice quality, long-term formants, ASR

  • TUULS, University of York (ESRC) PI C. Llamas

Newcastle, Middlesbrough, Sunderland – DyViS tasks

  • WYRED, University of Huddersfield (ESRC) PI E. Gold

Bradford, Kirklees and Wakefield – DyViS tasks

  • YorViS, University of Cambridge (British Academy)

PI K. McDougall

York English, 20 male speakers – DyViS tasks

https://sites.google.com/site/yorkfss/research/grants-and- projects/voice-and-identity---source-filter-biometric https://www.york.ac.uk/language/research/projects/tuuls/ http://wyredproject.co.uk/

slide-49
SLIDE 49

YorViS Database

(McDougall, Duckworth and Hudson 2015)

  • “York Variability in Speech”
  • 21 speakers
  • male
  • aged 18-25 years
  • recruited locally, e.g. job centre, McDonalds
  • education level not specified
  • Same 4 tasks as DyViS
  • Police Officer = me
  • Accomplice = Nathan Atkinson

(East Lancashire accent)

  • Recorded at University of York
  • Acknowledgement: Paul Foulkes
slide-50
SLIDE 50

YorViS examples: Speaker 19

  • Interview:

hope avenue onto reeve causeway then down the high street and then onto pightly road

  • Telephone:

down hope avenue onto reeve causeway and left

  • nto t' high street and then right onto pighty road
slide-51
SLIDE 51

Other areas of research

  • Robert Fuchs (2012) Speech

Rhythm in Varieties of English

Used DyViS for SSBE data; replicated methodology to collect Indian English spontaneous speech corpus

  • Sociolinguistic studies of RP,

e.g. Anne Fabricius

  • Vowel variation in varieties of

English

Wikström (2013) ‘An acoustic study of the RP English LOT and THOUGHT vowels’ JIPA

https://link.springer.com/book/10.1007/978-3-662-47818-9

slide-52
SLIDE 52

Some challenges

  • Everything takes more time than you think it

should – and some more

  • experimental set-up, materials, recruitment,

transcription, editing, file management….

  • Technical challenges
  • Speakers are human
  • logistically
  • linguistically
  • engagement
slide-53
SLIDE 53

Design and technical decisions

  • File format
  • Sampling rate
  • Bit rate
  • Level setting
  • Recording equipment and positioning
  • Recording types (+ video?)
  • Backing-up arrangements
  • File labelling and handling
  • Transcription – level and conventions
  • Post-processing
  • Collection of metadata
  • Ethical procedures
slide-54
SLIDE 54

Please go and collect speech population data

…. and please share it!

Image: http://nomanbefore.com/oxford-vs-cambridge/

slide-55
SLIDE 55

References

Anderson, A. H., M. Bader, E. G. Bard, E. Boyle, G. Doherty, S. Garrod, S. Isard, J. Kowtko, J. McAllister, J. Miller, C. Sotillo, H. S. Thompson and R. Weinert (1991) ‘The HCRC Map Task Corpus.’ Language and Speech 34.4: 351-366. de Jong, G., K. McDougall, T. Hudson and F. Nolan (2007) ‘The speaker-discriminating power of sounds undergoing historical change: a formant-based study.’ In J. Trouvain and W. Barry (eds.), Proceedings of the 16th International Congress of Phonetic Sciences, 6-10 August 2007, Saarbrücken, 1813-1816.

  • G. de Jong, K. McDougall and F. Nolan (2007) ‘Sound change and speaker identity: an acoustic study.’ In C. Müller (ed.), Speaker

Classification II: Selected Papers. Berlin: Springer. 130-141. Duckworth, M., K. McDougall, G. de Jong and L. Shockey (2011) ‘The consistency of formant measurements in high quality audio data: the effect of agreeing measurement procedures.’ International Journal of Speech, Language and the Law 18.1: 35-51. Fuchs, R. (2012) Speech Rhythm in Varieties of English: Evidence from Educated Indian English and British English. Berlin: Springer- Verlag. Hudson, T., G. de Jong, K. McDougall, P. Harrison and F. Nolan (2007) ‘F0 statistics for 100 young male speakers of Standard Southern British English.’ In J. Trouvain and W. Barry (eds.), Proceedings of the 16th International Congress of Phonetic Sciences, 6- 10 August 2007, Saarbrücken, 1809-1812. Jessen, M., O. Köster and S. Gfroerer (2005) ‘Influence of vocal effort on average and variability of fundamental frequency.’ International Journal of Speech, Language and the Law 12(2): 174-213. Lawrence, S., F. Nolan and K. McDougall (2008) ‘Acoustic and perceptual effects of telephone transmission on vowel quality.’ International Journal of Speech, Language and the Law 15.2: 159-190. Künzel, H. J. (2001). ‘Beware of the “telephone effect”: the influence of telephone transmission on the measurement of formant frequencies.’ International Journal of Speech Language and the Law 8.1: 80-99. Nolan, F. (2002) ‘The 'telephone effect' on formants: a response.’ Forensic Linguistics 9(1): 74-82. McDougall, K. and M. Duckworth (2017) ‘Profiling fluency: an analysis of individual variation in disfluencies in adult males.’ Speech Communication 95: 16-27. McDougall, K., M. Duckworth and T. Hudson (2015) ‘Individual and group variation in disfluency features: a cross-accent investigation.’ In: The Scottish Consortium for ICPhS 2015 (ed.) Proceedings of the 18th International Congress of Phonetic Sciences, 10-14 August 2015, Glasgow. Paper number 0308.1-5. <http://www.icphs.info/pdfs/Papers/ICPHS0308.pdf> McDougall, K. and F. Nolan (2007) ‘Discrimination of speakers using the formant dynamics of /uː/ in British English.’ In J. Trouvain and W. Barry (eds.), Proceedings of the 16th International Congress of Phonetic Sciences, 6-10 August 2007, Saarbrücken, 1825- 1828. McDougall, K., R. Rhodes, M. Duckworth, J.P. French, C. Kirchhübel and J. Wormald (2018) ‘Applying disfluency analysis in forensic speaker comparison casework.’ Paper presented at the International Association for Forensic Phonetics and Acoustics Annual Conference, Huddersfield, 29 July – 1 August 2018. Nolan, F., K. McDougall, G. de Jong and T. Hudson (2009) ‘The DyViS database: style-controlled recordings of 100 homogeneous speakers for forensic phonetic research.’ International Journal of Speech, Language and the Law 16.1: 31-57. Osanai, T., M. Tanimosto, H. Kido and T. Suzuki (1995) ‘Text-dependent speaker verification using isolated word utterances based

  • n dynamic programming.’ [In Japanese] National Research Institute for Police Science Report 48.1: 15-19.

Wikstrom, J. (2013) ‘An acoustic study of the RP English LOT and THOUGHT vowels.’ Journal of the International Phonetic Association 43.1: 37-47.