Speaking Speaking under co under cover er: The he impact impact - - PowerPoint PPT Presentation

speaking speaking under co under cover er
SMART_READER_LITE
LIVE PREVIEW

Speaking Speaking under co under cover er: The he impact impact - - PowerPoint PPT Presentation

Speaking Speaking under co under cover er: The he impact impact of of f face ace-con concea cealing ling gar garments ments on on the aco the acoustics of ustics of fri frica cativ tives. es. Natalie Fecher Language &


slide-1
SLIDE 1

The he impact impact of

  • f f

face ace-con concea cealing ling gar garments ments on

  • n

the aco the acoustics of ustics of fri frica cativ tives. es.

Natalie Fecher

Language & Linguistic Science University of York, York, UK PhD supervisors: Dominic Watt, David van Leeuwen IAFPA 2011, Vienna, Austria, 27th July 2011

Speaking Speaking under co under cover er:

slide-2
SLIDE 2

 Background of the Project  The ‘Face Cover’ Corpus  Acoustic Fricative Study  Conclusions and Outlook

Outline Outline

slide-3
SLIDE 3

Bac Backg kground

  • und
slide-4
SLIDE 4

4

PhD PhD Pr Projec

  • ject

t on

  • n

Multi Multimod modal al Spee Speech an h and d Spea Speaker er Recognition ition

Forensic ic Sp Speech Sc Scien ience Audio udio- Visual isual Spe Speec ech h Pr Proce

  • cessing

ssing

slide-5
SLIDE 5

5

Joint processing and transformation of acoustic and facial information under qualitatively variable input:

Acous Acoustic tic Noise Noise

  • microphone type/placement
  • acoustic environment
  • channel characteristics
  • complexity of the scenario
  • ► face coverings

Visual isual Noise Noise

  • lighting, occlusion, perspective
  • image background
  • resolution/compression
  • appearance change
  • ► face coverings

A(V) speech/ speaker recognition by human perceptual

  • r automatic

system Identify speech/ speaker or verify claimed content/ identity

A / V A / V / / AV V Speec Speech/ h/ Speak Speaker er Recognition ecognition by H by Hum uman an Per erceptual ceptual

  • r
  • r

Automa utomatic tic System System ‘Identify’ Speec Speech/ h/ Speak Speaker er

  • r
  • r

Verify erify Claimed Claimed Content/ Content/ Identity Identity (on the basis of Aleksic&Katsaggelos, 2006)

slide-6
SLIDE 6

6

Pr Previou vious s Rese esear arch

Llama Llamas/H s/Har arrison rison/Do /Donn nnell elly/W y/Watt tt (2009 2009)

 set of common confusions during bimodal presentation of AV stimuli  sound transmission loss characteristics (TL) of 3 fabrics

Watt/ tt/Llama Llamas/H s/Har arrison rison (2010 2010)

 sound quality judgement of speech filtered with TL spectra

Zhan Zhang/ g/Tan an ( (2008 2008)

 test of an ASR system with 10 types of voice disguise  ‘masking’ amongst the 3 guises with the lowest similarity rate

Con Coniam iam (2005 2005)

 impact of surgical masks in oral exams during SARS outbreak

slide-7
SLIDE 7

‘Face Cover’ Corpus

slide-8
SLIDE 8

High-quality audio/ video recordings in a professional TV Studio at the University of York.

8

Whe here? e?

slide-9
SLIDE 9

10 British English speakers. Control for demographic, educational and language background (details in Fecher, 2011a/b).

9

Who ho?

slide-10
SLIDE 10

10

Di Disg sguise uise?

Not:

Selection criteria:

 forensic relevance  facial parts covered  mask material No voice disguise per se.

slide-11
SLIDE 11

11

Di Disg sguise uise?

slide-12
SLIDE 12

12

Wha hat? t?

Phoneticall Phonetically y contr controlled

  • lled stimuli

timuli

syllable structure /C1VC2/ (existing English words excluded) vowel [ɑ:] as in <father> consonants /p, t, k, b, d, g, f, s, ʃ, ϴ, v, z, ʒ, ð, m, n, ŋ, h/ syllable position initial, final carrier phrase He said /stimulus/. phonotactic rules no /ŋ/ initial, no /h/ final presentation IPA, randomised ► 576 stimuli per speaker

slide-13
SLIDE 13

13

Ho How? w? VI VIDEO DEO

half-profile camera 3m

AUDIO

headband

slide-14
SLIDE 14

14

Ho How? w?

slide-15
SLIDE 15

Frica ricativ tive Study e Study

slide-16
SLIDE 16

16

Metho Method

  • /s ʃ f θ/ × 2 tokens × 2 syllable positions × 6 speakers

× 8 disguise conditions

  • less standardised analysis procedures for obstruents

(see e.g. Haley et al., 2010; Maniwa et al., 2009; Jongman et al., 2000; Flipsen et al., 1999; Shadle&Mair, 1996; Tabain&Watson, 1996)

  • no bandpass filter, no pre-emphasis (48kHz/16bit/PCM)

FFT spectrum

20dB

f [Hz] A [dB]

2.4*104

s θ ʃ f

t f [Hz] 6000 4000

slide-17
SLIDE 17

17

Varia ariables bles

intensity peak CoG variance skewness kurtosis

spectral moments

slide-18
SLIDE 18

18

s θ ʃ f

inte intens nsity ity

slide-19
SLIDE 19

19

s θ ʃ f

peak peak fr freq eque uenc ncy

slide-20
SLIDE 20

f

20

s θ ʃ

ce cent ntre of e of gravity vity

slide-21
SLIDE 21

21

sk skewn ewnes ess s * * ku kurto tosis sis

BAL CON HEL HOO NIQ RUB SUR TAP BAL CON HEL HOO NIQ RUB SUR TAP BAL CON HEL HOO NIQ RUB SUR TAP BAL CON HEL HOO NIQ RUB SUR TAP

r²=.10, p=.44 r²=0.68, p<.05 r²=.90, p<.001 r²=.67, p<.05

  • 3

3 6 9 12 15 18 21 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 2.7 3 3.3

kurtosis (dimensionless) skewness (dimensionless)

s ʃ f θ

slide-22
SLIDE 22

 sound energy absorption dependent on mask material  but: additional intensity variation due to the speakers’ individual compensation strategies  overall stronger effects for the spectrally diffuse and low- energy non-sibilants /f, θ/ than for the sibilants /s, ʃ/

 more prone to energy absorption in higher frequency bands  lower centre of gravity for most coverings  highly variable peak frequencies

 positive correlation for skewness*kurtosis; for both measures same ranking of guises by size of effect (NIQ least, HEL most)

22

Su Summar mmary

slide-23
SLIDE 23

Conc Conclusions lusions

slide-24
SLIDE 24

24

Misarticulation

physiological and somatosensory effects, e.g. lip/nose contact, restricted jaw movement, skin stretching

(Fuchs et al., 2010; Haley et al., 2010; Iskarous et al., 2009; Maniwa at al., 2009)

Articulatory compensation

e.g. increased vocal effort (Coniam, 2005; Sluijter at al., 1997), may be increased when impaired auditory self-monitoring

Spe Speec ech h pr produc

  • ducti

tion

  • n
slide-25
SLIDE 25

25

Interdependence

  • f

physiological and physical events in the vocal tract Acoustic damping effects mask materials assumed to act like a low-pass filter which attenuate energy in higher frequency bands

(Watt et al., 2010; Llamas et al., 2009; Coniam, 2005)

Spe Speec ech h acou acousti stics cs

slide-26
SLIDE 26

26

Upcoming research

Investigating speech intelligibility when the (visual) speech signal is impaired, i.e. when the mapping between acoustically distinct signals and perceptually consistent categories may be constrained due to a) acoustic transmission loss caused by the mask material, b) the auditory consequences

  • f

impaired speech production and acoustics, c) impoverished visual (facial) speech cues.

Spe Speec ech h per perception ception

slide-27
SLIDE 27

Ref efer erences ences

slide-28
SLIDE 28

28

Aleksic, P.S. & Katsaggelos, A.K. (2006). Audio-Visual Biometrics, Proc. IEEE 94/11, 2025-44. Coniam, D. 2005. The impact of wearing a face mask in a high-stakes oral examination: An exploratory post-SARS study in Hong Kong. Language Assessment Quarterly 2, 235-261. Fecher, N. 2011a. Spectral properties of fricatives: a forensic approach. Proc. of the 4th ISCA Tutorial and Research Workshop on Experimental Linguistics, May 25-27, Paris, France, 71-74. Fecher, N., Watt, D. 2011b. Speaking under cover: The effect of face-concealing garments on spectral properties of

  • fricatives. Proc. of the 17th International Congress of Phonetic Sciences, Hong Kong, August 2011 (accepted).

Flipsen, P.Jr., Shriberg, L., Weismer, G., Karlsson, H., McSweeny, J. 1999. Acoustic characteristics of /s/ in

  • adolescents. JSLHR 42, 663-677.

Fuchs, S., Weirich, M., Kroos, C., Fecher, N., Pape, D., Koppetsch, S. 2010. Time for a shave? Does facial hair interfere with visual speech intelligibility? In: Fuchs, S., Hoole, P., Mooshammer, C., Zygis, M. (eds.). Between the regular and the particular in speech and language. Frankfurt/M.: Peter Lang, 247-264. Haley, K.L., Seelinger, E., Mandulak, K.C., Zajac, D.J. 2010. Evaluating the spectral distinction between sibilant fricatives through a speaker-centered approach. Journal of Phonetics 38(4), 548-554. Iskarous, K., Shadle, C., Proctor, M. 2008. Evidence for the dynamic nature of fricative production: American English /s/. Proc. of the 8th Int. Seminar on Speech Production, Strasbourg, France, 405-408. Jongman, A., Wayland, R., Wong, S. 2000. Acoustic characteristics of English fricatives. JASA 108 (3), 1252-63. Llamas, C., Harrison, P., Donnelly, D., Watt, D. 2009. Effects of different types of face coverings on speech acoustics and intelligibility. York Papers in Linguistics (Series 2) 9, 80-104. Maniwa, K., Jongman, A., Wade, T. 2009. Acoustic characteristics of clearly spoken English fricatives. JASA 125(6), 3962-73. Shadle, C., Mair, S.J. 1996. Quantifying spectral characteristics of fricatives. Proc. of Interspeech, Philadelphia, 1521-24. Sluijter, A. M. C., van Heuven, V. J., Pacilly, J. J. A. 1997. Spectral balance as a cue in the perception of linguistic

  • stress. JASA 101 (1), 503-513.

Tabain, M., Watson, C. 1996. Classification of fricatives. Proc. 6th Aust. Int. Conf. Speech Sci. Technol., Adelaide, 623-628. Watt, D., Llamas, C., Harrison, P. 2010. Differences in perceived sound quality between speech recordings filtered using transmission loss spectra of selected fabrics. Talk given at the IAFPA Conference 2010, Trier, Germany. Zhang, C., Tan, T. 2008. Voice disguise and automatic speaker recognition, Forensic Science International 175(2-3), 118-122.

slide-29
SLIDE 29

Special thanks to Dominic Watt, Huw Llewelyn-Jones and all participants for letting me put tape on their mouth.

This research project receives funding from the European Community's 7th Framework Programme (FP7/2007-2013) under grant agreement number 238803 (Marie Curie Initial Training Network ‘BBfor2’).

Contact: natalie.fecher@york.ac.uk

Thank hank you.

  • u.