Holistic perception of voice quality matters more than L1 when judging speaker similarity in short stimuli
Eugenia San Segundo, Paul Foulkes & Vincent Hughes
University of York
SST 2016 Parramatta, Australia 7-9 December
Holistic perception of voice quality matters more than L1 when - - PowerPoint PPT Presentation
Holistic perception of voice quality matters more than L1 when judging speaker similarity in short stimuli Eugenia San Segundo, Paul Foulkes & Vincent Hughes University of York SST 2016 Parramatta, Australia 7-9 December 1. Introduction
University of York
SST 2016 Parramatta, Australia 7-9 December
2
Quasi-permanent quality resulting from a combination of long-term laryngeal and supralaryngeal features (Laver, 1980) – broad definition
IDIOSYNCRATIC FORENSIC PHONETICS!
APPROACH Articulatory/Perceptual/Acoustic naïve listeners experts
Differences in speaker similarity ratings by native vs non-native listeners?
naïve listeners will rely on holistic VQ perception
in order to judge similarity between speakers... …. regardless of their L1 i.e. no native language advantage (cf. Perrachione et al. 2009)
speaker similarity
– native Spanish (Madrid) – no voice pathologies – similar sounding:
mean 21, sd 3.7
mean 113 Hz, sd 13 Hz
expert (featural) assessment
4
– using a simplified version of the VPA scheme:
e.g. mandibular setting (close – neutral – open)
– Similarity Matching Coefficients
number of setting matches number of settings
5
1 2 SMC=
– interlocutor = controlled – same speaking style
– different ling. content – diverse neutral topics
6
90 different-speaker pairings – random order
“please rate their similarity from 1 to 5” very similar very different
7
1 4 3 2 5
to visualize degree of perceived similarity to detect meaningful dimensions that explain observed (dis)similarities
to fit models to the similarity ratings — Fixed effects (predictors):
8
— Random effects:
(target sp. comparison)
scree plot: relative magnitude of the sorted Eigenvalues
9
stress:0.03 stress:0.07 *stress = goodness-of-fit criterion to minimize. Rule of thumb: <0.1 is excellent; >0.20 is poor
10
stress: 0.8
11
stress: 0.4
12
speakers → AGF SGF DCT JCT ARJ JRJ ASM RSM AMG EMG listeners ↓ Spanish 0.341 0.343 0.345 0.369 0.607 English 0.264 0.219 0.349 0.435 0.445
most similar most different
13
14
Language * Reaction time
language*reaction_time effect plot
reaction_time response (probability)
0.1 0.2 0.3 0.4 0.5 2 4 6 8 10 12 14 : language English : response 1 : language Spanish : response 1 : language English : response 2 0.1 0.2 0.3 0.4 0.5 : language Spanish : response 2 0.1 0.2 0.3 0.4 0.5 : language English : response 3 : language Spanish : response 3 : language English : response 4 0.1 0.2 0.3 0.4 0.5 : language Spanish : response 4 0.1 0.2 0.3 0.4 0.5 : language English : response 5 2 4 6 8 10 12 14 : language Spanish : response 5
15
Reaction time * Twins
(language independent effects!)
reaction_time*twins effect plot
reaction_time response (probability)
0.2 0.4 0.6 0.8 2 4 6 8 10 12 14 : twins No : response 1 : twins Yes : response 1 : twins No : response 2 0.2 0.4 0.6 0.8 : twins Yes : response 2 0.2 0.4 0.6 0.8 : twins No : response 3 : twins Yes : response 3 : twins No : response 4 0.2 0.4 0.6 0.8 : twins Yes : response 4 0.2 0.4 0.6 0.8 : twins No : response 5 2 4 6 8 10 12 14 : twins Yes : response 5
16
SMC * Twins
(language independent effects!)
smc*twins effect plot
smc response (probability)
0.2 0.4 0.6 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 : twins No : response 1 : twins Yes : response 1 : twins No : response 2 0.2 0.4 0.6 0.8 : twins Yes : response 2 0.2 0.4 0.6 0.8 : twins No : response 3 : twins Yes : response 3 : twins No : response 4 0.2 0.4 0.6 0.8 : twins Yes : response 4 0.2 0.4 0.6 0.8 : twins No : response 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 : twins Yes : response 5
17
– lowest possible stress value – confirms VQ multidimensionality (Kreiman & Sidtis 2011)
same cue prominence? different weight?
18
language*reaction_time effect plot
reaction_time response (probability)
0.1 0.2 0.3 0.4 0.5 2 4 6 8 10 12 14 : language English : response 1 : language Spanish : response 1 : language English : response 2 0.1 0.2 0.3 0.4 0.5 : language Spanish : response 2 0.1 0.2 0.3 0.4 0.5 : language English : response 3 : language Spanish : response 3 : language English : response 4 0.1 0.2 0.3 0.4 0.5 : language Spanish : response 4 0.1 0.2 0.3 0.4 0.5 : language English : response 5 2 4 6 8 10 12 14 : language Spanish : response 5sd: 0.14
sd: 0.18
19
population (same accent, similar age, etc.)
different salience weigthing methods