phil rose Division of Humanities, Hong Kong University of Science - - PowerPoint PPT Presentation

phil rose
SMART_READER_LITE
LIVE PREVIEW

phil rose Division of Humanities, Hong Kong University of Science - - PowerPoint PPT Presentation

TAL 2012 (Prosody in the Real World) Tonal Aspects across Tone and Non-Tone Languages Invited Talk Two sides of the same coin: between-speaker F0 differences in linguistic-tonetic description and forensic voice comparison. phil rose


slide-1
SLIDE 1

Two sides of the same coin: between-speaker F0 differences in linguistic-tonetic description and forensic voice comparison.

Division of Humanities, Hong Kong University of Science & Technology School of Language Studies, Australian National University Joseph Bell Centre for Forensic Statistics and Legal Reasoning, University of Edinburgh

TAL 2012 (“Prosody in the Real World”)Tonal Aspects across Tone and Non-Tone Languages Invited Talk

phil rose

slide-2
SLIDE 2
  • Between-Speaker Differences in tonally-relevant

acoustic output from two complementary perspectives:

  • (1) BSD’s in Forensic Voice Comparison
  • A case of “prosody in the real world”:
  • A real world FVC case where intonational F0

played an important part

  • (2) BSD’s in Linguistic Tonetics
  • Tonal Normalisation and some of its uses for a

quantifiable linguistic-tonetic representation of tonal and intonational pitch.

The Theme

slide-3
SLIDE 3

Cords vibrating like a string Titze

1994

  • F0 = 1/ 2L * √ σ / ρ
  • L = vocal cord length
  • σ = longitudinal stress in the cords
  • stress = the tension in the cords

divided by the cross-sectional area of vibrating tissue

  • (cover) tension is controlled by Crico-

thyroid contraction/relaxation

  • ρ = tissue density

Since F0 is inversely proportional to cord length, other things being equal, if the speaker's cords are long, their F0 will be lower

Cords vibrating like a spring

  • F0 = 1/ 2π * √ k/m

m = vocal cord mass

Since F0 is proportional to cord mass, other things being equal, if the speaker's cords are bigger, their F0 will be lower

Main anatomical source of F0 BSDs

slide-4
SLIDE 4

Forensic Voice Comparison

  • Self-evidently the differences between

speakers that are important

  • Absent BSD’s not possible to recognise

someone by their voice

  • FVC = comparing speech samples wrt any

any aspect of voice (not just phonetics!) to help trier-of-fact decide whether suspect said incriminating speech

slide-5
SLIDE 5
  • On Christmas Eve 2003 a fraudulent fax was sent to

the investment bank JP Morgan Chase in Australia

  • requesting the transfer of $150 million to accounts in

Switzerland, Greece and Hong Kong.

  • About 10 minutes before the closing of business,
  • the bank received a phone call from a Craig Slater,
  • asking for a call-back on the fax
  • = a procedure confirming the details of the fax and

verifying that the transfer could go ahead.

  • Here is part of the money-making phone-call

The Crime

slide-6
SLIDE 6

“JP Morgan Greg speaking” “Yeah hello Greg this is Craig Slater here mate” “Oh g’day how are you?” “Not too bad I bin havin a bit of trouble here…”

The Offender

slide-7
SLIDE 7

“em.. And we’re going to pay Hong Kong dollars 118,678,543 spot 29 to HSBC em…Hong Kong?” “Correct” Hong Kong I think Hong Kong Power Limited six three six double oh three oh five five double

  • h one [$636,003,055,001] ?

“Yes”

Out goes the money …

slide-8
SLIDE 8
  • That is how you make $150 million in one

phone call

  • And also how the Australian

Commonwealth Superannuation Scheme account administered by the bank lost $150 million.

The Result

slide-9
SLIDE 9
  • 15 intercepted telephone calls containing

“not too bad”, e.g.

  • “…mate, how are you?”
  • “Oh not too bad, everything’s good.”

The Suspect

slide-10
SLIDE 10
  • Both suspect and offender contain the utterance

“not too bad” said with same H.L.LH intonation

– rise nuclear tone on bad (“supportive interest encouraging further conversation”). – high head on not (the suspect’s not high/low head)

  • Therefore F0 highly comparable
  • Usually F0 not much good in FVC
  • < high within-speaker variation
  • > disadvantageous variance ratio.

The (Intonational F0) Evidence

slide-11
SLIDE 11

罪犯的 “not too bad” F0

60 120 180 240 300 0.11351 0.227019 0.340529 0.454039 0.567549 Duration (sec.)

H on not L on too LH on bad

slide-12
SLIDE 12

F0 曲线的相似程度 Degreee of similarity

between suspect and offender’s not too bad F0

罪犯 Offender F0 嫌疑人 Suspect Samples F0

slide-13
SLIDE 13

You want to know the probability the suspect said the incriminating speech, given the similarity between the suspect and offender data? p(H|E)

Evaluating Evidence Rationally

By my theorem, that is proportional to the strength

  • f your evidence …

… and the probability that the suspect said the incriminating speech BEFORE the evidence is taken into account … Bayes’ Theorem: Posterior Odds = Prior Odds * Likelihood Ratio

slide-14
SLIDE 14
  • Strength of Evidence in support of one

hypothesis over another =

  • Probability of evidence under competing

hypotheses =

  • p(E | Hsame spk) / p(E | Hdiff spk)
  • Probability of the difference between

suspect and offender F0 in not too bad assuming the suspect said it, vs. the probability of the difference, assuming it was said by someone else randomly chosen from the relevant population.

The Likelihood Ratio

LR denominator is where the between-speaker differences come in!

slide-15
SLIDE 15

So we have to collect a Reference Sample of “not too bad”s

  • Natural responses to

“how’s it going?” etc

  • Do any two samples

sound as if they are from the same speaker?

  • Relatively easy to find

speakers with very similar voices!!

Speaker 10 Speaker 9 Speaker 8 Speaker 7 Speaker 6 Speaker 5 Speaker 4 Speaker 3 Speaker 2 Speaker 1

slide-16
SLIDE 16

2 4 6 100 200 300 Adam 2 4 6 100 200 300 Alderman 2 4 6 100 200 300 Andrew 2 4 6 100 200 300 Bevan 2 4 6 100 200 300 Brown 2 4 6 100 200 300 Cameron 2 4 6 100 200 300 Collette 2 4 6 100 200 300 Dando 2 4 6 100 200 300 Dave 2 4 6 100 200 300 DavidDoroth 2 4 6 100 200 300 GaryNgale 2 4 6 100 200 300 GaryYuko 2 4 6 100 200 300 GaryRenata 2 4 6 100 200 300 Hendriks 2 4 6 100 200 300 Hill 2 4 6 100 200 300 James 2 4 6 100 200 300 Jeffries 2 4 6 100 200 300 Langford 2 4 6 100 200 300 Lee Lee 2 4 6 100 200 300 Mac Mac 2 4 6 100 200 300 Malcolm Malcolm 2 4 6 100 200 300 Pavlic-Searle Pavlic-Searle 2 4 6 100 200 300 Hunter Hunter 2 4 6 100 200 300 Rose Rose 2 4 6 100 200 300 Ruggieri Ruggieri 2 4 6 100 200 300 Sidwell Sidwell 2 4 6 100 200 300 Stephen

2 4 6 100 200 300 Stewart 2 4 6 100 200 300 Windle 2 4 6 100 200 300 Young

You have to go and get this!

Reference sample: non-contemporaneous variation in 30 males’ “not too bad” F0.

slide-17
SLIDE 17

MVLR的分子 = MVLR的分母 =

( )

( ) ( )

( ) (

) (

)

{ }

( ) (

)

( )

{ } (

)⎥

⎦ ⎤ ⎢ ⎣ ⎡ − + + − × − + − × + +

− − − − = − − − − − − − − − −

i i m i p p

x y C h D D x y

  • y

y D D y y

  • C

h D D mh C D D * * exp exp 2

1 2 1 1 2 1 1 1 2 1 1 2 1 2 1 2 1 1 2 1 2 1 1 1 2 1 2 1 2 2 1 1

T T

2 1 2 1

π ( )

( ) ∏

− + − ∑ × +

= ⎢ ⎢ ⎢ ⎣ ⎡ ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = − − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − − − − − 2 1 1 2 T 1 2 1 1 2 1 2 1 2 1

2 1 2

exp

l i l l i l m i l l p p

x y C h D x y

  • C

h D D mh C π

多变量似然率计算公式 Multivariate Likelihood Ratio (Aitken & Lucy 2002)

The Formula

slide-18
SLIDE 18

20.6 Density(密度)

Multivariate LR values for comparison between suspect and offender samples using F0 in “not too bad” against reference population of 30 males.利用 “not too bad”

中的F0计算得到的多变量似然率结果 (以30个男性语音作为参考样本)

About 20 times more likely to get this difference in not too bad F0 if suspect said it than if someone else had said it.

NOT the suspect is about 20 more likely to have said it than someone else!!

The Finding

slide-19
SLIDE 19
  • By combining LRs from different features, one can

get quite large strengths of evidence in support of either defence or prosecution hypotheses.

  • In this case the acoustics (F-pattern) in “yes” were

also used

  • They gave a LR of about 70
  • Combined with not too bad F0 the LR is now 1400
  • All the acoustic voice evidence in the case gave a LR
  • f about 11 million

Offender Suspect Reference sample

The Other Voice Evidence

slide-20
SLIDE 20
  • I don’t know the prior odds (= the other

evidence in the case), but

  • The suspect was found guilty
  • Most of the money was recovered

The Verdict

slide-21
SLIDE 21

Forensic Voice Comparison with Tonal F0?

Yes, small contribution from tones – improves /i/ Cllr on fusion Tippett/reliability plots for F-pattern and [22] tonal F0 in Cantonese yih ‘two’ for 26 young male Cantonese speakers’ non-contemporaneous natural speech. LRs from same- speaker comparisons LRs from different- speaker comparisons Log-LR cost (0.51) EER = 15%

slide-22
SLIDE 22

Theme 2:

Using Between-speaker differences to get quantified Linguistic- Tonetic description of tones of a variety

–For tonal typology –Acoustic reconstruction

slide-23
SLIDE 23

Modelling tones

  • Wu dialect tones
  • Merit in complexity
  • Some typically complex data from

Wencheng, Jinyun

  • Can all be easily modelled with a

continuous model (e.g Fujisaki)

  • But perhaps not quite so easily with

discrete phonological Bao-type model

slide-24
SLIDE 24

Jinyun tones (Steed &

Rose 2009)

high rise-fall low fall low fall-rise low rise mid dipping low rise-fall low depressed fall Whole range fall Register Contour

slide-25
SLIDE 25

Wencheng 文成 tonal acoustics (Rose

2010)

  • Mean tonal F0 as function of mean tonal

duration

low falling-rising mid? falling-rising lower rising upper short rising mid depressed level low level upper mid? level

Register Contour

slide-26
SLIDE 26

Observations

  • These systems mostly do not behave in

the way theory tells us to expect

  • Simple tones are rare; complex contours

abound; phonation type contrasts are found in nearly all possible different interactions with tone

  • The rules/constraints relating the isolation

tones to tone sandhi forms lack phoneticity.

  • Why would such systems evolve?
  • Difficult to avoid idea of tones as indexical

features

slide-27
SLIDE 27

Normalisation

  • Before you can answer these typological

questions you need to be able to characterise varieties’ acoustics quantitatively

  • Let’s look at a simple single variety -

Shanghai

slide-28
SLIDE 28

Shanghai raw tonal acoustics

Unstopped tones: “high falling” mid dipping” “low rising” 8 male (thick lines) 8 female Controlled for intrinsic vowel F0 Controlled for intrinsic consonantal F0

slide-29
SLIDE 29

Shanghai normalised tonal acoustics

(Rose 1993)

8 males 8 females normalisation: F0 - intrinsic z-score duration – percent NB not equalised! Coloured lines = mean normalised F0, duration Solid = male Dotted = female Note sex related differences in high falling tone Normalisation index (Earle 1975): How much does the normalisation reduce the original tonally-related between-speaker F0 variance? With this normalisation, about 9.5 times

slide-30
SLIDE 30

Comparing varieties

  • If we want to find out how languages differ in

their tonal acoustics, and how they are the same (Anderson’s 1973 “linguistic-phonetic properties”), we need to compare varieties.

  • Problem: Comparing different varieties with

normalisation is not straightforward: you need to be sure that your normalisation parameters are comparable across varieties! for example:

  • How many linguistic-tonetically shared tones are

there between Standard Thai (5) and Southern Thai (7)?

slide-31
SLIDE 31

Standard Thai female Southern Thai male (Thompson 1996)

What is the correct relationship between these two sets?

slide-32
SLIDE 32

Using bilingual’s tones

The female speaker is a Southern Thai educated professional, bilingual in both Southern & Standard Thai (Rose 1997).

So our normalisation strategy must adequately reflect the relationship between her two sets of tones …

slide-33
SLIDE 33

testing z-score normalisation with bilingual’s tones

10 20 30 40 50 60 20 40 60 80 100 percent of total F0 range

Speaker is controlling 11 different tones Z-score normalisation

slide-34
SLIDE 34

(conservative HK) Cantonese

  • six contrasting pitch shapes on unstopped

syllables (subminimal sextuplet from CF4):

woman

fu]

low to mid rise “[23]

ancient

ku

low to high rise “[24]”

support

fu

falling from low “[21], [1↓]”

part

pu

lower mid level “[22]”

cause

ku

mid level “[33]”

father's sister

[ku

high level “[55]”

slide-35
SLIDE 35

Z-score normalised Cantonese unstopped tones

(Rose 2000)

5 males, 5 females, controlled for intrinsic vowel F0

slide-36
SLIDE 36

Comparing Cantonese, Shanghai tones

8 different tones, “low rising” shared?

slide-37
SLIDE 37

Comparing tones across varieties: high falling tones in Yongjiang 涌江 & Oujiang 甌江 sub- groups of Wu

10 20 30 40 50 60 70 80 90 100

  • 3
  • 2
  • 1

1 2 3

Oujiang normalisation index = 21.6

normalised F0 & duration for OJ & YJ high falling tone normalised duration (%) normalised F0(sd) Yongjiang Oujiang

Is anything the same??

Problem: these are two different Middle Chinese tonal cognates The amount of variance around these normalised curves is less than that for a single variety (Shanghai). They are demonstrably linguistic-tonetically the same tone.

slide-38
SLIDE 38

Summary

  • Talk has focussed on quantified

comparison of tonal/intonational F0 shapes ..

  • And testing of hypotheses about them!
  • It has shown that BSD’s (from a lot of

data!) are crucial for doing this.

slide-39
SLIDE 39

THANK YOU FOR LISTENING