[PPT] - A Corpus For Large-Scale Phonetic Typology Elizabeth Salesky PowerPoint Presentation

SLIDE 1

A Corpus For Large-Scale Phonetic Typology

1

Elizabeth Salesky Eleanor Chodroff Tiago Pimentel Matthew Wiesner Ryan Cotterell

Alan W Black

Jason Eisner

VoxClamantis in deserto: “a voice crying out in   the wilderness”

SLIDE 2

2

In the beginning, there was SPEECH

‘in the beginning’  English

‘ipeuhcan’  Nahuatl ‘am Anfang’  German ‘በመጀመሪያ’  Amharic

Tower of Babel

SLIDE 3

3

‘in the beginning’  English

‘ipeuhcan’  Nahuatl ‘am Anfang’  German ‘በመጀመሪያ’  Amharic

Then the linguist asked:

How do speech and language vary?

↳ prior cross-linguistic phonetic studies have relied on reported [language- aggregate] measurements

We create our new corpus, VoxClamantis v1.0,   to answer this question!

✔ spoken readings of the Bible ✔ >600 languages ✔ time-aligned phonemic transcriptions ✔ phonetic measures for vowel and sibilant tokens

In the beginning, there was SPEECH

SLIDE 4

① WHY we want this data ② HOW we create it ③ CASE STUDIES validating the corpus & illustrating two possible uses

4

This talk

SLIDE 5

5

Why?

SLIDE 6

s s

6

variation

/i/ /u/ /o/ /a/ /e/

⑤  Spanish ⑦  Romanian

/i/ /u/ /o/ /a/ /e/ /ɨ/ /ə/

s s s s ss s s s s s s s s s s s s s s s s s s s s s s s s s s Variation in and across languages

Motivation

We know phonetic variation within a language,  but what are its range and limits? How does the number and set of phonemic   categories influence their realizations?

SLIDE 7

7

How?

SLIDE 8

8

በመጀመሪያ

Resources Needed bəmədʒ məri ja ə

Grapheme-to-Phoneme (G2P)

Amharic ? ? ? ?

① speech ② transcripts ③ phonemic labels

SLIDE 9

9

Amharic

በመጀመሪያ

Resources Needed b ə m ə dʒ m ə r i j a ə

? ? ? ?

Forced alignment

(HMM acoustic model)

① speech ② transcripts ③ phonemic labels ④ time alignments ⑤ phonetic measures

Phonetic measures (R or Praat):

Formant frequencies, mid-frequency peak, duration…

SLIDE 10

with ① speech! and ② transcripts! >1TB 😲 >6 years of CPU compute 😲

10

‘በመጀመሪያ’  Amharic

Extraction Process

CMU Wilderness (2019)

6 9 9 B i b l e r e a d i n g s !

① speech ② transcripts

SLIDE 11

11

Extraction Process

① speech ② transcripts

በመጀመሪያ

Utterance: Chapter: <30s ~30min

CMU Wilderness dataset

1 የፍጥረት አጀማመር በመጀመሪያ እግዚአብሔር (ኤሎሂም) ሰማያትንና ምድርን ፈጠረ። 2 ምድርም ቅርጽ የለሽና ባዶ ነበረች።※ የምድርን ጥልቅ ስፍራ ሁሉ ጨለማ ውጦት ነበር። የእግዚአብሔርም (ኤሎሂም) መንፈስ በውሆች ላይ ይረብብ ነበር። 3 ከዚያም እግዚአብሔር (ኤሎሂም) “ብርሃን ይሁን” አለ፤ ብርሃንም ሆነ። 4 እግዚአብሔርም (ኤሎሂም) ብርሃኑ መልካም እንደሆነ አየ፤ ብርሃኑን ከጨለማ ለየ። 5 እግዚአብሔርም (ኤሎሂም) ብርሃኑን “ቀን”፣ ጨለማውን “ሌሊት” ብሎ ጠራው። መሸ፤ ነጋም፤ የመጀመሪያ ቀን። 6 እግዚአብሔር (ኤሎሂም)፣ “ውሃን ከውሃ የሚለይ ጠፈር በውሆች መካከል ይሁን” አለ። 7 ስለዚህ እግዚአብሔር (ኤሎሂም) ጠፈርን አድርጎ ከጠፈሩ በላይና ከጠፈሩ በታች ያለውን ውሃ ለየ፤

እንዳለውም ሆነ። 8 እግዚአብሔር (ኤሎሂም) ጠፈርን “ሰማይ” ብሎ ጠራው። መሸ፤ ነጋም፤ ሁለተኛ ቀን። 9 ከዚያም እግዚአብሔር (ኤሎሂም)፣ “ከሰማይ በታች ያለው ውሃ በአንድ.  

…

😲

SLIDE 12

read  /ɛ/

/ɹɛt/ /ɹɛd/ phonemes text text G2P

read  /i/

Which phonemes are present?

12

Extraction Process

① speech ② transcripts ③ phonemic labels

SLIDE 13

13

① Linguist-created rules (Epitran)

Phoneme “Transcriptions”—- Grapheme-to-Phoneme

690

.

690

.

690

.

64

1 6 5

39 readings 18 readings All 690 readings

② Wisdom of Crowds (Wiktionary/WikiPron) 

+ our own WFST-models (Phonetisaurus 🦖 )

③ Naïve baseline (Unitran)

😲 “first-pass transcription”

Extraction Process

① speech ② transcripts ③ phonemic labels

(disjoint)

SLIDE 14

57 readings  “High-resource (HR)” ALL 690 readings  “First-pass (FP)” 690 readings

.

“first-pass” .

39 18

We’ll come back to that 😊

🤕 why provide FP alignments for languages with HR ?

14

G2P Summary

SLIDE 15

15

bəmədʒ məri ja ə

Amharic

① speech ② transcripts ③ phonemic labels

Extraction Process

? ? ? ?

Forced alignment

(HMM acoustic model)

SLIDE 16

16

Extraction Process

Amharic

b ə m ə dʒ m ə r i j a ə

① speech ② transcripts ③ phonemic labels ④ time alignments

start time

b

end time

? ? ? ?

Forced alignment

(HMM acoustic model)

SLIDE 17

17

Extraction Process

Amharic

b ə m ə dʒ m ə r i j a ə

Forced alignment

(HMM acoustic model)

① speech ② transcripts ③ phonemic labels ④ time alignments

? ? ? ?

start time

b

end time

SLIDE 18

18

Extraction Process

Amharic

b ə m …

① speech ② transcripts ③ phonemic labels ④ time alignments

Phoneme tokens:

start time

b

end time

SLIDE 19

a a

F1

F3

z z s

Formants

Spectral peak,   COG, Duration, ...

VOWELS SIBILANTS

PRAAT TEXTGRID

19

Extraction Process

① speech ② transcripts ③ phonemic labels ④ time alignments ⑤ phonetic measures

F2 F4

eg high-amplitude   frequencies

Phonetic Measures

SLIDE 20

How much does quality vary across languages?
Are certain phonemes more accurate than others?
What about time alignment accuracy?

🤕 Why provide both Unitran and High-Resource alignments?

20

Evaluation See paper! (+ appendices)

Use multiple sets of alignments to assess Unitran alignment quality

SLIDE 21

690 recorded readings of the Bible
635 languages (ISO 639-3)
70 language families

21

Corpus Summary VoxClamantis v1.0 provides tokens of phoneme- level measurements in hundreds of languages!

>400 million aligned phoneme-level segments
Subsequent phonetic measures for all vowels and sibilants

SLIDE 22

22

Case Studies

SLIDE 23

Vowels 

~50 phonemes

Sibilants 

/s/ /z/

48 High-Resource Readings

23

① R e p r

d

u c t i

n
f

p r e v i

u

s r e s u l t s v a l i d a t e s r e s

u

r c e

Case studies with VoxClamantis v1.0

Case Studies

② R e s e a r c h a t s c a l e s u g g e s t s g e n e r a l c r

s

s

l

i n g u i s t i c p r i n c i p l e s

SLIDE 24

Reproduce previous results,   but with many more languages

24

Formants: Vowels Mid-Freq Peak: Sibilants Are shared characteristics realized uniformly within languages?

Phonetic Uniformity

Supports hypothesis that this may be a   universal principle

(eg: vowel height, POA) (eg: measures strongly correlated)

(eg: language) /i/, /u/: high vowels

/s/, /z/: alveolar  place of articulation

While variation exists across languages,   within language F1 strongly correlated

SLIDE 25

20 vowels

Marshallese  English 

25

Is inventory size correlated with articulatory precision?

4 vowels

ɜ: i: ə u u: ɚ a:

ɑ ɑ:

ɪ ɒ ɔ ɔ: ᵿ e æ ɛ i

Phonetic Dispersion

e æ ɛ i

VOWELS

SLIDE 26

20 vowels

Marshallese  English 

26

Is inventory size correlated with articulatory precision?

4 vowels Phonetic Dispersion

e æ ɛ i ɜ: i: ə u u: ɚ a:

ɑ ɑ:

ɪ ɒ ɔ ɔ: ᵿ e æ ɛ i

SLIDE 27

ɜ: i: ə u u: ɚ a:

ɑ ɑ:

ɪ ɒ ɔ ɔ: ᵿ e æ ɛ i

20 vowels

Marshallese  English 

27

Is inventory size correlated with articulatory precision?

4 vowels Phonetic Dispersion

Previously shown,   but not possible to study at scale

Supports hypothesis that this may [not] be a   universal principle

e æ ɛ i

No

(Spearman ρ = 0.11, p = 0.44;   Pearson r = 0.11, p = 0.46)

SLIDE 28

Utterance alignment C A U T I O N

Automatic phoneme labels

Alignment assessment! Corpus representation 

(e.g. speakers)

Filter -- in future, realign! Better G(+A)2P Curate more resources! Curate more resources!

28

B + A

D

+ A %  

😲

B

SLIDE 29

29

Summary

SLIDE 30

aligned phoneme-level segments in hundreds of languages  57 high-resource, 690 first-pass

😲 methodology is not perfect – version 1.0! ⬇ download 🥴 use for research ⬆ contribute to v2.0!

30

VoxClamantis v1.0 corpus: Conclusion

voxclamantisproject.github.io

SLIDE 31

Q u e s t i

n

s ! C

m

m e n t s ! C

n

t r i b u t i

n

s !

voxclamantisproject.github.io

31

Elizabeth Salesky Eleanor Chodroff Tiago Pimentel Matthew Wiesner Ryan Cotterell

Alan W Black

Jason Eisner

Contact Us!

voxclamantisproject@gmail.com

VoxClamantis in deserto: “a voice crying out in   the wilderness”