A Probabilistic Approach to Diachronic Phonology Alexandre - - PowerPoint PPT Presentation

a probabilistic approach to diachronic phonology
SMART_READER_LITE
LIVE PREVIEW

A Probabilistic Approach to Diachronic Phonology Alexandre - - PowerPoint PPT Presentation

A Probabilistic Approach to Diachronic Phonology Alexandre Bouchard-C ot e Percy Liang Tom Griffiths Dan Klein Languages evolve Gloss Latin Italian Spanish Portuguese Word/verb verbum verbo verbo verbu Fruit fructus frutta


slide-1
SLIDE 1

A Probabilistic Approach to Diachronic Phonology Alexandre Bouchard-Cˆ

e Percy Liang Tom Griffiths Dan Klein

slide-2
SLIDE 2

Languages evolve

Gloss Latin Italian Spanish Portuguese Word/verb verbum verbo verbo verbu Fruit fructus frutta fruta fruta Laugh ridere ridere reir rir Center centrum centro centro centro August augustus agosto agosto agosto Swim natare nuotare nadar nadar . . .

slide-3
SLIDE 3

Language evolution

Gloss Latin Italian Spanish Portuguese Word/verb verbum verbo verbo verbu Fruit fructus frutta fruta fruta Laugh ridere ridere reir rir Center centrum centro centro centro August augustus agosto agosto agosto Swim natare nuotare nadar nadar . . .

  • Phonological rules more regular than

morphological or syntactic ones

  • basis of the comparative method
slide-4
SLIDE 4

Example of a mutation process as seen by the comparative method

la vl it ib es pt

  • ib : Proto-ibero Romance
  • vl : Vulgar Latin
slide-5
SLIDE 5

Example of a mutation process as seen by the comparative method

la vl it ib es pt

u → o / some context m → / some context ....

........ .... .. . ........ .... .. . ........ .... .. . ........ .... .. .

  • Deterministic re-write rules at each branch
  • Activated by some context
slide-6
SLIDE 6

Example of a mutation process as seen by the comparative method

/werbum/ (la) /verbo/ (vl) /veɾbu/ (pt) /beɾbo/ (es) /veɾbo/ (ib) /vɛɾbo/ (it)

u → o / some context m → / some context ....

........ .... .. . ........ .... .. . ........ .... .. . ........ .... .. .

Gloss Latin Italian Spanish Portuguese Word/verb verbum verbo verbo verbu

slide-7
SLIDE 7

Example of a mutation process as seen by the comparative method

/kentrum/ (la) /ʧentro/ (vl) /semtɾu/ (pt) /sentɾo/ (es) /sentɾo/ (ib) /ʧɛntro/ (it)

u → o / some context m → / some context ....

........ .... .. . ........ .... .. . ........ .... .. . ........ .... .. .

Gloss Latin Italian Spanish Portuguese Word/verb verbum verbo verbo verbu Center centrum centro centro centro . . .

slide-8
SLIDE 8

Example of a mutation process as seen by the comparative method

la vl it ib es pt

  • In practice, the ancient words and/or the evolutionary tree are

unknown

  • Methodology: manually inspecting the data
slide-9
SLIDE 9

Our work:

  • A probabilistic model that captures phonological aspects of

language change.

  • Many usages:

? ?

/kinto/

?

/kwinto/ Reconstruction of word forms (ancient and modern)

slide-10
SLIDE 10

Our work:

  • A probabilistic model that captures phonological aspects of

language change.

  • Many usages:

/kwintam/ /kinta/ /kinto/ /kimtu/ /kwinto/

? ? ? ?

Inference of phonological rules

slide-11
SLIDE 11

Our work:

  • A probabilistic model that captures phonological aspects of

language change.

  • Many usages:

/kwintam/ / k i n t a / /kwinto/ /kimtu/ /kinto/ /kwintam/ / k i n t a / /kinto/ /kimtu/ /kwinto/

vs.

Selection of phylogenies

slide-12
SLIDE 12

Our work:

  • A probabilistic model that captures phonological aspects of

language change.

  • Many usages:

– Reconstruction of word forms (ancient and modern) – Inference of phonological rules – Selection of phylogenies

  • An inference procedure and experiments on all three applications
  • A new task and evaluation framework
slide-13
SLIDE 13

The model

slide-14
SLIDE 14

Big picture

la vl it es

  • Assume for now that the tree topology is known
slide-15
SLIDE 15

Big picture

/werbum/ /veɾbu/ /beɾbo/ /vɛrbo/ /kentrum/ /ʧentro/ /sentɾo/ /ʧentro/ ... ... ... ...

la vl it es

  • Assume for now that the tree topology is known
  • Track individual words
slide-16
SLIDE 16

Stochastic edit model

f

  • k

u s f k w

  • ɔ

# #

/werbum/ /veɾbu/ /fokus/ /fwɔko/ ... ... ... ... ... ...

  • Let’s look at how a single words evolve along one of the edges of

the tree

  • Mutation of Latin FOCUS (/fokus/)

into Italian fuoco (/fwOko/) (fire)

slide-17
SLIDE 17

Stochastic edit model: operations

f

  • k

u s f k w

  • ɔ

# #

  • Substitution
slide-18
SLIDE 18

Stochastic edit model: operations

f

  • k

u s f k w

  • ɔ

# #

  • Substitution (incl. self-substitution)
slide-19
SLIDE 19

Stochastic edit model: operations

f

  • k

u s f k w

  • ɔ

# #

  • Substitution (incl. self-substitution)
  • Insertion
slide-20
SLIDE 20

Stochastic edit model: operations

f

  • k

u s f k w

  • ɔ

# #

  • Substitution (incl. self-substitution)
  • Insertion
  • Deletion
slide-21
SLIDE 21

Stochastic edit model: context

f

  • k

u s f ? w

  • ɔ

# #

  • Distribution over operations conditioned on adjacent phonemes
slide-22
SLIDE 22

Stochastic edit model: generation process

f

  • k

u s f k w

  • ɔ

# #

slide-23
SLIDE 23

Stochastic edit model: generation process

f

  • k

u s

# #

?

slide-24
SLIDE 24

Stochastic edit model: generation process

f

  • k

u s

# #

f w

  • P(f → f w / #

V) = 0.05

slide-25
SLIDE 25

Stochastic edit model: generation process

f

  • k

u s f w ?

# #

  • P(f → f w / #

V) = 0.05

slide-26
SLIDE 26

Stochastic edit model: generation process

f

  • k

u s f w

ɔ

# #

  • P(f → f w / #

V) = 0.05

  • P(o → O / C

V) = 0.1

slide-27
SLIDE 27

Stochastic edit model: generation process

f

  • k

u s f k w

  • ɔ

# #

  • P(f → f w / #

V) = 0.05

  • P(o → O / C

V) = 0.1

  • . . .
  • P(/fokus/ → /fwOko/)) = 0.05 × 0.1 × · · ·
slide-28
SLIDE 28

Edit parameters

/werbum/ /veɾbu/ /beɾbo/ /vɛrbo/ /kentrum/ /ʧentro/ /sentɾo/ /ʧentro/ ... ... ... ...

la vl it es

slide-29
SLIDE 29

Edit parameters

/werbum/ /veɾbu/ /beɾbo/ /vɛrbo/ /kentrum/ /ʧentro/ /sentɾo/ /ʧentro/ ... ... ... ...

la vl it es

θla→vl θla→es θla→es P

  • One set of parameter θA→B for each edge A → B in the tree
  • Shared across all word forms evolving along this edge
slide-30
SLIDE 30

Edit parameters

/veɾbu/ /ʧentro/...

θla→vl

  • θA→B specifies P(operation|context)

context

  • peration

P(operation|context) u m # deletion 0.1 u m # substitution to /m/ 0.8 u m # substitution to /b/ 0.1 a c b deletion 0.8 a c b insertion of c 0.1 . . . . . . . . . . . .

slide-31
SLIDE 31

Distribution on the edit parameters

  • Too many parameters
  • Addressed by:

– Sparsity prior: independent Dirichlet priors (one for each context) – Group context distributions. Example: context

  • peration

P(operation|context) V m # deletion 0.1 V m # substitution to /a/ 0.8 V m # substitution to /b/ 0.1 V c C deletion 0.8 V c C insertion of c 0.1 . . . . . . . . . . . .

slide-32
SLIDE 32

Inference and experiments

slide-33
SLIDE 33

Inference: EM

  • Exact E step is intractable

– We use a stochastic E step based on Gibbs sampling

  • E: fix the edit parameters, resample the derivations
  • M: update the edit parameters from expected edit counts
slide-34
SLIDE 34

Automatic extraction of a Romance corpus

Wiktionary

XML dump

  • Bible

Align. Closure Cognate detector

Europarl

Align.

  • Noisier than manually curated cognate lists
  • More data available
  • Our model overcomes this noise

Data available online: http://nlp.cs.berkeley.edu/pages/historical.html

slide-35
SLIDE 35

Reconstruction of ancient word forms

  • Task: reconstruction of Latin given all of the Spanish and Italian

words, and some of the Latin words

  • Evaluation: uniform cost edit distance on held-out data
  • Baseline: pick one of the modern languages at random
slide-36
SLIDE 36

Reconstruction of ancient word forms

  • Task: reconstruction of Latin given all of the Spanish and Italian

words, and some of the Latin words

  • Example: “teeth”, nearly correctly reconstructed

/dEntis/ /djEntes/ /dEnti/

i → E E → j E s →

  • Numbers:

Language Baseline Model Improvement Latin 2.84 2.34 9%

slide-37
SLIDE 37

Reconstruction of word forms

  • Evaluation: uniform cost edit distance on held-out data
  • Baseline: pick one of the modern languages at random
  • Example: “teeth”, nearly correctly reconstructed

/dEntis/ /djEntes/ /dEnti/

i → E E → j E s →

  • Numbers:

Language Baseline Model Improvement Latin 2.84 2.34 9% Spanish 3.59 3.21 11%

slide-38
SLIDE 38

Inference of phonological rules

la vl it ib es pt

  • ib : Proto-ibero Romance
  • vl : Vulgar Latin
slide-39
SLIDE 39

Inference of phonological rules

m → / _ # 0.92 u → o / _ 0.87 ..... ... ... . ......... ..... ....... .... .... ... ... . ......... ..... ....... .... .... ... ... . ......... ..... ....... .... .... ... ... . ......... ..... ....... .... .... ... ... .

la vl it ib es pt

  • Reconstruct the internal nodes
  • Focus on the rules used most often during the last E step
slide-40
SLIDE 40

Hypothesized derivation for “word” along with top rules

/werbum/ (la) /verbo/ (vl)

m → u → o w → v

... ...

r → ɾ e → ɛ m → / _ # u → o / _ w → v / many environments ...

  • Comparison with historical evidence: the Appendix Probi

coluber non colober passim non passi

slide-41
SLIDE 41

Hypothesized derivation for “word” along with top rules

/veɾbu/ (pt) /beɾbo/ (es) /veɾbo/ (ib)

v → b u → o u → o / many environments v → b / init. or intervocal. t → t e / ALV _ # ... r → ɾ

...

  • /v/ to /b/ fortition
  • /s/ to /z/ voicing in Italian
slide-42
SLIDE 42

Selection of phylogenies

slide-43
SLIDE 43

Inference of topology

la pt es it

?

slide-44
SLIDE 44

Example of previous approaches

  • Gray and Atkinson, 2003
  • Coarse encoding:

Latin mandere (to chew) French manger Italian mangiare Latin comedere (to consume) Spanish comer Portuguese comer Meaning Eat · · · Cognate set 1 2 · · · Latin 1 1 · · · French 1 · · · Italian 1 · · · Spanish 1 · · · Portuguese 1 · · ·

  • These characters evolve independently in their model
  • Lots of information discarded
slide-45
SLIDE 45

Comparison

/werbum/ /veɾbu/ /beɾbo/ /vɛrbo/ /kentrum/ /ʧentro/ /sentɾo/ /ʧentro/ ... ... ... ...

la vl it es Our samples look like this

slide-46
SLIDE 46

Comparison

1 1

... ... ... ...

la vl it es Atkinson’s

slide-47
SLIDE 47

What we did

  • Present good vs. bad topologies and compute the likelihood ratio

la it es pt la pt es it la es it pt

  • this can be turned into a full topology inference algorithm using

the quartet method [Erdos et al., 1996]

slide-48
SLIDE 48

Conclusion

  • Introduced a probabilistic approach to diachronic phonology
  • Enables reconstruction of ancient and modern word forms,

phonological rules and tree topologies

  • Future work:

– We are scaling it up to larger phylogenies – We are working on an extension using a log-linear parametrization of the contexts, reminiscent of stochastic OT

  • Data available online:

http://nlp.cs.berkeley.edu/pages/historical.html