Natural Language Processing Diachronics Dan Klein UC Berkeley - - PowerPoint PPT Presentation

▶

Feb 17, 2024 607 likes •1.09k views

12/1/2014 Natural Language Processing Diachronics Dan Klein UC Berkeley Includes joint work with Alex Bouchard Cote, Tom Griffiths, and David Hall 1 12/1/2014 The Task 2 12/1/2014 Lexical Reconstruction Latin focus French Spanish

SLIDE 1

12/1/2014 1

Natural Language Processing

Diachronics

Dan Klein – UC Berkeley

Includes joint work with Alex Bouchard‐Cote, Tom Griffiths, and David Hall

SLIDE 2

12/1/2014 2

The Task

SLIDE 3

12/1/2014 3

Latin focus

Lexical Reconstruction

French Spanish Italian Portuguese feu fuego fuoco fogo

SLIDE 4

12/1/2014 4

Tree of Languages

We assume the

phylogeny is known

Much work in

biology, e.g. work by Warnow, Felsenstein, Steele…

Also in linguistics, e.g.

Warnow et al., Gray and Atkinson… http://andromeda.rutgers.edu/~jlynch/language.html

SLIDE 5

12/1/2014 5

Evolution through Sound Changes

camera /kamera/

Latin

chambre /ṏ ambЙ /

French Deletion: /e/, /a/ Change: /k/ .. /tṏ

/ .. /ṏ /

Insertion: /b/

Eng. camera from Latin,

“camera obscura”

Eng. chamber from Old Fr.

before the initial /t/ dropped

SLIDE 6

12/1/2014 6

Changes are Systematic

camra /kamra/ camera /kamera/ e  _ numrus /numrus/ numerus /numerus/ e  _

SLIDE 7

12/1/2014 7

Changes are Contextual

camra /kamra/ camera /kamera/ e  _ / after stress e  _

SLIDE 8

12/1/2014 8

Changes Have Structure

cambra /kambra/ camra /kamra/ _  b / m_r _  b _  [stop x] / [nasal x]_r

SLIDE 9

12/1/2014 9

Changes are Systematic

English Great Vowel Shift (Simplified!)

e i a

“time” = teem “time” = taim

SLIDE 10

12/1/2014 10

Diachronic Evidence

tonitru non tonotru tonight not tonite Yahoo! Answers [ca 2000] Appendix Probi [ca 300]

SLIDE 11

12/1/2014 11

Synchronic (Comparative) Evidence

Key idea: changes occur uniformly across the lexicon

SLIDE 12

12/1/2014 12

The Data

SLIDE 13

12/1/2014 13

The Data

Data sets
Small: Romance
French, Italian, Portuguese, Spanish
2344 words
Complete cognate sets
Target: (Vulgar) Latin

FR IT PT ES

SLIDE 14

12/1/2014 14

The Data

Data sets
Small: Romance
French, Italian, Portuguese, Spanish
2344 words
Complete cognate sets
Target: (Vulgar) Latin
Large: Austronesian
637 languages
140K words
Incomplete cognate sets
Target: Proto‐Austronesian

FR IT PT ES

SLIDE 15

12/1/2014 15

Austronesian

SLIDE 16

12/1/2014 16

Austronesian Examples

From the Austronesian Basic Vocabulary Database

SLIDE 17

12/1/2014 17

The Model

SLIDE 18

12/1/2014 18

Simple Model: Single Characters

C G C C C C G G

G C G G

[cf. Felsenstein 81]

SLIDE 19

12/1/2014 19

Changes are Systematic

/fokus/ /fweɋo/ /fogo/ /fogo/ /fwꜼ ko/ /fokus/ /fweɋo/ /fogo/ /fogo/ /fwꜼ ko/ /kentrum/ /sentro/ /sentro/ /sentro/ /tṏ ƌntro/

SLIDE 20

12/1/2014 20

Parameters are Branch‐Specific

focus /fokus/ fuego /fweɋo/ /fogo/ fogo /fogo/ fuoco /fwꜼ ko/ IB

IT ES PT IB LA

ES IT PT

[Bouchard‐Cote, Griffiths, Klein, 07]

SLIDE 21

12/1/2014 21

Edits are Contextual, Structured

/fokus/ /fwꜼ ko/

f #

# w Ꜽ

IT

SLIDE 22

12/1/2014 22

Inference

SLIDE 23

12/1/2014 23

Learning: Objective

/fokus/ /fweɋo/ /fogo/ /fogo/ /fwꜼ ko/

z w

SLIDE 24

12/1/2014 24

Learning: EM

M‐Step
Find parameters which fit

(expected) sound change counts

Easy: gradient ascent on

theta

E‐Step
Find (expected) change

counts given parameters

Hard: variables are string‐

valued

/fokus/ /fweɋo/ /fogo/ /fogo/ /fwꜼ ko/ /fokus/ /fweɋo/ /fogo/ /fogo/ /fwꜼ ko/

SLIDE 25

12/1/2014 25

Computing Expectations

‘grass’

Standard approach, e.g. [Holmes 2001]: Gibbs sampling each sequence

[Holmes 01, Bouchard‐Cote, Griffiths, Klein 07]

SLIDE 26

12/1/2014 26

A Gibbs Sampler

‘grass’

SLIDE 27

12/1/2014 27

A Gibbs Sampler

‘grass’

SLIDE 28

12/1/2014 28

A Gibbs Sampler

‘grass’

SLIDE 29

12/1/2014 29

Getting Stuck

How could we jump to a state where the liquids /r/ and /l/ have a common ancestor?

?

SLIDE 30

12/1/2014 30

Getting Stuck

SLIDE 31

12/1/2014 31

Efficient Sampling: Vertical Slices

Single Sequence Resampling Ancestry Resampling

[Bouchard‐Cote, Griffiths, Klein, 08]

SLIDE 32

12/1/2014 32

Results

SLIDE 33

12/1/2014 33

Results: Romance

SLIDE 34

12/1/2014 34

Learned Rules / Mutations

SLIDE 35

12/1/2014 35

Learned Rules / Mutations

SLIDE 36

12/1/2014 36

Results: Austronesian

SLIDE 37

12/1/2014 37

Examples: Austronesian

[Bouchard‐Cote, Hall, Griffiths, Klein, 13]

SLIDE 38

12/1/2014 38

Result: More Languages Help

Number of modern languages used Mean edit distance

Distance from Blust [1993] Reconstructions

SLIDE 39

12/1/2014 39

Visualization: Learned Universals

*The model did not have features encoding natural classes

SLIDE 40

12/1/2014 40

Regularity and Functional Load

In a language, some pairs of sounds are more contrastive than others (higher functional load) Example: English p/d versus t/th High Load: p/d: pot/dot, pin/din dress/press, pew/dew, ... Low Load: t/th: thin/tin

SLIDE 41

12/1/2014 41

Functional Load: Timeline

1955: Functional Load Hypothesis (FLH): Sound changes are less frequent when they merge phonemes with high functional load [Martinet, 55] 1967: Previous research within linguistics: “FLH does not seem to be supported by the data” [King, 67] (Based on 4 languages as noted by [Hocket, 67; Surandran et al., 06]) Our approach: we reexamined the question with two orders

f magnitude more data [Bouchard‐Cote, Hall, Griffiths,

Klein, 13]

SLIDE 42

12/1/2014 42

Regularity and Functional Load

Functional load as computed by [King, 67]

Data: only 4 languages from the Austronesian data

Merger posterior probability

Each dot is a sound change identified by the system

SLIDE 43

12/1/2014 43

Regularity and Functional Load

Data: all 637 languages from the Austronesian data

Functional load as computed by [King, 67] Merger posterior probability

SLIDE 44

12/1/2014 44

Extensions

SLIDE 45

12/1/2014 45

Cognate Detection

/fweɋo/ /fogo/ /fwꜼ ko/ /berǍ

/vƌrbo/ /vƌrbo/ /tṏ ƌntro/ /sentro/ /sƌntro/

 ‘fire’

[Hall and Klein, 11]

SLIDE 46

12/1/2014 46

Grammar Induction

10 20 30 40 50 60 70

Dutch Danish Swedish Spanish Portuguese Slovene Chinese English

WG NG RM G IE GL

Avg rel gain: 29%

[Berg‐Kirkpatrick and Klein, 07]

SLIDE 47

12/1/2014 47

Language Diversity

Why are the languages of the world so similar?

Universal grammar answer: Hardware constraints Common source answer: Not much time has passed

[Rafferty, Griffiths, and Klein, 09]