Natural Language Processing Diachronics Dan Klein UC Berkeley - - PowerPoint PPT Presentation

natural language processing
SMART_READER_LITE
LIVE PREVIEW

Natural Language Processing Diachronics Dan Klein UC Berkeley - - PowerPoint PPT Presentation

12/1/2014 Natural Language Processing Diachronics Dan Klein UC Berkeley Includes joint work with Alex Bouchard Cote, Tom Griffiths, and David Hall 1 12/1/2014 The Task 2 12/1/2014 Lexical Reconstruction Latin focus French Spanish


slide-1
SLIDE 1

12/1/2014 1

Natural Language Processing

Diachronics

Dan Klein – UC Berkeley

Includes joint work with Alex Bouchard‐Cote, Tom Griffiths, and David Hall

slide-2
SLIDE 2

12/1/2014 2

The Task

slide-3
SLIDE 3

12/1/2014 3

Latin focus

Lexical Reconstruction

French Spanish Italian Portuguese feu fuego fuoco fogo

slide-4
SLIDE 4

12/1/2014 4

Tree of Languages

  • We assume the

phylogeny is known

  • Much work in

biology, e.g. work by Warnow, Felsenstein, Steele…

  • Also in linguistics, e.g.

Warnow et al., Gray and Atkinson… http://andromeda.rutgers.edu/~jlynch/language.html

slide-5
SLIDE 5

12/1/2014 5

Evolution through Sound Changes

camera /kamera/

Latin

chambre /ṏ ambЙ /

French Deletion: /e/, /a/ Change: /k/ .. /tṏ

/ .. /ṏ /

Insertion: /b/

  • Eng. camera from Latin,

“camera obscura”

  • Eng. chamber from Old Fr.

before the initial /t/ dropped

slide-6
SLIDE 6

12/1/2014 6

Changes are Systematic

camra /kamra/ camera /kamera/ e  _ numrus /numrus/ numerus /numerus/ e  _

slide-7
SLIDE 7

12/1/2014 7

Changes are Contextual

camra /kamra/ camera /kamera/ e  _ / after stress e  _

slide-8
SLIDE 8

12/1/2014 8

Changes Have Structure

cambra /kambra/ camra /kamra/ _  b / m_r _  b _  [stop x] / [nasal x]_r

slide-9
SLIDE 9

12/1/2014 9

Changes are Systematic

English Great Vowel Shift (Simplified!)

e i a

“time” = teem “time” = taim

slide-10
SLIDE 10

12/1/2014 10

Diachronic Evidence

tonitru non tonotru tonight not tonite Yahoo! Answers [ca 2000] Appendix Probi [ca 300]

slide-11
SLIDE 11

12/1/2014 11

Synchronic (Comparative) Evidence

Key idea: changes occur uniformly across the lexicon

slide-12
SLIDE 12

12/1/2014 12

The Data

slide-13
SLIDE 13

12/1/2014 13

The Data

  • Data sets
  • Small: Romance
  • French, Italian, Portuguese, Spanish
  • 2344 words
  • Complete cognate sets
  • Target: (Vulgar) Latin

FR IT PT ES

slide-14
SLIDE 14

12/1/2014 14

The Data

  • Data sets
  • Small: Romance
  • French, Italian, Portuguese, Spanish
  • 2344 words
  • Complete cognate sets
  • Target: (Vulgar) Latin
  • Large: Austronesian
  • 637 languages
  • 140K words
  • Incomplete cognate sets
  • Target: Proto‐Austronesian

FR IT PT ES

slide-15
SLIDE 15

12/1/2014 15

Austronesian

slide-16
SLIDE 16

12/1/2014 16

Austronesian Examples

From the Austronesian Basic Vocabulary Database

slide-17
SLIDE 17

12/1/2014 17

The Model

slide-18
SLIDE 18

12/1/2014 18

Simple Model: Single Characters

C G C C C C G G

G C G G

[cf. Felsenstein 81]

slide-19
SLIDE 19

12/1/2014 19

Changes are Systematic

/fokus/ /fweɋo/ /fogo/ /fogo/ /fwꜼ ko/ /fokus/ /fweɋo/ /fogo/ /fogo/ /fwꜼ ko/ /kentrum/ /sentro/ /sentro/ /sentro/ /tṏ ƌntro/

slide-20
SLIDE 20

12/1/2014 20

Parameters are Branch‐Specific

focus /fokus/ fuego /fweɋo/ /fogo/ fogo /fogo/ fuoco /fwꜼ ko/ IB

IT ES PT IB LA

ES IT PT

[Bouchard‐Cote, Griffiths, Klein, 07]

slide-21
SLIDE 21

12/1/2014 21

Edits are Contextual, Structured

/fokus/ /fwꜼ ko/

f #

  • f

# w Ꜽ

IT

slide-22
SLIDE 22

12/1/2014 22

Inference

slide-23
SLIDE 23

12/1/2014 23

Learning: Objective

/fokus/ /fweɋo/ /fogo/ /fogo/ /fwꜼ ko/

z w

slide-24
SLIDE 24

12/1/2014 24

Learning: EM

  • M‐Step
  • Find parameters which fit

(expected) sound change counts

  • Easy: gradient ascent on

theta

  • E‐Step
  • Find (expected) change

counts given parameters

  • Hard: variables are string‐

valued

/fokus/ /fweɋo/ /fogo/ /fogo/ /fwꜼ ko/ /fokus/ /fweɋo/ /fogo/ /fogo/ /fwꜼ ko/

slide-25
SLIDE 25

12/1/2014 25

Computing Expectations

‘grass’

Standard approach, e.g. [Holmes 2001]: Gibbs sampling each sequence

[Holmes 01, Bouchard‐Cote, Griffiths, Klein 07]

slide-26
SLIDE 26

12/1/2014 26

A Gibbs Sampler

‘grass’

slide-27
SLIDE 27

12/1/2014 27

A Gibbs Sampler

‘grass’

slide-28
SLIDE 28

12/1/2014 28

A Gibbs Sampler

‘grass’

slide-29
SLIDE 29

12/1/2014 29

Getting Stuck

How could we jump to a state where the liquids /r/ and /l/ have a common ancestor?

?

slide-30
SLIDE 30

12/1/2014 30

Getting Stuck

slide-31
SLIDE 31

12/1/2014 31

Efficient Sampling: Vertical Slices

Single Sequence Resampling Ancestry Resampling

[Bouchard‐Cote, Griffiths, Klein, 08]

slide-32
SLIDE 32

12/1/2014 32

Results

slide-33
SLIDE 33

12/1/2014 33

Results: Romance

slide-34
SLIDE 34

12/1/2014 34

Learned Rules / Mutations

slide-35
SLIDE 35

12/1/2014 35

Learned Rules / Mutations

slide-36
SLIDE 36

12/1/2014 36

Results: Austronesian

slide-37
SLIDE 37

12/1/2014 37

Examples: Austronesian

[Bouchard‐Cote, Hall, Griffiths, Klein, 13]

slide-38
SLIDE 38

12/1/2014 38

Result: More Languages Help

Number of modern languages used Mean edit distance

Distance from Blust [1993] Reconstructions

slide-39
SLIDE 39

12/1/2014 39

Visualization: Learned Universals

*The model did not have features encoding natural classes

slide-40
SLIDE 40

12/1/2014 40

Regularity and Functional Load

In a language, some pairs of sounds are more contrastive than others (higher functional load) Example: English p/d versus t/th High Load: p/d: pot/dot, pin/din dress/press, pew/dew, ... Low Load: t/th: thin/tin

slide-41
SLIDE 41

12/1/2014 41

Functional Load: Timeline

1955: Functional Load Hypothesis (FLH): Sound changes are less frequent when they merge phonemes with high functional load [Martinet, 55] 1967: Previous research within linguistics: “FLH does not seem to be supported by the data” [King, 67] (Based on 4 languages as noted by [Hocket, 67; Surandran et al., 06]) Our approach: we reexamined the question with two orders

  • f magnitude more data [Bouchard‐Cote, Hall, Griffiths,

Klein, 13]

slide-42
SLIDE 42

12/1/2014 42

Regularity and Functional Load

Functional load as computed by [King, 67]

Data: only 4 languages from the Austronesian data

Merger posterior probability

Each dot is a sound change identified by the system

slide-43
SLIDE 43

12/1/2014 43

Regularity and Functional Load

Data: all 637 languages from the Austronesian data

Functional load as computed by [King, 67] Merger posterior probability

slide-44
SLIDE 44

12/1/2014 44

Extensions

slide-45
SLIDE 45

12/1/2014 45

Cognate Detection

/fweɋo/ /fogo/ /fwꜼ ko/ /berǍ

  • /

/vƌrbo/ /vƌrbo/ /tṏ ƌntro/ /sentro/ /sƌntro/

 ‘fire’

[Hall and Klein, 11]

slide-46
SLIDE 46

12/1/2014 46

Grammar Induction

10 20 30 40 50 60 70

Dutch Danish Swedish Spanish Portuguese Slovene Chinese English

WG NG RM G IE GL

Avg rel gain: 29%

[Berg‐Kirkpatrick and Klein, 07]

slide-47
SLIDE 47

12/1/2014 47

Language Diversity

Why are the languages of the world so similar?

Universal grammar answer: Hardware constraints Common source answer: Not much time has passed

[Rafferty, Griffiths, and Klein, 09]