12/1/2014 1
Natural Language Processing
Diachronics
Dan Klein – UC Berkeley
Includes joint work with Alex Bouchard‐Cote, Tom Griffiths, and David Hall
Natural Language Processing Diachronics Dan Klein UC Berkeley - - PowerPoint PPT Presentation
12/1/2014 Natural Language Processing Diachronics Dan Klein UC Berkeley Includes joint work with Alex Bouchard Cote, Tom Griffiths, and David Hall 1 12/1/2014 The Task 2 12/1/2014 Lexical Reconstruction Latin focus French Spanish
12/1/2014 1
Dan Klein – UC Berkeley
Includes joint work with Alex Bouchard‐Cote, Tom Griffiths, and David Hall
12/1/2014 2
12/1/2014 3
Latin focus
French Spanish Italian Portuguese feu fuego fuoco fogo
12/1/2014 4
phylogeny is known
biology, e.g. work by Warnow, Felsenstein, Steele…
Warnow et al., Gray and Atkinson… http://andromeda.rutgers.edu/~jlynch/language.html
12/1/2014 5
“camera obscura”
before the initial /t/ dropped
12/1/2014 6
12/1/2014 7
12/1/2014 8
12/1/2014 9
“time” = teem “time” = taim
12/1/2014 10
12/1/2014 11
12/1/2014 12
12/1/2014 13
FR IT PT ES
12/1/2014 14
FR IT PT ES
12/1/2014 15
12/1/2014 16
12/1/2014 17
12/1/2014 18
[cf. Felsenstein 81]
12/1/2014 19
12/1/2014 20
IT ES PT IB LA
[Bouchard‐Cote, Griffiths, Klein, 07]
12/1/2014 21
12/1/2014 22
12/1/2014 23
/fokus/ /fweɋo/ /fogo/ /fogo/ /fwꜼ ko/
12/1/2014 24
(expected) sound change counts
theta
counts given parameters
valued
/fokus/ /fweɋo/ /fogo/ /fogo/ /fwꜼ ko/ /fokus/ /fweɋo/ /fogo/ /fogo/ /fwꜼ ko/
12/1/2014 25
Standard approach, e.g. [Holmes 2001]: Gibbs sampling each sequence
[Holmes 01, Bouchard‐Cote, Griffiths, Klein 07]
12/1/2014 26
12/1/2014 27
12/1/2014 28
12/1/2014 29
12/1/2014 30
12/1/2014 31
[Bouchard‐Cote, Griffiths, Klein, 08]
12/1/2014 32
12/1/2014 33
12/1/2014 34
12/1/2014 35
12/1/2014 36
12/1/2014 37
[Bouchard‐Cote, Hall, Griffiths, Klein, 13]
12/1/2014 38
Number of modern languages used Mean edit distance
12/1/2014 39
*The model did not have features encoding natural classes
12/1/2014 40
12/1/2014 41
1955: Functional Load Hypothesis (FLH): Sound changes are less frequent when they merge phonemes with high functional load [Martinet, 55] 1967: Previous research within linguistics: “FLH does not seem to be supported by the data” [King, 67] (Based on 4 languages as noted by [Hocket, 67; Surandran et al., 06]) Our approach: we reexamined the question with two orders
Klein, 13]
12/1/2014 42
Functional load as computed by [King, 67]
Merger posterior probability
Each dot is a sound change identified by the system
12/1/2014 43
Functional load as computed by [King, 67] Merger posterior probability
12/1/2014 44
12/1/2014 45
/fweɋo/ /fogo/ /fwꜼ ko/ /berǍ
/vƌrbo/ /vƌrbo/ /tṏ ƌntro/ /sentro/ /sƌntro/
[Hall and Klein, 11]
12/1/2014 46
10 20 30 40 50 60 70
Dutch Danish Swedish Spanish Portuguese Slovene Chinese English
WG NG RM G IE GL
[Berg‐Kirkpatrick and Klein, 07]
12/1/2014 47
[Rafferty, Griffiths, and Klein, 09]