A Bayesian test of the lineage-specificity of word-order - - PowerPoint PPT Presentation

a bayesian test of the lineage specificity of word order
SMART_READER_LITE
LIVE PREVIEW

A Bayesian test of the lineage-specificity of word-order - - PowerPoint PPT Presentation

A Bayesian test of the lineage-specificity of word-order correlations Gerhard Jger Tbingen University Workshop The origins and evolution of word order April 16, 2018 Gerhard Jger (Tbingen) Word-order Universals Evolang 2018 1 / 23


slide-1
SLIDE 1

A Bayesian test of the lineage-specificity of word-order correlations

Gerhard Jäger

Tübingen University

Workshop The origins and evolution of word order

April 16, 2018

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 1 / 23

slide-2
SLIDE 2

Introduction

Introduction

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 2 / 23

slide-3
SLIDE 3

Introduction

Word order correlations

Greenberg, Keenan, Lehmann etc.: general tendency for languages to be either consistently head-initial or consistently head-final alternative account (Dryer, Hawkins): phrases are consistently left- or consistently right-branching can be formalized as collection of implicative universals, such as With overwhelmingly greater than chance frequency, languages with normal SOV order are postpositional. (Greenberg’s Universal 4) both generativist and functional/historical explanations in the literature

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 3 / 23

slide-4
SLIDE 4

Introduction

Phylogenetic non-independence

languages are phylogenetically structured if two closely related languages display the same pattern, these are not two independent data points ⇒ we need to control for phylogenetic dependencies

(from Dunn et al., 2011) Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 4 / 23

slide-5
SLIDE 5

Introduction

Phylogenetic non-independence

Maslova (2000): “If the A-distribution for a given typology cannot be as- sumed to be stationary, a distributional universal cannot be discovered on the basis of purely synchronic statistical data.” “In this case, the only way to discover a distributional universal is to estimate transition probabilities and as it were to ‘predict’ the stationary distribution on the basis

  • f the equations in (1).”

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 5 / 23

slide-6
SLIDE 6

The phylogenetic comparative method

The phylogenetic comparative method

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 6 / 23

slide-7
SLIDE 7

The phylogenetic comparative method

Modeling language change

Markov process

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 7 / 23

slide-8
SLIDE 8

The phylogenetic comparative method

Modeling language change

Markov process Phylogeny

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 7 / 23

slide-9
SLIDE 9

The phylogenetic comparative method

Modeling language change

Markov process Phylogeny Branching process

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 7 / 23

slide-10
SLIDE 10

The phylogenetic comparative method

Estimating rates of change

if phylogeny and states of extant languages are known...

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 8 / 23

slide-11
SLIDE 11

The phylogenetic comparative method

Estimating rates of change

if phylogeny and states of extant languages are known... ... transition rates, stationary probabilities and ancestral states can be estimated based on Markov model

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 8 / 23

slide-12
SLIDE 12

The phylogenetic comparative method

Correlation between features

Pagel and Meade (2006) construct two types of Markov processes:

independent: the two features evolve according to independend Markov processes dependent: rates of change in one feature depends on state of the other feature

fit both models to the data apply statistical model comparison

VO OV PN NP VO/PN OV/NP OV/PN VO/NP

Independent model Dependent model

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 9 / 23

slide-13
SLIDE 13

Dunn et al. (2011)

Dunn et al. (2011)

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 10 / 23

slide-14
SLIDE 14

Dunn et al. (2011)

Dunn et al. (2011)

all 28 pairs of 8 word-order features considered 4 language families: Austronesian, Bantu, Indo-European, and Uto-Aztecan main finding: wildly different results between families conclusion: word-order correlations are lineage-specific

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 11 / 23

slide-15
SLIDE 15

Universal and lineage-specific models

Universal and lineage-specific models

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 12 / 23

slide-16
SLIDE 16

Universal and lineage-specific models

This study

Experiments

1

replication of Dunn et al. (2011) with different data

2

model comparison: universal vs. lineage-specific correlations

3

word-order correlations across a comprehensive collection of language families

M1 trees1 data1 M2 trees2 data2 M3 trees3 data3 M4 trees4 data4 M trees1 data1 trees2 data2 trees3 data3 trees4 data4

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 13 / 23

slide-17
SLIDE 17

Universal and lineage-specific models

Data

word-order data: WALS phylogeny:

ASJP word lists (Wichmann et al., 2016) feature extraction (automatic cognate detection, inter alia) ❀ character matrix Maximum-Likelihood phylogenetic inference with Glottolog (Hammarström et al., 2016) tree as backbone advantages over hand-coded Swadesh lists

applicable across language familes covers more languages than those for which expert cognate judgments are available

1004 languages in total Austronesian: 123; Bantu: 41; Indo-European: 53; Uto-Aztecan: 13 34 families with at least five languages; comprising 768 languages in total

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 14 / 23

slide-18
SLIDE 18

Universal and lineage-specific models

Phylogenetic tree sample

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 15 / 23

slide-19
SLIDE 19

Universal and lineage-specific models

Replication of Dunn et al.

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 16 / 23

slide-20
SLIDE 20

Universal and lineage-specific models

Comparing universal and lineage-specific models

so far: fitting a separate model for each language family

advantage: good fit of the lineage-specific data disadvantage: many parameters (8 per family for a dependent model)

statistical model comparison: quantifying to what degree the data support the excess parameters of lineage-specific models models to be compared:

universal: one set of rates (8 parameters), applying to all 4 families lineage specific: a separate set of rates for each family

comparison via Bayes Factor (implementation with RevBayes; Höhna et al. 2016)

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 17 / 23

slide-21
SLIDE 21

Universal and lineage-specific models

Results

universal vs. lineage specific

feature pair Bayes Factor Adp-N V-Obj 58.1 Adp-N N-Gen 47.2 N-Adj N-Rel 41.6 N-Gen V-Obj 36.9 Adp-N V-Subj 23.6 N-Gen N-Rel 21.9 N-Dem N-Num 20.6 Adp-N N-Rel 18.7 V-Obj N-Rel 18.1 N-Dem N-Rel 17.4 N-Rel V-Subj 14.5 N-Gen V-Subj 13.7 V-Obj V-Subj 12.1 N-Adj N-Dem 5.4 Adp-N N-Dem

  • 5.3

N-Dem N-Gen

  • 5.7

N-Adj N-Num

  • 5.8

N-Adj V-Subj

  • 12.3

N-Dem V-Obj

  • 12.8

N-Num N-Rel

  • 15.7

N-Adj Adp-N

  • 17.0

N-Dem V-Subj

  • 18.6

N-Adj V-Obj

  • 22.0

N-Adj N-Gen

  • 23.2

Adp-N N-Num

  • 28.2

N-Gen N-Num

  • 34.1

N-Num V-Subj

  • 37.6

N-Num V-Obj

  • 45.4

universal lineage-specific

correlated vs. independent

feature pair Bayes Factor Adp-N N-Gen 115.7 Adp-N V-Obj 104.8 N-Dem N-Num 99.6 N-Adj N-Num 93.3 N-Gen V-Obj 68.0 N-Adj N-Dem 64.9 N-Adj N-Rel 48.5 N-Gen V-Subj 41.1 V-Obj V-Subj 38.2 V-Obj N-Rel 35.3 N-Dem N-Rel 33.5 Adp-N V-Subj 31.3 N-Gen N-Rel 23.8 N-Dem N-Gen 23.5 Adp-N N-Rel 22.6 N-Gen N-Num 16.5 N-Dem V-Obj 15.4 Adp-N N-Dem 15.0 N-Num V-Subj 14.4 Adp-N N-Num 13.5 N-Adj N-Gen 12.2 N-Num V-Obj 7.6 N-Rel V-Subj 6.8 N-Num N-Rel 5.0 N-Adj Adp-N 0.3 N-Adj V-Subj

  • 0.3

N-Adj V-Obj

  • 1.0

N-Dem V-Subj

  • 1.2

correlated uncorrelated Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 18 / 23

slide-22
SLIDE 22

Universal and lineage-specific models

Results

V-Subj N-Adj N-Dem N-Num N-Gen Adp-N V-Obj N-Rel

  • ne tightly connected cluster of mutually universally

correlated word order features comprises Dryer’s (1992) verb patterners + V-Subj additionally some correlations regarding NP syntax

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 19 / 23

slide-23
SLIDE 23

Universal and lineage-specific models

Results

universal (AdvP-N/V-Obj)

0.51 0.22 3.98 13.22 9.15 8.87 1.07 2.74

PN VO PN OV NP VO NP OV

lineage-specific (N-Gen/N-Num)

4.91 1.6 4.06 0.92 1.18 1.24 11.35 9.1 NG NNum NG Num N GN NNum GN Num N

Austronesian

0.38 0.37 4.8 3.86 4.85 3.92 4.08 4.2 NG NNum NG Num N GN NNum GN Num N

Bantu Indo-European

4.86 3.87 0.56 3.23 4.09 4.5 2.5 0.7 NG NNum NG Num N GN NNum GN Num N 4.41 4.61 2.63 3.76 3.46 5.8 2.02 2.14 NG NNum NG Num N GN NNum GN Num N

Uto-Aztecan

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 20 / 23

slide-24
SLIDE 24

Universal and lineage-specific models

What the universal dependencies look like

0.27 . 2 4 2.87 2 . 1 3 1 . 7 7 5.02 . 1 9 0.11 Adp-N N-Ge n Adp-N Ge n-N N-Adp N-Ge n N-Adp Ge n-N 0.12 . 3 9 2.4 3 . 1 4 . 5 1 0.58 . 1 6 0.73 Adp-N N-Re l Adp-N Re l-N N-Adp N-Re l N-Adp Re l-N . 2 5 0.11 1 . 9 6 7.77 1.17 . 8 1 0.17 . 2 1 Adp-N V-Obj Adp-N Obj-V N-Adp V-Obj N-Adp Obj-V 1 . 3 . 1 2 . 7 5 . 5 5 1 . 3 8 1 . 2 6 . 1 6 . 1 3 Adp-N V-Subj Adp-N Subj-V N-Adp V-Subj N-Adp Subj-V 0.11 0.52 1.65 3.83 1.93 0.53 0.56 0.43 N-Adj N-Re l N-Adj Re l-N Adj-N N-Re l Adj-N Re l-N . 2 5 . 9 3 2 . 6 1 . 3 9 3 . 3 4 1 . 4 4 . 2 4 . 1 4 N-De m N-Num N-De m Num
  • N

De m

  • N

N-Num De m

  • N

Num

  • N
. 1 1 . 7 2 2.88 5.97 0.56 0.36 . 9 3 . 3 6 N-De m N-Re l N-De m Re l-N De m
  • N

N-Re l De m

  • N

Re l-N

. 1 4 0.48 3 . 5.4 0.57 . 2 6 0.11 . 5 7

N-Ge n N-Re l N-Ge n Re l-N Ge n-N N-Re l Ge n-N Re l-N

. 2 0.34 . 7 4.04 1.08 . 8 9 0.12 . 2 9

N-Ge n V-Obj N-Ge n Obj-V Ge n-N V-Obj Ge n-N Obj-V

1 . 2 5 . 1 1 . 7 6 1 . 4 1 . 6 9 . 9 5 . 1 6 . 1 5

N-Ge n V-Subj N-Ge n Subj-V Ge n-N V-Subj Ge n-N Subj-V

1 . 9 1 . 1 3 . 6 7 . 2 5 1 . 4 1 . 9 5 . 6 3 . 7

N-Re l V-Subj N-Re l Subj-V Re l-N V-Subj Re l-N Subj-V

0.05 0.35 1.48 1.25 0.75 0.84 0.18 0.66

V-Obj N-Re l V-Obj Re l-N Obj-V N-Re l Obj-V Re l-N 1.33 0.18 0.69 0.44 3.35 3.06 0.22 0.11 V-Obj V-Subj V-Obj Subj-V Obj-V V-Subj Obj-V Subj-V

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 21 / 23

slide-25
SLIDE 25

Conclusion

Conclusion

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 22 / 23

slide-26
SLIDE 26

Conclusion

Conclusion

empirical

universal vs. lineage-specific is not an absolute distinction, but a matter of degree some “classical” word-order correlation fall very close to the universal end

methodological

important to fit statistical model across language-families

Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 23 / 23

slide-27
SLIDE 27

Conclusion Matthew S. Dryer. The Greenbergian word order correlations. Language, 68(1):81–138, 1992. Michael Dunn, Simon J. Greenhill, Stephen Levinson, and Russell D. Gray. Evolved structure of language shows lineage-specific trends in word-order universals. Nature, 473(7345):79–82, 2011. Harald Hammarström, Robert Forkel, Martin Haspelmath, and Sebastian Bank. Glottolog 2.7. Max Planck Institute for the Science of Human History, Jena, 2016. Available online at http://glottolog.org, Accessed on 2017-01-29. Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie. The World Atlas of Language Structures online. Max Planck Digital Library, Munich,

  • 2008. http://wals.info/.

Sebastian Höhna, Michael J. Landis, Tracy A. Heath, Bastien Boussau, Nicolas Lartillot, Brian R. Moore, John P. Huelsenbeck, and Frederik Ronquist. Revbayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Systematic biology, 65(4):726–736, 2016. Elena Maslova. A dynamic approach to the verification of distributional universals. Linguistic Typology, 4(3):307–333, 2000. Mark Pagel and Andrew Meade. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. The American Naturalist, 167(6):808–825, 2006. Søren Wichmann, Eric W. Holman, and Cecil H. Brown. The ASJP database (version 17). http://asjp.clld.org/, 2016. Gerhard Jäger (Tübingen) Word-order Universals Evolang 2018 23 / 23