On the proper use of phylogenetic information in typology Gerhard - - PowerPoint PPT Presentation

on the proper use of phylogenetic information in typology
SMART_READER_LITE
LIVE PREVIEW

On the proper use of phylogenetic information in typology Gerhard - - PowerPoint PPT Presentation

On the proper use of phylogenetic information in typology Gerhard Jger Tbingen University Workshop Phylogenetic Linguistics and Linguistic Theory York, November 15, 2018 Gerhard Jger (Tbingen) Phylogenetic typology LanGeLin Workshop 1


slide-1
SLIDE 1

On the proper use of phylogenetic information in typology

Gerhard Jäger

Tübingen University

Workshop Phylogenetic Linguistics and Linguistic Theory

York, November 15, 2018

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 1 / 39

slide-2
SLIDE 2

Introduction

Introduction

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 2 / 39

slide-3
SLIDE 3

Introduction

Word order correlations

Greenberg, Keenan, Lehmann etc.: general tendency for languages to be either consistently head-initial or consistently head-final alternative account (Dryer, Hawkins): phrases are consistently left- or consistently right-branching can be formalized as collection of implicative universals, such as With overwhelmingly greater than chance frequency, languages with normal SOV order are postpositional. (Greenberg’s Universal 4) both generativist and functional/historical explanations in the literature

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 3 / 39

slide-4
SLIDE 4

Introduction

Phylogenetic non-independence

languages are phylogenetically structured if two closely related languages display the same pattern, these are not two independent data points ⇒ we need to control for phylogenetic dependencies

(from Dunn et al., 2011) Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 4 / 39

slide-5
SLIDE 5

Introduction

Phylogenetic non-independence

Maslova (2000): “If the A-distribution for a given typology cannot be as- sumed to be stationary, a distributional universal cannot be discovered on the basis of purely synchronic statistical data.” “In this case, the only way to discover a distributional universal is to estimate transition probabilities and as it were to ‘predict’ the stationary distribution on the basis

  • f the equations in (1).”

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 5 / 39

slide-6
SLIDE 6

The phylogenetic comparative method

The phylogenetic comparative method

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 6 / 39

slide-7
SLIDE 7

The phylogenetic comparative method

Modeling language change

Markov process

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 7 / 39

slide-8
SLIDE 8

The phylogenetic comparative method

Modeling language change

Markov process Phylogeny

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 7 / 39

slide-9
SLIDE 9

The phylogenetic comparative method

Modeling language change

Markov process Phylogeny Branching process

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 7 / 39

slide-10
SLIDE 10

The phylogenetic comparative method

Estimating rates of change

if phylogeny and states of extant languages are known...

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 8 / 39

slide-11
SLIDE 11

The phylogenetic comparative method

Estimating rates of change

if phylogeny and states of extant languages are known... ... transition rates, stationary probabilities and ancestral states can be estimated based on Markov model

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 8 / 39

slide-12
SLIDE 12

The phylogenetic comparative method

Correlation between features

Pagel and Meade (2006) construct two types of Markov processes:

independent: the two features evolve according to independend Markov processes dependent: rates of change in one feature depends on state of the other feature

fit both models to the data apply statistical model comparison

VO OV PN NP VO/PN OV/NP OV/PN VO/NP

Independent model Dependent model

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 9 / 39

slide-13
SLIDE 13

Dunn et al. (2011)

Dunn et al. (2011)

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 10 / 39

slide-14
SLIDE 14

Dunn et al. (2011)

Dunn et al. (2011)

all 28 pairs of 8 word-order features considered 4 language families: Austronesian, Bantu, Indo-European, and Uto-Aztecan main finding: wildly different results between families conclusion: word-order correlations are lineage-specific

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 11 / 39

slide-15
SLIDE 15

Universal and lineage-specific models

Universal and lineage-specific models

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 12 / 39

slide-16
SLIDE 16

Universal and lineage-specific models

This study

Experiments

1

replication of Dunn et al. (2011) with different data

2

model comparison: universal vs. lineage-specific correlations

3

word-order correlations across a comprehensive collection of language families

M1 trees1 data1 M2 trees2 data2 M3 trees3 data3 M4 trees4 data4 M trees1 data1 trees2 data2 trees3 data3 trees4 data4

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 13 / 39

slide-17
SLIDE 17

Universal and lineage-specific models

Data

word-order data: WALS phylogeny:

ASJP word lists (Wichmann et al., 2016) feature extraction (automatic cognate detection, inter alia) ❀ character matrix Maximum-Likelihood phylogenetic inference with Glottolog (Hammarström et al., 2016) tree as backbone advantages over hand-coded Swadesh lists

applicable across language familes covers more languages than those for which expert cognate judgments are available

1004 languages in total Austronesian: 123; Bantu: 41; Indo-European: 53; Uto-Aztecan: 13 34 families with at least five languages; comprising 768 languages in total

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 14 / 39

slide-18
SLIDE 18

Universal and lineage-specific models

Phylogenetic tree sample

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 15 / 39

slide-19
SLIDE 19

Universal and lineage-specific models

Replication of Dunn et al.

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 16 / 39

slide-20
SLIDE 20

Universal and lineage-specific models

Comparing universal and lineage-specific models

so far: fitting a separate model for each language family

advantage: good fit of the lineage-specific data disadvantage: many parameters (8 per family for a dependent model)

statistical model comparison: quantifying to what degree the data support the excess parameters of lineage-specific models models to be compared:

universal: one set of rates (8 parameters), applying to all 4 families lineage specific: a separate set of rates for each family

comparison via Bayes Factor (implementation with RevBayes; Höhna et al. 2016)

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 17 / 39

slide-21
SLIDE 21

Universal and lineage-specific models

Results

universal vs. lineage specific

feature pair Bayes Factor Adp-N V-Obj 58.1 Adp-N N-Gen 47.2 N-Adj N-Rel 41.6 N-Gen V-Obj 36.9 Adp-N V-Subj 23.6 N-Gen N-Rel 21.9 N-Dem N-Num 20.6 Adp-N N-Rel 18.7 V-Obj N-Rel 18.1 N-Dem N-Rel 17.4 N-Rel V-Subj 14.5 N-Gen V-Subj 13.7 V-Obj V-Subj 12.1 N-Adj N-Dem 5.4 Adp-N N-Dem

  • 5.3

N-Dem N-Gen

  • 5.7

N-Adj N-Num

  • 5.8

N-Adj V-Subj

  • 12.3

N-Dem V-Obj

  • 12.8

N-Num N-Rel

  • 15.7

N-Adj Adp-N

  • 17.0

N-Dem V-Subj

  • 18.6

N-Adj V-Obj

  • 22.0

N-Adj N-Gen

  • 23.2

Adp-N N-Num

  • 28.2

N-Gen N-Num

  • 34.1

N-Num V-Subj

  • 37.6

N-Num V-Obj

  • 45.4

universal lineage-specific

correlated vs. independent

feature pair Bayes Factor Adp-N N-Gen 115.7 Adp-N V-Obj 104.8 N-Dem N-Num 99.6 N-Adj N-Num 93.3 N-Gen V-Obj 68.0 N-Adj N-Dem 64.9 N-Adj N-Rel 48.5 N-Gen V-Subj 41.1 V-Obj V-Subj 38.2 V-Obj N-Rel 35.3 N-Dem N-Rel 33.5 Adp-N V-Subj 31.3 N-Gen N-Rel 23.8 N-Dem N-Gen 23.5 Adp-N N-Rel 22.6 N-Gen N-Num 16.5 N-Dem V-Obj 15.4 Adp-N N-Dem 15.0 N-Num V-Subj 14.4 Adp-N N-Num 13.5 N-Adj N-Gen 12.2 N-Num V-Obj 7.6 N-Rel V-Subj 6.8 N-Num N-Rel 5.0 N-Adj Adp-N 0.3 N-Adj V-Subj

  • 0.3

N-Adj V-Obj

  • 1.0

N-Dem V-Subj

  • 1.2

correlated uncorrelated Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 18 / 39

slide-22
SLIDE 22

Universal and lineage-specific models

Results

V-Subj N-Adj N-Dem N-Num N-Gen Adp-N V-Obj N-Rel

  • ne tightly connected cluster of mutually universally

correlated word order features comprises Dryer’s (1992) verb patterners + V-Subj additionally some correlations regarding NP syntax

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 19 / 39

slide-23
SLIDE 23

Universal and lineage-specific models

Results

universal (AdvP-N/V-Obj)

0.51 0.22 3.98 13.22 9.15 8.87 1.07 2.74

PN VO PN OV NP VO NP OV

lineage-specific (N-Gen/N-Num)

4.91 1.6 4.06 0.92 1.18 1.24 11.35 9.1 NG NNum NG Num N GN NNum GN Num N

Austronesian

0.38 0.37 4.8 3.86 4.85 3.92 4.08 4.2 NG NNum NG Num N GN NNum GN Num N

Bantu Indo-European

4.86 3.87 0.56 3.23 4.09 4.5 2.5 0.7 NG NNum NG Num N GN NNum GN Num N 4.41 4.61 2.63 3.76 3.46 5.8 2.02 2.14 NG NNum NG Num N GN NNum GN Num N

Uto-Aztecan

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 20 / 39

slide-24
SLIDE 24

Universal and lineage-specific models

What the universal dependencies look like

0.27 . 2 4 2.87 2 . 1 3 1 . 7 7 5.02 . 1 9 0.11 Adp-N N-Ge n Adp-N Ge n-N N-Adp N-Ge n N-Adp Ge n-N 0.12 . 3 9 2.4 3 . 1 4 . 5 1 0.58 . 1 6 0.73 Adp-N N-Re l Adp-N Re l-N N-Adp N-Re l N-Adp Re l-N . 2 5 0.11 1 . 9 6 7.77 1.17 . 8 1 0.17 . 2 1 Adp-N V-Obj Adp-N Obj-V N-Adp V-Obj N-Adp Obj-V 1 . 3 . 1 2 . 7 5 . 5 5 1 . 3 8 1 . 2 6 . 1 6 . 1 3 Adp-N V-Subj Adp-N Subj-V N-Adp V-Subj N-Adp Subj-V 0.11 0.52 1.65 3.83 1.93 0.53 0.56 0.43 N-Adj N-Re l N-Adj Re l-N Adj-N N-Re l Adj-N Re l-N . 2 5 . 9 3 2 . 6 1 . 3 9 3 . 3 4 1 . 4 4 . 2 4 . 1 4 N-De m N-Num N-De m Num
  • N

De m

  • N

N-Num De m

  • N

Num

  • N
. 1 1 . 7 2 2.88 5.97 0.56 0.36 . 9 3 . 3 6 N-De m N-Re l N-De m Re l-N De m
  • N

N-Re l De m

  • N

Re l-N

. 1 4 0.48 3 . 5.4 0.57 . 2 6 0.11 . 5 7

N-Ge n N-Re l N-Ge n Re l-N Ge n-N N-Re l Ge n-N Re l-N

. 2 0.34 . 7 4.04 1.08 . 8 9 0.12 . 2 9

N-Ge n V-Obj N-Ge n Obj-V Ge n-N V-Obj Ge n-N Obj-V

1 . 2 5 . 1 1 . 7 6 1 . 4 1 . 6 9 . 9 5 . 1 6 . 1 5

N-Ge n V-Subj N-Ge n Subj-V Ge n-N V-Subj Ge n-N Subj-V

1 . 9 1 . 1 3 . 6 7 . 2 5 1 . 4 1 . 9 5 . 6 3 . 7

N-Re l V-Subj N-Re l Subj-V Re l-N V-Subj Re l-N Subj-V

0.05 0.35 1.48 1.25 0.75 0.84 0.18 0.66

V-Obj N-Re l V-Obj Re l-N Obj-V N-Re l Obj-V Re l-N 1.33 0.18 0.69 0.44 3.35 3.06 0.22 0.11 V-Obj V-Subj V-Obj Subj-V Obj-V V-Subj Obj-V Subj-V

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 21 / 39

slide-25
SLIDE 25

Hierarchical Models

Hierarchical Models

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 22 / 39

slide-26
SLIDE 26

Hierarchical Models

Hierarchical Bayesian models

CTMC trees1 data1 trees2 data2 trees3 data3 trees4 data4 trees1 data1 trees2 data2 trees3 data3 trees4 data4 CTMC4 CTMC3 CTMC2 CTMC1

lineage-specific universal

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 23 / 39

slide-27
SLIDE 27

Hierarchical Models

Hierarchical Bayesian models

CTMC trees1 data1 trees2 data2 trees3 data3 trees4 data4 trees1 data1 trees2 data2 trees3 data3 trees4 data4 CTMC4 CTMC3 CTMC2 CTMC1 trees1 data1 trees2 data2 trees3 data3 trees4 data4 CTMC4 CTMC3 CTMC2 CTMC1 hyper-parameter

lineage-specific universal hierarchical

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 23 / 39

slide-28
SLIDE 28

Hierarchical Models

Hierarchical Models

each family has its own parameters parameters are all drawn from the same distribution D shape of D is learned from the data prior assumption that there is little cross-family variation → can be

  • verwritten by the data

trees1 data1 trees2 data2 trees3 data3 trees4 data4 CTMC4 CTMC3 CTMC2 CTMC1 hyper-parameter

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 24 / 39

slide-29
SLIDE 29

Hierarchical Models

Hierarchical Models

each family has its own parameters parameters are all drawn from the same distribution D shape of D is learned from the data prior assumption that there is little cross-family variation → can be

  • verwritten by the data

enables information flow across families

trees1 data1 trees2 data2 trees3 data3 trees4 data4 CTMC4 CTMC3 CTMC2 CTMC1 hyper-parameter

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 24 / 39

slide-30
SLIDE 30

Hierarchical Models

Trans-dimensional parameter estimation

Which version should we choose for CTMCi – the dependent or the independent one? Choice can be left to the data via trans-dimensional parameter estimation a.k.a. Reversible-Jump Markov Chain Monte Carlo

CTMC1

VO OV PN NP VO/PN OV/NP OV/PN VO/NP

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 25 / 39

slide-31
SLIDE 31

Hierarchical Models

Model comparison

  • verall, hierarchical model outperforms both lineage

specific and universal model exceptions in extreme cases

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 26 / 39

slide-32
SLIDE 32

Hierarchical Models

Posterior probability of dependent model

depndendent independent

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 27 / 39

slide-33
SLIDE 33

Intermediate summary

Intermediate summary

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 28 / 39

slide-34
SLIDE 34

Intermediate summary

Intermediate summary

strong signal for universal word-order correlations, e.g.

Adp-N / V-Obj Adp-N / N-Gen N-Gen / V-Obj N-Gen / V-Subj N-Dem / N-Num N-Adj / N-Rel V-Obj / V-Subj V-Obj / N-Rel

signal only becomes apparent if we look at several families simultaneously Bayesian hierarchical models:

allows the model fit for individual families to inform each other lets the data decide to what degree patterns are universal and to what degree lineage-specific

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 29 / 39

slide-35
SLIDE 35

Further applications (work in progress)

Further applications (work in progress)

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 30 / 39

slide-36
SLIDE 36

Further applications (work in progress)

Case marking patterns

Maslova and Nikitina (2007): implementation of Maslova’s (2000) program rate estimation of CTMC by using two heuristics:

how many languages of type A occur in a predominantly B-family how many pairs of closely related languages differ in their type

no phylogenetic information of intermediate time depths no branch length infomation universality is assumed a priori conclusion: nominative is at least three times as likely as ergative in the equilibrium distribution of the CTMC

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 31 / 39

slide-37
SLIDE 37

Further applications (work in progress)

Case marking patterns

data: from Maslova and Nikitina (2007) intersected with (character-transformed) ASJP data 260 languages from 23 families

ergative neutral split nominative

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 32 / 39

slide-38
SLIDE 38

Further applications (work in progress)

Case marking patterns

main conclusions

with 85% posterior probability, nominative is more likely in equilibrium than ergative with 82% posterior probability, ergative is more likely than nominative very high degree of uncertainty

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 33 / 39

slide-39
SLIDE 39

Major word orders

Major word orders

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 34 / 39

slide-40
SLIDE 40

Major word orders

Statistics of major word order distribution

data: WALS intersected with ASJP 1,055 languages, 201 lineages, 71 families with at least 3 languages

Raw numbers

SOV SVO VSO VOS OVS OSV 497 447 78 20 10 3 47.1% 42.4% 7.4% 1.9% 0.9% 0.3%

250 500 750 1000 1

pattern

SOV SVO VSO VOS OVS OSV

by language

Weighted by lineages

SOV SVO VSO VOS OVS OSV 135.1 46.9 10.5 4.0 3.7 0.8 67.2% 23.3% 5.2% 2.0% 1.8% 0.4%

50 100 150 200 1

frequency pattern

SOV SVO VSO VOS OVS OSV

by family

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 35 / 39

slide-41
SLIDE 41

Major word orders

Estimating transition rates

totally unrestricted model, all 30 transition rates are estimed independently implementation using RevBayes (Höhna et al., 2016) expected strength of flow

SOV VOS VSO SVO OVS OSV

slide-42
SLIDE 42

Major word orders

Reconstruction history with SIMMAP

estimated frequency of mutations within the 77 families under consideration (posterior mean and 95% HPD, 100 simulations

SOV SVO VSO VOS OVS OSV SOV − 51.5 [19; 82] 10.2 [1; 19] 7.5 [0; 29] 5.8 [0; 14] 4.2 [0; 13] SVO 83.8 [31; 131] − 22.3 [2; 42] 10.4 [0; 30] 2.8 [0; 8] 3.9 [0; 12] VSO 1.4 [0; 5] 8.3 [0; 24] − 29.0 [5; 45] 3.0 [0; 9] 1.1 [0; 5] VOS 4.3 [0; 15] 141.9 [115; 188] 30.9 [17; 47] − 2.1 [0; 9] 1.0 [0; 3] OVS 11.1 [0; 28] 0.8 [0; 4] 1.8 [0; 8] 0.4 [0; 3] − 0.8 [0; 5] OSV 4.2 [0; 15] 0.4 [0; 3] 1.9 [0; 11] 1.1 [0; 7] 1.1 [0; 9] −

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 37 / 39

slide-43
SLIDE 43

Major word orders

Posterior distributions

Empirical vs. estimated distribution

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 38 / 39

slide-44
SLIDE 44

Major word orders

Posterior distributions

Waiting times

expected waiting time in 1,000 years

Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 39 / 39

slide-45
SLIDE 45

Major word orders Matthew S. Dryer. The Greenbergian word order correlations. Language, 68(1):81–138, 1992. Michael Dunn, Simon J. Greenhill, Stephen Levinson, and Russell D. Gray. Evolved structure of language shows lineage-specific trends in word-order universals. Nature, 473(7345):79–82, 2011. Harald Hammarström, Robert Forkel, Martin Haspelmath, and Sebastian Bank. Glottolog 2.7. Max Planck Institute for the Science of Human History, Jena, 2016. Available online at http://glottolog.org, Accessed on 2017-01-29. Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie. The World Atlas of Language Structures online. Max Planck Digital Library, Munich,

  • 2008. http://wals.info/.

Sebastian Höhna, Michael J. Landis, Tracy A. Heath, Bastien Boussau, Nicolas Lartillot, Brian R. Moore, John P. Huelsenbeck, and Frederik Ronquist. RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Systematic biology, 65(4):726–736, 2016. Elena Maslova. A dynamic approach to the verification of distributional universals. Linguistic Typology, 4(3):307–333, 2000. Elena Maslova and E. Nikitina. Stochastic universals and dynamics of cross-linguistic distributions: the case of alignment types. unpublished manuscript, Stanford University, 2007. Mark Pagel and Andrew Meade. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. The American Naturalist, 167(6):808–825, 2006. Søren Wichmann, Eric W. Holman, and Cecil H. Brown. The ASJP database (version 17). http://asjp.clld.org/, 2016. Gerhard Jäger (Tübingen) Phylogenetic typology LanGeLin Workshop 39 / 39