Harnessing Bayesian phylogenetics to test a Greenbergian universal - - PowerPoint PPT Presentation

harnessing bayesian phylogenetics to test a greenbergian
SMART_READER_LITE
LIVE PREVIEW

Harnessing Bayesian phylogenetics to test a Greenbergian universal - - PowerPoint PPT Presentation

Harnessing Bayesian phylogenetics to test a Greenbergian universal Gerhard Jger 1 Ramon Ferrer-i-Cancho 2 Tbingen University 1 Universitat Politcnica de Catalunya 2 52nd Annual Meeting of the Societas Linguistica Europaea Leipzig, August


slide-1
SLIDE 1

Harnessing Bayesian phylogenetics to test a Greenbergian universal

Gerhard Jäger1 Ramon Ferrer-i-Cancho2

Tübingen University1 Universitat Politècnica de Catalunya2

52nd Annual Meeting of the Societas Linguistica Europaea

Leipzig, August 21, 2019

slide-2
SLIDE 2

1 / 30

slide-3
SLIDE 3

Greenberg’s Universal 17

2 / 30

slide-4
SLIDE 4

With overwhelmingly more than chance frequency, languages with dominant order VSO have the adjective after the noun. (Greenberg, 1963)

Mirror image: Verb-final languages prefer adjective-noun order.

But: Dryer (1992)

3 / 30

slide-5
SLIDE 5

Dependency Length Minimization

The dog was chased by the cat

1 2 1 3 2 1

n = 7, D = 10

  • Dependency distances.
  • DDm: dependency distance minimization principle (Liu et al., 2017).
  • Cognitive origins of DDm: interference and decay (Liu et al., 2017).
  • The challenge of aggregating D over heterogeneous data: sentences of different lengths,

multiple authors, ... (Ferrer-i-Cancho and Liu, 2014)

4 / 30

slide-6
SLIDE 6

V N Adj N Adj

1 1 3 1

D=6 V N Adj N Adj

2 1 4 1

D=8 V N Adj N Adj

1 1 2 1

D=5 V N Adj N Adj

1 1 2 1

D=5

V1 Vmed Vfin AdjN NAdj

D=8 V N Adj N Adj

2 1 4 1

V N Adj N Adj

1 1 3 1

D=6

DDm provides functional motivation for Universal 17 and its mirror image.

5 / 30

slide-7
SLIDE 7

Frequency distribution (WALS)

NAdj AdjN V1 Vfin Vmed Vfin Vmed V1 V1 Vmed Vfin AdjN AdjN AdjN NAdj NAdj NAdj

6 / 30

slide-8
SLIDE 8

Frequency distribution, weighted by lineage

NAdj AdjN V1 V1 Vfin Vfin Vmed Vmed V1 Vmed Vfin AdjN AdjN AdjN NAdj NAdj NAdj

7 / 30

slide-9
SLIDE 9

Geographic distribution

V1 Vmed Vfin NAdj AdjN 8 / 30

slide-10
SLIDE 10

Phylogenetic non-independence

  • languages are phylogenetically structured
  • if two closely related languages display the

same pattern, these are not two independent data points ⇒ we need to control for phylogenetic dependencies

(from Dunn et al., 2011) 9 / 30

slide-11
SLIDE 11

Phylogenetic non-independence

Maslova (2000):

“If the A-distribution for a given typology cannot be as- sumed to be stationary, a distributional universal cannot be discovered on the basis of purely synchronic statistical data.” “In this case, the only way to discover a distributional universal is to estimate transition probabilities and as it were to ‘predict’ the stationary distribution on the basis

  • f the equations in (1).”

10 / 30

slide-12
SLIDE 12

The phylogenetic comparative method

11 / 30

slide-13
SLIDE 13

Modeling language change

Markov process

12 / 30

slide-14
SLIDE 14

Modeling language change

Markov process Phylogeny

12 / 30

slide-15
SLIDE 15

Modeling language change

Markov process Phylogeny Branching process

12 / 30

slide-16
SLIDE 16

Estimating rates of change

  • if phylogeny and states of extant languages are known...

13 / 30

slide-17
SLIDE 17

Estimating rates of change

  • if phylogeny and states of extant languages are known...
  • ... transition rates, stationary probabilities and ancestral states can be estimated based on

Markov model

13 / 30

slide-18
SLIDE 18

Correlation between features

14 / 30

slide-19
SLIDE 19

Pagel and Meade (2006)

  • construct two types of Markov processes:
  • independent: the two features evolve according to independend Markov processes
  • dependent: rates of change in one feature depends on state of the other feature
  • fit both models to the data
  • apply statistical model comparison

Independent model Dependent model

Adj-N N-Adj Vmed V1

V

Vfin

V

N-Adj/Vmed N-Adj/V1

V

N-Adj/Vfin

V

Adj-N/Vmed Adj-N/V1

V

AdjN/Vfin

V

15 / 30

slide-20
SLIDE 20

Data

  • word-order data: WALS
  • phylogeny:
  • ASJP word lists (Wichmann et al., 2016)
  • feature extraction (automatic cognate detection, inter alia) ❀ character matrix
  • Bayesian phylogenetic inference with Glottolog (Hammarström et al., 2016) tree as backbone
  • advantages over hand-coded Swadesh lists
  • applicable across language familes
  • covers more languages than those for which expert cognate judgments are available
  • 902 languages in total
  • 76 families and 105 isolates

16 / 30

slide-21
SLIDE 21

Phylogenetic tree sample

17 / 30

slide-22
SLIDE 22

Hierarchical Bayesian models

CTMC trees1 data1 trees2 data2 trees3 data3 trees4 data4 trees1 data1 trees2 data2 trees3 data3 trees4 data4 CTMC4 CTMC3 CTMC2 CTMC1

lineage-specific universal

18 / 30

slide-23
SLIDE 23

Hierarchical Bayesian models

CTMC trees1 data1 trees2 data2 trees3 data3 trees4 data4 trees1 data1 trees2 data2 trees3 data3 trees4 data4 CTMC4 CTMC3 CTMC2 CTMC1 trees1 data1 trees2 data2 trees3 data3 trees4 data4 CTMC4 CTMC3 CTMC2 CTMC1 hyper-parameter

lineage-specific universal hierarchical

18 / 30

slide-24
SLIDE 24

Hierarchical Models

  • each family has its own parameters
  • parameters are all drawn from the same

distribution f

  • shape of f is learned from the data
  • prior assumption that there is little

cross-family variation → can be

  • verwritten by the data

trees1 data1 trees2 data2 trees3 data3 trees4 data4 CTMC4 CTMC3 CTMC2 CTMC1 hyper-parameter

19 / 30

slide-25
SLIDE 25

Hierarchical Models

  • each family has its own parameters
  • parameters are all drawn from the same

distribution f

  • shape of f is learned from the data
  • prior assumption that there is little

cross-family variation → can be

  • verwritten by the data
  • enables information flow across families

trees1 data1 trees2 data2 trees3 data3 trees4 data4 CTMC4 CTMC3 CTMC2 CTMC1 hyper-parameter

19 / 30

slide-26
SLIDE 26

What about isolates?

  • Continuous Time Markov Chain defines a unique equilibrium distribution
  • hierarchical model assumes a different CTMC, and thus a different equilibrium distribution

for each lineage

  • by modeling assumption, root state of a lineage is drawn from this distribution (Uniformity

Principle)

  • isolates are treated as families of size 1, i.e., they are drawn from their equilibrium

distribution

20 / 30

slide-27
SLIDE 27

Results

21 / 30

slide-28
SLIDE 28

Independent model Dependent model

Adj-N N-Adj Vmed V1

V

Vfin

V

N-Adj/Vmed N-Adj/V1

V

N-Adj/Vfin

V

Adj-N/Vmed Adj-N/V1

V

AdjN/Vfin

V
  • Bayes Factor: 260 in favor of dependent model1

1In the abstract we reported the opposite conclusion, but there we used a non-hierarchical universal model.

22 / 30

slide-29
SLIDE 29

No posterior support for Universal 17/17’

17

0.00 0.25 0.50 0.75 1.00

P(NAdj|V1)

0.00 0.25 0.50 0.75 1.00

P(NAdj|Vfin)

23 / 30

slide-30
SLIDE 30

Correlation between verb order and adjective order

  • lineages fall into two, about equally sized,

groups:

1 negative or no correlation

Nuclear Macro-Je, Mande, Siouan, Pama-Nyungan, Austronesian, ...

2 positive correlation

Uto-Aztecan, Afro-Asiatic, Indo-Euroean, Dravidian, Austroasiatic, Otomanguean, ...

  • 0.5

0.0 0.5

correlation lineages

word order correlation: lineage-wise posterior distribution

24 / 30

slide-31
SLIDE 31

Correlation between verb order and adjective order

  • lineages fall into two, about equally sized,

groups:

1 negative or no correlation

Nuclear Macro-Je, Mande, Siouan, Pama-Nyungan, Austronesian, ...

2 positive correlation

Uto-Aztecan, Afro-Asiatic, Indo-Euroean, Dravidian, Austroasiatic, Otomanguean, ...

  • 0.50
  • 0.25

0.00 0.25 0.50

correlation

25 / 30

slide-32
SLIDE 32

Correlation between verb order and adjective order

  • 0.2

0.0 0.2 0.4

correlation

26 / 30

slide-33
SLIDE 33

A representative family for each type

27 / 30

slide-34
SLIDE 34

Pama-Nyungan

N-Adj/Vmed N-Adj/V1

V

N-Adj/Vfin

V

Adj-N/Vmed Adj-N/V1

V

AdjN/Vfin

V

Austroasiatic

N-Adj/Vmed N-Adj/V1

V

N-Adj/Vfin

V

Adj-N/Vmed Adj-N/V1

V

AdjN/Vfin

V

28 / 30

slide-35
SLIDE 35

Conclusion

29 / 30

slide-36
SLIDE 36
  • no empirical support for Universal 17
  • more nuanced picture for its mirror image:
  • two different possible dynamics governing relationship between verb-object and

noun-adjective order

  • Dependency Length Minimization is operative in one dynamic, but not the other
  • reminds of an OT style pattern, with two competing constraints

30 / 30

slide-37
SLIDE 37

Matthew S. Dryer. The Greenbergian word order correlations. Language, 68(1):81–138, 1992. Michael Dunn, Simon J. Greenhill, Stephen Levinson, and Russell D. Gray. Evolved structure of language shows lineage-specific trends in word-order universals. Nature, 473(7345): 79–82, 2011. Ramon Ferrer-i-Cancho and H. Liu. The risks of mixing dependency lengths from sequences of different length. Glottotheory, (5):143–155, 2014. Joseph Greenberg. Some universals of grammar with special reference to the order of meaningful elements. In Universals of Language, pages 73–113. MIT Press, Cambridge, MA, 1963. Harald Hammarström, Robert Forkel, Martin Haspelmath, and Sebastian Bank. Glottolog 2.7. Max Planck Institute for the Science of Human History, Jena, 2016. Available online at http://glottolog.org, Accessed on 2017-01-29. Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie. The World Atlas of Language Structures online. Max Planck Digital Library, Munich, 2008. http://wals.info/.

  • H. Liu, C. Xu, and J. Liang. Dependency distance: a new perspective on syntactic patterns in natural languages. Physics of Life Reviews, 21:171–193, 2017.

Elena Maslova. A dynamic approach to the verification of distributional universals. Linguistic Typology, 4(3):307–333, 2000. Mark Pagel and Andrew Meade. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. The American Naturalist, 167(6): 808–825, 2006. Søren Wichmann, Eric W. Holman, and Cecil H. Brown. The ASJP database (version 17). http://asjp.clld.org/, 2016. 30 / 30