harnessing bayesian phylogenetics to test a greenbergian
play

Harnessing Bayesian phylogenetics to test a Greenbergian universal - PowerPoint PPT Presentation

Harnessing Bayesian phylogenetics to test a Greenbergian universal Gerhard Jger 1 Ramon Ferrer-i-Cancho 2 Tbingen University 1 Universitat Politcnica de Catalunya 2 52nd Annual Meeting of the Societas Linguistica Europaea Leipzig, August


  1. Harnessing Bayesian phylogenetics to test a Greenbergian universal Gerhard Jäger 1 Ramon Ferrer-i-Cancho 2 Tübingen University 1 Universitat Politècnica de Catalunya 2 52nd Annual Meeting of the Societas Linguistica Europaea Leipzig, August 21, 2019

  2. 1 / 30

  3. Greenberg’s Universal 17 2 / 30

  4. With overwhelmingly more than chance frequency, languages with dominant order VSO have the adjective after the noun. (Greenberg, 1963) Mirror image: Verb-final languages prefer adjective-noun order. But: Dryer (1992) 3 / 30

  5. Dependency Length Minimization 3 2 n = 7 , D = 10 2 1 1 1 The dog was chased by the cat • Dependency distances. • DDm: dependency distance minimization principle (Liu et al., 2017). • Cognitive origins of DDm: interference and decay (Liu et al., 2017). • The challenge of aggregating D over heterogeneous data: sentences of different lengths, multiple authors, ... (Ferrer-i-Cancho and Liu, 2014) 4 / 30

  6. V1 Vfin Vmed D=6 D=5 D=8 3 2 4 2 1 1 1 1 1 1 1 1 NAdj Adj Adj Adj Adj V N N N V N N Adj N Adj V D=5 D=8 D=6 4 3 2 2 1 1 1 1 1 1 1 1 AdjN V Adj N Adj N Adj N V Adj N Adj N Adj N V DDm provides functional motivation for Universal 17 and its mirror image. 5 / 30

  7. Frequency distribution (WALS) NAdj AdjN V1 V1 Vmed Vmed Vfin Vfin V1 Vmed Vfin AdjN AdjN AdjN NAdj NAdj NAdj 6 / 30

  8. Frequency distribution, weighted by lineage NAdj AdjN V1 V1 Vmed Vmed Vfin Vfin V1 Vmed Vfin AdjN AdjN AdjN NAdj NAdj NAdj 7 / 30

  9. Geographic distribution V1 NAdj Vmed AdjN V fi n 8 / 30

  10. Phylogenetic non-independence • languages are phylogenetically structured • if two closely related languages display the same pattern, these are not two independent data points ⇒ we need to control for phylogenetic dependencies (from Dunn et al., 2011) 9 / 30

  11. Phylogenetic non-independence Maslova (2000): “If the A-distribution for a given typology cannot be as- sumed to be stationary, a distributional universal cannot be discovered on the basis of purely synchronic statistical data.” “In this case, the only way to discover a distributional universal is to estimate transition probabilities and as it were to ‘predict’ the stationary distribution on the basis of the equations in (1).” 10 / 30

  12. The phylogenetic comparative method 11 / 30

  13. Modeling language change Markov process 12 / 30

  14. Modeling language change Markov process Phylogeny 12 / 30

  15. Modeling language change Markov process Phylogeny Branching process 12 / 30

  16. Estimating rates of change • if phylogeny and states of extant languages are known... 13 / 30

  17. Estimating rates of change • if phylogeny and states of extant languages are known... • ... transition rates, stationary probabilities and ancestral states can be estimated based on Markov model 13 / 30

  18. Correlation between features 14 / 30

  19. Pagel and Meade (2006) • construct two types of Markov processes: • independent: the two features evolve according to independend Markov processes • dependent: rates of change in one feature depends on state of the other feature • fit both models to the data • apply statistical model comparison Independent model Dependent model Adj-N/V1 N-Adj/V1 V V V1 V Adj-N Adj-N/Vmed N-Adj/Vmed Vmed N-Adj AdjN/Vfin N-Adj/Vfin V V Vfin V 15 / 30

  20. Data • word-order data: WALS • phylogeny: • ASJP word lists (Wichmann et al., 2016) • feature extraction (automatic cognate detection, inter alia ) ❀ character matrix • Bayesian phylogenetic inference with Glottolog (Hammarström et al., 2016) tree as backbone • advantages over hand-coded Swadesh lists • applicable across language familes • covers more languages than those for which expert cognate judgments are available • 902 languages in total • 76 families and 105 isolates 16 / 30

  21. Phylogenetic tree sample 17 / 30

  22. Hierarchical Bayesian models CTMC CTMC 1 CTMC 2 CTMC 3 CTMC 4 data 1 data 2 data 3 data 4 data 1 data 2 data 3 data 4 trees 1 trees 2 trees 3 trees 4 trees 1 trees 2 trees 3 trees 4 lineage-speci fi c universal 18 / 30

  23. Hierarchical Bayesian models hyper-parameter CTMC 1 CTMC 2 CTMC 3 CTMC 4 CTMC CTMC 1 CTMC 2 CTMC 3 CTMC 4 data 1 data 2 data 3 data 4 data 1 data 2 data 3 data 4 data 1 data 2 data 3 data 4 trees 1 trees 2 trees 3 trees 4 trees 1 trees 2 trees 3 trees 4 trees 1 trees 2 trees 3 trees 4 lineage-speci fi c universal hierarchical 18 / 30

  24. Hierarchical Models hyper-parameter • each family has its own parameters • parameters are all drawn from the same CTMC 1 CTMC 2 CTMC 3 CTMC 4 distribution f • shape of f is learned from the data • prior assumption that there is little data 1 data 2 data 3 data 4 cross-family variation → can be overwritten by the data trees 1 trees 2 trees 3 trees 4 19 / 30

  25. Hierarchical Models hyper-parameter • each family has its own parameters • parameters are all drawn from the same CTMC 1 CTMC 2 CTMC 3 CTMC 4 distribution f • shape of f is learned from the data • prior assumption that there is little data 1 data 2 data 3 data 4 cross-family variation → can be overwritten by the data trees 1 trees 2 trees 3 trees 4 • enables information flow across families 19 / 30

  26. What about isolates? • Continuous Time Markov Chain defines a unique equilibrium distribution • hierarchical model assumes a different CTMC, and thus a different equilibrium distribution for each lineage • by modeling assumption, root state of a lineage is drawn from this distribution (Uniformity Principle) • isolates are treated as families of size 1, i.e., they are drawn from their equilibrium distribution 20 / 30

  27. Results 21 / 30

  28. Independent model Dependent model Adj-N/V1 N-Adj/V1 V V V1 V Adj-N Adj-N/Vmed N-Adj/Vmed Vmed N-Adj AdjN/Vfin N-Adj/Vfin V V Vfin V • Bayes Factor: 260 in favor of dependent model 1 1 In the abstract we reported the opposite conclusion, but there we used a non-hierarchical universal model. 22 / 30

  29. No posterior support for Universal 17/17’ 17 P(NAdj|V1) P(NAdj|V fi n) 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 23 / 30

  30. Correlation between verb order and adjective order word order correlation: lineage-wise posterior distribution • lineages fall into two, about equally sized, groups: 1 negative or no correlation lineages Nuclear Macro-Je, Mande, Siouan, Pama-Nyungan, Austronesian, ... 2 positive correlation Uto-Aztecan, Afro-Asiatic, Indo-Euroean, Dravidian, Austroasiatic, Otomanguean, ... -0.5 0.0 0.5 correlation 24 / 30

  31. Correlation between verb order and adjective order • lineages fall into two, about equally sized, groups: 1 negative or no correlation Nuclear Macro-Je, Mande, Siouan, Pama-Nyungan, Austronesian, ... 2 positive correlation Uto-Aztecan, Afro-Asiatic, Indo-Euroean, Dravidian, Austroasiatic, Otomanguean, ... -0.50 -0.25 0.00 0.25 0.50 correlation 25 / 30

  32. Correlation between verb order and adjective order correlation 0.4 0.2 0.0 -0.2 26 / 30

  33. A representative family for each type 27 / 30

  34. Pama-Nyungan Adj-N/V1 N-Adj/V1 V V Adj-N/Vmed N-Adj/Vmed AdjN/Vfin N-Adj/Vfin V V Austroasiatic Adj-N/V1 N-Adj/V1 V V Adj-N/Vmed N-Adj/Vmed AdjN/Vfin N-Adj/Vfin V V 28 / 30

  35. Conclusion 29 / 30

  36. • no empirical support for Universal 17 • more nuanced picture for its mirror image: • two different possible dynamics governing relationship between verb-object and noun-adjective order • Dependency Length Minimization is operative in one dynamic, but not the other • reminds of an OT style pattern, with two competing constraints 30 / 30

  37. Matthew S. Dryer. The Greenbergian word order correlations. Language , 68(1):81–138, 1992. Michael Dunn, Simon J. Greenhill, Stephen Levinson, and Russell D. Gray. Evolved structure of language shows lineage-specific trends in word-order universals. Nature , 473(7345): 79–82, 2011. Ramon Ferrer-i-Cancho and H. Liu. The risks of mixing dependency lengths from sequences of different length. Glottotheory , (5):143–155, 2014. Joseph Greenberg. Some universals of grammar with special reference to the order of meaningful elements. In Universals of Language , pages 73–113. MIT Press, Cambridge, MA, 1963. Harald Hammarström, Robert Forkel, Martin Haspelmath, and Sebastian Bank. Glottolog 2.7 . Max Planck Institute for the Science of Human History, Jena, 2016. Available online at http://glottolog.org, Accessed on 2017-01-29. Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie. The World Atlas of Language Structures online. Max Planck Digital Library, Munich, 2008. http://wals.info/. H. Liu, C. Xu, and J. Liang. Dependency distance: a new perspective on syntactic patterns in natural languages. Physics of Life Reviews , 21:171–193, 2017. Elena Maslova. A dynamic approach to the verification of distributional universals. Linguistic Typology , 4(3):307–333, 2000. Mark Pagel and Andrew Meade. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. The American Naturalist , 167(6): 808–825, 2006. Søren Wichmann, Eric W. Holman, and Cecil H. Brown. The ASJP database (version 17). http://asjp.clld.org/, 2016. 30 / 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend