Bayesian Typology Gerhard Jger Tbingen University RAILS, - PowerPoint PPT Presentation

Bayesian Typology Gerhard Jäger Tübingen University RAILS, Universität des Saarlandes October 24, 2019

Major word orders 1 / 45

Statistics of major word order distribution • data: WALS intersected with ASJP • 1,055 languages, 201 lineages, 71 families with at least 3 languages Raw numbers Weighted by lineages SOV SVO VSO VOS OVS OSV SOV SVO VSO VOS OVS OSV 497 447 78 20 10 3 135.1 46.9 10.5 4.0 3.7 0.8 47.1% 42.4% 7.4% 1.9% 0.9% 0.3% 67.2% 23.3% 5.2% 2.0% 1.8% 0.4% by language by family 0 1000 0 200 pattern 1 pattern 1 SOV SOV SVO SVO 250 50 VSO VSO VOS VOS 750 OVS 150 OVS OSV OSV 500 100 frequency frequency 2 / 45

Previous approaches • Gell-Mann and Ruhlen (2011): • Proto-world was SOV • general pathway: SOV → SVO ↔ VSO/VOS • minor pathway: SOV → OVS/OSV • exceptions due to diffusion • Ferrer-i-Cancho (2015): SOV SVO OSV VSO OVS VOS • permutation circle • transition probability inversely related to path length 3 / 45

Phylogenetic non-independence • languages are phylogenetically structured • if two closely related languages display the same pattern, these are not two independent data points ⇒ we need to control for phylogenetic dependencies 4 / 45

Phylogenetic non-independence 5 / 45

Typological distributions 6 / 45

Typological distributions • common practice since Greenberg (1963): • collect a sample of languages • classify them according to some typological feature ⇒ skewed distribution indicates something interesting going on • Problem: languages are not independent samples • skewed distribution may reflect • skewed diversification rate across families • properties of an ancestral bottleneck • balanced sampling mitigates the first, but not the second problem 7 / 45

Typological distributions Maslova (2000): “If the A-distribution for a given typology cannot be assumed to be stationary, a distributional universal cannot be discovered on the basis of purely synchronic statistical data.” “In this case, the only way to discover a distributional universal is to estimate transition probabilities and as it were to ‘predict’ the stationary distribution on the basis of the equations in (1).” 8 / 45

The phylogenetic comparative method 9 / 45

Modeling language change Markov process cf. Dunn et al. (2011); Levinson and Gray (2012), inter alia 10 / 45

Modeling language change Markov process Phylogeny cf. Dunn et al. (2011); Levinson and Gray (2012), inter alia 10 / 45

Estimating rates of change • if phylogeny and states of extant languages are known... 11 / 45

Estimating rates of change • if phylogeny and states of extant languages are known... • ... transition rates and ancestral states can be estimated based on Markov model 11 / 45

Inferring trees across many families 12 / 45

From words to trees Swadesh lists training pair-Hidden Markov Model sound similarities applying pair-Hidden Markov Model word alignments classification/ clustering cognate classes feature extraction character matrix Bayesian phylogenetic inference phylogenetic tree 13 / 45

From words to trees n a Khoisan r a h n a a S d i - i o v l a Altaic N i r D i c n l a a Niger-Congo e Swadesh lists U r p o u r E - o d n training I pair-Hidden Markov Model Afro-Asiatic sound n NW Eurasia a similarities r a h a a applying s c b i r u f pair-Hidden Markov Model S A a n Australia/Papua r a l i u s t A orricelli T Sepik word alignments Trans-NewGuinea i c e l l i P T o r r a Trans-NewGuinea p u Trans-NewGuinea classification/ a a i Trans-NewGuinea s A clustering E c h a n C h i b S Otomanguean cognate classes a n w a k A r a n A n o a P a m n u A i e e - G c r o r M a Cariban i c ucanoan a feature extraction n T p i a u T Penutian Austronesian Algic character matrix n e e D n N a e a u g n m a o Uto-Aztecan O t Bayesian n a k o Mayan H phylogenetic n a n i Salish inference a phylogenetic t n s a e u Hmong-Mien h g h Sino-Tibetan T a c ai-Kadai tree D e Timor-Alor-Pantar Austro-Asiatic u h - Q k a N 13 / 45

Estimating word-order transition patterns 14 / 45

Workflow (data from all 77 families with ≥ 3 languages in data base; 924 languages in total) • estimate posterior tree distributions with MrBayes for each family, using Glottolog as constraint tree • estimate transition rates • estimate stationary distribution of major word order categories • apply stochastic character mapping (SIMMAP; Bollback 2006) • estimate expected number of mutations for each transition type 15 / 45

Estimating posterior tree distributions • using characters extracted from ASJP data (Jäger 2018) • Glottolog as constraint tree • Γ -distributed rates • ascertainment bias correction • relaxed molecular clock (IGR) • uniform tree prior • stop rule: 0 . 01, samplefreq=1000 • if convergence later than after 1,000,000 steps, sample 1,000 trees from posterior 16 / 45

Phylogenetic tree sample 17 / 45

Estimating transition rates expected strength of flow • totally unrestricted model, all 30 transition rates are estimed independently SOV • implementation using RevBayes (Höhna et al., 2016) SVO OSV VSO OVS VOS

Reconstruction history with SIMMAP • estimated frequency of mutations within the 77 families under consideration (posterior mean and 95 % HPD, 100 simulations SOV SVO VSO VOS OVS OSV − 51 . 5 [ 19 ; 82 ] 10 . 2 [ 1 ; 19 ] 7 . 5 [ 0 ; 29 ] 5 . 8 [ 0 ; 14 ] 4 . 2 [ 0 ; 13 ] SOV 83 . 8 [ 31 ; 131 ] − 22 . 3 [ 2 ; 42 ] 10 . 4 [ 0 ; 30 ] 2 . 8 [ 0 ; 8 ] 3 . 9 [ 0 ; 12 ] SVO VSO 1 . 4 [ 0 ; 5 ] 8 . 3 [ 0 ; 24 ] − 29 . 0 [ 5 ; 45 ] 3 . 0 [ 0 ; 9 ] 1 . 1 [ 0 ; 5 ] VOS 4 . 3 [ 0 ; 15 ] 141 . 9 [ 115 ; 188 ] 30 . 9 [ 17 ; 47 ] − 2 . 1 [ 0 ; 9 ] 1 . 0 [ 0 ; 3 ] OVS 11 . 1 [ 0 ; 28 ] 0 . 8 [ 0 ; 4 ] 1 . 8 [ 0 ; 8 ] 0 . 4 [ 0 ; 3 ] − 0 . 8 [ 0 ; 5 ] OSV 4 . 2 [ 0 ; 15 ] 0 . 4 [ 0 ; 3 ] 1 . 9 [ 0 ; 11 ] 1 . 1 [ 0 ; 7 ] 1 . 1 [ 0 ; 9 ] − 19 / 45

Posterior distributions Empirical vs. estimated distribution 20 / 45

Posterior distributions Waiting times expected waiting time in 1,000 years 21 / 45

Differential case marking 22 / 45

Universal syntactic-semantic primitives • three universal core roles S: intransitive subject A: transitive subject O: transitive object 23 / 45

Alignment systems Accusative Latin system Puer puellam vidit. S boy.NOM girl.ACC saw 'The boy saw the girl.' A Puer venit. O boy.NOM came 'The boy came.' accusative nominative 24 / 45

Alignment systems Ergative Dyirbal system ŋ uma yabu- ŋ gu bura-n. S father mother.ERG see-NONFUT 'The mother saw the father.' O A ŋ uma banaga-nu. boy.NOM came 'The boy came.' nominative (absolutive) ergative 25 / 45

Alignment systems Neutral Mandarin system rén lái le. S person come CRS 'The person has come.' O A zh ā ngs ā n mà l ĭ sì le ma. Zhangsan scold Lisi CRS Q 'Did Zhangsan scold Lisi?' nominative 26 / 45

Differential case marking • many languages have mixed systems • e.g., some NPs have accusative and some have neutral paradigm, such as Hebrew (1) Ha-seret her?a ?et-ha-milxama the-movie showed acc-the-war ‘The movie showed the war.’ (2) Ha-seret her?a (*?et-)milxama the-movie showed (*acc-)war ‘The movie showed a war’ (from Aissen, 2003) 27 / 45

Differential case marking 28 / 45

Bayesian Typology Gerhard Jger Tbingen University RAILS, - PowerPoint PPT Presentation

Bayesian Typology Gerhard Jger Tbingen University RAILS, Universitt des Saarlandes October 24, 2019 Major word orders 1 / 45 Statistics of major word order distribution data: WALS intersected with ASJP 1,055 languages, 201

Canonical Typology Danny Hieber Hieber, Daniel W. 2011. Canonical Typology. Talk given to the

LEXICAL TYPOLOGY Peter Koch (Part I) Koch, Lexical typology, 2010-8-24 A. General introduction

LEXICAL TYPOLOGY LEXICAL TYPOLOGY Peter Koch (Part II) Department of Romance Studies, Tbingen

LEXICAL TYPOLOGY LEXICAL TYPOLOGY Peter Koch (Part III) Department of Romance Studies, Tbingen

Language Typology and Areal Linguistics Yiru July 13, 2016 Yiru Language Typology July 13,

A Holistic and Sustainable Care Center Project Typology - His istory ry & & Trends

Development of a Development of a Rural Typology GI S for Rural Typology GI S for Policy Makers

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

On the proper use of phylogenetic information in typology Gerhard Jger Tbingen University

Exploring the typology of quantity-insensitive stress systems without gradient constraints Jeff

Formal Concept Analysis Kow Kuroda meets grammar typology Medical School, Kyorin

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Grammatical evidentiality Modal Evidentials in Questions Grammatical evidentiality is the

Argumentative texts and clause types Alexis Palmer Leibniz ScienceCampus, University of

Galois Connections in Categorial Type Logic Raffaella Bernardi joint work with Carlos Areces

Workplace Violence 21 st Century Update Robert Harrison

Modelling Inter-Domain Routing Olaf Maennel Maennel Olaf University of Adelaide University

Using the Tools of Low-Income Energy Efficiency Financing March 30, 2017 Housekeeping

Coordination & Collaboration for Internal NOC issues Lars Fischer NORDUnet TF NOC

QUERY EMBEDDINGS: WEB SCALE SEARCH POWERED BY DEEP LEARNING AND PYTHON Ankit Bahuguna

Sambuz

Useful Links

Newsletter

Mail Us

Bayesian Typology Gerhard Jger Tbingen University RAILS, - PowerPoint PPT Presentation

Bayesian Typology Gerhard Jger Tbingen University RAILS, Universitt des Saarlandes October 24, 2019 Major word orders 1 / 45 Statistics of major word order distribution data: WALS intersected with ASJP 1,055 languages, 201

Canonical Typology Danny Hieber Hieber, Daniel W. 2011. Canonical Typology. Talk given to the

LEXICAL TYPOLOGY Peter Koch (Part I) Koch, Lexical typology, 2010-8-24 A. General introduction

LEXICAL TYPOLOGY LEXICAL TYPOLOGY Peter Koch (Part II) Department of Romance Studies, Tbingen

LEXICAL TYPOLOGY LEXICAL TYPOLOGY Peter Koch (Part III) Department of Romance Studies, Tbingen

Language Typology and Areal Linguistics Yiru July 13, 2016 Yiru Language Typology July 13,

A Holistic and Sustainable Care Center Project Typology - His istory ry &amp; &amp; Trends

Development of a Development of a Rural Typology GI S for Rural Typology GI S for Policy Makers

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

On the proper use of phylogenetic information in typology Gerhard Jger Tbingen University

Exploring the typology of quantity-insensitive stress systems without gradient constraints Jeff

Formal Concept Analysis Kow Kuroda meets grammar typology Medical School, Kyorin

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Grammatical evidentiality Modal Evidentials in Questions Grammatical evidentiality is the

Argumentative texts and clause types Alexis Palmer Leibniz ScienceCampus, University of

Galois Connections in Categorial Type Logic Raffaella Bernardi joint work with Carlos Areces

Workplace Violence 21 st Century Update Robert Harrison

Modelling Inter-Domain Routing Olaf Maennel Maennel Olaf University of Adelaide University

Using the Tools of Low-Income Energy Efficiency Financing March 30, 2017 Housekeeping

Coordination &amp; Collaboration for Internal NOC issues Lars Fischer NORDUnet TF NOC

QUERY EMBEDDINGS: WEB SCALE SEARCH POWERED BY DEEP LEARNING AND PYTHON Ankit Bahuguna

Sambuz

Useful Links

Newsletter

Mail Us

A Holistic and Sustainable Care Center Project Typology - His istory ry & & Trends

Coordination & Collaboration for Internal NOC issues Lars Fischer NORDUnet TF NOC