Statistical estimation of diachronic stability from synchronic data - PowerPoint PPT Presentation

Statistical estimation of diachronic stability from synchronic data Gerhard Jäger Tübingen University Cape Town, July 6, 2018 1 / 29

Introduction From the workshop description “The workshop starts from the null hypothesis that diachronically stable properties are those that appear as the typologically most frequent ones, and that cross-linguistic rarity correlates with diachronic instability.” Inferring diachronic stability of a feature from its typological frequency is potentially fallacious for three reasons: 1. Processes of difgerent rates may lead to identical equilibrium distributions. 2. Individual languages are not independent random samples, since genetically related languages are likely to have similar typological profjles. 3. The stability of a feature value might depend on the value of other, correlated features. 2 / 29

Frequency, stability, and Markov chains 3 / 29

Rainy days per year in Mumbay and Rome source: https://weather-and-climate.com 4 / 29 83 days 78 days

Markov chains 5 / 29 A B C

Phylogenetic structure 6 / 29 Markov process

Phylogenetic structure 6 / 29 Markov process Phylogeny

Phylogenetic structure 6 / 29 Markov process Phylogeny Branching process

Phylogenetic non-independence these are not two independent data points 7 / 29 ▶ languages are phylogenetically structured ▶ if two closely related languages display the same pattern, ⇒ we need to control for phylogenetic dependencies

Phylogenetic non-independence 8 / 29

Phylogenetic non-independence Maslova (2000): “If the A-distribution for a given typology cannot be assumed to be stationary, a distributional universal cannot be discovered on the basis of purely synchronic statistical data.” “In this case, the only way to dis- cover a distributional universal is to estimate transition probabil- ities and as it were to ‘predict’ the stationary distribution on the basis of the equations in (1).” 9 / 29

The phylogenetic comparative method 10 / 29

Estimating rates of change ... transition rates and ancestral states can be estimated based on Markov model 11 / 29 ▶ if phylogeny and states of extant languages are known...

Estimating rates of change based on Markov model 11 / 29 ▶ if phylogeny and states of extant languages are known... ▶ ... transition rates and ancestral states can be estimated

Inferring a world tree of languages 12 / 29

From words to trees 13 / 29 Swadesh lists training pair-Hidden Markov Model sound similarities applying pair-Hidden Markov Model word alignments classification/ clustering cognate classes feature extraction character matrix Bayesian phylogenetic inference phylogenetic tree

From words to trees 13 / 29 Nilo-Saharan Khoisan n a d i i v a c r D a i Uralic l t a n Niger-Congo A e Swadesh lists p r o u - E o n d training I pair-Hidden Markov Model Afro-Asiatic sound Subsaharan N similarities W E Africa u applying r a s pair-Hidden Markov Model i a Australian u a p P a a / a i l orricelli t r s u T Sepik A word alignments Trans-NewGuinea orricelli Papua T Trans-NewGuinea Trans-NewGuinea SE Asia classification/ Trans-NewGuinea clustering Chibchan Otomanguean cognate classes k a n r a w a A Panoan America Ainu Macro-Ge a n r b i feature extraction C a ucanoan upian T T Penutian A Algic u s t o r character matrix n e n e Otomanguean D e s a i a n N Uto-Aztecan Bayesian n a o k Mayan H phylogenetic n a inference n i Salish a phylogenetic t s Quechuan e Hmong-Mien h T g Sino-Tibetan ai-Kadai tree a Timor-Alor-Pantar Austro-Asiatic D h - k a N

From tree to forest data 14 / 29 ▶ branch lengths within Glottolog families estimated from lexical ▶ calibration: Proto-Austronesian ∼ 5,000 years ▶ branches above family level effjctively set to infjnity

Case study 1: Rare consonants 15 / 29

Synchronic statistics X average weighted by family average 334 raw numbers 378 8 pharyngeal fricative (transcribed as X ) languages and dialects; Wichmann et al. 2016) 16 / 29 ▶ data: ASJP word lists (word lists from ca. 6,000 living ▶ variables: ▶ voiceless and voiced dental fricative (transcribed as 8 ) ▶ voiceless and voiced uvular fricative, voiceless and voiced 5 . 7 % 6 . 6 % 14 . 6 22 . 2 4 . 6 % 7 . 0 %

17 / 29

Phylogenetic estimates 8 X equilibrium probability half-life present (kyrs) half-life absent (kyrs) 18 / 29 5 . 5 % 7 . 4 % 1 . 8 4 . 6 30 . 1 58 . 4

Case study 2: Major word orders 19 / 29

Statistics of major word order distribution 11.8 SOV SVO VSO VOS OVS OSV 139.1 49.3 4.7 0.3% 4.5 0.8 66.3% 23.4% 5.6% 2.2% 2.1% 0.4% Weighted by lineages 20 / 29 1.1% OVS 11 19 79 442 491 1.8% VOS 47.0% VSO SVO SOV Raw numbers 1,045 languages, 211 lineages data: WALS intersected with ASJP 3 OSV 42.3% 7.6% ▶ ▶ by language by family 0 1000 0 200 1 pattern 1 pattern SOV SOV SVO SVO 250 VSO 50 VSO VOS VOS 750 OVS 150 OVS OSV OSV 500 100 frequency frequency

Phylogenetically estimated Markov process 21 / 29

Case study 3: Word order and case 22 / 29

Statistics 14.2% 11.8% 56.0% 21.9% 10.3% 12.2 57.7 22.6 10.6 case/VO case/OV no case/VO no case/OV Weighted by lineages 46.1% 31.4% 8.3% 29 94 64 17 case/VO case/OV no case/VO no case/OV Raw numbers 204 languages, 103 lineages data: WALS intersected with ASJP 23 / 29 ▶ ▶

24 / 29

25 / 29

Phylogenetically estimated Markov process: features individually 26 / 29 case no case

Phylogenetically estimated Markov process: dependent features 27 / 29 no case OV no case VO

Conclusion 28 / 29

Conclusion stability is loose at best potentially complex causal network between typological variables, waiting to be explored Bickel’s Family Bias Method (Bickel, 2013) or Greenhill et al.’s (2017) approach 29 / 29 ▶ connection between cross-linguistic frequency and diachronic ▶ to assess diachronic stability, we need information on ▶ phylogenetic structure ▶ branch lengths ▶ stability of feature values may depend on other features → ▶ todo: ▶ comparison to related but difgerent approaches, such as ▶ factoring in language contact ▶ non-homogeneous Markov chains?

Statistical estimation of diachronic stability from synchronic data - PowerPoint PPT Presentation

Statistical estimation of diachronic stability from synchronic data Gerhard Jger Tbingen University Cape Town, July 6, 2018 1 / 29 Introduction From the workshop description The workshop starts from the null hypothesis that

Modelling language contact with diachronic crosslinguistic data Achim Stein Carola Trips

D Exploring diachronic collocations with DiaCollo Bryan Jurish jurish@bbaw.de G ottingen

Periodization of constructional productivity in diachronic corpora Florent Perek University of

D Exploring diachronic collocations with DiaCollo Bryan Jurish jurish@bbaw.de Universit at

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

A tour on Bridgeland stability Paolo Stellari Hamburg, June 2015 Paolo Stellari A tour on

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Corporate Presentation Thermolab group One stop Solutions for all your Stability requirements

STABILITY METER INSTRUMENT for Hydrogen-Peroxide - automatic device- Abstract The Stability

Stability Programme, 2018 Update John McCarthy Department of Finance 17 th April 2018 Stability

Assessment of VDOT Bowers Hill Improvement Alternatives to Ease Evacuation HRTPO Board Meeting

Baryogenesis from Helical Magnetic Fields Through the EW Phase Transition Andrew Long EWPT

Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures Vincent Tan

REPRESENTATIONS AND OPERATORS FOR IMPROVING EVOLUTIONARY SOFTWARE REPAIR Claire Westley

The use of parsed corpora in information structural research LSA Summer Institute 2013: Workshop

Linguistics: Morphology MSc Bridge Course, October 2011 Dr. Alexis Palmer :

A RECIPE FOR NULLNESS Ileana Paul and Diane Massam University of Western Ontario and University

An evolutionary approach to (logistic-like) language change Ted Briscoe Computer Laboratory

Sambuz

Useful Links

Newsletter

Mail Us