Language Typology and Areal Linguistics Yiru July 13, 2016 Yiru - - PowerPoint PPT Presentation

language typology and areal linguistics
SMART_READER_LITE
LIVE PREVIEW

Language Typology and Areal Linguistics Yiru July 13, 2016 Yiru - - PowerPoint PPT Presentation

Language Typology and Areal Linguistics Yiru July 13, 2016 Yiru Language Typology July 13, 2016 1 / 26 Overview Introduction 1 Typologically-based Clusters 2 Areal Linguistics 3 Yiru Language Typology July 13, 2016 2 / 26 Language


slide-1
SLIDE 1

Language Typology and Areal Linguistics

Yiru July 13, 2016

Yiru Language Typology July 13, 2016 1 / 26

slide-2
SLIDE 2

Overview

1

Introduction

2

Typologically-based Clusters

3

Areal Linguistics

Yiru Language Typology July 13, 2016 2 / 26

slide-3
SLIDE 3

Language similarity

Why are some languages more alike than others? the languages may be related ”genetically”. derived from a common ancestor language the similarities may be due to chance. linguistic universals the languages may be related areally. due to sharing

Yiru Language Typology July 13, 2016 3 / 26

slide-4
SLIDE 4

Language similarity

Differences between the concepts of genetic relatedness and language similarities lead us to the following questions: If we cluster languages based only on their typological features, how do the induced clusters compare to phylogenetic groupings? How well do induced clusters and genetic families perform in predicting values for typological features? What typological features tend to stay the same within language families, and what features are likely to differ?

Yiru Language Typology July 13, 2016 4 / 26

slide-5
SLIDE 5

WALS(World Atlas of Language Structures)

The WALS project consists of a database that catalogs linguistic features for

  • ver 2,556 languages in 208 language families

using 142 features in 11 different categories. Data sparsity: only 16% of the cells are filled

  • presents serious problems to clustering algorithms

Yiru Language Typology July 13, 2016 5 / 26

slide-6
SLIDE 6

Pruning Methods

Pruning the data to produce a smaller but denser subset Prune Languages by Minimum Features require languages have a minimum of 25 features for the whole-world set, or 10 features for comparing across subfamilies Prune Features by Minimum Coverage pruning features that do not cover more than 10% of the selected languages in the whole-world set, and 25% in comparisons across subfamilies. Use a Dense Language Family

Yiru Language Typology July 13, 2016 6 / 26

slide-7
SLIDE 7

Features and Feature Values

the actual representation of the features values cannot be treated using distance measures: Binarization

Yiru Language Typology July 13, 2016 7 / 26

slide-8
SLIDE 8

Experimental Setup

Q1: how do induced clusters compare to phylogenetic groupings? Clustering Methods k-medoids algorithm methods from the CLUTO: repeated-bisection (rb), a k-means implementation (direct), an agglomerative algorithm (agglo) using UPGMA to produce hierarchical clusters, and bagglo, a variant of agglo Similarity Measures CLUTOs default cosine similarity measure (cos) shared overlap =

#FeatureswithSameValues #FeaturesBothFilledOutinWALS

Yiru Language Typology July 13, 2016 8 / 26

slide-9
SLIDE 9

Clustering Performance Metrics

The genetic families as the gold standard Rand Index Cluster Precision, Recall, and F-Score

Yiru Language Typology July 13, 2016 9 / 26

slide-10
SLIDE 10

Prediction Accuracy

Q2: how do induced clusters and genetic families compare in predicting the values of features for languages in the same group? Q3: what typological features tend to stay the same within related families? Prediction accuracy: use 90% of the filled cells to build clusters predicted the values of the remaining 10% of filled cells the accuracy is calculated by comparing these predicted values with the actual values in the gold standard

Yiru Language Typology July 13, 2016 10 / 26

slide-11
SLIDE 11

Results & Analysis

Cluster Similarity

Yiru Language Typology July 13, 2016 11 / 26

slide-12
SLIDE 12

Results & Analysis

Prediction Accuracy

Yiru Language Typology July 13, 2016 12 / 26

slide-13
SLIDE 13

Results & Analysis

Prediction Accuracy

Yiru Language Typology July 13, 2016 13 / 26

slide-14
SLIDE 14

Results & Analysis

Feature Selection

Yiru Language Typology July 13, 2016 14 / 26

slide-15
SLIDE 15

Error Analysis

Language Similarity vs. Genetic

Yiru Language Typology July 13, 2016 15 / 26

slide-16
SLIDE 16

Error Analysis

WALS as the Dataset The Feature Set in WALS Data Sparsity and Shared Features

Yiru Language Typology July 13, 2016 16 / 26

slide-17
SLIDE 17

Areal Linguistics

The use of areas improves genetic reconstruction of languages according to a variety of metrics. Basic ideas: develop a Bayesian model of typology that allows for the existence of linguistic areas preference for some feature to be shared areally to show that reconstructing language family trees is significantly aided by knowledge of areal features

Yiru Language Typology July 13, 2016 17 / 26

slide-18
SLIDE 18

Areal Linguistics

some of the well-known linguistic areas The Balkans: Albanian, Bulgarian, Greek, Macedonian, Rumanian and Serbo-Croatian. (Sometimes: Romani and Turkish) The Baltic: Baltic languages, Baltic German, and Finnic languages (especially Estonian and Livonian). linguistic features most easily shared areally Ross (1988): nouns > verbs > adjectives > syntax > non − boundfunctionwords > boundmorphemes > phonemes Curnow (2001): 15 categories of borrowable features, phonetics (rare), phonology (common), lexical (very common)

Yiru Language Typology July 13, 2016 18 / 26

slide-19
SLIDE 19

A Bayesian Model for Areal Linguistics

Pitman-Yor process for modeling linguistic areas Kingmans coalescent for modeling linguistic phylogeny

Yiru Language Typology July 13, 2016 19 / 26

slide-20
SLIDE 20

Identifying Language Areas

2

Yiru Language Typology July 13, 2016 20 / 26

slide-21
SLIDE 21

Identifying Areal Features

Yiru Language Typology July 13, 2016 21 / 26

slide-22
SLIDE 22

Genetic Reconstruction

Yiru Language Typology July 13, 2016 22 / 26

slide-23
SLIDE 23

Genetic Reconstruction

Yiru Language Typology July 13, 2016 23 / 26

slide-24
SLIDE 24

Conclusion

  • 1. Comparing clusters derived from typological features to genetic groups

in the worlds languages the induced clusters look very different from genetic grouping despite the differences, induced clusters show similar, or even greater levels of typological similarity than genetic grouping

  • 2. The use of areas improves genetic reconstruction of languages

Yiru Language Typology July 13, 2016 24 / 26

slide-25
SLIDE 25

References

Ryan Georgi, Fei Xia, William Lewis (2001) Comparing Language Similarity across Genetic and Typologically-Based Groupings Hal Daume III(2009) Non-Parametric Bayesian Areal Linguistics

Yiru Language Typology July 13, 2016 25 / 26

slide-26
SLIDE 26

Thank You!

Yiru Language Typology July 13, 2016 26 / 26