The semantics of color terms. A quantitative cross-linguistic - - PowerPoint PPT Presentation

the semantics of color terms a quantitative cross
SMART_READER_LITE
LIVE PREVIEW

The semantics of color terms. A quantitative cross-linguistic - - PowerPoint PPT Presentation

The semantics of color terms. A quantitative cross-linguistic investigation Gerhard J ager gerhard.jaeger@uni-tuebingen.de May 20, 2010 University of Leipzig 1/124 The psychological color space physical color space has infinite


slide-1
SLIDE 1

The semantics of color terms. A quantitative cross-linguistic investigation

Gerhard J¨ ager gerhard.jaeger@uni-tuebingen.de

May 20, 2010

University of Leipzig

1/124

slide-2
SLIDE 2

The psychological color space

physical color space has infinite dimensionality — every wavelength within the visible spectrum is one dimension psychological color space is only 3-dimensional this fact is employed in technical devices like computer screens (additive color space) or color printers (subtractive color space) additive color space subtractive color space

2/124

slide-3
SLIDE 3

The psychological color space

psychologically correct color space should not only correctly represent the topology of, but also the distances between colors distance is inverse function of perceived similarity L*a*b* color space has this property three axes:

black — white red — green blue — yellow

irregularly shaped 3d color solid

3/124

slide-4
SLIDE 4

The color solid

4/124

slide-5
SLIDE 5

The Munsell chart

for psychological investigations, the Munsell chart is being used 2d-rendering of the surface of the color solid

8 levels of lightness 40 hues

plus: black–white axis with 8 shaded of grey in between neighboring chips differ in the minimally perceivable way

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

5/124

slide-6
SLIDE 6

Berlin and Kay 1969

pilot study how different languages carve up the color space into categories informants: speakers of 20 typologically distant languages (who happened to be around the Bay area at the time) questions (using the Munsell chart):

What are the basic color terms of your native language? What is the extension of these terms? What are the prototypical instances of these terms?

results are not random indicate that there are universal tendencies in color naming systems

6/124

slide-7
SLIDE 7

Berlin and Kay 1969

extensions

Arabic

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

7/124

slide-8
SLIDE 8

Berlin and Kay 1969

extensions

Bahasa Indonesia

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

8/124

slide-9
SLIDE 9

Berlin and Kay 1969

extensions

Bulgarian

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

9/124

slide-10
SLIDE 10

Berlin and Kay 1969

extensions

Cantonese

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

10/124

slide-11
SLIDE 11

Berlin and Kay 1969

extensions

Catalan

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

11/124

slide-12
SLIDE 12

Berlin and Kay 1969

extensions

English

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

12/124

slide-13
SLIDE 13

Berlin and Kay 1969

extensions

Hebrew

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

13/124

slide-14
SLIDE 14

Berlin and Kay 1969

extensions

Hungarian

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

14/124

slide-15
SLIDE 15

Berlin and Kay 1969

extensions

Ibibo

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

15/124

slide-16
SLIDE 16

Berlin and Kay 1969

extensions

Japanese

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

16/124

slide-17
SLIDE 17

Berlin and Kay 1969

extensions

Korean

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

17/124

slide-18
SLIDE 18

Berlin and Kay 1969

extensions

Mandarin

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

18/124

slide-19
SLIDE 19

Berlin and Kay 1969

extensions

Mexican Spanish

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

19/124

slide-20
SLIDE 20

Berlin and Kay 1969

extensions

Pomo

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

20/124

slide-21
SLIDE 21

Berlin and Kay 1969

extensions

Swahili

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

21/124

slide-22
SLIDE 22

Berlin and Kay 1969

extensions

Tagalog

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

22/124

slide-23
SLIDE 23

Berlin and Kay 1969

extensions

Thai

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

23/124

slide-24
SLIDE 24

Berlin and Kay 1969

extensions

Tzeltal

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

24/124

slide-25
SLIDE 25

Berlin and Kay 1969

extensions

Urdu

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

25/124

slide-26
SLIDE 26

Berlin and Kay 1969

extensions

Vietnamese

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

26/124

slide-27
SLIDE 27

Berlin and Kay 1969

identification of absolute and implicational universals, like

all languages have words for black and white if a language has a word for yellow, it has a word for red if a language has a word for pink, it has a word for blue ...

27/124

slide-28
SLIDE 28

The World Color Survey

B&K was criticized for methodological reasons in response, in 1976 Kay and co-workers launched the world color survey investigation of 110 non-written languages from around the world around 25 informants per language two tasks:

the 330 Munsell chips were presented to each test person one after the other in random order; they had to assign each chip to some basic color term from their native language for each native basic color term, each informant identified the prototypical instance(s)

data are publicly available under http://www.icsi.berkeley.edu/wcs/

28/124

slide-29
SLIDE 29

Data digging in the WCS

distribution of focal colors across all informants:

Distribution of focal colors

Munsell chips # named as focal color 20 50 200 1000

29/124

slide-30
SLIDE 30

Data digging in the WCS

distribution of focal colors across all informants:

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

30/124

slide-31
SLIDE 31

Data digging in the WCS

partition of a randomly chosen informant from a randomly chosen language

31/124

slide-32
SLIDE 32

Data digging in the WCS

partition of a randomly chosen informant from a randomly chosen language

32/124

slide-33
SLIDE 33

Data digging in the WCS

partition of a randomly chosen informant from a randomly chosen language

33/124

slide-34
SLIDE 34

Data digging in the WCS

partition of a randomly chosen informant from a randomly chosen language

34/124

slide-35
SLIDE 35

Data digging in the WCS

partition of a randomly chosen informant from a randomly chosen language

35/124

slide-36
SLIDE 36

Data digging in the WCS

partition of a randomly chosen informant from a randomly chosen language

36/124

slide-37
SLIDE 37

Data digging in the WCS

partition of a randomly chosen informant from a randomly chosen language

37/124

slide-38
SLIDE 38

Data digging in the WCS

partition of a randomly chosen informant from a randomly chosen language

38/124

slide-39
SLIDE 39

Data digging in the WCS

partition of a randomly chosen informant from a randomly chosen language

39/124

slide-40
SLIDE 40

Data digging in the WCS

partition of a randomly chosen informant from a randomly chosen language

40/124

slide-41
SLIDE 41

What is the extension of categories?

data from individual informants are extremely noisy averaging over all informants from a language helps, but there is still noise, plus dialectal variation desirable: distinction between “genuine” variation and noise

41/124

slide-42
SLIDE 42

Statistical feature extraction

first step: representation of raw data in contingency matrix

rows: color terms from various languages columns: Munsell chips cells: number of test persons who used the row-term for the column-chip

A0 B0 B1 B2 · · · I38 I39 I40 J0 red · · · 2 green · · · blue · · · black · · · 18 23 21 25 white 25 25 22 23 · · · . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rot · · · 1 gr¨ un · · · gelb 1 · · · . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rouge · · · vert · · · . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

further processing:

divide each row by the number n of test persons using the corresponding term duplicate each row n times

42/124

slide-43
SLIDE 43

Principal Component Analysis

technique to reduce dimensionality of data input: set of vectors in an n-dimensional space first step: rotate the coordinate system, such that

the new n coordinates are

  • rthogonal to each other

the variations of the data along the new coordinates are stochastically independent

second step: choose a suitable m < n project the data on those m new coordinates where the data have the highest variance

43/124

slide-44
SLIDE 44

Principal Component Analysis

alternative formulation:

choose an m-dimensional linear sub-manifold of your n-dimensional space project your data onto this manifold when doing so, pick your sub-manifold such that the average squared distance of the data points from the sub-manifold is minimized

intuition behind this formulation:

data are “actually” generated in an m-dimensional space

  • bservations are disturbed by n-dimensional noise

PCA is a way to reconstruct the underlying data distribution

applications: picture recognition, latent semantic analysis, statistical data analysis in general, data visualization, ...

44/124

slide-45
SLIDE 45

Statistical feature extraction: PCA

first 15 principal components jointly explain 91.6% of the total variance choice of m = 15 is determined by using “Kaiser’s stopping rule”

principal components proportion of variance explained 0.00 0.05 0.10 0.15 0.20 0.25 0.30

45/124

slide-46
SLIDE 46

Statistical feature extraction: PCA

after some post-processing (“varimax” algorithm):

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

46/124

slide-47
SLIDE 47

Projecting observed data on lower-dimensional-manifold

noise removal: project observed data onto the lower-dimensional submanifold that was obtained via PCA in our case: noisy binary categories are mapped to smoothed fuzzy categories (= probability distributions over Munsell chips) some examples:

47/124

slide-48
SLIDE 48

Projecting observed data on lower-dimensional-manifold

48/124

slide-49
SLIDE 49

Projecting observed data on lower-dimensional-manifold

49/124

slide-50
SLIDE 50

Projecting observed data on lower-dimensional-manifold

50/124

slide-51
SLIDE 51

Projecting observed data on lower-dimensional-manifold

51/124

slide-52
SLIDE 52

Projecting observed data on lower-dimensional-manifold

52/124

slide-53
SLIDE 53

Projecting observed data on lower-dimensional-manifold

53/124

slide-54
SLIDE 54

Projecting observed data on lower-dimensional-manifold

54/124

slide-55
SLIDE 55

Projecting observed data on lower-dimensional-manifold

55/124

slide-56
SLIDE 56

Projecting observed data on lower-dimensional-manifold

56/124

slide-57
SLIDE 57

Projecting observed data on lower-dimensional-manifold

57/124

slide-58
SLIDE 58

Projecting observed data on lower-dimensional-manifold

58/124

slide-59
SLIDE 59

Projecting observed data on lower-dimensional-manifold

59/124

slide-60
SLIDE 60

Projecting observed data on lower-dimensional-manifold

60/124

slide-61
SLIDE 61

Projecting observed data on lower-dimensional-manifold

61/124

slide-62
SLIDE 62

Projecting observed data on lower-dimensional-manifold

62/124

slide-63
SLIDE 63

Projecting observed data on lower-dimensional-manifold

63/124

slide-64
SLIDE 64

Projecting observed data on lower-dimensional-manifold

64/124

slide-65
SLIDE 65

Projecting observed data on lower-dimensional-manifold

65/124

slide-66
SLIDE 66

Projecting observed data on lower-dimensional-manifold

66/124

slide-67
SLIDE 67

Projecting observed data on lower-dimensional-manifold

67/124

slide-68
SLIDE 68

Smoothing the partitions

from smoothed extensions we can recover smoothed partitions each pixel is assigned to category in which it has the highest degree of membership

68/124

slide-69
SLIDE 69

Smoothed partitions of the color space

69/124

slide-70
SLIDE 70

Smoothed partitions of the color space

70/124

slide-71
SLIDE 71

Smoothed partitions of the color space

71/124

slide-72
SLIDE 72

Smoothed partitions of the color space

72/124

slide-73
SLIDE 73

Smoothed partitions of the color space

73/124

slide-74
SLIDE 74

Smoothed partitions of the color space

74/124

slide-75
SLIDE 75

Smoothed partitions of the color space

75/124

slide-76
SLIDE 76

Smoothed partitions of the color space

76/124

slide-77
SLIDE 77

Smoothed partitions of the color space

77/124

slide-78
SLIDE 78

Smoothed partitions of the color space

78/124

slide-79
SLIDE 79

Convexity

note: so far, we only used information from the WCS the location of the 330 Munsell chips in L*a*b* space played no role so far still, apparently partition cells always form continuous clusters in L*a*b* space Hypothesis (G¨ ardenfors): extension of color terms always form convex regions of L*a*b* space

79/124

slide-80
SLIDE 80

Support Vector Machines

supervised learning technique smart algorithm to classify data in a high-dimensional space by a (for instance) linear boundary minimizes number of mis-classifications if the training data are not linearly separable

green red −3 −2 −1 1 2 3 −3 −2 −1 1 2 3

  • o
  • SVM classification plot

y x

80/124

slide-81
SLIDE 81

Convex partitions

a binary linear classifier divides an n-dimensional space into two convex half-spaces intersection of two convex set is itself convex hence: intersection of k binary classifications leads to convex sets procedure: if a language partitions the Munsell space into m categories, train m(m−1)

2

many binary SVMs, one for each pair

  • f categories in L*a*b* space

leads to m convex sets (which need not split the L*a*b* space exhaustively)

81/124

slide-82
SLIDE 82

Convex approximation

82/124

slide-83
SLIDE 83

Convex approximation

83/124

slide-84
SLIDE 84

Convex approximation

84/124

slide-85
SLIDE 85

Convex approximation

85/124

slide-86
SLIDE 86

Convex approximation

86/124

slide-87
SLIDE 87

Convex approximation

87/124

slide-88
SLIDE 88

Convex approximation

88/124

slide-89
SLIDE 89

Convex approximation

89/124

slide-90
SLIDE 90

Convex approximation

90/124

slide-91
SLIDE 91

Convex approximation

91/124

slide-92
SLIDE 92

Convex approximation

  • n average, 93.7% of all Munsell chips are correctly classified

by convex approximation

  • 0.80

0.85 0.90 0.95 proportion of correctly classified Munsell chips

92/124

slide-93
SLIDE 93

Convex approximation

compare to the outcome of the same procedure without PCA, and with PCA but using a random permutation of the Munsell chips

  • 1

2 3 20 40 60 80 100 degree of convexity (%)

93/124

slide-94
SLIDE 94

Convex approximation

choice of m = 10 is somewhat arbitrary

  • utcome does not depend very much on this choice though
  • ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

10 20 30 40 50 50 60 70 80 90 100

  • no. of principal components used

mean degree of convexity (%)

94/124

slide-95
SLIDE 95

Implicative universals

first six features correspond nicely to the six primary colors white, black, red, green, blue, yellow according to Kay et al. (1997) (and many other authors) simple system of implicative universals regarding possible partitions of the primary colors

95/124

slide-96
SLIDE 96

Implicative universals

I II III IV V     white red/yellow green/blue black           white red yellow green/blue black       white/red/yellow black/green/blue

 white red/yellow black/green/blue       white red/yellow green black/blue             white red yellow green blue black             white red yellow black/green/blue           white red yellow green black/blue           white red yellow/green/blue black           white red yellow/green blue black           white red yellow/green black/blue    

source: Kay et al. (1997)

96/124

slide-97
SLIDE 97

Partition of the primary colors

each speaker/term pair can be projected to a 15-dimensional vector primary colors correspond to first 6 entries each primary color is assigned to the term for which it has the highest value defines for each speaker a partition over the primary colors

97/124

slide-98
SLIDE 98

Partition of the primary colors

for instance: sample speaker (from Piraha): extracted partition:     white/yellow red green/blue black     supposedly impossible, but

  • ccurs 61 times in the

database

J I H G F E D C B A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

98/124

slide-99
SLIDE 99

Partition of primary colors

most frequent partition types:

1 {white}, {red}, {yellow}, {green, blue}, {black} (41.9%) 2 {white}, {red}, {yellow}, {green}, {blue}, {black} (25.2%) 3 {white}, {red, yellow}, {green, blue, black} (6.3%) 4 {white}, {red}, {yellow}, {green}, {black, blue} (4.2%) 5 {white, yellow}, {red}, {green, blue}, {black} (3.4%) 6 {white}, {red}, {yellow}, {green, blue, black} (3.2%) 7 {white}, {red, yellow}, {green, blue}, {black} (2.6%) 8 {white, yellow}, {red}, {green, blue, black} (2.0%) 9 {white}, {red}, {yellow}, {green, blue, black} (1.6%) 10 {white}, {red}, {green, yellow}, {blue, black} (1.2%)

99/124

slide-100
SLIDE 100

Partition of primay colors

87.1% of all speaker partitions obey Kay et al.’s universals the ten partitions that confirm to the universals occupy ranks 1, 2, 3, 4, 6, 7, 9, 10, 16, 18 decision what counts as an exception seems somewhat arbitrary on the basis of these counts

100/124

slide-101
SLIDE 101

Partition of primary colors

more fundamental problem:

partition frequencies are distributed according to power law frequency ∼ rank −1.99

no natural cutoff point to distinguish regular from exceptional partitions

  • ● ●
  • 1

2 5 10 20 50 1 2 5 10 20 50 100 200 500 rank frequency

101/124

slide-102
SLIDE 102

Partition of seven most important colors

frequency ∼ rank−1.64

  • ● ● ●
  • 1

2 5 10 20 50 100 1 2 5 10 20 50 100 200 500 rank frequency

102/124

slide-103
SLIDE 103

Partition of eight most important colors

frequency ∼ rank−1.46

  • ● ●
  • 1

2 5 10 20 50 100 200 1 2 5 10 20 50 100 200 rank frequency

103/124

slide-104
SLIDE 104

Power laws

104/124

slide-105
SLIDE 105

Power laws

105/124

slide-106
SLIDE 106

Power laws

from Newman 2006

106/124

slide-107
SLIDE 107

Other linguistic power law distributions

number of vowels vowel systems and their frequency of occurrence 3 14 4 14 5 4 2 5 97 3 6 26 12 12 7 23 6 5 4 3 8 6 3 3 2 9 7 7 3

(from Schwartz et al. 1997, based on the UCLA Phonetic Segment Inventory Database) 107/124

slide-108
SLIDE 108

Other linguistic power law distributions

frequency ∼ rank−1.06

  • ● ●●●
  • 1

2 5 10 20 2 5 10 20 50 100 rank frequency

108/124

slide-109
SLIDE 109

Other linguistic power law distributions

size of language families source: Ethnologue frequency ∼ rank−1.32

  • ● ● ●
  • 1

2 5 10 20 50 100 1 5 10 50 100 500 rank frequency

109/124

slide-110
SLIDE 110

Other linguistic power law distributions

number of speakers per language source: Ethnologue frequency ∼ rank−1.01

  • ●●
  • 1

2 5 10 20 50 100 200 5 10 20 50 100 200 500 1000 rank frequency (in million)

110/124

slide-111
SLIDE 111

The World Atlas of Language Structures

large scale typological database, conducted mainly by the MPI EVA, Leipzig 2,650 languages in total are used 142 features, with between 120 and 1,370 languages per feature available online

111/124

slide-112
SLIDE 112

The World Atlas of Language Structures

question: are frequency of feature values powerlaw distributed? problem: number of feature values usually too small for statistic evaluation solution:

cross-classification of two (randomly chosen) features

  • nly such feature pairs are considered that lead to at least 30

non-empty feature value combinations

pilot study with 10 such feature pairs

112/124

slide-113
SLIDE 113

The World Atlas of Language Structures

Feature 1: Consonant-Vowel Ratio Feature 2: Subtypes

  • f Asymmetric

Standard Negation Kolmogorov-Smirnov test: positive

10 10

1

10

2

10

−2

10

−1

10 Pr(X ≥ x) x

113/124

slide-114
SLIDE 114

The World Atlas of Language Structures

Feature 1: Weight Factors in Weight-Sensitive Stress Systems Feature 2: Ordinal Numerals Kolmogorov-Smirnov test: positive

10 10

1

10

2

10

−2

10

−1

10 Pr(X ≥ x) x

114/124

slide-115
SLIDE 115

The World Atlas of Language Structures

Feature 1: Third Person Zero of Verbal Person Marking Feature 2: Subtypes

  • f Asymmetric

Standard Negation Kolmogorov-Smirnov test: positive

10 10

1

10

2

10

−2

10

−1

10 Pr(X ≥ x) x

115/124

slide-116
SLIDE 116

The World Atlas of Language Structures

Feature 1: Relationship between the Order of Object and Verb and the Order of Adjective and Noun Feature 2: Expression

  • f Pronominal

Subjects Kolmogorov-Smirnov test: positive

10 10

1

10

2

10

3

10

−2

10

−1

10 Pr(X ≥ x) x

116/124

slide-117
SLIDE 117

The World Atlas of Language Structures

Feature 1: Plurality in Independent Personal Pronouns Feature 2: Asymmetrical Case-Marking Kolmogorov-Smirnov test: positive

10 10

1

10

2

10

−2

10

−1

10 Pr(X ≥ x) x

117/124

slide-118
SLIDE 118

The World Atlas of Language Structures

Feature 1: Locus of Marking: Whole-language Typology Feature 2: Number of Cases Kolmogorov-Smirnov test: positive

10 10

1

10

2

10

−2

10

−1

10 Pr(X ≥ x) x

118/124

slide-119
SLIDE 119

The World Atlas of Language Structures

Feature 1: Prefixing

  • vs. Suffixing in

Inflectional Morphology Feature 2: Coding of Nominal Plurality Kolmogorov-Smirnov test: positive

10 10

1

10

2

10

3

10

−2

10

−1

10 Pr(X ≥ x) x

119/124

slide-120
SLIDE 120

The World Atlas of Language Structures

Feature 1: Prefixing

  • vs. Suffixing in

Inflectional Morphology Feature 2: Ordinal Numerals Kolmogorov-Smirnov test: positive

10 10

1

10

2

10

−2

10

−1

10 Pr(X ≥ x) x

120/124

slide-121
SLIDE 121

The World Atlas of Language Structures

Feature 1: Coding of Nominal Plurality Feature 2: Asymmetrical Case-Marking Kolmogorov-Smirnov test: positive

10 10

1

10

2

10

−2

10

−1

10 Pr(X ≥ x) x

121/124

slide-122
SLIDE 122

The World Atlas of Language Structures

Feature 1: Position of Case Affixes Feature 2: Ordinal Numerals Kolmogorov-Smirnov test: negative

10 10

1

10

2

10

−2

10

−1

10 Pr(X ≥ x) x

122/124

slide-123
SLIDE 123

Why power laws?

critical states self-organized criticality preferential attachment random walks ... Preferential attachment items are stochastically added to bins probability to end up in bin n is linear in number of items that are already in bin n

123/124

slide-124
SLIDE 124

(Wide) Open questions

Preferential attachment explains power law distribution if there are no a priori biases for particular types first simulations suggest that preferential attachment + biased type assignment does not lead to power law negative message: uneven typological frequency distribution does not prove that frequent types are inherently preferred linguistically/cognitively/socially unsettling questions:

Are there linguistic/cognitive/social biases in favor of certain types? If yes, can statistical typology supply information about this? If power law distributions are the norm, is their any content to the notion of statistical universal in a Greenbergian sense?

124/124