introduction to dialectometry
play

Introduction to Dialectometry Wilbert Heeringa Spr akbanken, - PowerPoint PPT Presentation

Introduction to Dialectometry Wilbert Heeringa Spr akbanken, University of Gothenburg 30 january 2019 1 Introduction 2 What is dialectometry? The measure of dialect (Jean S eguy). Measures the degree of difference or


  1. Introduction to Dialectometry Wilbert Heeringa Spr˚ akbanken, University of Gothenburg 30 january 2019 1

  2. Introduction 2

  3. What is dialectometry? • ’The measure of dialect’ (Jean S´ eguy). • Measures the degree of difference or similarity between dialects. • Thus patterns in the dialect landscape can be revealed. 3

  4. Why dialectometry? • For the record of cultural history. In order to reveal migrations, contacts with other peoples, and internal cultural divisions. • May be of use to language learners, publishers, broadcasters, educators and language planners. 4

  5. Isogloss method • Primary tool of traditional dialectology has been the isogloss . • Greek isos means equal, Greek gl ¯ o ssa means language. 5

  6. Nucleus in ripe : [rip( @ )] (west) [r E; p] (central) [rip( @ )] (east) 6

  7. Coda in cold : [k O; u t] (west) [k O; lt] (east) 7

  8. Nucleus in ripe & coda in cold 8

  9. Isogloss method Overlay the isogloss maps of 14 phenomena: 1 [ VE rk] vs. [ VE r @ k] 2 [spl I nt @ r] vs. [spl I nt @ö ] [kni] vs. [kne : ] vs. [kn E: i ] vs. [kn I b @ l] 3 4 [zi ; n] vs. [ @ zi ; n] vs. [ G@ zi ; n] vs. [j @ zi ; n] " ] vs. [ste ; n @ ] vs. [st I; @ s] 5 [ste ; n [me : st @ r] vs. [mi ; @ st @ r] vs. [m E; st @ r] 6 [rip] vs. [r E; i p] 7 8 [z E s] vs. [s E s] vs. [s E z] [k O; u t] vs. [k O; lt] 9 10 [ro : zn " ] vs. [ro : z @ n] vs. [ro : z @ ] [l A d @ r vs. [li ; @ r( @ )] 11 [bru : r] vs. [br œ: i j @ r] vs. [bru ; r @ ] 12 13 [br Yx ] vs. [br YG ( @ )] vs. [br Yg ] 14 [bl O; w] vs. [bl A: t] 9

  10. Isoglosses of 14 phenomena. Isogloss bundles represent dialect boundaries. 10

  11. Isogloss method • Not easy to decide about dialect borders, unless by selecting coinciding isoglosses. 11

  12. Dialectometry We need methodology that: • is purely linguistic; • includes all linguistic levels; • uses a representative data set of contemporary spoken dialect; • includes all data without making subjective selections; • utilizes the data maximally; • allows comparisons regardless whether varieties are geographically close or not; • produces results that are unambiguous. Use dialectometry? 12

  13. Relative difference value • The term ‘dialectometry’ was coined by Jean S´ eguy. • He was director of the Atlas linguistique de la Gascogne . • Assisted and inspired by Henri Guiter. • Dialect distance: number of items on which two dialects differ, expressed in a percentage. 13

  14. Relative difference value • Example: calculate lexical relative difference value between Middelstum and Ommen on the basis of six items: Middelstum Ommen friend k A m @ r U; t k A m @ r O: t 0 ˇ ship sx I p sx I p 0 far v E: r V it 1 ˇ ˚ are b I n b I nt 0 " still n O x n O x 0 stø ; t n push dr Y k 1 ˇN " " 2 • Distance: 2/6 = 0.33. Percentage: 33%. 14

  15. Relative difference value • We call this the ‘relative difference value’. • Can be used for all linguistic levels. • No gradual distances between items. • Goebl (1982 and later) measured dialect similarity and called this Relative Identity Value (RIV). 15

  16. Weighted difference value • Goebl (1984) introduced the Weighted Identity Value (WIV). • Basic idea: similarity in rare lexemes contributes more strongly to the overall similarity between two local dialects than similarity in common lexemes. • Since we focus on distances rather than on similarity, we present ‘weighted difference value’. 16

  17. Weighted difference value • Example: in a set of 360 dialects we find the following lexemes for schip ‘ship’: schip (353), boot (2), lager (1), schuit (4). In terms of distances: schip vs. schip : 353/360 = 0.981 schuit vs. schuit : 4/360 = 0.011 boot vs. boot : 2/360 = 0.006 • The distance between different lexemes (for example schip versus boot ) always is 1. 17

  18. Weighted difference value • Example: calculate the lexical weighted difference value between Middelstum and Ommen on the basis of 6 words: Middelstum Ommen friend k A m @ r U; t k A m @ r O: t 140/354 0.40 ˇ ship sx I p sx I p 353/360 0.98 far v E: r V it 1 ˇ ˚ are b I n b I nt 176/360 0.49 " still n O x n O x 354/355 1.00 stø ; t n push dr Y k 1 ˇN " " 4.87 • Distance: 4.87/6 = 0.81. Percentage: 81%. 18

  19. Levenshtein distance Groningen m�lk Grouw m�lk� Haarlem Almelo m�l�k m�l�k Polsbroek m�l�k Renesse mæl�k Venray m�l�k Mechelen Alveringem m�l�k mæk Kerkrade m�l�x How to quantify differences between the dialect pronunciations? 19

  20. Levenshtein distance • Levenshtein distance was introduced in dialectology by Brett Kessler. • In 1995 he measured linguistic distances between Irish Gaelic dialects. • Later it was applied to Dutch, Sardinian, Norwegian, American English, German, Bulgarian and Bantu dialect/language varieties by others. • Calculate the cost of changing one string into another. 20

  21. Levenshtein distance • Example: milk may be pronounced as [m E l @ k] in the dialect of Haarlem and as [m O lk @ ] in the dialect of Grouw. • Change the first pronounciation into the other. m E l @ k subst. E / O 1 m O l @ k delete @ 1 m O lk insert @ 1 m O lk @ 3 • Many sequence operations map [m E l @ k] → [m O lk @ ]. Levenshtein distance = cost of cheapest mapping. 21

  22. Levenshtein distance • Alignment: 1 2 3 4 5 6 m l k E @ m O l k @ 1 1 1 • We keep track of the alignment length. • If multiple alignments all have the minimum cost, we calculate the length of the longest alignment. • The longest alignment has the greatest number of matches and is linguistically most plausible. 22

  23. Alignment • In a linguistic alignment we assure that the minimum cost is based on an alignment in which: a vowel matches with a vowel ◦ a consonant matches with a consonant ◦ the [j] or [w] matches with a vowel ◦ the [i] or [u] matches with a consonant ◦ ◦ the schwa matches with a sonorant • A pair of pronunciations to be compared with Levenshtein distance consists preferably of cognates as we have done in all of the examples. 23

  24. Levenshtein distance • Variation among dialects is usually not measured on the basis of a single word, but on a set of words. • Assume for two dialects we calculate the Levenshtein distance for n word pairs. • How do we combine them to one distance, i.e. how do we calculate the aggregated distance? 24

  25. Calculating the aggregate • Example: calculate the distance in the sound components between Middelstum and Ommen on the basis of 6 words: Middelstum Ommen sum of length of weights alignment ship sx I p sx I p 0 4 cap p E t p E t @ 1 4 called r O upm @ rupm 2 6 jump spr IN spr IN kt 2 7 cellar k E l @ r k E ld @ r 1 6 house hus hys 1 3 7 30 • ‘Raw distance’ is 7/6 = 1.67, normalized distance is 7/30 = 0.233 = 23.3%. 25

  26. Operation weights • In the examples above we used binary weights: weight is 0 (match of two sounds) or 1 (substitution of one sound by another); ◦ when a sound is inserted or deleted, the weight also is 1. ◦ • Refinement by using gradual PMI distances as operation weights. 26

  27. PMI-based Levenshtein distance • Introduced in dialectology by Martijn Wieling, Jelena Proki´ c and John Nerbonne in 2009. • Pointwise Mutual Information (PMI) assesses the degree of dependence between aligned segments. Procedure: repeat ◦ compare each dialect to each dialect by using Levenstein distance (the first time with binary weights, later times with newly calculated weights). ◦ find new weights by analyzing the alignments: the more frequently segments co-occur in an alignment, the smaller the distance weight. until weights do not change any more. • Alignments made by PMI Levenshtein are better, see Wieling, Proki´ c and Nerbonne (2009). 27

  28. Application • Reeks Nederlandse Dialectatlassen, compiled by E. Blancquaert and W. P´ ee. • Texts from 1922–1975, 1956 local dialects, 139 sentences each. • We selected 361 dialects, 125 words. 28

  29. Distribution of the 361 dialects in the Dutch dialect area. 29

  30. Beam maps • Introduced by Goebl ( ± 1983). • Distances between dialects represented by lines among local dialects in a map. • Each local dialect is connected by a straight line with each dialect. • Darker lines represent smaller distances, lighter lines represent larger distances. 30

  31. Beam maps: lexical relative difference values (left), lexical weighted difference values (middle) and pronunciation Levenshtein distances (right). 31

  32. Honeycomb maps • Exist since Haag (1898), and ‘reintroduced’ by Goebl ( ± 1983). • Shows distances between geographically neighboring dialects. • Related dialects are separated by lighter lines, and more remote dialects are separated by darker lines. • Cartographic inversion of beam maps. 32

  33. Honeycomb maps: lexical relative distance values (left), lexical weighted difference values (middle) and pronunciation Levenshtein distances (right). 33

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend