Introduction to Dialectometry III Wilbert Heeringa German Academic - - PowerPoint PPT Presentation

introduction to dialectometry iii
SMART_READER_LITE
LIVE PREVIEW

Introduction to Dialectometry III Wilbert Heeringa German Academic - - PowerPoint PPT Presentation

Introduction to Dialectometry III Wilbert Heeringa German Academic Exchange Service DAAD University of Bielefeld, Faculty of Linguistics and Literary Studies Frisian Academy Abidjan, December, 1923, 2016 1 Topics Gabmap Literature 2


slide-1
SLIDE 1

Introduction to Dialectometry III

Wilbert Heeringa

German Academic Exchange Service – DAAD University of Bielefeld, Faculty of Linguistics and Literary Studies Frisian Academy

Abidjan, December, 19–23, 2016

1

slide-2
SLIDE 2

Topics

Gabmap Literature

2

slide-3
SLIDE 3

Gabmap

3

slide-4
SLIDE 4

What is Gabmap?

  • A web application that visualizes dialect variation:

Doing dialect analysis on the web

  • Developed by Peter Kleiweg under supervision of John Nerbonne.
  • Based on functions in the RuG/L04 package which exists since 2001, and has been

freely distributed since 2004

  • Gabmap was developed since the end of 2010 and first published on Github on June 4,

2011.

4

slide-5
SLIDE 5

What is Gabmap?

  • Original version available at:

http://www.let.rug.nl/~kleiweg/L04/webapp

  • Version forked and maintained by C

¸agri C ¸¨

  • ltekin:

http://www.gabmap.nl/ and maintained by Martijn Wieling.

5

slide-6
SLIDE 6

6

slide-7
SLIDE 7

Gabmap running on USB stick

  • Peter Kleiweg developed a Docker image of Gabmap which is available at: at:

https://github.com/pebbe/Gabmap-docker and which enables us to run Gabmap without internet connection.

  • The ’Docker version’ of Gabmap is installed in Lubuntu 16.04, an operating system

based on the Linux kernel.

  • GpsPrune and LibreOffice are also installed. The ‘Calc spreadsheet’ component enables

you to create and view dialect data tables.

7

slide-8
SLIDE 8

How to boot from the USB stick

  • Turn on your computer. If you have a
  • PC,

then right after this press F9 or F12 or ESC or ...;

  • Mac,

then hold the Alt/Option key as soon as you hear the Macs startup chime.

  • A list of list of devices will appear. Choose something with ‘Lexar JumpDrive S75 USB

3.0’.

  • After a while, another ‘boot menu’ will appear. Just press ENTER.
  • Login with:

username: guest password: guest

  • After some time Gabmap is opened in Firefox.

8

slide-9
SLIDE 9

How to boot from the USB stick

  • If booting from the USB does not succeed, check your UEFI settings.
  • Right after turning on your computer press ESC or ?.
  • Disable Secure Boot (enable this again when you are done).
  • Change boot mode to CSM or Legacy (legacy BIOS compatibility mode, legacy USB

support, ?)

9

slide-10
SLIDE 10

10

slide-11
SLIDE 11

11

slide-12
SLIDE 12

Input

  • Gabmap needs three input files:
  • map
  • dialect data
  • feature definition file

12

slide-13
SLIDE 13

Input: map

  • A map consists of at least:
  • an outline of the area;
  • placemarks are added for the locations where the data was collected. NB: place

names should be spelled exactly as in your data file!

  • Optionally, more details can be added to the map, for example internal borders, rivers.

13

slide-14
SLIDE 14

Input: map

  • The maps can be created with Google Earth or Google Maps
  • For a manual about creating maps with Google Earth see:

http://www.let.rug.nl/~kleiweg/L04/kml/manual.html and with Google Maps: http://coltekin.net/cagri/courses/leuven/

  • The two manuals are also found in the folder Gabmap/manuals on the USB stick.
  • Save the map as .kml or .kmz file

14

slide-15
SLIDE 15

Input: map

  • When at least an outline is available as .kml file, we can edit the file with GpsPrune.
  • Locations can be added when the coordinates are known. Coordinates can be obtained

via Google Maps.

15

slide-16
SLIDE 16

16

slide-17
SLIDE 17

17

slide-18
SLIDE 18

18

slide-19
SLIDE 19

19

slide-20
SLIDE 20

20

slide-21
SLIDE 21

21

slide-22
SLIDE 22

Input: map

  • GpsPrune changes polygons in lines. However, Gabmap requires polygons!
  • Load and edit your .kml file in Leafpad.
  • Use Search, Replace and check Replace all at once in order to perform the replacements

throughout the whole document.

  • Replace:
  • ‘<LineString>’

by ‘<Polygon><outerBoundaryIs><LinearRing>’

  • ‘</LineString>’

by ‘</LinearRing></outerBoundaryIs></Polygon>’

22

slide-23
SLIDE 23

Input: dialect data

  • The dialect data should be in a table where:
  • the rows represent the locations where the data was collected;
  • the columns represent the data items.
  • Prepare the data file using LibreOffice Calc (on USB stick) or Microsoft Excel.
  • Use the IPA chart Unicode keyboard at:

https://westonruter.github.io/ipa-chart/keyboard/ for finding the Unicode characters.

  • The chart covers the The International Phonetic Alphabet revised to 2005.

23

slide-24
SLIDE 24

24

slide-25
SLIDE 25

25

slide-26
SLIDE 26

Input: dialect data

  • For uploading the data file in Gabmap it has to be a tab-separated plain text file

encoded as Unicode (UTF-8 or UTF-16).

  • To save the file in this format choose ‘Save As’ in the ‘File’ menu, and choose ‘Text

CSV (.csv)’ in the lower right corner of the window ‘Save’.

  • In the window ‘Export Text File’ choose ‘Tab’ as field delimiter.
  • The resulting file with the extension .csv can be uploaded in Gabmap.

26

slide-27
SLIDE 27

27

slide-28
SLIDE 28

28

slide-29
SLIDE 29

Input: dialect data

  • When loading an existing file in LibreOffice Calc load the file as Unicode (UTF8 or

UTF-16) and the tab as separator.

29

slide-30
SLIDE 30

30

slide-31
SLIDE 31

31

slide-32
SLIDE 32

32

slide-33
SLIDE 33

33

slide-34
SLIDE 34

Input: dialect data

  • Other types of data than transcriptions can be analyzed in Gabmap, too, especially

categorical data.

34

slide-35
SLIDE 35

35

slide-36
SLIDE 36

Input: dialect data

  • See also the manual about preparing dialect data for Gabmap which is found
  • under Help in Gabmap;
  • in the folder Gabmap/manuals on the USB stick.

36

slide-37
SLIDE 37

Input: feature definition file

  • The file IPA.def is found in /Gabmap/datasets.
  • Covers the Unicode characters of the IPA revised until 2005.
  • Using this file assures that in an alignment of two pronunciations:
  • a vowel matches with a vowel
  • a consonant matches with a consonant

and allows that:

  • the [j] or [w] matches with a vowel
  • the [i] or [u] matches with a consonant
  • the schwa matches with a sonorant
  • Substitutions, insertions and indels have weight of 1.

37

slide-38
SLIDE 38

Input: feature definition file

  • If two segments are the same, but they have different suprasegmentals and diacritics,

the weight is 0.3.

  • Not processed are:

primary stress, secondary stress, minor (foot) group, major (intonation) group, syllable break, linking (absence of a break).

  • NB: language-specific adjustments may be necessary!

However, be careful when changing IPA.def.

38

slide-39
SLIDE 39

Running Gabmap

  • Now we have a map, a table and a feature definition file, we can run Gabmap.

39

slide-40
SLIDE 40

40

slide-41
SLIDE 41

41

slide-42
SLIDE 42

Literature

42

slide-43
SLIDE 43

Literature (1)

Goebl, H. (1982). Dialektometrie; Prinzipien und Methoden des Einsatzes der numerischen Taxonomie im Bereich der Dialektgeographie. Wien: Verlag der ¨

  • Ost. Akademie der Wissenschaften.

Goebl, H. (1984). Dialektometrische Studien anhand italoromanischer, r¨ atoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF. Volume 1. (Volumes 2 and 3 contain maps and tables). T¨ ubingen: Max Niemeyer. Goebl, H. (2010a). Dialectometry and quantitative mapping. In Language and Space. An International Handbook of Linguistic Variation. Volume 2: Language Mapping. Handb¨ ucher zur Sprach- und Kommunikationswissenschaft [HSK], edited by Alfred Lameli, Roland Kehrein and Stefan Rabanus, 30.2, 433–457, 2201–2212. Berlin: de Gruyter Mouton. Goebl, H. (2010b). Dialectometry: Theoretical prerequisites, practical problems, and concrete applications (mainly with examples drawn from the “Atlas Linguistique de La France”, 1902–1910). Dialectologia. Special Issue, I(2010): 63–77. Goebl, H. (2006). Recent Advances in Salzburg Dialectometry. Literary and Linguistic Computing 21(4), 411–435.

43

slide-44
SLIDE 44

Literature (2)

Gooskens, Ch, Beijering, K. and Heeringa, W. (2008). Phonetic and lexical predictors of intelligibility. International Journal of Humanities and Arts Computing, 2(1–2): 63–81. Gooskens, Ch. and Heeringa W. (2004). Perceptive Evaluation of Levenshtein Dialect Distance Measurements using Norwegian Dialect Data. Language Variation and Change, 16(3), 189–207. Heeringa, W. (2004). Measuring dialect pronunciation differences using Levenshtein distance. Phd thesis, University of Groningen. Heeringa, W., Kleiweg, P., Gooskens, Ch. and Nerbonne, J. (2006). Evaluation of String Distance Algorithms for Dialectology. In Linguistic Distances Workshop at the joint conference of International Committee

  • n Computational Linguistics and the Association for Computational Linguistics, Sydney, July, 2006,

edited by John Nerbonne and Erhard Hinrichs, 51–62. Stroudsburg PA: The Association for Computational Linguistics (ACL). Kessler, B. (1995). Computational dialectology in Irish Gaelic. In Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics, 60–67, Dublin. EACL.

44

slide-45
SLIDE 45

Literature (3)

Kruskal, J B. (1999). An overview of sequence comparison. In Time Warps, String edits, and Macromolecules. The Theory and Practice of Sequence Comparison edited by D. Sankoff, and J. Kruskal, 2nd ed., 1–44. Stanford: Center for the Study of Language and Information. 1st edition appeared in 1983. Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory, 10(8): 707–710. Leinonen, T., C ¸¨

  • ltekin, C

¸. and J. Nerbonne (2016): Using Gabmap. Lingua 178(2016), special issue on Linguistic Infrastructure edited by Jan Odijk, 71–83. Nerbonne, J. (2009). Data-driven dialectology. Language and Linguistics Compass 3(1), 175–198. Nerbonne, J. (2010). Mapping aggregate variation. In A. Lameli, R. Kehrein och S. Rabanus (eds.), Language and Space Vol. 2. Language Mapping. Berlin: De Gruyter, 476–495. Nerbonne, J., Colen, R., Gooskens, Ch., Kleiweg, P. and Leinonen, T. (2011). Gabmap – a web application for dialectology. Dialectologia: revista electr`

  • nica, special issue on Production, Perception and Attitude

edited by John Nerbonne, Stef Grondelaers, Dirk Speelman & Maria-Pilar Perea, 65–89.

45

slide-46
SLIDE 46

Literature (4)

Nerbonne, J. & W. Heeringa (2010). Measuring dialect differences. In J. E. Schmidt and P. Auer (eds.), Language and Space Vol. 1. Theories and Methods. Berlin: De Gruyter, 550–567. Snoek, C. (2014). Review of Gabmap: Doing Dialect Analysis on the Web. Language Documentation and Conservation 8, 192–208. Spruit, M.R., Heeringa, W. and Nerbonne, J. (2009). Associations among Linguistic Levels. In Lingua, special issue on The Forests behind the Trees, edited by John Nerbonne and Franz Manni, 119(11): 1624–1642. Wieling, M. (2012). A Quantitative Approach to Social and Geographical Dialect Variation. PhD dissertation, University of Groningen. Wieling, M., Bloem, J., Mignella, K., Timmermeister, M. and Nerbonne, J. (2014). Measuring foreign accent strength in English. Validating Levenshtein Distance as a Measure.Language Dynamics and Change 4(2): 253–269. Wieling, M., Margaretha, E. and Nerbonne, J. (2012). Inducing a measure of phonetic similarity from dialect

  • variation. Journal of Phonetics, 40(2), 307–314.

46

slide-47
SLIDE 47

Literature (5)

Wieling, M., Proki´ c, J. and Nerbonne, J. (2009). Evaluating the Pairwise String Alignments of Pronunciations. In Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education (LaTeCH SHELT&R 2009) EACL Workshop edited by Lars Borin and Piroska Landvai, 18–25.

47

slide-48
SLIDE 48

Final remarks

  • The course materials are found
  • in the folder /Gabmap/ of the USB stick
  • n the WEB at http://www.wjheeringa.nl/dialect/.
  • There you can find:
  • the slides
  • exercises
  • data sets (only USB)
  • software (only WEB)
  • manuals
  • literature about Gabmap.

48

slide-49
SLIDE 49

49

slide-50
SLIDE 50

Merci de votre attention!

50