Tagging modality in Oceanic languages of Melanesia Annika Tjuka, - - PowerPoint PPT Presentation

tagging modality in oceanic languages of melanesia
SMART_READER_LITE
LIVE PREVIEW

Tagging modality in Oceanic languages of Melanesia Annika Tjuka, - - PowerPoint PPT Presentation

MelaTAMP project Data Method Inter-annotator agreement Conclusion Tagging modality in Oceanic languages of Melanesia Annika Tjuka, Lena Weimann, and Kilu von Prince August 1 st , 2019 1 / 23 The 13 th Linguistic Annotation Workshop


slide-1
SLIDE 1

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Tagging modality in Oceanic languages of Melanesia

Annika Tjuka, Lena Weißmann, and Kilu von Prince August 1st, 2019 The 13th Linguistic Annotation Workshop

1 / 23

slide-2
SLIDE 2

MelaTAMP project Data Method Inter-annotator agreement Conclusion

The MelaTAMP project

2 / 23

slide-3
SLIDE 3

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Introduction

Saliba-Logea Mavea Daakaka Dalkalaen Daakie Nafsan North Ambrym

Figure 1: Subject languages of the MelaTAMP project.

3 / 23

slide-4
SLIDE 4

MelaTAMP project Data Method Inter-annotator agreement Conclusion

The Languages

  • Subject languages: Daakaka, Dalkalaen, Daakie, Mavea,

Nafsan, Saliba-Logea, and North Ambrym.

  • Speaker populations range from about 30 (Mavea) to around

6000 (Nafsan).

  • So far, our understanding of the Oceanic languages of

Melanesia is based mostly on descriptive accounts.

4 / 23

slide-5
SLIDE 5

MelaTAMP project Data Method Inter-annotator agreement Conclusion

The MelaTAMP Project

  • Comparative research
  • Based on corpus data
  • Texts were recorded during fjeldwork sessions with speakers of

the respective language.

  • Investigation of modality, aspect, tense, and polarity (TAMP)

in Oceanic languages. The focus of this talk is on our study on tagging modality in fjve of the seven subject languages.

5 / 23

slide-6
SLIDE 6

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Expressing TAMP

  • TAM-related meanings are often expressed obligatorily within

the verbal complex, sometimes in more than one place.

sbj.agr cond neg it/incpt num impf redup- Verb adv tr

  • bj

i-, … mo- sopo- m̋e-/pete- r-/tol- l(o)- =i =a/NP

Table 1: The verbal complex in Mavea (Guérin, 2011).

  • In contrast, Saliba-Logea only uses optional particles to

express TAM-related meanings.

6 / 23

slide-7
SLIDE 7

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Expressing TAMP

  • TAM-related meanings are often expressed obligatorily within

the verbal complex, sometimes in more than one place.

sbj.agr cond neg it/incpt num impf redup- Verb adv tr

  • bj

i-, … mo- sopo- m̋e-/pete- r-/tol- l(o)- =i =a/NP

Table 1: The verbal complex in Mavea (Guérin, 2011).

  • In contrast, Saliba-Logea only uses optional particles to

express TAM-related meanings.

6 / 23

slide-8
SLIDE 8

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Data

7 / 23

slide-9
SLIDE 9

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Corpora

  • Corpora of the following languages were considered in this

study: Daakaka, Dalkalaen, Mavea, Nafsan, and Saliba-Logea.

  • In comparison to previous approaches, we did not identify a

specifjc target set of expressions to label (e.g., modal auxiliaries and adverbs).

8 / 23

slide-10
SLIDE 10

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Sub-Corpus

  • Prioritizing of a comparable sub-corpus (26 texts).
  • Descriptions of wild-life behaviour, tales and fables about

miraculous events including mysterious fjgures and animals native to the region.

9 / 23

slide-11
SLIDE 11

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Overview

Total Tagged Language #Texts #Tok. #Texts #Clauses Daakaka 119 68k 5 141 Dalkalaen 114 34k 6 658 Mavea 61 45k 3 634 Nafsan 110 65k 6 363 Saliba-Logea 214 150k* 6 157 Total 618 362k 26 1953

Table 2: Corpora included in this study; Tok: tokens; tag.: tagged; *of the 150k tokens in this corpus, about 70k are fully annotated.

10 / 23

slide-12
SLIDE 12

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Method

11 / 23

slide-13
SLIDE 13

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Previous Approaches to Tagging Modality

  • Difgerentiation between modal fmavours such as deontic and

epistemic and modal forces such as necessity and possibility.

  • These distinctions are notoriously diffjcult to tag (Rubinstein

et al., 2013).

12 / 23

slide-14
SLIDE 14

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Our Tag Set

Category Name Tags Clause type clause assertion, question, directive; embedded: proposition, conditional, e.question, temporal, adverbial, attributive Temporal domain time past, future, present Modal domain mood factual, counterfactual, possible Aspectual domain event bounded, ongoing, repeated, stative Polarity polarity positive, negative

Table 3: Tag set of the MelaTAMP project, see https://wikis.hu-berlin.de/melatamp/Main_page.

13 / 23

slide-15
SLIDE 15

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Branching-times Framework

Figure 2: The three domains of the factual (solid line), the counterfactual (dotted lines), and the possible future (dashed lines). Vertically aligned indices are here taken to be simultaneous (von Prince, 2019).

14 / 23

slide-16
SLIDE 16

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Example: factual

(1) mwe REAL liye take an 3SG.POSS bosi copra.chisel “He took his copra chisel.” (Daakaka)

  • clause: assertion
  • time: past
  • mood: factual
  • event: bounded
  • polarity: positive

15 / 23

slide-17
SLIDE 17

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Example: counterfactual

(2) ru=mroki 3PL.RS=think [na COMP ruk=fan 3PL.IR=go sol get tete some mane money em̃rom inside st]o. shop “they thought [someone had taken money from inside the shop].” (Nafsan: 030.048)

  • clause: proposition
  • time: past
  • mood: counterfactual
  • event: bounded
  • polarity: positive

16 / 23

slide-18
SLIDE 18

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Example: possible

(3) ka MOD na-p 1SG-POT pwer-pwer REDUP-stay yen in

  • r

bush “I will live in the bush.” (Daakaka: 1348)

  • clause: assertion
  • time: future
  • mood: possible
  • event: stative
  • polarity: positive

17 / 23

slide-19
SLIDE 19

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Results of Inter-Annotator Agreement

18 / 23

slide-20
SLIDE 20

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Results in each Category

Figure 3: Percentages of total inter-annotator consistencies (orange) and inconsistencies (grey) in each TAMP category of the tag set.

19 / 23

slide-21
SLIDE 21

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Inter-Annotator Agreement Score for each Category

  • Polarity: α1 = 0.91
  • Mood: α = 0.86
  • Clause: α = 0.85
  • Time: α = 0.85
  • Event: α = 0.79

1Krippendorfg’s alpha coeffjcient (Krippendorfg, 1980). 20 / 23

slide-22
SLIDE 22

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Results in the Mood Category

  • The high α score in this category indicates that our three-way

distinction (factual, counterfactual, possible) seems to be effjcient.

21 / 23

slide-23
SLIDE 23

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Conclusion

22 / 23

slide-24
SLIDE 24

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Conclusion

  • The overall tag set that we used to annotate the TAM

categories exhibits a high percentage of inter-annotator consistency throughout difgerent categories.

  • Our modal tag set has been proven useful for our purposes.
  • Depending on the languages and the goals of tagging

modality, our tag set may be an interesting alternative to

  • ther models.

Thank you!

23 / 23

slide-25
SLIDE 25

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Conclusion

  • The overall tag set that we used to annotate the TAM

categories exhibits a high percentage of inter-annotator consistency throughout difgerent categories.

  • Our modal tag set has been proven useful for our purposes.
  • Depending on the languages and the goals of tagging

modality, our tag set may be an interesting alternative to

  • ther models.

Thank you!

23 / 23

slide-26
SLIDE 26

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Conclusion

  • The overall tag set that we used to annotate the TAM

categories exhibits a high percentage of inter-annotator consistency throughout difgerent categories.

  • Our modal tag set has been proven useful for our purposes.
  • Depending on the languages and the goals of tagging

modality, our tag set may be an interesting alternative to

  • ther models.

Thank you!

23 / 23

slide-27
SLIDE 27

MelaTAMP project Data Method Inter-annotator agreement Conclusion

Conclusion

  • The overall tag set that we used to annotate the TAM

categories exhibits a high percentage of inter-annotator consistency throughout difgerent categories.

  • Our modal tag set has been proven useful for our purposes.
  • Depending on the languages and the goals of tagging

modality, our tag set may be an interesting alternative to

  • ther models.

Thank you!

23 / 23

slide-28
SLIDE 28

Appendix

References

Carletta, Jean, 1996. Assessing agreement on classifjcation tasks: the alpha statistic. Computational linguistics, 22(2):249–254. Druskat, Stephan, 2018. ToolboxTextModules (Version 1.1.0). Franjieh, Michael, 2013. A documentation of North Ambrym, a language of Vanuatu. London: SOAS, ELAR. Guérin, Valérie, 2006. Documentation of Mavea. London: SOAS, ELAR. Guérin, Valérie, 2011. A grammar of Mavea: An Oceanic language of Vanuatu. Honolulu: University of Hawai’i Press. Klecha, Peter, 2011. Optional and obligatory modal subordination. In Proceedings of Sinn und Bedeutung, volume 15. Krause, Thomas and Amir Zeldes, 2016. ANNIS3: A new architecture for generic corpus query and

  • visualization. Digital Scholarship in the Humanities, 31(1):118–139.

Krifka, Manfred, 2013. Daakie, The Language Archive. Nijmegen: MPI for Psycholinguistics. Krippendorfg, Klaus. 1980. Content analysis: An introduction to its methodology. Sage publications. Margetts, Anna, Andrew Margetts, and Carmen Dawuda, 2017. Saliba/Logea. The Language Archive. MelaTAMP, 2017. Primary data repository – MelaTAMP. https://wikis.hu-berlin.de/melatamp. von Prince, Kilu 2019. Counterfactuality and Past. Linguistics and Philosophy. von Prince, Kilu, 2013a. Daakaka, The Language Archive. Nijmegen: MPI for Psycholinguistics. von Prince, Kilu, 2013b. Dalkalaen, The Language Archive. Nijmegen: MPI for Psycholinguistics. Rubinstein, Aynat et al. 2013. Toward fjne-grained annotation of modality in text. In Proceedings of IWCS 10, WAMM, Potsdam. Thieberger, Nick, 2006. Dictionary and texts in South Efate. Digital collection managed by PARADISEC. 1 / 1