rusyn as a language between state borders a statistical
play

Rusyn as a language between state borders a statistical approach to - PowerPoint PPT Presentation

Rusyn as a language between state borders a statistical approach to variation (for small sample sizes) Albert-Ludwig University of Freiburg, Germany Department of Slavonic Studies Prof. Dr. Achim Rabus & M. Zaidan Lahjouji


  1. Rusyn as a language between state borders – a statistical approach to variation 
 (for small sample sizes) 
 Albert-Ludwig University of Freiburg, Germany Department of Slavonic Studies Prof. Dr. Achim Rabus & M. Zaidan Lahjouji Project: Russinisch als eine Staatsgrenzen überschreitende Minderheitensprache: Quantitative Perspektiven ( Rusyn as a state border transgressing minority language: quantitive perspectives)

  2. Topics: • The Rusyn project • Interests and Aims • Border Effects • The Corpus of Spoken Rusyn • Quantitative approaches to spoken data • Pitfalls, Possible Solutions and Limitations • Example Dataset & Analysis

  3. The Rusyn Project • Interests and aims: • Status / Condition of the Carpatho Rusyn language • Documentation of Spoken Carpatho-Rusyn (Corpus) • Language Contact with Several „Roofing Languages“ (Slavic and Non-Slavic) • Contact induced changes? • Language Perception • Border Effects (Woolhiser 2005) • Quantitative / Statistical Approaches to Spoken Language Data

  4. The Rusyn Project • Interests and aims: • Status / Condition of the Carpatho Rusyn language • Documentation of the Rusyn Varieties (Corpus) • Language Contact with Several „Roofing Languages“ (Slavic and Non-Slavic) • Contact induced changes? • Language Perception • Border Effects (Woolhiser 2005) • Quantitative / Statistical Approaches to Spoken Language Data (R-Studio)

  5. Magocsi, P. R.: Národ znikadiaľ : ilustrovaná história karpatských Rusínov. Prešov : Rusín a Ľudové noviny, 2007, p. 34.

  6. Rusyns as National Minority

  7. Sociolinguistic Factors • Status? Age? Sex? Education? Mobility? Religion? Which factors determine how people speak?

  8. Border Effects as Hypothesis Border effects (Woolhiser 2005) are detectable within Rusyn vernacular Poland Ukraine Slovakia Hungary Romania

  9. Example: A(j) Conjugation Pugh, S.M. (2009). The Rusyn language: A grammar of the literary standard of Slovakia with reference to Lemko and Subcarpathian Rusyn . München. P. 117.

  10. Corpus of Spoken Rusyn Corpus of Spoken Rusyn CQP – query search: [word=‚ма|має|мат*|зна|знає|знат*|позна|познає|познат*'%cd]

  11. Example Variation within conjugation types AJ and A(j) (Pugh 2009: 116-20) • Our dataset contains: • Threefold variation: мати 3 𝑄𝑡 . 𝑇𝑕 . 𝑄𝑠𝑓𝑡 . ( ма , має , мат(ь) ) and ( (по-)зна , (по-)знає, (по-)знат(ь)). ( по− ) знати 3 𝑄𝑡 . 𝑇𝑕 . 𝑄𝑠𝑓𝑡 . • Several utterances by the same speakers. Biased + violation of assumptions! Bad! • Context • Metadata of speakers

  12. Coefficients in Multinomial Logistic Regression Model ln( 𝑄 ( 𝑤𝑓𝑠𝑐𝐺𝑝𝑠𝑛 = 𝑛𝑏 ) 𝑄 ( 𝑤𝑓𝑠𝑐𝐺𝑝𝑠𝑛 = 𝑛𝑏𝑓 ) = 𝑐 10 + 𝑐 11 ( 𝑤𝑏𝑠𝑗𝑓𝑢𝑧 = 𝑇𝑚𝑝 ) + 𝑐 12 ( 𝑤𝑏𝑠𝑗𝑓𝑢𝑧 = 𝑀𝑓𝑛 ) + 𝑐 13 𝐵𝑕𝑓 + 𝑐 14 ( 𝑡𝑓𝑦 = 𝑛 ) ln( 𝑄 ( 𝑤𝑓𝑠𝑐𝐺𝑝𝑠𝑛 = 𝑛𝑏𝑢 ) 𝑄 ( 𝑤𝑓𝑠𝑐𝐺𝑝𝑠𝑛 = 𝑛𝑏𝑓 ) = 𝑐 20 + 𝑐 21 ( 𝑤𝑏𝑠𝑗𝑓𝑢𝑧 = 𝑇𝑚𝑝 ) + 𝑐 22 ( 𝑤𝑏𝑠𝑗𝑓𝑢𝑧 = 𝑀𝑓𝑛 ) + 𝑐 23 𝐵𝑕𝑓 + 𝑐 24 ( 𝑡𝑓𝑦 = 𝑛 )

  13. Problems X Data set is rather small X Biased data set X Dependent variable(verb_form) is categorical X Threefold variation X Independent variables are predominantly categorical X Violation of assumptions (Independence) X We have collected precious data, so we don’t want to give up

  14. Bootstrapping Regression Regression Sample5 Sample1 Robust … Sample2 Sample500 estimation Sample0 Sample3 Sample6

  15. Conclusion • Bootstrapping provides us with a robust estimation of the values of interest, even when assumptions aren’t met or the data set was small and or biased. • Even after Bootstrapping, we can still see clear tendencies: settlement area (Variety) seem to be the most significant factor.

  16. Conclusion • Statistical methods are useful for several aspects of our research. • Our possibilities are rather limited. • Assumptions are often violated when applying state of the art methods. • Nevertheless, robust methods help us to get more unbiased estimations. • Robust estimations should always be reported.

  17. Файно Вам дякуєме за Вашу увагу! Thank you very much for your attention! Contact: zaidan.lahjouji@slavistik.uni-freiburg.de achim.rabus@slavistik.uni-freiburg.de www.russinisch.de

  18. Literature • Christ, Oliver (1994). A modular and flexible architecture for an integrated corpus query system. In: Proceedings of COMPLEX’94: 3rd Conference on Computational Lexicography and Text Research, 23–32. • Evert, S. and Hardie, A. (2011). Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium. In: Proceedings of the Corpus Linguistics 2011 Conference, Birmingham, UK. University of Birmingham. • Hinneburg, Alexander, Heikki Mannila, Samuli Kaislaniemi, TerŠu Nevalainen & Helena Raumolin-Brunberg (2007). “How to handle small samples: Bootstrap and Bayesian methods in the analysis of linguis‹c change”. Literary and Linguis‹c Compu‹ng 22(2): 137–150. • Mueller,T; Schmid,H & Schütze, H. (2013). Efficient higher-order CRFs for morphological tagging. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 322–332, Seattle, Washington, USA, October. Association for Computational Linguistics. • Rabus, A. & A. Šymon (2015): Na novŷch putjach isslidovanja rusyns’kŷch dialektu. Korpus rozhovornoho rusyns’koho jazŷka. In: Koporová, Kvetoslava (Hrsg.): Rusyn’skŷj literaturnŷj jazŷk na Slovakiji. 20 rokiv kodifikaciji. Prešov, 40-54. • Rabus, Achim (2015): Current Developments in Carpatho-Rusyn Speech - Preliminary Observations. In: Krafcik P. & V. Padjak (eds.): Juvilejnyj zbirnyk na čest' profesora Pavla-Roberta Magočija. Užhorod, 489-496. • Rabus, A. & Scherrer, Y. (2017): Lexicon Induction for Spoken Rusyn - Challenges and Results. In: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 27-32.

  19. Literature ● Rabus, A., Savić, S., Waldenfels, R. v. (2012). Towards an electronic corpus of the Velikie Minei Čet'i. In: Rediscovery: Bulgarian Codex Suprasliensis of the 10 th century . Sofia: Iztok Zapad. ● Scherrer, Y & Rabus, A (2017): Multi-source morphosyntactic tagging for spoken Rusyn. In: Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), 84 – 92. ● Schimon, A. & A. Rabus (2016): Wahrnehmungsdialektologische Untersuchungen zum Russinischen in Zakarpattja am Beispiel der Region Chust. In: Zeitschrift für Slawistik 61(3), 401-432. ● Šymon, A. & A. Rabus (2016): Ysslidovanja rusyns'koho jazŷka yz pohljada vospryymatel'noji dialektologiji. In: Dynamické procesy v súčasnej slavistike, S. 71-88. (Nachdruck in Rusyn 5/2016 und 6/2016) ● v. Waldenfels, R.; Woźniak, M. (2017). SpoCo – a simple and adaptable web interface for dialect corpora. In: Journal for Language Technology and Computational Linguistics, 31(1), 145 – 160. ● v. Waldenfels,R.; Daniel, M., Dobrushina, N. (2014): Why Standard Orthography? Building the Ustya River Basin Corpus, an online corpus of a Russian dialect. Komp'juternaja lingvistika i intellektual'nye technologii: Po materialam ežegodnoj Meždunarodnoj konferencii «Dialog» (Bekasovo, 4 — 8 ijunja 2014 g.). Vyp. 13 (20). — M.: Izd-vo RGGU, 2014. ● Woolhiser, C. (2005). Political borders and dialect divergence/convergence in Europe. In Peter Auer, Frans Hinskens, and Paul Kerswill, editors, Dialect change, 236–262. Cambridge Univ. Press, Cambridge

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend