a sentence is a sentence is a sentence
play

A Sentence is a Sentence is a Sentence? Zarah Weiss Introduction - PowerPoint PPT Presentation

Segmenting Oral and Historical Language Data A Sentence is a Sentence is a Sentence? Zarah Weiss Introduction Parallels and Differences between the Defining Sentences Written Language Bias Segmentation of Oral and Historical Defining


  1. Segmenting Oral Requirements of Sentential Segmentation Units and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias By Auer (2010) Defining Sentential Units Historical Continuity 1. Exhaustivity: Capture all the linguistic material The LangBank T-Unit 2. Atomism: Do not include other segments of their type Background Annotation Principles 3. Disctreteness: Do not allow overlapping segments Basic Definition Sentential Properties Handling Ambiguities 4. Coherence on Linguistic Level: Use linguistic descriptions Inter-Rater Reliability Segmenting Speech Additional criteria The SegCor Project Maximal Syntactic Unit ◮ Be reliably applicable across and within annotators Generalizable Solutions Open Issues ◮ Avoid written language bias and comparative fallacy Conclusion References 11 / 67

  2. Segmenting Oral Define Sentential Units Beyond A Written Norm and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity ◮ Define sentential units for non-standard language The LangBank T-Unit ◮ Here for Early New High German (ENHG; 1482–1652) Background Annotation Principles Basic Definition ◮ Bridge from ENHG to spoken language data Sentential Properties Handling Ambiguities → ENHG shares linguistic properties with spoken German Inter-Rater Reliability Segmenting → Historical continuity of non-normative syntactic patterns Speech The SegCor Project (Hennig 2007; Sandig 1973) Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 12 / 67

  3. Segmenting Oral Dependent Clause with Verb-Last (VL) and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity genannt Absinthium welcher Name bis auf den (4) / The LangBank T-Unit called which name until the Absinthium / Background Annotation Principles heutigen Tag in den Apotheken geblieben ist Basic Definition Sentential Properties present day in the pharmacies remained is Handling Ambiguities Inter-Rater Reliability ‘called Absinthium, which to the present day is the Segmenting Speech name used used in pharmacies.’ The SegCor Project Maximal Syntactic Unit Generalizable Solutions Fuchs (1543) Open Issues Conclusion References 13 / 67

  4. Segmenting Oral weil -2 in Spoken German and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank T-Unit (5) a. weil ich Sie hierauf Aufmerksam machen will Background Annotation Principles Basic Definition b. weil ich will Sie hierauf Aufmerksam machen Sentential Properties Handling Ambiguities c. ‘because I want to call your attention to this’ Inter-Rater Reliability Segmenting Speech STUTTGART21 (2010) The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 14 / 67

  5. Segmenting Oral weil -2 in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank T-Unit M. nennt sie wann sie ist giftig Background (6) Rapiens Vitam / Annotation Principles M. calls her Rapiens Vitam because she is poisonous Basic Definition / Sentential Properties ‘M. calls it Rapiens Vitam because it is poisonous.’ Handling Ambiguities Inter-Rater Reliability Segmenting Speech Tallat (1532) The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 15 / 67

  6. Segmenting Oral The LangBank Project and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity ◮ From 09/2015 to 09/2018 The LangBank T-Unit ◮ Create annotated corpora for teaching and learning Background Annotation Principles ◮ Languages: Classical Latin and ENHG Basic Definition Sentential Properties Handling Ambiguities ◮ Complement teaching with corpus-based work Inter-Rater Reliability Segmenting ◮ Investigate applicability of NLP for non-standard data Speech The SegCor Project ◮ ENHG Corpus available online 1 Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 1 https://korpling.org/annis3/ 16 / 67

  7. Segmenting Oral Why Early New High German? and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity ◮ Part of the German Arts and Literature curriculum The LangBank T-Unit ◮ Shares morpho-syntactic properties with contemporary Background Annotation Principles Basic Definition German, e.g. verb asymmetry Sentential Properties Handling Ambiguities ◮ Interesting diachronic differences wrt. linguistic properties Inter-Rater Reliability Segmenting → Written ENHG register is forming Speech The SegCor Project ◮ Highly variable non-standard language variety Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 17 / 67

  8. Segmenting Oral The Ridges Corpus and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units ◮ Register in Diachronic German Science (RIDGES) Historical Continuity The LangBank ◮ Serves as empirical basis for project T-Unit Background ◮ Corpus of herbal texts (1482 to 1914) Annotation Principles Basic Definition (Lüdeling et al. 2018) Sentential Properties Handling Ambiguities ◮ LangBank time period: 1482 to 1652 Inter-Rater Reliability Segmenting ◮ LangBank core: 14 books (80,095 dipl tokens) Speech The SegCor Project Maximal Syntactic Unit → Sentence ending punctuation emerge Generalizable Solutions Open Issues (Hartweg & Wegera 2005) Conclusion References 18 / 67

  9. Segmenting Oral Punctuation Marks in ENHG (Bock, 1539) and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank T-Unit Background Annotation Principles Basic Definition Sentential Properties Handling Ambiguities Inter-Rater Reliability Segmenting Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 19 / 67

  10. Segmenting Oral Punctuation Marks in ENHG (Bock, 1539) and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank T-Unit Background Annotation Principles Basic Definition Sentential Properties Handling Ambiguities Inter-Rater Reliability Segmenting Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Virgule (sentence final) Open Issues Conclusion References 19 / 67

  11. Segmenting Oral Punctuation Marks in ENHG (Bock, 1539) and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank T-Unit Background Annotation Principles Basic Definition Sentential Properties Handling Ambiguities Inter-Rater Reliability Segmenting Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Punctuation mark (sentence final) Open Issues Conclusion References 19 / 67

  12. Segmenting Oral Punctuation Marks in ENHG (cont.) and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias das Wasser [...] braucht [...] Hieronymus von (7) Defining Sentential Units Historical Continuity the water [...] needs [...] Hieronymus of The LangBank T-Unit Braunschweig für das Abnehmen . Für den Background Annotation Principles Braunschweig for the loosing weight . For the Basic Definition Sentential Properties Hauptschwindel . Denen so Blut speien . Handling Ambiguities Inter-Rater Reliability dizziness . those who blood vomit Segmenting ‘Hieronimus von Braunschweig uses this water Speech The SegCor Project against phthisis, dizziness, and to heal those, who Maximal Syntactic Unit Generalizable Solutions vomit blood’ Open Issues Conclusion Megenberg (1482) References 20 / 67

  13. Segmenting Oral Sentential Units in LangBank and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Approach Defining Sentential Units Historical Continuity ◮ No graphematic definition possible The LangBank T-Unit ◮ Syntactic definition required Background Annotation Principles Basic Definition Sentential Properties Purpose Handling Ambiguities Inter-Rater Reliability 1. Annotation of syntactic phenomena Segmenting Speech 2. Basis for automatic and manual linguistic analysis The SegCor Project Maximal Syntactic Unit 3. Unit for corpus representation (visualization, query) Generalizable Solutions Open Issues 4. (Investigation of diachronic changes from 1482 to 1914) Conclusion References 21 / 67

  14. Segmenting Oral Assumptions from Contemporary German and Historical Language Data Zarah Weiss Introduction Defining Sentences Linguistic Units Written Language Bias Defining Sentential Units ◮ Words and Parts-of-Speech Historical Continuity The LangBank ◮ Phrases and constituents T-Unit Background ◮ Clauses Annotation Principles Basic Definition Sentential Properties → Sentential units based on original t-unit (Hunt 1965) Handling Ambiguities Inter-Rater Reliability → ENHG t-unit, henceforth TU Segmenting Speech The SegCor Project Linguistic theories Maximal Syntactic Unit Generalizable Solutions Open Issues ◮ X-bar theory Conclusion ◮ Topological field model References 22 / 67

  15. Segmenting Oral X-bar Theory and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units XP Historical Continuity The LangBank T-Unit Background spec X’ Annotation Principles Basic Definition Sentential Properties Handling Ambiguities X’ adj Inter-Rater Reliability Segmenting Speech X comp The SegCor Project Maximal Syntactic Unit Generalizable Solutions head Open Issues Conclusion References 23 / 67

  16. Segmenting Oral X-bar Theory (cont.) and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias VP Defining Sentential Units Historical Continuity The LangBank T-Unit Background NP V’ Annotation Principles Basic Definition Sentential Properties Handling Ambiguities Julia V’ PP Inter-Rater Reliability Segmenting Speech V NP The SegCor Project at noon Maximal Syntactic Unit Generalizable Solutions Open Issues meets Franca Conclusion References 24 / 67

  17. Segmenting Oral X-bar Theory (cont.) and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias VP Defining Sentential Units Historical Continuity The LangBank T-Unit Background NP V’ Annotation Principles Basic Definition Sentential Properties Handling Ambiguities Julia V’ PP Inter-Rater Reliability Segmenting Speech V NP The SegCor Project at noon Maximal Syntactic Unit Generalizable Solutions Open Issues meets Franca Conclusion References 24 / 67

  18. Segmenting Oral Topological Field Model and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank ◮ Non-hierarchical model of German clause structure T-Unit Background ◮ Helps identify word order and constituency structure Annotation Principles Basic Definition Sentential Properties → Arranged around German sentence bracket Handling Ambiguities Inter-Rater Reliability → Model core applicable ENHG, but with limitations Segmenting Speech ◮ Wöllstein (2014); Pittner & Bermann (2004); Drach (1937) The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 25 / 67

  19. Segmenting Oral Topological Field Model (cont.) and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity N Pre LSB Middle Field RSB Post The LangBank T-Unit S 0 Ich habe heute früher gegessen — Background Annotation Principles You have today earlier eaten Basic Definition Sentential Properties Handling Ambiguities Inter-Rater Reliability Segmenting Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 26 / 67

  20. Segmenting Oral Topological Field Model (cont.) and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity N Pre LSB Middle Field RSB Post The LangBank T-Unit S 0 Ich habe heute früher gegessen — Background Annotation Principles You have today earlier eaten Basic Definition Sentential Properties S 2 — Ist das ein Satz? — — Handling Ambiguities Inter-Rater Reliability Is this a sentence? Segmenting Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 26 / 67

  21. Segmenting Oral Topological Field Model (cont.) and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity N Pre LSB Middle Field RSB Post The LangBank T-Unit S 0 Ich habe heute früher gegessen — Background Annotation Principles You have today earlier eaten Basic Definition Sentential Properties S 2 — Ist das ein Satz? — — Handling Ambiguities Inter-Rater Reliability Is this a sentence? Segmenting Speech S 3 — damit ich mir Zeit lassen kann — The SegCor Project Maximal Syntactic Unit so.that I me time leave can Generalizable Solutions Open Issues Conclusion References 26 / 67

  22. Segmenting Oral Topological Field Model (cont.) and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity N Pre LSB Middle Field RSB Post The LangBank T-Unit S 0 Ich habe heute früher gegessen S 3 Background Annotation Principles You have today earlier eaten Basic Definition Sentential Properties S 2 — Ist das ein Satz? — — Handling Ambiguities Inter-Rater Reliability Is this a sentence? Segmenting Speech S 3 — damit ich mir Zeit lassen kann — The SegCor Project Maximal Syntactic Unit so.that I me time leave can Generalizable Solutions Open Issues Conclusion References 26 / 67

  23. Segmenting Oral Starting with a T-unit and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank T-Unit A t-unit is the “shortest grammatically allowable Background sentence into which (writing can be split) or Annotation Principles Basic Definition minimally terminable unit” (Hunt 1965) Sentential Properties Handling Ambiguities Inter-Rater Reliability Segmenting Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 27 / 67

  24. Segmenting Oral Starting with a T-unit and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank T-Unit A t-unit is the “shortest grammatically allowable Background sentence into which (writing can be split) or Annotation Principles Basic Definition minimally terminable unit” (Hunt 1965) Sentential Properties Handling Ambiguities Inter-Rater Reliability Segmenting → Circular if used as a sentence definition Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 27 / 67

  25. Segmenting Oral Adjusting the Traditional T-unit and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias A TU consists of the head of a phrase and all of its Defining Sentential Units Historical Continuity arguments and adjuncts and nothing else. The LangBank T-Unit (Weiss & Schnelle 2016) Background Annotation Principles Basic Definition Sentential Properties Handling Ambiguities Inter-Rater Reliability Segmenting Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 28 / 67

  26. Segmenting Oral Adjusting the Traditional T-unit and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias A TU consists of the head of a phrase and all of its Defining Sentential Units Historical Continuity arguments and adjuncts and nothing else. The LangBank T-Unit (Weiss & Schnelle 2016) Background Annotation Principles Basic Definition ◮ Does not require a predefined notion of a sentence Sentential Properties Handling Ambiguities Inter-Rater Reliability ◮ Uses X-bar terminology to express “grammatically” and Segmenting “minimally terminable” Speech The SegCor Project ◮ On its own misses crucial sentence properties Maximal Syntactic Unit Generalizable Solutions Open Issues → Elaborated on in the remaining rules Conclusion References 28 / 67

  27. Segmenting Oral Atomism and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Property Historical Continuity The LangBank ◮ SU do not contain other SU T-Unit Background → they are atomic (Auer 2010) Annotation Principles Basic Definition Sentential Properties Handling Ambiguities Rule Inter-Rater Reliability Segmenting ◮ TU do not govern other TU Speech The SegCor Project ◮ The head of a TU may not be the argument or the Maximal Syntactic Unit Generalizable Solutions adjunct of another head itself Open Issues Conclusion References 29 / 67

  28. Segmenting Oral Atomism in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias dass auch hoch notwendig zu wissen [...] welche in Defining Sentential Units (8) / Historical Continuity that also highly necessary to know [...] which in / The LangBank T-Unit ihrem Leben das ist wann sie noch grün und / / Background Annotation Principles their life that is when they still green and / / Basic Definition Sentential Properties saftig sind ihre Tugend bald erzeigen Handling Ambiguities Inter-Rater Reliability juicy are their virtue soon show Segmenting ‘that it is also necessary to know [...] which ones show Speech The SegCor Project their virtue while they are alive, i.e. while they are Maximal Syntactic Unit Generalizable Solutions green and juicy’ Open Issues Conclusion von Bodenstein (1557) References 30 / 67

  29. Segmenting Oral Atomism in Speech and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank A If the whole thing is drawn the same size then (.) is (.) T-Unit you go on you go on I made the start Background Annotation Principles Basic Definition B Then you can see better Sentential Properties Handling Ambiguities A [...] Inter-Rater Reliability Segmenting B What is bigger and what is smaller Speech The SegCor Project Maximal Syntactic Unit 7th graders in math class (Prediger & Wessel 2018) Generalizable Solutions Open Issues Conclusion References 31 / 67

  30. Segmenting Oral Discreteness and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity Property The LangBank T-Unit ◮ SU do not overlap with other SU Background Annotation Principles → they are discrete (Auer 2010) Basic Definition Sentential Properties Handling Ambiguities Inter-Rater Reliability Rule Segmenting Speech ◮ TU may not overlap The SegCor Project Maximal Syntactic Unit ◮ No phrase is part of more than one ENHG-TU. Generalizable Solutions Open Issues Conclusion References 32 / 67

  31. Segmenting Oral Discreteness in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank T-Unit das andere [...] hat Blättlein sind ein wenig rauher Background (9) / Annotation Principles the other [...] has small.leaves are a bit rougher Basic Definition / Sentential Properties ‘The other one has leaves are a bit more rough.’ Handling Ambiguities Inter-Rater Reliability Segmenting Speech Fuchs (1543) The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 33 / 67

  32. Segmenting Oral Discreteness in Speech and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank T-Unit (10) and there he is on the roof was he Background Annotation Principles Basic Definition Sentential Properties FOLK corpus 2 Handling Ambiguities Inter-Rater Reliability Segmenting Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 2 https://dgd.ids-mannheim.de/DGD2Web/ExternalAccessServlet? command=displayTranscript&id=FOLK_E_00084_SE_01_T_01_DF_01& cID=c28&wID=c28 34 / 67

  33. Segmenting Oral Exhaustivity and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity Property The LangBank T-Unit ◮ SU exhaustively partition a text Background Annotation Principles → they are exhaustive (Auer 2010) Basic Definition Sentential Properties Handling Ambiguities Inter-Rater Reliability Rule Segmenting Speech ◮ A text has to be partitioned exhaustively into TU The SegCor Project Maximal Syntactic Unit ◮ No material is left over Generalizable Solutions Open Issues Conclusion References 35 / 67

  34. Segmenting Oral Exhaustivity in ENHG and Historical Language Data Zarah Weiss Da ich aber das Widerspiel erfahren habe (11) Introduction Because I though the counter-play experienced have Defining Sentences Written Language Bias dass auf eine Zeit da der Samen unter dem [...] Defining Sentential Units / / Historical Continuity that at a time that the seeds under the [...] / / The LangBank T-Unit Grund hervorgekommen ist der über den / Background Annotation Principles carried ground emerge is which over / Basic Definition Sentential Properties Winter grün verblieben ist und nachwärts / Handling Ambiguities Inter-Rater Reliability the winter green remained is and / Segmenting im Sommer sehr groß geworden ist Speech The SegCor Project afterwards in the summer very big become is Maximal Syntactic Unit Generalizable Solutions ‘but because I experienced the following counter-play: Open Issues Conclusion that after the seeds emerged from the earth [...] they References remain green throughout winter and then grow up in summer’ Fuchs (1543) 36 / 67

  35. Segmenting Oral Exhaustivity in Speech and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity A On the strip board [...] you can see it better because # The LangBank T-Unit Background B # Because of Mehmet Annotation Principles Basic Definition A ((laughs)) # because it is in a row hence in a section # Sentential Properties Handling Ambiguities B # a board # Inter-Rater Reliability Segmenting A # or whatever you call it Speech The SegCor Project Maximal Syntactic Unit 7th graders in math class (Prediger & Wessel 2018) Generalizable Solutions Open Issues Conclusion References 37 / 67

  36. Segmenting Oral (Dis)continuity and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Question Historical Continuity The LangBank ◮ Is a sentential unit continuous or discontinuous? T-Unit Background ◮ Depends on the purpose of the segmentation Annotation Principles Basic Definition ◮ Here focus on syntactic analysis Sentential Properties Handling Ambiguities Inter-Rater Reliability Segmenting Rule Speech The SegCor Project ◮ TU are continuous strings of tokens Maximal Syntactic Unit Generalizable Solutions ◮ Exception: meta text artificially inserted by OCR Open Issues Conclusion References 38 / 67

  37. Segmenting Oral (Dis)continuity in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias dass auch hoch notwendig zu wissen ... welche in Defining Sentential Units (12) / Historical Continuity that also highly necessary to know ... which in / The LangBank T-Unit ihrem Leben das ist wann sie noch grün und / / Background Annotation Principles their life that is when they still green and / / Basic Definition Sentential Properties saftig sind ihre Tugend bald erzeigen Handling Ambiguities Inter-Rater Reliability juicy are their virtue soon show Segmenting ‘that it is also necessary to know ... which ones show Speech The SegCor Project their virtue while they are alive, i.e. while they are Maximal Syntactic Unit Generalizable Solutions green and juicy’ Open Issues Conclusion von Bodenstein (1557) References 39 / 67

  38. Segmenting Oral (Dis)continuity in Speech and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity A On the strip board [...] you can see it better because # The LangBank T-Unit Background B # Because of Mehmet Annotation Principles Basic Definition A ((laughs)) # because it is in a row hence in a section # Sentential Properties Handling Ambiguities B # a board # Inter-Rater Reliability Segmenting A # or whatever you call it Speech The SegCor Project Maximal Syntactic Unit 7th graders in math class (Prediger & Wessel 2018) Generalizable Solutions Open Issues Conclusion References 40 / 67

  39. Segmenting Oral Punctuation and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity Question The LangBank T-Unit ◮ Where is punctuation located in a SU? Background Annotation Principles ◮ In practice, it is often SU-final Basic Definition Sentential Properties → Does not play a role for speech Handling Ambiguities Inter-Rater Reliability Segmenting Speech Rule The SegCor Project Maximal Syntactic Unit ◮ Punctuation is located at a TU’s outermost right periphery Generalizable Solutions Open Issues Conclusion References 41 / 67

  40. Segmenting Oral Punctuation in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank Das heimisch Eppich ist wohlschmeckend aber es ist (13) / T-Unit Background the local ivy is tasty but it is / Annotation Principles Basic Definition dem Haupt böse Sentential Properties Handling Ambiguities the jead evil Inter-Rater Reliability Segmenting ‘The local ivy is tasty. But it is bad for the head.’ Speech The SegCor Project Maximal Syntactic Unit von Megenberg (1482) Generalizable Solutions Open Issues Conclusion References 42 / 67

  41. Segmenting Oral Punctuation in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank Das heimisch Eppich ist wohlschmeckend aber es ist (14) / T-Unit Background the local ivy is tasty but it is / Annotation Principles Basic Definition dem Haupt böse Sentential Properties Handling Ambiguities the jead evil Inter-Rater Reliability Segmenting ‘The local ivy is tasty. But it is bad for the head.’ Speech The SegCor Project Maximal Syntactic Unit von Megenberg (1482) Generalizable Solutions Open Issues Conclusion References 42 / 67

  42. Segmenting Oral Handling Attachment Ambiguities and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank T-Unit Background ◮ Attachment ambiguities of phrases and clauses Annotation Principles Basic Definition → Common in non-standard language, in particular ENHG Sentential Properties Handling Ambiguities Inter-Rater Reliability → Challenges the uniqueness and discreteness of TU Segmenting Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 43 / 67

  43. Segmenting Oral TU Attachment Ambiguity and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity Ambiguity The LangBank T-Unit ◮ Is a unit a subordinate clause or a TU on its own? Background Annotation Principles Basic Definition Rule Sentential Properties Handling Ambiguities ◮ Minimize the maximal TU length in words Inter-Rater Reliability Segmenting Speech ◮ Two shorter TU are preferred over one longer TU The SegCor Project Maximal Syntactic Unit → Ambiguous cases are analyzed as TU Generalizable Solutions Open Issues Conclusion References 44 / 67

  44. Segmenting Oral TU Attachment Ambiguity in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity Wermut ist ein Kraut mit vielen Zinken und (15) The LangBank vermouth is a herb with many branches and T-Unit Background Ästen an welchem sind aschefarbene Blätter / Annotation Principles Basic Definition branches on which / these are ash colored leaves / Sentential Properties Handling Ambiguities ‘Vermouth is a herb with many branches ...’ Inter-Rater Reliability Segmenting A. ... on which there are ash colored leaves.’ Speech The SegCor Project B. ... On these there are ash colored leaves.’ Maximal Syntactic Unit Generalizable Solutions Open Issues Excerpt from Fuchs (1543) Conclusion References 45 / 67

  45. Segmenting Oral TU Attachment Ambiguity in Speech and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity Larry is grad fast ausm Fenster gefalln voll schlimm (16) The LangBank T-Unit Larry is just nearly out window fell really bad Background Annotation Principles A. ‘Larry just nearly fell out of the window really badly’ Basic Definition Sentential Properties B. ‘Larry just nearly fell out of the window. So bad!’ Handling Ambiguities Inter-Rater Reliability Segmenting FOLK corpus 3 Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 3 https://dgd.ids-mannheim.de/DGD2Web/ExternalAccessServlet? command=displayTranscript&id=FOLK_E_00084_SE_01_T_01_DF_01& cID=c15&wID=&textSize=200&contextSize=4 46 / 67

  46. Segmenting Oral Phrase Attachment Ambiguity and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Ambiguity Historical Continuity The LangBank ◮ To which TU does a phrase (or clause) belong? T-Unit Background Annotation Principles Rule Basic Definition Sentential Properties Handling Ambiguities ◮ Minimize the maximal TU length in words Inter-Rater Reliability Segmenting ◮ Two shorter TU are preferred over one longer TU Speech The SegCor Project → Attachment to the shorter TU Maximal Syntactic Unit Generalizable Solutions → Attachment to preceding TU for for equal TU length Open Issues Conclusion References 47 / 67

  47. Segmenting Oral Phrase Attachment Ambiguity in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias So bedeuten nun diese zwei Blätter das Gesetz in (17) / Defining Sentential Units so mean now these two leaves the law in Historical Continuity / The LangBank zwei Tafeln fest von Gott gegeben die anderen drei / / T-Unit Background two tablets firm by god given the other three / / Annotation Principles Basic Definition grünen Blättlein die drei Personen dir zeigen / Sentential Properties Handling Ambiguities green leaves which three people you show / Inter-Rater Reliability Segmenting ‘These two leaves symbolize the law in two tablets ... Speech A. that were given by god. The other three green leaves The SegCor Project Maximal Syntactic Unit B. The other three green leaves given by god Generalizable Solutions Open Issues ... show you three people’ Conclusion References Rosbachs (1588) 48 / 67

  48. Segmenting Oral Phrase Attachment Ambiguity in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias So bedeuten nun diese zwei Blätter das Gesetz in (18) / Defining Sentential Units so mean now these two leaves the law in Historical Continuity / The LangBank zwei Tafeln fest von Gott gegeben die anderen drei / / T-Unit Background two tablets firm by god given the other three / / Annotation Principles Basic Definition grünen Blättlein die drei Personen dir zeigen / Sentential Properties Handling Ambiguities green leaves which three people you show / Inter-Rater Reliability Segmenting ‘These two leaves symbolize the law in two tablets ... Speech A. that were given by god. The other three green leaves The SegCor Project Maximal Syntactic Unit B. The other three green leaves given by god Generalizable Solutions Open Issues ... show you three people’ Conclusion References Rosbachs (1588) 48 / 67

  49. Segmenting Oral Phrase Attachment Ambiguity in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias So bedeuten nun diese zwei Blätter das Gesetz in (19) / Defining Sentential Units so mean now these two leaves the law in Historical Continuity / The LangBank zwei Tafeln fest von Gott gegeben die anderen drei / / T-Unit Background two tablets firm by god given the other three / / Annotation Principles Basic Definition grünen Blättlein die drei Personen dir zeigen / Sentential Properties Handling Ambiguities green leaves which three people you show / Inter-Rater Reliability Segmenting ‘These two leaves symbolize the law in two tablets ... Speech A. that were given by god. The other three green leaves The SegCor Project Maximal Syntactic Unit B. The other three green leaves given by god Generalizable Solutions Open Issues ... show you three people’ Conclusion References Rosbachs (1588) 48 / 67

  50. Segmenting Oral Phrase Attachment Ambiguity in Speech and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias ihr könnt euch das mal gegenseitig [...] erklären was (20) Defining Sentential Units you can you this now each.other [...] explain what Historical Continuity The LangBank ihr da gemacht habt am besten ihr habt das ja T-Unit Background you there dine have the best you have this yes Annotation Principles Basic Definition alle richtig gemacht Sentential Properties Handling Ambiguities all correctly done Inter-Rater Reliability Segmenting A. ‘It would be best you explain to each other what Speech you did. All of you did everything correct anyway.’ The SegCor Project Maximal Syntactic Unit B. ‘Now explain to each other what you did. Ideally, all Generalizable Solutions Open Issues of you did everything correct anyway.’ Conclusion References Prediger & Wessel (2018) 49 / 67

  51. Segmenting Oral Phrase Attachment Ambiguity in Speech and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias ihr könnt euch das mal gegenseitig [...] erklären was (21) Defining Sentential Units you can you this now each.other [...] explain what Historical Continuity The LangBank ihr da gemacht habt am besten ihr habt das ja T-Unit Background you there dine have the best you have this yes Annotation Principles Basic Definition alle richtig gemacht Sentential Properties Handling Ambiguities all correctly done Inter-Rater Reliability Segmenting A. ‘It would be best you explain to each other what Speech you did. All of you did everything correct anyway.’ The SegCor Project Maximal Syntactic Unit B. ‘Now explain to each other what you did. Ideally, all Generalizable Solutions Open Issues of you did everything correct anyway.’ Conclusion References Prediger & Wessel (2018) 49 / 67

  52. Segmenting Oral Phrase Attachment Ambiguity in Speech and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias ihr könnt euch das mal gegenseitig [...] erklären was (22) Defining Sentential Units you can you this now each.other [...] explain what Historical Continuity The LangBank ihr da gemacht habt am besten ihr habt das ja T-Unit Background you there dine have the best you have this yes Annotation Principles Basic Definition alle richtig gemacht Sentential Properties Handling Ambiguities all correctly done Inter-Rater Reliability Segmenting A. ‘It would be best you explain to each other what Speech you did. All of you did everything correct anyway.’ The SegCor Project Maximal Syntactic Unit B. ‘Now explain to each other what you did. Ideally, all Generalizable Solutions Open Issues of you did everything correct anyway.’ Conclusion References Prediger & Wessel (2018) 49 / 67

  53. Segmenting Oral Well-Formedness vs. Brevity and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity Issue The LangBank T-Unit ◮ SU in non-standard language may (not) be well-formed Background Annotation Principles ◮ What to do if a well-formed analysis is possible? Basic Definition Sentential Properties → Ambiguities between well-formed and short TU arise Handling Ambiguities Inter-Rater Reliability Segmenting Speech Rule The SegCor Project Maximal Syntactic Unit ◮ When in doubt, prefer well-formedness over brevity Generalizable Solutions Open Issues Conclusion References 50 / 67

  54. Segmenting Oral Well-Formed vs. Brevity in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias aber sie trocknen die Zungen [...] wenn man Köl- (23) Defining Sentential Units Historical Continuity but they dry the tongues [...] if you köl- The LangBank T-Unit und Haselbaum pflanzt zu Weinrebenwurzeln / Background Annotation Principles and hazeltree plant next.to grape.vine.roots / Basic Definition Sentential Properties so verderben sie die Reben Handling Ambiguities Inter-Rater Reliability then spoil they the graoe.vine Segmenting ‘But they dry the tongues [...]. If you plant köl tree and Speech The SegCor Project hazel tree next to grape vine roots, then they spoil the Maximal Syntactic Unit Generalizable Solutions grape vine.’ Open Issues Conclusion von Megenberg (1482) References 51 / 67

  55. Segmenting Oral Well-Formed vs. Brevity in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank (24) I have already all the time been hanging in the no not T-Unit Background all the time a minute I have been hanging in the line Annotation Principles Basic Definition it’s quite funny when you don’t hear anything Sentential Properties Handling Ambiguities Inter-Rater Reliability FOLK corpus 4 Segmenting Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 4 https://dgd.ids-mannheim.de/DGD2Web/ExternalAccessServlet? command=displayTranscript&id=FOLK_E_00084_SE_01_T_01_DF_01& cID=c10&wID=c10 52 / 67

  56. Segmenting Oral Our TU Definition in one Slide and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity 1. A TU consists of a phrasal head and all its arguments The LangBank T-Unit and adjuncts Background Annotation Principles 2. TUs are independent, unique, exhaustive Basic Definition Sentential Properties 3. TUs are continuous and do not start with punctuation Handling Ambiguities Inter-Rater Reliability 4. Ambiguous clause and phrase attachment is resolved by Segmenting Speech preference of brevity The SegCor Project Maximal Syntactic Unit 5. Well-formedness trumps brevity Generalizable Solutions Open Issues Conclusion References 53 / 67

  57. Segmenting Oral Inter-Rater Reliability and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity ◮ End of every word is sentence boundary candidate The LangBank T-Unit → Binary ± sentence boundary annotation Background Annotation Principles ◮ 3 annotators Basic Definition Sentential Properties ◮ 5 text excerpts from 1532 to 1639 Handling Ambiguities Inter-Rater Reliability ◮ 2,609 tokens → ca. 5% sentence boundaries Segmenting Speech ◮ Cohen’s Kappa ( κ ) = . 82 The SegCor Project Maximal Syntactic Unit Generalizable Solutions → Almost perfect agreement: κ ≥ . 80 (Landis & Koch 1977) Open Issues Conclusion References 54 / 67

  58. Segmenting Oral The SegCor Project and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity ◮ Segmentation of Oral Corpora (06/16 to 02/19) The LangBank T-Unit ◮ Develop methods for segmentation of spoken language Background Annotation Principles Basic Definition ◮ Languages: Spoken French and German Sentential Properties Handling Ambiguities ◮ Corpora: FOLK (German), CLAPI and ESLO (French) Inter-Rater Reliability Segmenting ◮ Westpfahl et al. (2018); Westpfahl & Gorisch (2018); Schmidt Speech The SegCor Project & Westpfahl (2018) Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 55 / 67

  59. Segmenting Oral SegCor Guidelines: Purpose and Means and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank ◮ Based on syntactic criteria T-Unit Background ◮ Find phenomena typical for spoken language Annotation Principles Basic Definition Sentential Properties ◮ Offer information about syntactic phenomena Handling Ambiguities Inter-Rater Reliability ◮ Practical approach Segmenting Speech ◮ See Westpfahl et al. (2018, p. 3) The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 56 / 67

  60. Segmenting Oral Syntactic Segmentation Layers and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units ◮ Topological fields Historical Continuity The LangBank ◮ Clause type (based on verb position) T-Unit Background ◮ Maximal syntactic unit (MSU) Annotation Principles Basic Definition ◮ (Certain speech phenomena) Sentential Properties Handling Ambiguities Inter-Rater Reliability Segmenting → Hierarchical multi-layer annotation approach Speech The SegCor Project → MSU serve as ‘maximal unit of related clauses’ (SU) Maximal Syntactic Unit Generalizable Solutions Open Issues → Based on same linguistic units and theories as TU Conclusion References 57 / 67

  61. Segmenting Oral Types of Maximal Syntactic Units and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity ◮ Simple sentential unit (complete, 1 clause) The LangBank T-Unit ◮ Complex sentential unit (complete, 2+ clauses) Background Annotation Principles ◮ Abandoned unit (incomplete clause) Basic Definition Sentential Properties ◮ Non-sentential unit (independent phrases, non-verbal) Handling Ambiguities Inter-Rater Reliability Segmenting Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 58 / 67

  62. Segmenting Oral Types of Maximal Syntactic Units and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity ◮ Simple sentential unit (complete, 1 clause) The LangBank T-Unit ◮ Complex sentential unit (complete, 2+ clauses) Background Annotation Principles ◮ Abandoned unit (incomplete clause) Basic Definition Sentential Properties ◮ Non-sentential unit (independent phrases, non-verbal) Handling Ambiguities Inter-Rater Reliability Segmenting Speech → Currently no similar differentiation for TU The SegCor Project Maximal Syntactic Unit → Applicable to current TU definition Generalizable Solutions Open Issues Conclusion References 58 / 67

  63. Segmenting Oral Similarities between MSU and TU and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank ◮ Maximal units of connected clauses (syntactic definition) T-Unit Background ◮ Atomic, discrete, exhaustive, and coherent (Auer 2010) Annotation Principles Basic Definition Sentential Properties ◮ Differ in linguistic theories used to express concept Handling Ambiguities Inter-Rater Reliability → TU maximal phrasal projection Segmenting Speech → MSU all related clauses or independent/abandoned units The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 59 / 67

  64. Segmenting Oral Historical and Spoken Language Phenomena and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Shared phenomena Defining Sentential Units Historical Continuity ◮ Dialect and non-standard orthography The LangBank T-Unit ◮ Parentheses and ellipses, apo koinu Background Annotation Principles ◮ Non-sentential and abandoned language material Basic Definition Sentential Properties ◮ Syntactic similarities between ENHG and spoken German Handling Ambiguities Inter-Rater Reliability Segmenting Speech Different challenges and cues The SegCor Project Maximal Syntactic Unit ◮ Speech: collaborative or interrupted speech, pauses Generalizable Solutions Open Issues ◮ ENHG: no native speakers’ judgments, sections Conclusion References 60 / 67

  65. Segmenting Oral Similar Solutions to Similar Challenges and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Apo koinu Defining Sentential Units Historical Continuity The LangBank (25) and there he is on the roof was he (FOLK corpus) T-Unit Background Annotation Principles Basic Definition Parentheses and insertions Sentential Properties Handling Ambiguities Inter-Rater Reliability (26) a. I’ve been waiting all the time in the no not all the Segmenting Speech time for a minute in the line (FOLK corpus) The SegCor Project Maximal Syntactic Unit b. because (.) madam minister (.) you don’t only Generalizable Solutions know the conservatory (FOLK corpus) Open Issues Conclusion References 61 / 67

  66. Segmenting Oral Similar Solutions to Similar Challenges (cont.) and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Free noun phrase Historical Continuity The LangBank T-Unit (27) a reasonable cat (.) the big black one (FOLK corpus) Background Annotation Principles Basic Definition Sentential Properties Non-language material Handling Ambiguities Inter-Rater Reliability (28) and then ((laughter)) ((breathing)) (FOLK corpus) Segmenting Speech The SegCor Project (29) then they spoil the vine - ¶ (von Megenberg, 1482) Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 62 / 67

  67. Segmenting Oral Discontinuous Speech � Continuity and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias T Why is it difficult to formulate? Defining Sentential Units Historical Continuity A On the strip board, on the strip board you can see it The LangBank T-Unit better because # Background Annotation Principles B # Because of Mehmet Basic Definition Sentential Properties Handling Ambiguities A ((laughs)) # because it is in a row hence in a section # Inter-Rater Reliability Segmenting B # a board # Speech The SegCor Project A # or whatever you call it Maximal Syntactic Unit Generalizable Solutions T Uhm what do you mean # Open Issues Conclusion 7th graders in math class (Prediger & Wessel 2018) References 63 / 67

  68. Segmenting Oral Discontinuous Speech � Continuity and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias T Why is it difficult to formulate? Defining Sentential Units Historical Continuity A On the strip board, on the strip board you can see it The LangBank T-Unit better because # Background Annotation Principles B # Because of Mehmet Basic Definition Sentential Properties Handling Ambiguities A ((laughs)) # because it is in a row hence in a section # Inter-Rater Reliability Segmenting B # a board # Speech The SegCor Project A # or whatever you call it Maximal Syntactic Unit Generalizable Solutions T Uhm what do you mean # Open Issues Conclusion 7th graders in math class (Prediger & Wessel 2018) References 63 / 67

  69. Segmenting Oral Collaborative Speech � Continuity and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units T Uhm (.) yes, well, now try to use the sentence Historical Continuity The LangBank A If the whole thing is drawn the same size then (.) is (.) T-Unit Background you go on you go on I made the start Annotation Principles Basic Definition B Then you can see better Sentential Properties Handling Ambiguities Inter-Rater Reliability A What big what Segmenting Speech B What is bigger and what is smaller The SegCor Project Maximal Syntactic Unit T Very good Generalizable Solutions Open Issues 7th graders in math class (Prediger & Wessel 2018) Conclusion References 64 / 67

  70. Segmenting Oral Collaborative Speech � Continuity and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units T Uhm (.) yes, well, now try to use the sentence Historical Continuity The LangBank A If the whole thing is drawn the same size then (.) is (.) T-Unit Background you go on you go on I made the start Annotation Principles Basic Definition B Then you can see better Sentential Properties Handling Ambiguities Inter-Rater Reliability A What big what Segmenting Speech B What is bigger and what is smaller The SegCor Project Maximal Syntactic Unit T Very good Generalizable Solutions Open Issues 7th graders in math class (Prediger & Wessel 2018) Conclusion References 64 / 67

  71. Segmenting Oral Conclusion and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity Sentence segmentation The LangBank T-Unit ◮ Sentence definitions suffer from written language bias Background Annotation Principles ◮ Different purposes call for different SU Basic Definition Sentential Properties Handling Ambiguities ◮ Yet many principles can be shared across purposes Inter-Rater Reliability Segmenting ◮ Only few phenomena require specialized rules Speech The SegCor Project → Multi-layer annotation of SU (see SegCor annotation) Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 65 / 67

  72. Segmenting Oral Conclusion (cont.) and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity Generalizability of TU and MSU The LangBank T-Unit ◮ Share assumptions and principles Background Annotation Principles ◮ Often yield same or comparable segmentation Basic Definition Sentential Properties Handling Ambiguities ◮ TU lack applicability to collaborative speech Inter-Rater Reliability Segmenting ◮ MSU relies on topological fields and native-speaker Speech The SegCor Project intuition → limits generalizability Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 66 / 67

  73. Segmenting Oral Outlook and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units ◮ COLD project: 04/2019 to 04/2021 5 Historical Continuity The LangBank ◮ Investigate interaction of teachers and students T-Unit Background ◮ Adaptation strategies of teachers to language learners Annotation Principles Basic Definition ◮ Challenges Sentential Properties Handling Ambiguities → How do we segment our classroom interactions? Inter-Rater Reliability Segmenting → Multiple interlocutors and parallel discourses Speech → Incomplete and collaborative utterances The SegCor Project Maximal Syntactic Unit ◮ Currently: decide transcription and segmentation Generalizable Solutions Open Issues Conclusion References 5 https://www.die-bonn.de/COLD/ 67 / 67

  74. References Segmenting Oral and Historical Language Data Auer, P . (2010). Zum Segmentierungsproblem in der Gesprochenen Sprache. Zarah Weiss InLiSt – Interaction and Linguistic Structures 49, 1–19. Introduction Bley-Vroman, R. (1983). The comparative fallacy in interlanguage studies: The Defining Sentences case of systematicity. Language Learning 33(1), 1–17. Written Language Bias Defining Sentential Units Bredel, U. (2008). Die Interpunktion des Deutschen. Ein kompositionelles System Historical Continuity zur Online-Steuerung des Lesens.. Tübingen: Niemeyer. The LangBank T-Unit Bredel, U. (2011). Interpunktion. Heidelberg: Winter. Background Drach, E. (1937). Grundgedanken der deutschen Satzlehre. Frankfurt am Main, Annotation Principles Germany: Diesterweg. Basic Definition Sentential Properties Eugen, S. (1935). Geschichte und Kritik der wichtigsten Satzdefinitionen. Jena: Handling Ambiguities Fromannsche Buchhandlung. Inter-Rater Reliability Feilke, H. (2010). Schriftliches Argumentieren zwischen Nähe und Distanz – am Segmenting Speech Beispiel wissenschaftlichen Schreibens. Nähe und Distanz im Kontext The SegCor Project variationslinguistischer Forschung 35, 209–231. Maximal Syntactic Unit Foster, P . & P . Skehan (1996). The Influence of Planning and Task Type on Second Generalizable Solutions Open Issues Language Performance. Studies in Second Language Acquisition 18(3), Conclusion 299–323. References Gallmann, P . (2009). Duden. Die Grammatik, Mannheim, Leipzig, Wien, Zürich: Dudenverlag, chap. Der Satz, pp. 763–1056. Harris, R. (1980). The Language Markers. Ithaca, N.Y.: Cornell University Press. Hartweg, F. & K.-P . Wegera (2005). Frühneuhochdeutsch: eine Einführung in die deutsche Sprache des Spätmittelalters und der frühen Neuzeit. Germanistische Arbeitshefte 33. 67 / 67

  75. Hennig, M. (2007). Thesen zur Erforschung historischer Nähesprachlichkeit. In Segmenting Oral M. Balaskó & P . Szatmári (eds.), Sprach- und Literaturwissenschaftliche and Historical Language Data Brückenschläge. Vorträge der 13. Jahrestagung der GESUS in Szombathely, München: Lincom (Edition Linguistik 59), pp. 13–26. Zarah Weiss Hennig, M. (2008). Grammatik der gesprochenen Sprache in Theorie und Praxis. Introduction Defining Sentences Hennig, M. (2009). Nähe und Distanzierung. Verschriftlichung und Reorganisation Written Language Bias des Nähebereichs im Neuhochdeutschen. Kassel: Kassel University Press. Defining Sentential Units Hunt, K. W. (1965). Grammatical Structures Written at Three Grade Levels. NCTE Historical Continuity Research Report No. 3. The LangBank T-Unit Landis, J. R. & G. G. Koch (1977). The measurement of observer agreement for Background categorical data. Biometrics 33(1), 159–174. Annotation Principles Basic Definition Linell, P . (1982). The Written Language Bias in Linguistics. Linköping: University of Sentential Properties Linköping. Handling Ambiguities Lüdeling, A., C. Odebrecht, L. Perlitz & A. Zeldes (2018). RIDGES-HErbology Inter-Rater Reliability Segmenting (Version 8.0). Speech http://korpling.org/ridges/.http://hdl.handle.net/11022/0000-0007-C6A3-1. The SegCor Project Pittner, K. & J. Bermann (2004). Deutsche Syntax: ein Arbeitsbuch. Tübingen, Maximal Syntactic Unit Generalizable Solutions Germany: Narr. Open Issues Prediger, S. & L. Wessel (2018). Brauchen mehrsprachige jugendliche eine andere Conclusion fach- und sprachintegrierte Förderung als einsprachige? Zeitschrift für References Erziehungswissenschaft 21(2), 361–382. Reichmann, O. & R. P . Ebert (1993). Frühneuhochdeutsche Grammatik, vol. 12. Tübingen, Germany: Niemeyer. Sandig, B. (1973). Zur historischen Kontinuität normativ diskriminierter syntaktischer Muster in spontaner Sprachsprache. Deutsche Sprache pp. 37–56. 67 / 67

  76. Schmidt, K. (2016). Der graphematische Satz. Zeitschrift für germanistische Segmenting Oral Linguistik 44(2), 215–256. and Historical Language Data Zarah Weiss Schmidt, T. & S. Westpfahl (2018). A Study on Gaps and Syntactic Boundaries in Introduction Spoken Interaction. In Proceedings of the 14th Conference on Natural Defining Sentences Written Language Bias Language Processing (KONVENS 14). Vienna, Austria, pp. 40–49. Defining Sentential Units Historical Continuity The LangBank STUTTGART21 (2010). STUTTGART21: German panel discussion. Phoenix TV. T-Unit Background Annotation Principles Basic Definition Weiss, Z. & G. Schnelle (2016). Early New High German Sentence Segmentation Sentential Properties Handling Ambiguities Guidelines. Inter-Rater Reliability Segmenting Speech Westpfahl, S. & J. Gorisch (2018). A Syntax-Based Scheme for the Annotation and The SegCor Project Maximal Syntactic Unit Segmentation of German Spoken Language Interactions. In Proceedings of Generalizable Solutions the Joint Workshop on Linguistic Annotation, Multiword Expressions and Open Issues COnstructions. Santa Fe, New Mexico, USA, pp. 109–120. Conclusion References Westpfahl, S., N. Proske, M. Hobich, A. Borlinghaus & H. Strub (2018). Syntactic Segmentation in the SegCor project. Version 1. Wöllstein, A. (2014). Topologisches Satzmodell. Heidelberg: Winter, 2 ed. 68 / 67

  77. Segmenting Oral A sentence is a sentence is a sentence and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank T-Unit Background Annotation Principles Basic Definition Sentential Properties Handling Ambiguities Inter-Rater Reliability Segmenting Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 68 / 67

  78. Segmenting Oral Written Language Bias and Comparative and Historical Language Data Fallacy Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank The learner’s system is worthy of study in its own T-Unit Background right, not just as a degenerate form of the target Annotation Principles Basic Definition system Bley-Vroman (1983, p. 4) Sentential Properties Handling Ambiguities Inter-Rater Reliability In grammar description, linguists tend to regard the Segmenting Speech peculiarities of the former [= spoken language] as The SegCor Project ’deviations’ rather than as independent structural Maximal Syntactic Unit Generalizable Solutions principles. (translated from Small 1985a: 13) Open Issues Conclusion References 69 / 67

  79. Segmenting Oral Particles After Comparative in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity Die Wasser [ . . . ] sind gut jedoch nicht so stark und (30) / The LangBank T-Unit the waters are good but not as strong and / Background . . . Annotation Principles kräftig als wie durch das Instrument destilliert / Basic Definition Sentential Properties vigorous than as through the instrument distilled / Handling Ambiguities Inter-Rater Reliability ‘The waters are good but not as strong and vigorous Segmenting Speech as waters distilled with the instrument.’ The SegCor Project Maximal Syntactic Unit Generalizable Solutions Libavius (1603) Open Issues Conclusion References 70 / 67

  80. Segmenting Oral Particles After Comparative in Spoken German and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Eine Eisenbahnfahrt im Tiefbahnhof findet Defining Sentential Units (31) Historical Continuity a train ride in the underground station takes The LangBank T-Unit unter keinen anderen eisenbahntechnischen Background Annotation Principles under not other technical railway Basic Definition Sentential Properties Voraussetzungen statt als wie im Kopfbahnhof Handling Ambiguities Inter-Rater Reliability conditions place than as in the terminus station Segmenting ‘A railway journey in an underground station does not Speech The SegCor Project take place under any other technical railway Maximal Syntactic Unit Generalizable Solutions conditions than in a terminus station.’ Open Issues Conclusion STUTTGART21 (2010) References 71 / 67

  81. Segmenting Oral Another Punctuation Example and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias das is mehr für gelehrte Leute als die sich (32) Defining Sentential Units / Historical Continuity this is more for educated people as those themselves / The LangBank T-Unit mit dem Werk zu belustigen begehre . In welchem er Background Annotation Principles with the work to entertain seek . in which he Basic Definition Sentential Properties aus Plinius sehr viel genommen Handling Ambiguities Inter-Rater Reliability from Plinius very much took Segmenting ‘this is rather for researchers than for those who read Speech The SegCor Project for pleasure the book in which he adopted a lot from Maximal Syntactic Unit Generalizable Solutions Plinius’ Open Issues Conclusion Rhagor (1693c) References 72 / 67

  82. Segmenting Oral VFIN-initial Verb Cluster and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank T-Unit derer man alle morgens und abends zwei (33) Background Annotation Principles of which you all mornings and evenings two Basic Definition Sentential Properties Skrupel dem Schwindsüchtigen [soll geben] / Handling Ambiguities Inter-Rater Reliability scrupuli the tuberculosis infected shall give / Segmenting Speech Adam von Bodenstein (1557) The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References 73 / 67

  83. Segmenting Oral Another TU Attachment Ambiguity in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias aber von dem Baum der Erkenntnis Gutes und Böses Defining Sentential Units (34) Historical Continuity but of the tree of knowledge good and bad The LangBank T-Unit sollst du nicht essen denn welches Tages du / Background Annotation Principles shall you not eat because which day you / Basic Definition Sentential Properties davon isst sollst du des Todes sterben / Handling Ambiguities Inter-Rater Reliability of it eat shall you the death die / Segmenting ‘but of the tree of knowledge about good and evil you Speech The SegCor Project shall not eat, because on the day that you eat of it, Maximal Syntactic Unit Generalizable Solutions you will die’ Open Issues Conclusion Rosbachs (1588) References 74 / 67

  84. Segmenting Oral Another TU Attachment Ambiguity in ENHG and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias aber von dem Baum der Erkenntnis Gutes und Böses Defining Sentential Units (35) Historical Continuity but of the tree of knowledge good and bad The LangBank T-Unit sollst du nicht essen denn welches Tages du / Background Annotation Principles shall you not eat because which day you / Basic Definition Sentential Properties davon isst sollst du des Todes sterben / Handling Ambiguities Inter-Rater Reliability of it eat shall you the death die / Segmenting ‘but of the tree of knowledge about good and evil you Speech The SegCor Project shall not eat, because on the day that you eat of it, Maximal Syntactic Unit Generalizable Solutions you will die’ Open Issues Conclusion Rosbachs (1588) References 74 / 67

  85. Segmenting Oral A Sentence is a Sentence is a Sentence? and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias Defining Sentential Units Historical Continuity The LangBank T-Unit Background Annotation Principles Basic Definition Sentential Properties Handling Ambiguities Inter-Rater Reliability Segmenting Speech The SegCor Project Maximal Syntactic Unit Generalizable Solutions Open Issues Conclusion References "LEASH DOGS SHEEP" from Feilke (2010) 75 / 67

  86. Segmenting Oral Spoken Learner Language and Historical Language Data Zarah Weiss Introduction Defining Sentences A which which what is your opinion? Written Language Bias Defining Sentential Units Historical Continuity B (1.0) maybe er (5.0) he (7.0) The LangBank T-Unit A long time? or it’s for for you it’s a major mistake or a Background small mistake? Annotation Principles Basic Definition Sentential Properties B maybe three months Handling Ambiguities Inter-Rater Reliability A three months for this one okay for me it’s ten Segmenting Speech B ten? The SegCor Project Maximal Syntactic Unit A ten years Generalizable Solutions Open Issues B yeah ten years oh very long Conclusion References L2 interaction on prison sentences (Foster & Skehan 1996) 76 / 67

  87. Segmenting Oral Interrupted Turns L1 German and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias T Why is it difficult to formulate? Defining Sentential Units Historical Continuity A On the strip board, on the strip board you can see it The LangBank T-Unit better because # Background Annotation Principles B # Because of Mehmet Basic Definition Sentential Properties Handling Ambiguities A ((laughs)) # because it is in a row hence in a section # Inter-Rater Reliability Segmenting B # a board # Speech The SegCor Project A # or whatever you call it Maximal Syntactic Unit Generalizable Solutions T Uhm what do you mean # Open Issues Conclusion 7th graders in math class (Prediger & Wessel 2018) References 77 / 67

  88. Segmenting Oral Interrupted Turns L1 German and Historical Language Data Zarah Weiss Introduction Defining Sentences Written Language Bias T Why is it difficult to formulate? Defining Sentential Units Historical Continuity A On the strip board, on the strip board you can see it The LangBank T-Unit better because # Background Annotation Principles B # Because of Mehmet Basic Definition Sentential Properties Handling Ambiguities A ((laughs)) # because it is in a row hence in a section # Inter-Rater Reliability Segmenting B # a board # Speech The SegCor Project A # or whatever you call it Maximal Syntactic Unit Generalizable Solutions T Uhm what do you mean # Open Issues Conclusion 7th graders in math class (Prediger & Wessel 2018) References 77 / 67

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend