example sentences and making them useful for theoretical
play

Example Sentences and Making them Useful for Theoretical and - PowerPoint PPT Presentation

Example Sentences and Making them Useful for Theoretical and Computational Linguistics Stefan M uller Email: Stefan.Mueller@cl.uni-bremen.de http://www.cl.uni-bremen.de/stefan/ DGfS-Jahrestagung Mainz, 27.02.2004 Outline Why test


  1. Example Sentences and Making them Useful for Theoretical and Computational Linguistics Stefan M¨ uller Email: Stefan.Mueller@cl.uni-bremen.de http://www.cl.uni-bremen.de/˜stefan/ DGfS-Jahrestagung Mainz, 27.02.2004

  2. Outline • Why test suites / data collections? • What do we have? • B-Ger-TS • Demo • Suggestions for using test suites / data collections • Guidelines • Conclusions

  3. Why are Test Suites Needed for NLP? • Language is very complex → minimal changes to a grammar may have unexpected effects • Check improvement in grammar development – coverage – processing speed – memory requirements 2/15

  4. What Test Suites and Data Bases are There? • Test Suites developed in TSNLP (Oepen, Netter and Klein, 1997) – English – German – French 3/15

  5. What Test Suites and Data Bases are There? • Test Suites developed in TSNLP (Oepen, Netter and Klein, 1997) – English – German – French • Test Suites that come with [incr TSDB()] wich is part of the LKB (Copestake, 2002) – English (Lingo, CSLI) – German (VM, DFKI) – Spanish – Japanese – Norwegian 3/15

  6. What Test Suites and Data Bases are There? • Test Suites developed in TSNLP (Oepen, Netter and Klein, 1997) – English – German – French • Test Suites that come with [incr TSDB()] wich is part of the LKB (Copestake, 2002) – English (Lingo, CSLI) – German (VM, DFKI) – Spanish – Japanese – Norwegian • Babel Test Suite 3/15

  7. What Test Suites and Data Bases are There? • Test Suites developed in TSNLP (Oepen, Netter and Klein, 1997) – English – German – French • Test Suites that come with [incr TSDB()] wich is part of the LKB (Copestake, 2002) – English (Lingo, CSLI) – German (VM, DFKI) – Spanish – Japanese – Norwegian • Babel Test Suite • A3-Datenbank in T¨ ubingen (Sternefeld, et. al.) • Others? 3/15

  8. Why Should we Have Additional Ones? (I) • Babel Test Suite is unsystematic, naturally grown from a diploma thesis 4/15

  9. Why Should we Have Additional Ones? (I) • Babel Test Suite is unsystematic, naturally grown from a diploma thesis • TSNLP is very systematic: (1) a. die alte Wand b. * der alte Wand c. * das alte Wand d. * des alte Wand e. * den alte Wand f. * dem alte Wand g. * die alte W¨ ande h. * der alte W¨ ande i. * das alte W¨ ande j. * des alte W¨ ande k. * den alte W¨ ande l. * dem alte W¨ ande m. * der alte W¨ anden n. * die alte W¨ anden 4/15

  10. Why Should we Have Additional Ones? (II) but it is only a part of what is needed: • phenomena are missing 5/15

  11. Why Should we Have Additional Ones? (II) but it is only a part of what is needed: • phenomena are missing • There are tons of strange ungrammatical sentences that are relevant in the context of a discussion of a particular analysis only. Such things are not in TSNLP. Examples: – Agreement as head feature and coordination. – Haider’s Designated Argument as a head feature and coordination of unergatives and unakkusatives 5/15

  12. Outline • Why test suites / data collections? • What do we have? • B-Ger-TS • Demo • Suggestions for using such test suites / data collections • Guidelines • Conclusions

  13. B-Ger-TS (I) • B-Ger-TS developed from Babel-TS • contains examples I gathered over the past ten years • I started to systematize it, to crossclassify items with regard to phenomena • extended the database by examples from the literature • provided references to bibliographic sources • eliminated lexical ambiguity 6/15

  14. B-Ger-TS (II) • verb position, scrambling, fronting and island data, extraposition, subjacency, . . . • coherent/incoherent constructions, complex predicates, particle verbs, control and raising, AcI constructions • incomplete category fronting with adjectives and verbs, multiple frontings • adjunction in the nominal and verbal area – attributive adjectives and participles – prepositional phrases – relative clauses • free relative clauses • left dislocation • topic drop 7/15

  15. B-Ger-TS (III) • depictive secondary predicates • passive in various forms (e.g., stative passive, dative passive, lassen passive) • modal infinitives • coordination • and the interaction between all of this! 8/15

  16. B-Ger-TS (III) • depictive secondary predicates • passive in various forms (e.g., stative passive, dative passive, lassen passive) • modal infinitives • coordination • and the interaction between all of this! • items are crossclassified according to the phenomena 8/15

  17. B-Ger-TS (III) • depictive secondary predicates • passive in various forms (e.g., stative passive, dative passive, lassen passive) • modal infinitives • coordination • and the interaction between all of this! • items are crossclassified according to the phenomena • retreival with respect to various aspects is possible 8/15

  18. Outline • Why test suites / data collections? • What do we have? • B-Ger-TS • Demo • Suggestions for using such test suites / data collections • Guidelines • Conclusions

  19. Demo of TSDB 9/15

  20. Suggestions for Using Test Suites / Data Collections • All published grammar fragments should come with a list of used test suites and results. (many already do, mainly those connected to the CSLI/DFKI groups) • example: http://www.cl.uni-bremen.de/Fragments/b-ger-gram.html 10/15

  21. Suggestions for Using Test Suites / Data Collections • All published grammar fragments should come with a list of used test suites and results. (many already do, mainly those connected to the CSLI/DFKI groups) • example: http://www.cl.uni-bremen.de/Fragments/b-ger-gram.html • Journal articles can be written and reviewed with reference to publically availible data collections. 10/15

  22. Outline • Why test suites / data collections? • What do we have? • B-Ger-TS • Demo • Suggestions for using such test suites / data collections • Guidelines • Conclusions

  23. The Format • simple ASCII text • lines with ‘ ;;; ’ indicate a phenomenon until the next line with ‘ ;;; ’ ;;; Extraposition daß der Mann schl¨ aft, der stirbt. ;; Extraposition aus Subjekt Der Mann liebt Maria, der ihn verachtet. ;; Extraposition aus Subjekt im Vorfeld Den Mann liebt Maria, der ihn verachtet. ;; Extraposition aus Objekt im Vorfeld Daß Karl schl¨ aft, ist dem Mann aufgefallen, der ihn kennt. ;; @ nach Haider94 11/15

  24. The Format • simple ASCII text • lines with ‘ ;;; ’ indicate a phenomenon until the next line with ‘ ;;; ’ ;;; Extraposition daß der Mann schl¨ aft, der stirbt. ;; Extraposition aus Subjekt Der Mann liebt Maria, der ihn verachtet. ;; Extraposition aus Subjekt im Vorfeld Den Mann liebt Maria, der ihn verachtet. ;; Extraposition aus Objekt im Vorfeld Daß Karl schl¨ aft, ist dem Mann aufgefallen, der ihn kennt. ;; @ nach Haider94 • everything that follows ‘ ;; ’ and preceedes ‘ @ ’ is a comment • everything that follows ‘ @ ’ is the source of the example 11/15

  25. The Format • simple ASCII text • lines with ‘ ;;; ’ indicate a phenomenon until the next line with ‘ ;;; ’ ;;; Extraposition daß der Mann schl¨ aft, der stirbt. ;; Extraposition aus Subjekt Der Mann liebt Maria, der ihn verachtet. ;; Extraposition aus Subjekt im Vorfeld Den Mann liebt Maria, der ihn verachtet. ;; Extraposition aus Objekt im Vorfeld Daß Karl schl¨ aft, ist dem Mann aufgefallen, der ihn kennt. ;; @ nach Haider94 • everything that follows ‘ ;; ’ and preceedes ‘ @ ’ is a comment • everything that follows ‘ @ ’ is the source of the example • crossclassification of phenomena: listing phenomena separated by ‘+’ ;;; Extraktion + w-Satz * daß ich nicht weiß, dieses Buch warum ich lesen sollte. ;; @GMueller98a:244 11/15

  26. Lexical Ambiguity and Efficiency Ambiguity in case does not hurt, but ambiguity in number does. (2) a. Will der Manager lachen? b. Will der Mann lachen? Manager projects to a full NP, Manager lachen a full VP + sentence 12/15

  27. Lexical Ambiguity and Efficiency Ambiguity in case does not hurt, but ambiguity in number does. (2) a. Will der Manager lachen? b. Will der Mann lachen? Manager projects to a full NP, Manager lachen a full VP + sentence Even worse: If the verb has an optional object, we get unwanted ambiguities: (3) Will der Manager essen? ( der = subject, manager = object) 12/15

  28. Lexical Ambiguity and Efficiency Ambiguity in case does not hurt, but ambiguity in number does. (2) a. Will der Manager lachen? b. Will der Mann lachen? Manager projects to a full NP, Manager lachen a full VP + sentence Even worse: If the verb has an optional object, we get unwanted ambiguities: (3) Will der Manager essen? ( der = subject, manager = object) (4) a. Will der Manager essen? → 307 passive edges b. Will der Mann essen? → 114 passive edges 12/15

  29. Lexical Ambiguity and Usability of Test Suites (Grammatical Sentences) ihr is ambiguous between dative feminine and second person plural and the possessive pronoun. A theory/grammar that makes wrong claims about case could analyze (5) as a sentence with two nominatives. (5) Ihr helfen wir. So the grammatical sentence could be parsed although the theory assigns a wrong structure/wrong case values. 13/15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend