a mas novas vos torn now i take you back
play

A mas novas vos torn / Now I take you back Corpus to my tale - PowerPoint PPT Presentation

Introduction Parallel Corpora A mas novas vos torn / Now I take you back Corpus to my tale Structure Corpus Study Conclusion The Romance of Flamenca References Olga Scrivner, E.D. Blodgett*, Sandra K ubler, Michael McGuire


  1. Introduction Parallel Corpora “A mas novas vos torn / Now I take you back Corpus to my tale” Structure Corpus Study Conclusion The Romance of Flamenca References Olga Scrivner, E.D. Blodgett*, Sandra K¨ ubler, Michael McGuire Indiana University *University of Alberta June 2013 1 / 36

  2. Introduction Introduction Parallel Corpora In the past, historical documents and manuscripts were studied Corpus Structure exclusively by using a manual paper-based approach. Corpus Study Conclusion References Recent achievements of corpus linguistics have introduced state-of-art methods and tools for digitization, semi-automatic annotation, and visualization of such resources. 2 / 36

  3. Linguistic Annotation Introduction “ By accessing linguistic annotation, we can extend the range Parallel Corpora of phenomena that can be found with high precision” Corpus (K¨ ubler and Zinsmeister, 2014) Structure Corpus Study 1 Morphological annotation - collocations, spelling variation Conclusion References 2 Syntactic annotation - sentence structure in narratives vs. dialogues, prose vs. verse 3 Discourse annotation - analysis of scenes and characters (Female vs. male speaker, King vs. servants) 3 / 36

  4. Medieval Romance Languages Introduction In recent years, a number of annotated corpora have been Parallel Corpora developed for Medieval Romance languages: Corpus Structure Corpus Study Corpora of Old Spanish (Davies, 2002) Conclusion Old Portuguese (Davies and Ferreira, 2006) References Old French (Stein, 2008; Martineau et al., 2010) 4 / 36

  5. Medieval Romance Languages Introduction In recent years, a number of annotated corpora have been Parallel Corpora developed for Medieval Romance languages: Corpus Structure Corpus Study Corpora of Old Spanish (Davies, 2002) Conclusion Old Portuguese (Davies and Ferreira, 2006) References Old French (Stein, 2008; Martineau et al., 2010) There exist (to our knowledge) two electronic databases: 1 “The Concordance of Medieval Occitan” (Ricketts and Reed, 2005) 2 “Proven¸ cal poetry” (ARTFL Project, 1998) Users of those corpora are limited to lexical search. 4 / 36

  6. Le Roman de Flamenca - 13th century Introduction Parallel Corpora Le Roman de Flamenca , a “universally acknowledged Corpus masterpiece of Old Occitan narrative” (Fleischmann, 1995). Structure Corpus Study Conclusion “Flamenca est la cr´ eation d’un homme d’esprit qui a voulu References faire une oeuvre agr´ eable o` u fˆ ut repr´ esent´ ee dans ce qu’elle avait de plus brillant la vie des cours au XII si` ecle. C’´ etait un roman de moeurs contemporaines” (Meyer, 1865) “Flamenca is the creation of a man of talent who wished to write an agreeable work representing the most brilliant aspects of courtly life in the twelfth century. It is a novel of manners” (Bradley, 1922) 5 / 36

  7. Le Roman de Flamenca Introduction Parallel Corpora Corpus Structure Corpus Study Conclusion References This romance presents a very intriguing love story between the beautiful Flamenca , who is imprisoned in a tower by her jealous husband Archambaut , and the sharp-witted knight Guillem . The photo of the tapestry is used by permission of FreeLargePhotos.com 6 / 36

  8. Le Roman de Flamenca Introduction Parallel Corpora Corpus The anonymous manuscript of Le Roman de Flamenca Structure was accidentally discovered in Carcassonne (France) by Corpus Study Raynouard and was first fully edited P. Meyer in 1865. Conclusion References This romance is unique in genre (“the first modern novel”), its use of setting, adventures, and character portrayal (Blodgett, 1995; Bradley, 1922; Meyer, 1865). The potential value of this historical resource, however, is limited by the lack of an accessible digital format and linguistic annotation. 7 / 36

  9. Goals Introduction Parallel Corpora Our corpus is intended not only as material for linguistic Corpus Structure research, but also to aid in broader studies: Corpus Study Conclusion Interactive online database with access to a glossary, to References translations of verses, and to comments (Meyer, 1901) http://nlp.indiana.edu/~obscrivn/Introduction.html Multiple-level annotation - morphological, syntactic and pragmatic (Scrivner et al., 2013) Parallel English-Occitan corpus (Blodgett, 1995) 8 / 36

  10. What is Parallel Corpus Introduction Parallel Corpora Corpus Structure Corpus Study Parallel corpus is “an association between two texts Conclusion (written or spoken) in different languages that represent References translations of each other” (Tufis, 2006). Parallel alignment is reciprocal translation units that encode valuable lexical and syntactic knowledge. 9 / 36

  11. Alignment types Introduction Parallel Corpora Corpus One-to-one: one word from a source language corresponds Structure to only one word in a target language Corpus Study Conclusion References 10 / 36

  12. Alignment types Introduction Parallel Corpora Corpus One-to-one: one word from a source language corresponds Structure to only one word in a target language Corpus Study Conclusion References One-to-many: 10 / 36

  13. Historical Parallel Corpora Introduction Parallel Corpora Corpus Structure Given that parallel words have the same content, we can Corpus Study identify forms that have not been studied (Koolen et al., 2006; Conclusion Enrique-Arias, 2012): References Spelling and lexical variation Morphosyntactic variation Null occurrences 11 / 36

  14. Null Occurrences Introduction Parallel Corpora Corpus Structure A mas novas Ø vos torn Now I take you back to my tale Corpus Study Conclusion References 12 / 36

  15. Null Occurrences Introduction Parallel Corpora Corpus Structure A mas novas Ø vos torn Now I take you back to my tale Corpus Study Conclusion References 12 / 36

  16. Automatic Alignment Introduction Parallel Corpora Corpus Structure English and Occitan texts are aligned by lines of verses Corpus Study Conclusion Bilingual lexicon is generated by NATools 1 References (Matrix of word-to-word probabilities) Automatic alignment via Berkeley parser (Liang et al., 2006) Manual correction of alignment 1 http://linguateca.di.uminho.pt/natools/ 13 / 36

  17. Morphological Annotation -TNT Tagger Introduction Parallel Corpora Corpus Structure Corpus Study Conclusion References 14 / 36

  18. Syntactic Annotation - Berkeley Parser Introduction Parallel Corpora Corpus Structure Corpus Study Conclusion References 15 / 36

  19. Syntactic Annotation Introduction Parallel Corpora Corpus Structure Corpus Study Conclusion References “...nor did he want to omit Flamenca” 16 / 36

  20. Discourse Annotation - Speakers Introduction Parallel Corpora Corpus Structure The labels correspond to the main characters names, namely Corpus Study Flamenca, Archambaut, Guillem, Father, King, Queen. Conclusion References Less important characters are marked as FemaleSpeakers and MaleSpeakers. 17 / 36

  21. Parallel Alignment Annotation Introduction Parallel Corpora Corpus Structure Corpus Study Conclusion References 18 / 36

  22. Corpus Design Introduction Parallel Corpora Corpus Since we are targeting two different types of users, linguists and Structure Corpus Study non-linguistics, with different needs, the corpus is made Conclusion available in two different modes: References Web Interface: Users can mainly browse the text and look up translations, glosses, and comments Query Search: Users interested in the linguistic annotation can query the corpus on-line 19 / 36

  23. 1. Web Database Introduction Parallel Corpora Corpus Glossary definitions, comments, and footnotes are linked to Structure tokens and are made visible when the user hovers over a Corpus Study marked word. Conclusion References 20 / 36

  24. 2. Search Tool (ANNIS) Introduction Parallel Corpora Our web search based on ANNIS allows for basic queries, to Corpus search for a word or phrase, and more complex queries for Structure syntactic, morphosyntactic, discourse and alignment Corpus Study annotation. Conclusion References 21 / 36

  25. Null Subject Introduction Parallel Corpora Corpus Structure Corpus Study Modern Occitan varieties are null subject languages Null Subjects (Hinzelin and Kaiser, 2012) Corpus Search Results Conclusion (1) Ø Era pertot, dintrava pertot References was everywhere, entered everywhere ‘the light was everywhere and it was coming from everywhere’ (Lo b` on de la nu` och, Max Rouquette) 22 / 36

  26. Previous Findings - Overt Subject Introduction Parallel Corpora Overt subjects - disambiguation or “mise en relief” Corpus Structure Corpus Study (2) Femna que ieu ame illuminada de non r` en Null Subjects ‘Woman who I love illuminated from nothingness’ Corpus Search Results (Saume dins lo vent, Serge Bec) Conclusion References Person - more frequent with 1st person (Vance, 2009) Genre - more frequent in prose No difference by clause types (Sitaridou, 2005) Subjunctive clause - preference for null subject (Vance, 2009) 23 / 36

  27. Search by Genre Introduction Parallel Corpora Corpus Structure Discourse annotation: Flamenca, King etc Corpus Study Null Subjects Corpus Search ex. speaker=”Flamenca” Results Conclusion References Narrative vs. Dialogues Male vs. Female High social rank vs. low social rank 24 / 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend