COGNITIVE STUDIES | ÉTUDES COGNITIVES, 13: 183–193 SOW Publishing House, Warsaw 2013
DOI: 10.11649/cs.2013.012
LUDMILA DIMITROVA1,A & RALITSA DUTSOVA1,B
1Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia Aludmila@cc.bas.bg ; Br.dutsova@yahoo.com
WEB-APPLICATION FOR THE PRESENTATION OF BILINGUAL CORPORA (FOCUSING ON BULGARIAN AS ONE OF THE TWO PAIRED LANGUAGES)
Abstract This paper briefly presents a web-application for the presentation of bilingual aligned corpora focusing on Bulgarian as one the two paired languages. The focus is given to the description of the software tools and user interface. The software is developed in IMI-BAS and will be hosted on a server there. Some examples of the usage of the web-application for the presentation of a Bulgarian-Polish aligned corpus are included. Keywords: parallel corpus, aligned corpus, concordance, linguistic annotation, lemmatization, POS-tagging, web-interface, web-application.
- 1. Introduction
The software tool Web-application for the presentation of bilingual aligned corpora with Bulgarian focuses on pairs of languages with Bulgarian being one of the two. The texts in the ongoing version of the corpora are automatically aligned at the sentence level. The whole corpus is oriented towards emphasizing the applicability
- f the digital bilingual data for computerized natural language processing, but also
as a source of human readable information.
- 2. Format of the Texts
The bilingual aligned corpora using Bulgarian as one of the paired languages, pre- pared at the Mathematical Linguistics Department of the IMI-BAS under the super- vision of L. Dimitrova, will serve as input files for the software tool Web-application for the presentation of bilingual aligned corpora with Bulgarian. 2.1. Alignment of a Corpus For a parallel corpus to be useful, it must be treated with a special program for "alignment". An aligned corpus is a parallel corpus containing relations between