Bozzi, Passarotti, CHLT LEMLAT 1
CHLT Project
(IST-2001-32745)
Workpackage 5. Neo-Latin Morphological Analyser
C.N.R.
Istituto di Linguistica Computazionale
Andrea Bozzi Giuseppe Cappelli Marco Passarotti Paolo Ruffolo
CHLT Project (IST-2001-32745) Workpackage 5. Neo-Latin - - PowerPoint PPT Presentation
CHLT Project (IST-2001-32745) Workpackage 5. Neo-Latin Morphological Analyser C.N.R. Istituto di Linguistica Computazionale Andrea Bozzi Giuseppe Cappelli Marco Passarotti Paolo Ruffolo Bozzi, Passarotti, CHLT LEMLAT 1 1. LEMLAT A
Bozzi, Passarotti, CHLT LEMLAT 1
(IST-2001-32745)
Istituto di Linguistica Computazionale
Andrea Bozzi Giuseppe Cappelli Marco Passarotti Paolo Ruffolo
Bozzi, Passarotti, CHLT LEMLAT 2
Bozzi, Passarotti, CHLT LEMLAT 3
– Georges – Gradenwitz – Oxford Latin Dictionary – TLL (partially)
– 58147 LES (invariable parts of the inflected forms)
Bozzi, Passarotti, CHLT LEMLAT 4
A0014 ABALIENATION N31 A0015 V ABALIEN V1 A0015 ABALEN V1 A0016 ABALIUD I A0017 ABALTERUTRUM I A0018 ABAMBUL V1I A0019 ABAMIT N1 A0020 ABANTE I A0021 V ABARC V2 A0021 ABERC V2
Different LES receive the same ID Number, if they have a common lemma (generated by the LES registered with V code):
A0015 V ABALIEN V1 A0015 ABALEN V1
Lemma: abalieno
LES COD LES ID Num.
Bozzi, Passarotti, CHLT LEMLAT 5
Input form Lemma Segmentation attempts COD LEM ID Num.
Bozzi, Passarotti, CHLT LEMLAT 6
Intellettuale Europeo, Roma), in Leibniz texts
published)
Lisboa)
Bozzi, Passarotti, CHLT LEMLAT 7
Bozzi, Passarotti, CHLT LEMLAT 8
– Words: Version 1.97 by William Whitaker http://www.erols.com/whitaker/words.htm – Nomen: by Paravia (Italian publishing house) – Perseus Latin Morphological Analysis: by Perseus Project
Bozzi, Passarotti, CHLT LEMLAT 9
– LEMLAT: 58147 LES – Words: 48698 stems – Nomen: 31903 lemmas – Perseus: ?
Example
– pardalios
Bozzi, Passarotti, CHLT LEMLAT 10
– vies (form of via: abl., pl. in Corp. Inscr. Lat. 4, 1410)
Bozzi, Passarotti, CHLT LEMLAT 11
Bozzi, Passarotti, CHLT LEMLAT 12
with an analytical one, through adding on the LEMLAT lemmatization results the following items:
– new morphological informations
aquai
Common, Noun, I Decl., Gen., Sing., Fem.
– new stylistic and historical-linguistic informations
aquai
Common, Noun, I Decl., Gen., Sing., Fem., Poetic., Arch.
Bozzi, Passarotti, CHLT LEMLAT 13
– LES: antiqu- – SM (paradigmatic suffixes): -issim- – SF (endings): -orum
Bozzi, Passarotti, CHLT LEMLAT 14
– accepted standard – largely tested on a number of languages – flexibility and personalization (useful for this first application on a dead language)
Bozzi, Passarotti, CHLT LEMLAT 15
====== ================== Code P ATTRIBUTE ====== ================== 1 PoS 2 Type 3 Flexive Category 4 Mood 5 Tense 6 Case 7 Gender 8 Number 9 Person 10 Degree
Bozzi, Passarotti, CHLT LEMLAT 16
= ===================== ===================== = P ATTRIBUTE VALUE C = ===================== ===================== = 3 Flexive Category I decl. A II decl. B III decl. C IV decl. D V decl. E I conjug. F II conjug. G III conjug. H IV conjug. L Conjug e/i M Exceptional Conjug. N No Flexive Category -
Bozzi, Passarotti, CHLT LEMLAT 17
a n1 NcA--bfs-- ros-a a n1 NcA--bms-- pirat-a a n1 NcA--nfs-- ros-a a n1 NcA--nms-- pirat-a a n1 NcA--vfs-- ros-a a n1 NcA--vms-- pirat-a a n1e NcA--bfs-- plastic-a a n1e NcA--bms-- poet-a a n1e NcA--nfs-- plastic-a a n1e NcA--nms-- poet-a a n1e NcA--vfs-- plastic-a a n1e NcA--vms-- poet-a abus n1e NcA--bfp-- de-abus abus n1e NcA--dfp-- de-abus SF LEMLAT Cod. EAGLES Cod. Examples
Bozzi, Passarotti, CHLT LEMLAT 18
with no segmentation:
– FE (exceptional forms): registered as such in the look-up table, with COD LES FE (ex. amassint)
A1705 AMASSINT FE A1705 V AM V1
– LE (exceptional lemmas): generated through a special information registered in the fourth field of the look-up table (ex. agape)
A1128 AGAP N1E -E
– I (invariable forms): registered as such in the look-up table, with COD LES I (ex. assultim)
A3200 ASSULTIM I
Bozzi, Passarotti, CHLT LEMLAT 19
Bozzi, Passarotti, CHLT LEMLAT 20
ASSULTIM I Ri-------
Bozzi, Passarotti, CHLT LEMLAT 21
– the COD LES of the LE LES – the kind of information registered in the fourth field of the LE LES raw in the look-up table: LEMLAT adds this information to the LES to generate the LE
A1128 AGAP N1E -E
LE: agape (AGAP plus –E) no segmented wordform! COD LES: N1E + Morphological analysis: Fourth field: -E Common, Noun, I Decl., Nomin., Sing., Fem. Common, Noun, I Decl., Voc., Sing., Fem. Common, Noun, I Decl., Abl., Sing., Fem.
Bozzi, Passarotti, CHLT LEMLAT 22
Bozzi, Passarotti, CHLT LEMLAT 23
– Input form: abamitas Segmentation: abamit-as
as n1 NcA--afp- as n1 NcA--amp–
A0019f ABAMIT N1
Selected SF:
as n1 NcA--afp-
Bozzi, Passarotti, CHLT LEMLAT 24
– To choose a RDBMS (Relational Database Management System) among the available open-source systems – To use the chosen RDBMS in LEMLAT – Software development for implementing of new features
Bozzi, Passarotti, CHLT LEMLAT 25
(but additional funds are needed)
Bozzi, Passarotti, CHLT LEMLAT 26
Proposal for EU Sixth Framework, 2003
– To be added to lemmas:
– Metric reading through a multimedial tool (text-to-speech and sound reproduction)