Papillon Lexical Database Project Monolingual Dictionaries & - - PowerPoint PPT Presentation

papillon lexical database project
SMART_READER_LITE
LIVE PREVIEW

Papillon Lexical Database Project Monolingual Dictionaries & - - PowerPoint PPT Presentation

Papillon Lexical Database Project Monolingual Dictionaries & Interlingual Links Mathieu Mangeot GETA/CLIPS IMAG Grenoble, France Mathieu.Mangeot@imag.fr 7,8 th December, 2000 7th International Workshop on Academic 1/22 Information


slide-1
SLIDE 1

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 1/22

Papillon Lexical Database Project

Monolingual Dictionaries & Interlingual Links

Mathieu Mangeot

GETA/CLIPS IMAG Grenoble, France Mathieu.Mangeot@imag.fr

slide-2
SLIDE 2

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 2/22

Plan

  • Initiators & Partners of the Project
  • Motivations & Goals of the Project
  • General View & Architecture of the Database
  • Structure of Monolingual Dictionaries
  • Construction Methodology

– Integration of Existing Resources – Adding of New Entries – Revision of New Entries

  • Consultation of the Lexical Database
  • Ongoing Work
  • Conclusion & Contacts
slide-3
SLIDE 3

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 3/22

Initiators & Partners

  • Initiators:

– Dr. Emmanuel Planas (GETA/CLIPS, France) – François Brown de Colstoun (French Embassy, Japan)

– Dr. Mutsuko Tomokiyo (GETA/CLIPS, France)

  • Partners:

– : National Institute of Informatics

(Tokyo, Japan)

– GETA/CLIPS: Machine Translation

(Grenoble, France)

  • & numerous voluntary contributors
slide-4
SLIDE 4

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 4/22

Motivations of the Project

  • Lack of usage dictionaries (& in any case paying)

French <-> Japanese USABLE by Francophones

  • Lack of dictionaries for lingware
  • Information not computerized
  • Internet allows linguists, translators & researchers

to collaborate easily

  • Make data of the project available under open

source license scheme

slide-5
SLIDE 5

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 5/22

Goals of the Project: Production of Dictionaries

  • For humans, in usual formats:

– Internet consultation on-line – Paper edition

  • For humans, thanks to databases:

Direct help for editors, browsers or PDAs

  • For machines:

Terminological resources for lingware

  • For Science:

Creation of multilingual dicos from monolingual ones

slide-6
SLIDE 6

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 6/22

References & Previous Work

  • Data:

– FeM French->English-Malay - M. Lafourcade

(Ass. Champollion/GETA, Grenoble; USM, Penang; DBP, KL)

– JMDict Japanese->English - Jim Breen

(Monash University, Clayton, Australia)

  • Entry Logical Structure:

– DEC, DiCo & LAF - I. Mel’cuk & A. Polguère

(Université de Montréal, Montréal, Canada)

  • Interlingual Databases:

– PARAX - E. Blanc - (GETA/CLIPS) – SUBLIM - Ph.D. thesis of G. Sérasset - (GETA)

  • Collaborative project:

– SAIKAM - (NII & NECTEC)

slide-7
SLIDE 7

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 7/22

Lexical Database

General View of the Database

User User User

Dictionary Dictionary

Resource Resource Resource

Interaction with the Dictionaries Extraction of Dictionaries Integration of existing resources

slide-8
SLIDE 8

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 8/22

French Dictionary Interlingual Dictionary Japanese Dictionary Vocable Carte n.f.

Lexie carte à jouer Lexie carte géographique 地図 カード

Acception 343

UNL: card(icl>play)

Acception 345

UNL: map(fld>geography)

Internal Architecture of the Database

Architecture Derived from Dr. Gilles Sérasset’s Ph.D. Thesis

slide-9
SLIDE 9

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 9/22

Monolingual Dictionaries

1. Name of the lexical unit: MEURTRE 2. Grammatical properties: nom, masc 3. Semantic Formula: action de tuer: ~ PAR L'individu X DE

L'individu Y

4. Government pattern: X = I = de N, A-poss Y = II = de N, A-poss 5. (Quasi-)synonyms: {QSyn} assassinat, homicide#1; crime 6. Semantic derivations & collocations:

– {V0} tuer – {A0} meurtrier-adj / *Nom pour X*/ – {S1} auteur [de ART Ø] //meurtrier-n /*Nom pour Y*/ – {S2} victime [de ART Ø] /*Très choquant*/

7. Examples: La mésentente pourrait être le mobile du meurtre. 8. Full Idioms:

– appel au meurtre – crier au meurtre

Structure derived from Prof. Alain Polguère’s Work on DiCo

slide-10
SLIDE 10

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 10/22

Construction Methodology

  • Creation of the lexical soup

– Integration of existing data

  • Revision of the lexical soup

– Revision of the links created automatically

  • Creation of new data

– The lexicographer writes monolingual entries – The translator edits interlingual links

  • Revision of the data

– The lexicologist reviews links & entries

slide-11
SLIDE 11

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 11/22

Creation of the lexical soup

FeM JMDict Dictionaries Lexical Database

slide-12
SLIDE 12

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 12/22

FeM: French->English

http://clips.imag.fr/geta/services/fem/

slide-13
SLIDE 13

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 13/22

FeM structure

http://clips.imag.fr/geta/services/fem/

(:fem-entry (:ENTRY "dictionnaire") (:FRENCH_PRON "diksyone+r") (:FRENCH_CAT "n.m.") (:FRENCH_GLOSS " u n texte") (:ENGLISH_EQU "dictionary") (:FRENCH_PHRASE "les enfants qui ne connaissent pas l'ordre alphabétique ne peuvent pas consulter le dictionnaire")

slide-14
SLIDE 14

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 14/22

JMDict: Japanese->English

from Prof. Jim Breen, Monash University, Australia

http://www.csse.monash.edu.au/~jwb/wwwjdic.html

slide-15
SLIDE 15

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 15/22

JMDict structure

<entry> <ent_seq>1582710</ent_seq> <k_ele> <keb>日本</keb> <ke_pri>jdd1</ke_pri> </k_ele> <r_ele> <reb>にほん</reb> </r_ele> <r_ele> <reb>にっぽん</reb> <re_pri>jdd1</re_pri> </r_ele> <sense> <gloss>Japan</gloss> <gloss g_lang="de">Japan</gloss> </sense> </entry>

slide-16
SLIDE 16

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 16/22

Revision of the Links

fr links ja Lexical Database

slide-17
SLIDE 17

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 17/22

Revision Interface

slide-18
SLIDE 18

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 18/22

Writing of New Entries

Lexical Database Lexicographer

new entry new entry revised

Lexicologist

slide-19
SLIDE 19

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 19/22

Consultation of the Database

Lexical Database Web

entry 1 entry 2

Book

entry 1 entry 2

Machines Humans

entry 1 entry 2 entry 1 entry 2

slide-20
SLIDE 20

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 20/22

Ongoing Work

  • Ph.D. intern (Monthon) at , Tokyo

– Preparation of the lexical soup with specific tools

  • 4 months contract (M. Tomokiyo) (12/00-02/01)

– Preliminary studies on linguistic content

  • 2 years CNRS/JSPS grant at , Tokyo (10/2001—)

– Management of the technical aspects of Papillon – Building of the server and CSCW tools

  • Papillon 2001 workshop at Grenoble, France

– July 2001, organized by GETA/CLIPS

slide-21
SLIDE 21

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 21/22

Conclusion

  • Advantages:

– Easy integration of new languages

  • Ongoing discussions for Thai (KU & NECTEC) & Malay

– Availability of the data with the open source license – Generation of multiple formats from the database

  • Needs for the development of the project:

– Centralized server & team of experts – Develop cooperative tools – Voluntary contributors !

slide-22
SLIDE 22

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 22/22

Contacts

  • Web Site: http://vulab.ias.unu.edu/papillon
  • Responsible: Emmanuel Planas
  • mailto:Emmanuel.Planas@imag.fr
  • Technical aspects: Mathieu Mangeot
  • mailto:Mathieu.Mangeot@imag.fr
  • responsible: Frédéric Andrès
  • mailto:andres@nii.ac.jp
slide-23
SLIDE 23

7,8th December, 2000 7th International Workshop on Academic Information Networks and Systems 23/22

Construction Methodology

!"##$%&'()$* + Français+ ,'-"%'(*+ '%./'(*+ 012+ Idéal: résulat de la 34*("% + "4(+ "4(+ "4(+ "4(+ Idéal mais pas 5(*-"%(6/$ + "4(+ "4(+ %"%+ "4(+ Données du GETA + "4(+ %"%+ "4(+ "4(+ 789+:+,95(;&+ "4(+ "4(+ "4(+ %"%+ <$&(&$*+/(*&$*+ 5(*-"%(6/$* + "4(+ "4(+ %"%+ %"%+ 789 + "4(+ %"%+ "4(+ %"%+ ,95(;& + %"%+ "4(+ "4(+ %"%+ 2=>2?!1=9+ "4(+ %"%+ %"%+ %"%+ De zéro + %"%+ %"%+ %"%+ %"%+ +