 
              Integrating software and lingware in the OMNIA and AXiMAG projects, following a KISS approach Achille Falaise Laboratoire d'Informatique de Grenoble Grenoble Computer Science Laboratory 1/20
Motivations behind software integration for NLP ● For high level projects involving... ● ...many existing linguistic resources and linguistic software ( lingware ), such as lexical databases, tokenisers, syntactic parsers, MT systems, etc. ● ...backend software (databases, web servers, etc.) ● ...end-user software (offline or online GUI) ● Many of them conceived by different teams, following heterogeneous approaches, for various purposes... ● What & how do we integrate this at LIG-GETALP team ? ● 2 examples : OMNIA project & AXiMAG project 2/20
Outline ● The KISS approach and REST implementation ● OMNIA project (terminated) ● AXiMAG project (reimplementation in progress) 3/20
The KISS principle ● Stands for : Keep It Simple, Stupid ● Acronym created by war plane manufacturers ● A general design principle, applied to IT 4/20 4/20
KISS for integration : REST architecture ● Stands for : REpresentational State Transfert ● Integration of services, through HTTP protocol ● Client-server (independent of any GUI) ● Stateless (services have nothing to remember) ● Cachable (for better scalability) ● Layered (redirections permitted without restrictions) ● Service call via : ● Any web browser ● Any programming language supporting HTTP requests ● Command line with popular utilities like cURL 5/20
Lingware integration in the OMNIA project 6/20
OMNIA project (2008-2010) : multilingual information retrieval ● From companion texts (eg picture captions) in several languages, to interlingual concepts ● Ontology-based indexing Multimodal analysis Requests Indexation Ontology Press, Picasa, etc. Relevant images 7/20
OMNIA modular architecture : from text to concepts 4 modules & 4 resources Form-lemma Lemma-UW UW-Concept Ontology dictionnary dictionnary Map Companion texts Interlingual Ambig Concept Lemmatisation Concepts annotation annotation annotation NL Requests 8/20
Example companion text AWA05 - 20020924 - BAGHDAD, IRAQ : Iraqi women sit under a portrait of Iraqi President Saddam Hussein in a waiting room in Baghdad's al-Mansur hospital 24 September 2002. Saddam Hussein is doggedly pursuing the development of weapons of mass destruction and will do his best to hide them from UN inspectors, the British government claimed in a 55-page dossier made public just hours before a special House of Commons debate on Iraq. Iraqi Culture Minister Hamad Yussef Hammadi called the British allegations "baseless." EPA PHOTO AFPI AWAD AWAD 9/20
Lemmatisation process Iraqi women sitting in a waiting room of Bagdad hospital (Iraqi, NOUN) (woman, NOUN) (sit, VERB) (in, PREP) (a, DET) (waiting room, NOUN) (of, PREP) (Bagdad, NOUN) (hospital, NOUN) Graph structure with ambiguities : a (DET) waiting room (NOUN) 40 a (NOUN) 41 waiting (NOUN) room (NOUN) 43 42 wait (VERB) room (VERB) 10/20
Interlingual lexicon : Universal Words (UW) ● Universal Words (UW) ● represent acceptions without ambiguities ● headword with meaning restrictions ● Examples : – book(icl>thing) – book(icl>do, agt>human, obj>thing) – ikebana(icl>flower arrangement) ● 200.000 UW++ built from WordNet Synsets 11/20
Interlingual annotation process waiting room (NOUN) a (DET) 43 waiting_room(icl>room>thing,equ>lounge) 40 a (NOUN) 41 room(icl>opportunity>thing) waiting(icl>inactivity>thing) 42 room(icl>opportunity>thing) waiting (NOUN) room(icl>position>thing) wait(icl>act>occur,obj>thing) room(icl>area>thing) wait(icl>work>do,obj>thing) room (VERB) waiting (VERB) room(icl>dwell>do) room (VERB) 12/20
Automatic disambiguation process waiting room (NOUN) 0.0020 a (DET) 43 waiting_room(icl>room>thing,equ>lounge) 40 a (NOUN) 41 0.0001 room(icl>opportunity>thing) 0.0002 waiting(icl>inactivity>thing) 42 0.0001 room(icl>opportunity>thing) waiting (NOUN) 0.0001 room(icl>position>thing) 0.0001 wait(icl>act>occur,obj>thing) 0.0002 room(icl>area>thing) 0.0001 wait(icl>work>do,obj>thing) room (VERB) waiting (VERB) 0.0001 room(icl>dwell>do) room (VERB) Ant algorithm (Schwab & Lafourcade 2007) 13/20
Ontology for conceptual annotation ● Ontology ● Domain dependant ● Concepts hierarchy ● Interlingual ● Alignment with interlingual lexicon ● Manual or automatic (Rouquet & Nguyen, 2009) WOMEN PRESIDENT MINISTER HOSPITAL HOUSE RESIDENTIAL BUILDING PEOPLE POLITICS BUILDING Ontology (excerpt) 14/20
Extracted concepts ● Output for whole text Concept Score WOMEN PRESIDENT MINISTER HOSPITAL HOUSE RESIDENTIAL 0.0002 RESIDENTIAL BUILDING BUILDING PEOPLE POLITICS PRESIDENT 0.0004 BUILDING Ontology (excerpt) BUILDING 0.0004 HOUSE 0.0002 HOSPITAL 0.0002 POLITICS 0.0044 MINISTER 0.0040 WOMAN 0.0002 PEOPLE 0.0146 15/20
OMNIA demo 16/20
Lingware & software integration in the AXiMAG project 17/20
AXiMAG version 2 project (2011) : website collaborative translation Systran Google Reverso Sistec Tradoh MultiMT system Sectraw Translation memories manager AXiMAG Website collaborative translation widget 18/20
AXiMAG demo 19/20
Design issue in service-oriented architecture ● Request building & request processing ● May lead to DoS ● Means to avoid this issue ● Scope of the service ● Internal cache 20/20
Recommend
More recommend