framework for supporting multilingual resource
play

Framework for Supporting Multilingual Resource Development at - PowerPoint PPT Presentation

Framework for Supporting Multilingual Resource Development at Expert System Jose Manuel Gomez-Perez jmgomez@expertsystem.com META-FORUM 2016, July 5th, 2016 Expert System About us Framework for Suppor.ng Mul.lingual Resource Development


  1. Framework for Supporting Multilingual Resource Development at Expert System Jose Manuel Gomez-Perez jmgomez@expertsystem.com META-FORUM 2016, July 5th, 2016

  2. Expert System – About us Framework for Suppor.ng Mul.lingual Resource Development at Expert System META-FORUM 2016

  3. Expert System’s COGITO • COGITO interprets text to empower beLer, more informed decision making • Based on Sensigrafo, a monolingual representa5on of knowledge that is both deep and wide • Sensigrafo contains millions of word defini5ons, related concepts and linguis5c informa5on • Several Person-Years each • COGITO leverages context informa5on for disambigua5on based on Sensigrafo • Document categoriza5on and informa.on extrac5on encoded on top of Sensigrafo in rule-based categoriza.on and extrac.on languages • Rule modeling supported by COGITO Studio Framework for Suppor.ng Mul.lingual Resource Development at Expert System META-FORUM 2016

  4. Expert System Today 14 languages na5vely supported Framework for Suppor.ng Mul.lingual Resource Development at Expert System META-FORUM 2016

  5. Challenges and Opportuni.es • Due to Expert System's rapid expansion in the European market, the company faced the challenge of crea5ng new monolingual resources from scratch, or… • Achieve na5ve mul5linguality in a cost-effec5ve manner , while maintaining high accuracy and reducing .me to market • Generalized MT is not the solu5on - resul.ng accuracy drops at least 10% average • Many of the projects in the new countries conceptually similar to previous projects in other languages • Enable reuse of exis5ng seman5c and linguis5c resources , including monolingual rule bases, across languages Framework for Suppor.ng Mul.lingual Resource Development at Expert System META-FORUM 2016

  6. Approach Context-based mapping No previous rule iden.fica.on base to reuse • The goal is not to automate Large document the whole process, rather : corpus available Rule Learning • Bootstrap resources, providing knowledge engineers with a solid base and allevia.ng the blank page syndrome par.cularly for rule development Word & Sense • Leverage context informa5on , Embeddings both in text and in the monolingual Sensigrafos, to improve transla.on quality Automa.c Rule • Provide confidence values to Transla.on guide valida5on efforts • Focus on the exis5ng Reusable rule base monolingual rule bases exists (in a different language) Framework for Suppor.ng Mul.lingual Resource Development at Expert System META-FORUM 2016

  7. Automa.c Rule Transla.on • Transform rules in the original language into Abstract Syntax Trees (AST) . Main nodes include concepts (word senses) , lemmas , and keywords • AST translator replicates ASTs, modifying or replacing nodes from the source language to the target language • Different handling for each node and operator type. Rely on concept mapping between source and target Sensigrafos • Applied to 90K rules in IPTC, EUROVOC, etc. and language pairs IT- ES, IT-FR, EN-DE ü 99.9% rules translated ü 55% to 70% accuracy Framework for Suppor.ng Mul.lingual Resource Development at Expert System META-FORUM 2016

  8. Word and Sense Embeddings • Suggest missing links between • Tokenized, lemma.zed and Sensigrafos using context for normalized the EUROPARL word sense disambigua.on parallel corpora using COGITO • Builds on MT work to infer • Skip-gram model, window size missing dic.onary entries 10, vector dimensionality 400 (Mikolov et al) • Linear projec.on learnt from a • Learn monolingual models and a linear projec5on between them dic.onary with the 5,000 most • Learnt rela5ons display several frequent terms in the source degrees of relatedness with language and their MT different confidence values, e.g. equivalent in the target equivalence, similarity, co- • Transla.on matrix code in Java occurrence, etc. available in GitHub • Pleno (ES) -> full, plenary, partsession, Hortefeux, approve, summarize (EN) hZps://github.com/josemanuelgp/ word2vec_vector-transla5on-java Framework for Suppor.ng Mul.lingual Resource Development at Expert System META-FORUM 2016

  9. Rule Learning • Automa5cally bootstrap a rule base star5ng from a targeted • Focus on beginner’s rules rather than perfect rules • Two main approaches, based on _-idf and decisión tres ü Precision >34% ü Recall >65% Framework for Suppor.ng Mul.lingual Resource Development at Expert System META-FORUM 2016

  10. Come see our poster! Framework for Suppor.ng Mul.lingual Resource Development at Expert System Jose Manuel Gomez-Perez jmgomez@expertsystem.com Framework for Suppor.ng Mul.lingual Resource Development at Expert System META-FORUM 2016

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend