O M L TO Multilingual On-Line T ranslation non multa, sed - - PowerPoint PPT Presentation

o
SMART_READER_LITE
LIVE PREVIEW

O M L TO Multilingual On-Line T ranslation non multa, sed - - PowerPoint PPT Presentation

O M L TO Multilingual On-Line T ranslation non multa, sed multum MOLTO Consortium FP7-247914 Project summary MOLTOs goal is to develop a set of tools for translating texts between multiple languages in real time with high


slide-1
SLIDE 1

FP7-247914

Multilingual On-Line T ranslation

M L TO

O

non multa, sed multum

MOLTO Consortium

slide-2
SLIDE 2

Project summary

MOLTO’s goal is to develop a set of tools for translating texts between multiple languages in real time with high

  • quality. Languages are separate modules in the tool and

can be varied; prototypes covering a majority of the EU’s 23 official languages will be built.

slide-3
SLIDE 3

Consortium

slide-4
SLIDE 4

How much?

! Total: 3,000,000 EUR, EC

contribution 2,375,000 EUR

! 90% for work (390 person months) ! 1 March 2010 – 28 February 2013

Management 4% RTD 86% Dissemination 10%

slide-5
SLIDE 5

What’s new

Google/Babelfish MOL TO target user input coverage quality

consumer producer unpredictable predictable unlimited limited browsing publishing

slide-6
SLIDE 6

T ranslation directions

Statistical methods work best to English

" rigid word order " simple morphology

Grammar-based methods work equally well for different languages

" German word order " Finnish cases

slide-7
SLIDE 7

MOL TO domains

! Mathematical exercises (WebALT) ! Biomedical and pharmaceutical patents ! Museum object descriptions

slide-8
SLIDE 8

More potential uses

! Wikipedia articles ! E-commerce sites ! Medical treatment recommendations ! Tourist phrasebooks ! Social media ! SMS

slide-9
SLIDE 9

MOL TO technologies

OWL Ontologies Statistical Machine T ranslation GF grammaticalframework.org

slide-10
SLIDE 10

GF - Grammatical Framework

Core of MOLTO is a multilingual GF grammar:

! meaning-preserving translation by composition of parsing

and generation

! abstract syntax as interlingua ! RGL, GF Resource Grammar Library, for inflectional

morphology and syntactic combination functions of 16 languages

slide-11
SLIDE 11

Abstract Syntax

MOL TO Languages

slide-12
SLIDE 12

Domain-specific interlinguas

The abstract syntax must be formally specified, well-understood

" semantic model for translation " fixed word senses " proper idioms

e.g. a mathematical theory, an ontology

slide-13
SLIDE 13

C h a l l e n g e

Grammar tools

Scale up production of domain interpreters

100’s of words GF experts months hand-crafting a grammar 1000’s of words domain experts & translators days translating a set of examples

slide-14
SLIDE 14

Mathematics

Abstract syntax

Nat : Set Even : Exp -> Prop Odd : Exp -> Prop Gt : Exp -> Exp -> Prop Sum : Exp -> Exp

English concrete syntax (by examples) Nat = "number" Even x = "x is even" Odd x = "x is odd" Gt x y = "x is greater than y" Sum x = "the sum of x" ... every even number that is greater than 0 is the sum of two odd numbers German concrete syntax (by examples) Nat = "Zahl" Even x = "x ist gerade" Odd x = "x ist ungerade" Gt x y = "x ist größer als y" Sum x = "die Summe von x" ... jede gerade Zahl, die größer als 0 ist, ist die Summe von zwei ungeraden Zahlen

Grammar generalization

slide-15
SLIDE 15

T ranslator’s tools

" text input + prediction " syntax editor for modification " disambiguation " on the fly extension " normal workflows: API for plug-ins in standard

tools, web, mobile phones...

slide-16
SLIDE 16

Authoring: document edits

slide-17
SLIDE 17

Authoring: document edits

Chère Madame X, j’ai l’honneur de vous informer que vous avez été promue chargée de projet. Avec mes salutations distinguées, le président.

slide-18
SLIDE 18

Authoring: document edits

Chère Monsieur Y, j’ai l’honneur de vous informer que vous avez été promue chargée de projet. Avec mes salutations distinguées, le président.

Madame X ! Monsieur Y

slide-19
SLIDE 19

Authoring: syntax edits

Chère Madame X, j’ai l’honneur de vous informer que vous avez été promue chargée de projet. Avec mes salutations distinguées, le président. Cher Monsieur Y, j’ai l’honneur de vous informer que vous avez été promu chargé de projet. Avec mes salutations distinguées, le président.

Letter (Dear (Mrs "X")) (Honour (Promote ProjectManager)) (Formal President) Letter (Dear (Mr "Y")) (Honour (Promote ProjectManager)) (Formal President)

Mrs X ! Mr Y

slide-20
SLIDE 20

Statistical Machine T ranslation

Main goal: improve robustness of raw GF on a quasi-open domain by statistical machine translation

slide-21
SLIDE 21

C h a l l e n g e

Robustness & statistics

" Statistical Machine Translation as fall-back " Hybrid systems " Learning of GF grammars by statistics " Improving SMT by grammars

slide-22
SLIDE 22

Models of hybrid MT systems

" baseline: cascade of independent MT systems; " hard integration: GF partial output is fixed in a

regular SMT decoding;

" soft integration I: GF partial output, as phrase pairs,

is integrated as a discriminative probability feature model in a phrase-based SMT system;

" soft integration II: GF partial output, as tree

fragment pairs, is integrated as a discriminative probability model in a syntax-based SMT system.

slide-23
SLIDE 23

Innovation: OWL interoperability

OWL as a way to specify interlinguas:

" 2-way transformation ontology-grammar " Web pages with ontologies... will soon be equipped by

translation systems

" Natural language search and inference

slide-24
SLIDE 24

The MOLTO infrastructure will

" semi-automatically create abstract grammars from

  • ntologies;

" derive ontologies from grammars; " retrieve instance level knowledge from/in NL by

transforming queries to semantic queries, and by expressing the knowledge in NL.

NL Knowledge Management

slide-25
SLIDE 25

OWL↔Grammar (sketch)

Class(pp:Nat ...) cat Nat ObjectProperty(pp:Odd domain(pp:Nat)) fun Odd: Nat->Prop ObjectProperty(pp:Gt domain(pp:Nat) range(pp:Nat)) fun Gt: Nat->Nat->Prop

slide-26
SLIDE 26

First results

# Online Demo, Jun 2010 at molto-project.eu # Knowledge Representation Infrastructure, Nov 2010 # GF Grammar Compiler API, Mar 2011