Multilingual Verbalisation of Modular Ontologies using GF and lemon - - PowerPoint PPT Presentation

multilingual verbalisation of modular
SMART_READER_LITE
LIVE PREVIEW

Multilingual Verbalisation of Modular Ontologies using GF and lemon - - PowerPoint PPT Presentation

Multilingual Verbalisation of Modular Ontologies using GF and lemon Brian Davis, Ramona Enache, Jeroen van Grondelle and Laurette Pretorius CNL 2012 August 29, 2012 Structure WHY? Be Informed use case as context The meta-model/model


slide-1
SLIDE 1

Multilingual Verbalisation of Modular Ontologies using GF and lemon

Brian Davis, Ramona Enache, Jeroen van Grondelle and Laurette Pretorius CNL 2012 August 29, 2012

slide-2
SLIDE 2

Structure

WHY? Be Informed use case as context The meta-model/model separation - meta-model semantics WHAT? Verbalisation and … — Modularisation — Label variants and their manipulation — Multilingualism — lemon-GF mapping HOW? Achieving these four aspects in GF (and lemon) WHAT NEXT? Ideas about future work

slide-3
SLIDE 3

Context: Be Informed Business Processes Platform

Challenges: Adoption of ontologies -> new audiences (knowledge engineers and ontologists, business users, end users, etc.) -> access via verbalisation in multiple languages Dealing with complexity -> many constraints; changing rules; contextual rules, e.g. customer, time, …; rules from many sources that may cause conflict and overlap Be Informed Business Process Ontology: Captures all relevant activities, artifacts, involved roles etc. and the relations between these in a modularised way: Meta-model, using pre- and post-condition semantics, and Models of specific business process applications Verbalisation: Based on pattern sentences

slide-4
SLIDE 4

MOLTO is funded by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement FP7-ICT-247914.

Context: Be Informed Use Case

  • Be Informed offers ontology driven support

throughout policy lifecycle

  • Business processes, products and decisions, registrations,

interaction

  • Drafting, choosing, communicating, executing, evaluating
  • Multilingual, because
  • Customers are offering multi lingual services (for example:

immigrations, Dutch government in the Caribbean)

  • Customers are sharing international models (for example:

Europe, emission trading)

  • Be Informed is developing international business
  • Natural language, because
  • New audiences for ontologies (domain experts, policy

makers, citizens)

  • Models lead to specification, documentation, case

documents and letters

slide-5
SLIDE 5

Remaining Challenges

CNL 2010: Van Grondelle, Heller, Grijzen, Spreeuwenberg

  • “How to prevent large numbers of patterns

– Language variations:

  • Inflectional morphology: Plurals,..

– Other natural languages

  • Mathematical Expressions
  • Named things vs anonymous things
  • Extending/relating CNL’s like we extend/relate meta models”

CNL 2012: GF, lemon, multilingualism, label variants, modularisation

slide-6
SLIDE 6

Meta-model

slide-7
SLIDE 7

Model: Grant application process

slide-8
SLIDE 8

Pattern Sentences

Van Grondelle, J.C., Gülpers, M.: Specifying flexible business processes using pre and post conditions. In PoEM, Volume 92 of Lecture Notes in Business Information Processing, Springer (2011) 38–51:

  • 1. The activity PUBLISHING THE RESULT may be performed if

(a) a document of type DOCUMENT WITH DETAILS is available.

  • 2. The activity PUBLISHING THE RESULT is completed if

(a) a document of type SUBMISSION FORM has been created. Clumsy grammar and lack of fluency Non-scalability in terms of number of supported languages

slide-9
SLIDE 9

Modularisation

Meta-model Models Concepts and relations chosen based on consensus Individual parties (no consensus) Determined once, fixed Introduced over time, frequent changes Ontology formalism Various information sources/formalisms/styles Created by knowledge engineers Created by a wide range of people BI default meta models (stable) Resource intensive development (changes/updates) Lexicalisation and verbalisation Labels follow the ontology according to guidelines (e.g. case, activity, etc.) Labels exhibit large variation Complexity at lexical and syntax/grammatical levels (pattern sentences) Complexity at lexical level

slide-10
SLIDE 10

Label variants

Sources of Variation: Non-linguistic

  • Different styles of choosing labels
  • Different backgrounds of the people involved in modelling
  • Trade-off: guidelines and standards for systematically choosing good lab
  • vs. industry adoptability and robustness in practice
  • Multilingual contexts

Linguistic

  • Concepts referred to by a proper name, noun or compound noun (term),

e.g. “Intake” or “Equality principle”

  • Concepts referred to by description, in form of a proposition or verb
  • riented style e.g. “Publishing the result”, “Publish the result” or “The

result is published”

slide-11
SLIDE 11

Label manipulation

Manipulation :

  • While allowing some freedom of choice in label selection, the

verbalisation of label variants that refer to the same concept in the

  • ntology should be unique
  • Verbalisation of triples with complex labels according to required sentence

patterns, with increased fluency (L, C and N)

slide-12
SLIDE 12

Requirements for ‘ontology-lexicon’ model

– Represent linguistic information relative to ontology

  • Avoid unnecessary ambiguities by representing only lexical features relevant to semantics of

underlying application

– Keep semantics separate from linguistic info

  • Separate clearly ‘world’ (properties of objects referred to by words) from ‘word’ (properties
  • f words) knowledge

– Modular, minimal design

  • Provide simple core model that can be easily extended upon need

lexicon model for ontologies: ‘lemon’

– General model for formalising lexical features relative to independently defined ontological semantics – http://www.monnet-project.eu/lemon

lemon

slide-13
SLIDE 13

lemon: Overview

slide-14
SLIDE 14

Verbalisation

Ontology verbalisation: exploits the complementary strengths of GF and lemon (modularisation, mapping …) GF: captures ontological information as well as the required sentence structure for multiple languages lemon: provides concrete label information in multiple languages Specification of BI business processes in terms of pre- and post-conditions requires verbalisation of such conditions, in accordance with the sentence patterns Concept labels are to be verbalised as propositional statements. Triples (activities with pre-conditions and/or post-conditions) are verbalised as conditional statements (“A if B”), where A and B are simple propositional statements with modalities, as appropriate.

slide-15
SLIDE 15

Verbalising the triple

(Activity, Requires_Available, Artifact subtyped as Document)

slide-16
SLIDE 16

Grammatical Framework(GF) in a nutshell

  • Grammar formalism
  • GF grammar = abstract + concretes
  • Mainly used for multilingual applications of natural language

describing limited domains

  • Resource library – basic syntactic constructions for 30

languages, useable as software library

slide-17
SLIDE 17

Prior uses of GF

  • Multilingual mathematical exercises(WebAlt)
  • Dialogue systems(TALK)
  • Verbalising ontologies(SUMO, MOLTO)
  • Modelling controlled natural languages (Attempto)
slide-18
SLIDE 18

Advantages of domain-specific GF grammars

  • Code reuse – new grammars are easily developed using the

resource library

  • Accessible w/o linguistic training
  • Accessible w/o extensive GF training(example-based)
  • Can model sophisticated aspects of natural language(long-

distance dependencies, discontinuous constituents, clitics)

  • Define a translation system for any pair of languages
slide-19
SLIDE 19

The problem

  • Verbalise a business model for English and Dutch
slide-20
SLIDE 20

The problem

slide-21
SLIDE 21

The problem

Separate concepts in

  • T-Box: basic architecture of the meta-model => general

patterns for verbalisation

  • A-Box: instances of the concepts from the T-Box, particular

cases of the meta-model

slide-22
SLIDE 22

The problem

T-Box

  • Main modeling task
  • Involves specifying the abstract syntax and concrete

syntaxes defining verbalisation patterns

slide-23
SLIDE 23

The solution : T-Box

Abstract syntax:

  • Translate the concepts in the model to GF categories
  • Translate the signatures of the functions from the model to GF

functions

slide-24
SLIDE 24

The solution : T-Box

  • Abstract syntax:
  • cat Artifact;
  • Activity;
  • Fragment;
  • fun requires_available : Activity →

Artifact → Fragment ;

slide-25
SLIDE 25

The solution : T-Box

  • Concrete syntax:
  • Language specific
  • Maps concepts to basic syntactic categories(NP, S) and

complex ones for a higher-quality language generation

slide-26
SLIDE 26

The solution : T-Box

  • Concrete syntax:
  • lincat Artifact = NP ;
  • Activity = {noun : NP; subj : NP; vp : VP;
  • hasVerb : Bool };
  • Fragment = {subj : NP; pred : VP;
  • ext : {s : S; hasExt : Bool}};
slide-27
SLIDE 27

The solution : T-Box

Concrete syntax:

  • define overloaded functions to build the categories
  • oper mkFragm = overload {
  • mkFragm : NP → VP → Fragm =
  • \np, vp → {subj = np; pred = vp;
  • ext = {s=dontCareS; hasExt=False}};
  • mkFragm : NP → VP → S → Fragm =
  • \np, vp, sub → {subj = np; pred = vp;
  • ext = {s=sub ; hasExt = True}}; }
slide-28
SLIDE 28

The solution : T-Box

Concrete syntax:

  • oper mkActivity = overload {
  • mkActivitm : NP → Activity =
  • \o → {noun = o; subj = o; vp = noVP;
  • hasVerb = False};
  • mkActivity : V2 → NP → Activity =
  • \v,o → {noun = nominalize (mkVPSlash v) o;
  • subj = o; vp = passiveVP v; hasVerb = True}; }
slide-29
SLIDE 29

The solution : T-Box

Concrete syntax:

  • implement the functions as verbalisation patterns
  • lin requires_available ac ar =
  • mkFragm ac.subj ac.vp (mkS (mkCl ar (mkVP

available_A));

slide-30
SLIDE 30

The solution : A-Box

Concrete syntax:

  • fun Aintake = mkActivity (mkNP intake_N);
  • fun ApublishingOfResult = mkActivity
  • publish_V2 (mkNP the_Det result_N);
slide-31
SLIDE 31

The solution : Example

The triple (Activity, Requires_Available, Artifact) from the model generates the following: Eng: The intake may be completed, if the submission form is available. Dut: De inname kan worden afgerond, als het aanvraagformulier bschikbaar is.

slide-32
SLIDE 32

The solution : Example

slide-33
SLIDE 33

Demo

Structure of the grammar and a couple of examples

slide-34
SLIDE 34

Future Work

  • Further exploration of GF-lemon mapping within

Molto and Monnet

– GF generation based on automatic, NLP based label analysis as offered by lemon generator

  • Add advanced sentence planning and improve

fluency

  • Investigate label choosing practices in real world

– And support them in this framework

slide-35
SLIDE 35

About this work

This work is funded in part by the European Community's Seventh Framework Program (FP7/2007-2013) under Grant Agreement no.: FP7-ICT-248458 and FP7-ICT-247914

Brian Davis Ramona Enache Jeroen van Grondelle Laurette Pretorius