When life gives you lemons, make GF! Inducing grammars from the - - PowerPoint PPT Presentation

when life gives you lemons make gf
SMART_READER_LITE
LIVE PREVIEW

When life gives you lemons, make GF! Inducing grammars from the - - PowerPoint PPT Presentation

When life gives you lemons, make GF! Inducing grammars from the lexicon-ontology interface Christina Unger Semantic Computing Group CITEC, Bielefeld University 1 / 41 In collaboration with: Jeroen van Grondelle, Frank Smit, Jouri Fledderman


slide-1
SLIDE 1

When life gives you lemons, make GF!

Inducing grammars from the lexicon-ontology interface

Christina Unger

Semantic Computing Group CITEC, Bielefeld University

1 / 41

slide-2
SLIDE 2

In collaboration with: Jeroen van Grondelle, Frank Smit, Jouri Fledderman (Be Informed, The Netherlands)

2 / 41

slide-3
SLIDE 3

Motivation

Conceptually scoped language technology

3 / 41

slide-4
SLIDE 4

Today

Natural language plays an increasingly important role as interface to existing services and data.

4 / 41

slide-5
SLIDE 5

Requirements

. 1 Alignment of natural language expressions and

domain concepts, data or services

Would I get housing benefits?

ASK WHERE :user :eligible "true".

. . 2 High precision (reliability and predictability) . . 3 Expertise and time for creating and maintaining

grammars (and for porting it across languages

  • r switching domains)

. . 4 Unrestricted coverage

5 / 41

slide-6
SLIDE 6

Requirements

. . 1 Alignment of natural language expressions and

domain concepts, data or services

Would I get housing benefits?

ASK WHERE { :user :eligible "true". }

. . 2 High precision (reliability and predictability) . . 3 Expertise and time for creating and maintaining

grammars (and for porting it across languages

  • r switching domains)

. . 4 Unrestricted coverage

5 / 41

slide-7
SLIDE 7

Requirements

. . 1 Alignment of natural language expressions and

domain concepts, data or services

Would I get housing benefits?

ASK WHERE { :user :eligible "true". }

. . 2 High precision (reliability and predictability) . 3 Expertise and time for creating and maintaining

grammars (and for porting it across languages

  • r switching domains)

. . 4 Unrestricted coverage

5 / 41

slide-8
SLIDE 8

Requirements

. . 1 Alignment of natural language expressions and

domain concepts, data or services

Would I get housing benefits?

ASK WHERE { :user :eligible "true". }

. . 2 High precision (reliability and predictability) . . 3 Expertise and time for creating and maintaining

grammars (and for porting it across languages

  • r switching domains)

. . 4 Unrestricted coverage

5 / 41

slide-9
SLIDE 9

Requirements

. . 1 Alignment of natural language expressions and

domain concepts, data or services

Would I get housing benefits?

ASK WHERE { :user :eligible "true". }

. . 2 High precision (reliability and predictability) . . 3 Expertise and time for creating and maintaining

grammars (and for porting it across languages

  • r switching domains)

. . 4 Unrestricted coverage

5 / 41

slide-10
SLIDE 10

Conceptually scoped language technology

The underlying application introduces a conceptual scope that determines the language fragment that is relevant and meaningful.

6 / 41

slide-11
SLIDE 11

Goal

CONCEPTUALIZATION

(ONTOLOGY)

LEXICAL INFORMATION

(ONTOLOGY LEXICON)

GRAMMAR

. .

7 / 41

slide-12
SLIDE 12

If life gives you lemons...

The lexicon-ontology interface

8 / 41

slide-13
SLIDE 13

Ontology

Example: Fresh water animals

. . Species . . Bird . Fish . Crayfish . Perlfish . Heron . Sea Raven . BodyOfWater . Country . Integer . String . in . livesIn . conservationStatus . pollution . predator

9 / 41

slide-14
SLIDE 14

Modelling data w.r.t. an ontology

Example: Chiemsee fish

1 :Germany

rdf:type :Country .

2 :Chiemsee rdf:type :BodyOfWater ; 3

:in :Germany ;

4

:pollution 2 .

5 6 :ChiemseeCrayfish rdf:type :Crayfish ; 7

:livesIn :Chiemsee ;

8

:conservationStatus "EX" .

9 10 :ChiemseePerlfish rdf:type :Perlfish ; 11

:livesIn :Chiemsee ;

12

:predator :Heron, :SeaRaven ;

13

:conservationStatus "EN" .

10 / 41

slide-15
SLIDE 15

Ontology lexica

Aim: capture rich and structured linguistic information about how ontology elements are lexicalized in a particular language

11 / 41

slide-16
SLIDE 16

Why simple terminological knowledge is not enough

The conceptual granularity of language often does not coincide with that of the schema underlying a particular dataset...

:team → to play for if the subject is any kind of player → to race for if the subject is a race driver

...and can also vary across languages.

:eat →en eat →de essen if the subject is a human →de fressen if the subject is an animal

12 / 41

slide-17
SLIDE 17

Why simple terminological knowledge is not enough

Not only lexicalizations of single classes or properties are relevant, but also lexicalizations of complex constructions.

Which fish live in Germany?

. . Fish . BodyOfWater . Country . in . livesIn

Which fish are endangered?

. . Species . "EN" . conservationStatus

13 / 41

slide-18
SLIDE 18

lemon (Lexicon Model for Ontologies)

http://lemon-model.net meta-model for describing ontology

lexica with RDF

declarative, thus abstracting from specific

syntactic and semantic theories

separation of lexicon and ontology

.

Semantics by reference

. . The meaning of lexical entries is specified by pointing to elements in the ontology.

14 / 41

slide-19
SLIDE 19

The lemon model (core)

LexicalEntry Lexicon LexicalForm LexicalSense Ontology

writtenRep:String form sense isSenseOf reference isReferenceOf entry language:String canonicalForm

  • therForm

abstractForm prefRef altRef hiddenRef

Word Phrase Part

15 / 41

slide-20
SLIDE 20

The lemon model (argument mapping)

Lexical Entry Argument Frame

synBehavior synArg semArg subjOfProp

  • bjOfProp

isA

LexicalSense

context:Resource definition:Resource condition:Resource sense isSenseOf reference isReferenceOf propertyDomain propertyRange

Ontology

subsense

Syntactic Role Marker

marker

16 / 41

slide-21
SLIDE 21

Example

:Perlfish :predator :Heron . → Herons eat perl fish.

. . eat : Word partOfSpeech=verb . : LexicalSense . <http://example.org/OceanWildlife.owl#predator> . : TransitiveFrame . : Argument . : Argument . : Form writtenRep="eat"@en . canonical form . sense . reference . synBehavior . directObject . subject . subjOfProp .

  • bjOfProp

17 / 41

slide-22
SLIDE 22

...make GF!

Mapping ontology lexica to grammars

18 / 41

slide-23
SLIDE 23

Roadmap

Mapping ontology lexica to GF requires to capture

the ontological (semantic) level the lexical (morpho-syntactic) level

General method:

. . 1 ontology → abstract syntax . . 2 lexical entries → concrete syntax

19 / 41

slide-24
SLIDE 24

From an ontology to abstract syntax 1

1 K. Angelov: The abstract syntax as ontology. GFSS 2009.

  • K. Angelov & R. Enache: Typeful Ontologies with Direct Multilingual
  • Verbalization. CNL 2010.

20 / 41

slide-25
SLIDE 25

Ontology to abstract syntax

1 cat 2 3

Class;

4

Individual Class;

5 6

Datatype;

7

Literal Datatype;

8 9

Statement;

21 / 41

slide-26
SLIDE 26

Example

. . Species . String . conservationStatus . . Fish . Bird

10 fun 11 12

Species, Fish, Bird : Class;

13

String : Datatype;

14 15

ChiemseePerlfish : Individual Fish;

16 17

conservationStatus : Individual Species

18

  • > Literal

String

19

  • > Statement;

20 21

coerce_Fish_to_Species : Individual Fish

22

  • > Individual Species;

22 / 41

slide-27
SLIDE 27

OWL constructs

Add functions for complex

classes (union, intersection, complement, restriction classes) properties (inverse properties, property chains)

Example:

1 :Endangered rdf:type owl:Restriction; 2

  • wl:onProperty onto:conservationStatus ;

3

  • wl:hasValue

"EN" .

4 5 Things_with_conservationStatus_EN : Class; 23 / 41

slide-28
SLIDE 28

From a lexicon to concrete syntax

24 / 41

slide-29
SLIDE 29

Example

Lexicon:

1 :ocean_N a lemon:Word ; 2 lexinfo:partOfSpeech

lexinfo:commonNoun;

3 lemon:canonicalForm [ lemon:writtenRep "ocean"@en ]; 4 lemon:otherForm

[ lemon:writtenRep "oceans"@en;

5

lexinfo:number lexinfo:plural ];

6 lemon:sense

[ lemon:reference

  • nto:Ocean ] .

Concrete syntax:

1 lin Ocean = mkCN (mkN "ocean" "oceans"); 25 / 41

slide-30
SLIDE 30

Example

Lexicon:

1 :sea_N a lemon:Word ; 2 lexinfo:partOfSpeech

lexinfo:commonNoun;

3 lemon:canonicalForm [ lemon:writtenRep "sea"@en ]; 4 lemon:otherForm

[ lemon:writtenRep "seas"@en;

5

lexinfo:number lexinfo:plural ];

6 lemon:sense

[ lemon:reference

  • nto:Ocean ] .

Concrete syntax:

1 lin Ocean = variants { 2

mkCN (mkN "ocean" "oceans");

3

mkCN (mkN "sea" "seas")

4

};

26 / 41

slide-31
SLIDE 31

Linearization categories

lincat Individual = NP;

the Pacific Ocean

Class = cn:CN; ap:AP ;

whale endangered

Statement = np:NP; vp:VP; vpSlash:VPSlash ;

  • NP The finback

VP lives in the Pacific Ocean .

Which ocean does

NP the finback VPSlash live in

?

27 / 41

slide-32
SLIDE 32

Linearization categories

lincat Individual = NP;

the Pacific Ocean

Class = cn:CN; ap:AP ;

whale endangered

Statement = np:NP; vp:VP; vpSlash:VPSlash ;

  • NP The finback

VP lives in the Pacific Ocean .

Which ocean does

NP the finback VPSlash live in

?

27 / 41

slide-33
SLIDE 33

Linearization categories

lincat Individual = NP;

the Pacific Ocean

Class = { cn:CN; ap:AP };

whale endangered

Statement = np:NP; vp:VP; vpSlash:VPSlash ;

  • NP The finback

VP lives in the Pacific Ocean .

Which ocean does

NP the finback VPSlash live in

?

27 / 41

slide-34
SLIDE 34

Linearization categories

lincat Individual = NP;

the Pacific Ocean

Class = { cn:CN; ap:AP };

whale endangered

Statement = { np:NP; vp:VP; vpSlash:VPSlash };

[NP The finback ] [VP lives in the Pacific Ocean ].

Which ocean does [NP the finback ] [VPSlash live in ]?

27 / 41

slide-35
SLIDE 35

Mapping lexical entries to linearizations

Starting point:

Input lexicon (centered around entries)

1 :ocean_N lemon:sense [lemon:reference onto:Ocean]. 2 :sea_N

lemon:sense [lemon:reference onto:Ocean].

3 4 :eat_V lemon:sense [lemon:reference onto:predator], 5

[lemon:reference onto:prey].

Target grammar (centered around senses)

1 lin Ocean

= variants { ocean_N; sea_N };

2 lin predator = eat_V; 3 lin prey

= eat_V;

28 / 41

slide-36
SLIDE 36

Mapping lexical entries to linearizations

. . 1 Collect all senses that occur in the lexicon (simple or

compound), together with all entries that denote this sense.

29 / 41

slide-37
SLIDE 37

Mapping lexical entries to linearizations

. . 1 Collect all senses that occur in the lexicon (simple or

compound), together with all entries that denote this sense. Example:

1 :ocean N lemon:sense [lemon:reference onto:Ocean]. 2 :sea N

lemon:sense [lemon:reference onto:Ocean].

{ reference:

Ocean, entries: [ocean N,sea N] }

30 / 41

slide-38
SLIDE 38

Mapping lexical entries to linearizations

. . 2 For each such entry, extract all relevant lexical

information:

canonical form part of speech (e.g. noun, verb) syntactic frames (with arguments and argument-specific

information, such as markers and optionality)

POS-specific information noun: gender, singular and plural forms verb: present, past, participle, gerund forms adjective: positive, comparative, superlative forms

31 / 41

slide-39
SLIDE 39

Mapping lexical entries to linearizations

. . 3 Based on the collected information, for every sense

construct a list of lin variants by instantiating a GF template for each frame of each entry lexicalizing that sense.

32 / 41

slide-40
SLIDE 40

Mapping lexical entries to linearizations

. . 3 Based on the collected information, for every sense

construct a list of lin variants by instantiating a GF template for each frame of each entry lexicalizing that sense. Example:

1 lin predator o s = variants { 2

{ np = s;

3

vp = mkVP (mkV2 eat_V) o;

4

vpSlash = mkVPSlash (mkV2 eat_V) };

5

... };

6 7 oper eat_V = mkV "eat" ...; 33 / 41

slide-41
SLIDE 41

Architecture

34 / 41

slide-42
SLIDE 42

Domain grammars as grammar modules

35 / 41

slide-43
SLIDE 43

Examples

liveIn (coerce Perlfish to Fish ChiemseePerlfish) (coerce Lake to BodyOfWater Chiemsee) the Chiemsee perlfish lives in Lake Chiemsee liveIn (Most Fish) (Generic BodyOfWater) most fish live in bodies of water neg (liveIn o in (Most Fish) Sweden) most fish don't live in Sweden mod Can (predator (All Species) (Generic Species)) every species can be eaten predator (That Species) (This Species) this species is a predator of that species

36 / 41

slide-44
SLIDE 44

Outlook

An ecosystem for language technology

37 / 41

slide-45
SLIDE 45

38 / 41

slide-46
SLIDE 46

39 / 41

slide-47
SLIDE 47

Appendix

40 / 41

slide-48
SLIDE 48

Code and resources

lemon2gf (code and documentation)

https://github.com/cunger/lemon2gf

Grammar modules

https://github.com/cunger/grammars

41 / 41