FrameNet CNL: A Knowledge Representation and Information Extraction - - PowerPoint PPT Presentation

framenet cnl
SMART_READER_LITE
LIVE PREVIEW

FrameNet CNL: A Knowledge Representation and Information Extraction - - PowerPoint PPT Presentation

FrameNet CNL: A Knowledge Representation and Information Extraction Language Guntis Barzdins Institute of Mathematics and Computer Science, University of Latvia CNL-2014, 20-22 August 2014, Galway This research was partially supported by the


slide-1
SLIDE 1

FrameNet CNL: A Knowledge Representation and Information Extraction Language

Guntis Barzdins

Institute of Mathematics and Computer Science, University of Latvia

CNL-2014, 20-22 August 2014, Galway

This research was partially supported by the Project Nr.2DP/2.1.1.1.0/13/APIA/VIAA/014 (ERAF) “Identification of relations in newswire texts and graph visualization of the extracted relation database” under contract Nr. 1/5-2013, LU MII Nr. 3-27.3-5-2013.

slide-2
SLIDE 2

Outline

 FrameNet frames of interest and automatic SRL

We use 26 frames  Frame Element filler disambiguation towards

the real-world entities

We disambiguate only Persons and Organizations

Named Entity Linking (NEL) or Cross Document Coreference (CDC)  IE & KR system for the Latvian News Agency

Entire digitized Latvian news-archive (~12M articles) processed

The method part of Horizon-2020 ICT-15 proposals

 from MicroEvents to CompositeEvents, n-ary relation extraction from text  multilingual Information Extraction, Knowledge visualization

slide-3
SLIDE 3

Information Extraction and CNL

FN-CNL abstract knowledge representation (AKR) Source documents in NL Text in FN-CNL information extraction FN-CNL text generation FN-CNL parsing

slide-4
SLIDE 4

Information Extraction and CNL

FN-CNL abstract knowledge representation (AKR) Source documents in NL Text in FN-CNL information extraction FN-CNL text generation FN-CNL parsing

slide-5
SLIDE 5

Commercial Transaction Frame

 The event known as Commercial Transaction

is a small bit of history where a Buyer and a Seller exchange Money and Goods

 The frame brings with it a checklist of the

frame elements that have to be part of the event

– Buyer, Seller, Goods, Money  Frames are langugage independent (multilinguality) From C.J.Fillmore’s slides presented at FrameNet MasterClass during TLT8, (2009)

slide-6
SLIDE 6

Commercial Transaction Frame

 Various target words may evoke the same frame

1.

They sold me the laptop for $1100.

2.

I bought the laptop for $1100.

3.

They only charged me $1100 for the laptop.

4.

My laptop cost me $1100.

5.

I got the laptop for a mere $1100.

From C.J.Fillmore’s slides presented at FrameNet MasterClass during TLT8, (2009)

slide-7
SLIDE 7

Commercial Transaction Frame

 Various target words may evoke the same frame

1.

They sold me the laptop for $1100.

2.

I bought the laptop for $1100.

3.

They only charged me $1100 for the laptop.

4.

My laptop cost me $1100.

5.

I got the laptop for a mere $1100.

 Frame Elements (in various syntactic realizations):

Buyer, Seller, Goods, Money

From C.J.Fillmore’s slides presented at FrameNet MasterClass during TLT8, (2009)

slide-8
SLIDE 8

FrameNet Labeling Example

 Phrase head-words are labelled in the dependency tree  A complete MachineLearning pipeline developed for FN labeling

(POS, NER, Syntax, Coreference, FrameNet SRL)

slide-9
SLIDE 9

FrameNets

FrameNet 1.3 (English) FrameNet 1.5 (English) FrameNet LV (Latvian) Frame types 665 (795) 877 (1019) 26 FrameElement types 720 1068 80 Training sentences with full annotation 2198 3256 4079 Training sentences with single frame annotation 139439 154607 – Test sentences with full annotation 120 2420 844

SemEval-2007, Task19 dataset

slide-10
SLIDE 10

 C6.0 FrameNet SRL demo http://c60.ailab.lv

English FrameNet 1.3 SemEval-2007 dataset

Frame Target identification Frame Element identification Precision Recall F1 Precision Recall F1 LTH 1) 68.9 53.6 60.3 51.6 35.4 42.0 SEMAFOR/Google 2) 69.7 54.9 61.4 58.1 38.8 46.5 C6.0 RuleSet EN 77.1 53.7 63.3 47.3 47.0 47.1 C6.0 RuleSet LV 63.5 62.7 63.1 65.9 76.8 70.9

1) Johansson, R., Nugues, P. (2007). LTH: semantic structure extraction using nonprojective dependency trees. In Proceedings of SemEval-2007: 4th International Workshop on Semantic Evaluations. Prague, pp. 227--230. 2) Das, D., Chen, D., Martins, A.F.T, Schneider, N., Smith, N.A. (2014). Frame-Semantic Parsing, Computational Linguistics, 40(1), pp. 9--56.

Accuracy of Automated FrameNet SRL

slide-11
SLIDE 11

Accuracy of Automated FrameNet SRL

C6.0 builds entire Latvian FrameNet RuleSet in 5 minutes (26 frames, 5000 annotated sentences)

– enables incremental learning

Frame target identification F1 score as a function of sentences in the training set Frame target identification F1 score for some English FrameNet frames

slide-12
SLIDE 12

Named Entity Recognition (NER) and Anaphora resolution

 Named Entity Linking (NEL)

Assisted by alias-name lists (multilingual) and cosine-similarity of context bag-of-words

DBpedia often used as an entity KnowledgeBase

 Cross-Document

Coreference (CDC)

There is no a’priori manually created entity KnowledgeBase

Entities need to be identified and linked «on the fly»

Typically ignore ontological entity relations (part-of, previous-name)

The key problem: persons with the same name

 «John Brown» problem

slide-13
SLIDE 13

Abstract Knowledge Representation

(with 26 frames)

Organization People_by_age

Age:string

People_by_origin

Ethnicity:string Origin:string

People_by_vocation

Vocation:string Descriptor:string

Product_line

Brand:string Products:string

TimeFrames

Time:dateTime

Personal_relationship

Relationship:string

Lending

Theme:string Collateral:string Units:string

Earnings_and_losses

Earnings:string Goods:string Profit:string Unit:string Growth:string

Possession

Possession2:string Share:string

Public_procurement

Theme:string Expected_amount:string Result:string

PersonOrOrganization

PrimaryName:string Alias:string

Residence

Frquency:string

TimePlaceFrames

Place:string

Win_prize

Prize:string Competition:string Result:string Rank:string

Participation

Event:string Manner:string

Membership

Standing:string

Statement

Message:string

Person Education_teaching

Subject:string Qualification:string

Trial

Laiks:string Person:string Charges:string

Giving

Theme:string

Being_employed

Compensation:string Employment_start:dateTime Employment_end:dateTime Position:string Earner Student Institution Donor Borrower Lender Group Participant_1 Person Person Partner_1 Partner_2 Partners Owner Institution Institution Winner Candidates Speaker Medium Defendant Prosecutor Competitor Oponent Organizer Employee Employer Resident Person Member Court Recipient Advokāts

OWL ontology visualised with OWLGrEd: http://owlgred.lumii.lv

slide-14
SLIDE 14

DBpedia 3.9 – a Non-Linguistic Knowledge Representation example

slide-15
SLIDE 15

Information Extraction System User Interface

CNL verbalization

  • f the

extracted frames Selected Person or Organization Documents supporting the selected statement The selected source document (natural language) Statement manual verification status

slide-16
SLIDE 16

Information Extraction and CNL

FN-CNL abstract knowledge representation (AKR) Source documents in NL Text in FN-CNL information extraction FN-CNL text generation FN-CNL parsing

Ieva Akuratere bija solista amatā [23] (Ieva Akuratere had a soloist position) Ieva Akuratere bija Puķu burves amatā [8] (Ieva Akuratere had a Flower fairy position) Ieva Akuratere bija mūziķes un aktrises amatā [5] (… had a musician and actress position) Ieva Akuratere bija deputātes amatā Rīgas domē [ (… had a member position in Riga city council) Ieva Akuratere bija solista amatā Koncertuzvedumā [4] (… had a soloist position in a Concert) Ieva Akuratere bija dziedātājas amatā [3] (… had a singer position) Ieva Akuratere bija triju Zvaigžņu ordeņa virsnieka amatā Latvijā [3] (…had an Honor position in Latvia) Ieva Akuratere held a soloist position Ieva Akuratere held a Flower fairy position Ieva Akuratere held a musician and actress position Ieva Akuratere held a member position in Riga city council Ieva Akuratere held a soloist position in a Concert Ieva Akuratere held a singer position Ieva Akuratere held an Honorary position in Latvia

slide-17
SLIDE 17

Open Challenges

 Transitioning to AMR (Abstract Meaning

Representation)

 User interface: CNL + Graphic  Time representation

slide-18
SLIDE 18

Pascale was charged with public intoxication and resisting arrest.

(c / charge-05 :ARG1 (p / person :name (n / name :op1 “Pascale”)) :ARG2 (a / and :op1 (i / intoxicate-01 :ARG1 p :location (p2 / public)) :op2 (r / resist-01 :ARG0 p :ARG1 (a / arrest-01 :ARG1 p))))

p c p2 i a2 r a n

a2

c a2 i r a p n

charge-05 and intoxicate-01 resist-01 arrest-01 person name

«Pascale»

public

p2

Extension (1): AMR

http://amr.isi.edu/

slide-19
SLIDE 19

Extension (2): Graphic UI

slide-20
SLIDE 20

Extension (3): Time Representation

Ontology (DB shema) timeline RDF NamedGraph1 RDF NamedGraph2 RDF NamedGraph3 RDF NamedGraph4 OWL T-Box (terminology) sequential snapshots of OWL A-Box (assertions)

SPARQL/ PROLOG update

Little Red Riding Hood lived in a wood with her mother.

Residence

She baked tasty bread

Cooking_Creation

and brought it to her grandmother.

Bringing

FrameNet

(Frames implemented as SPARQL or PROLOG update procedures)

SPARQL/ PROLOG update SPARQL/ PROLOG update Background knowledge

PAO-CNL reported at CNL-2009 workshop (Marettimo): http://www.semti-kamols.lv/doc_upl/LRRH.mov

slide-21
SLIDE 21

Questions?

C6.0 FrameNet SRL demo http://c60.ailab.lv

slide-22
SLIDE 22

FrameNet Manual SRL Interface

slide-23
SLIDE 23

FrameNet SRL Review Interface

slide-24
SLIDE 24

FrameNet SRL: a Complete RuleSet

Laplace ratio (n-m+1)/(n+2) avoids overfitting/underfitting the seen training data

– Realistic estimate of rule’s precision on unseen test data

FRAME TARGET: Revenge n m Laplace ratio

[_, _, _, _, {retaliation.n.1, punish.v.1, revengeful.s.1}, _, _, _, _, _] 193 9 95% [_, _, _, {avenger, retaliated, retaliate, avenged}, _, _, _, _, _, _] 49 0 98% [_, MD, _, get, _, _, _, _, RB, _] 23 3 84% [_, JJ, _, sanction, _, _, _, _, _, _] 4 0 83% [_, _, _, sanction, _, NNS, _, _, IN, _] 5 1 71% [_, _, #NONE#, sanction, _, _, _, ',', _, _] 2 0 75%

Features: PLEMMA, PPOS, PNER, LEMMA, HYPERNYM, POS, NER, NLEMMA, NPOS, NNER Validation on training data: Correct YES = 206; Correct NO = 56473; False YES = 10; False NO = 8

slide-25
SLIDE 25

Multilinguality

NL text Objects FN Events EN Paraphrase LV Paraphrase Sophie Amundsen was

  • n her way home

from school. X1:Sophie Amundsen; X72:home; X73:school; X3:way; E1:Self_motion( self_mover:X1; source:X73; goal:X72; path:X3) E1:Sophie Amundsen moved from school to home. E1:Sofija Amundsena pārvietojās no skolas uz mājām She had walked the first part of the way with Joanna. X4: the first part of X3; X5:Joanna; E2: Self_motion( self_mover:X1; path:X4; co_theme:X5; time:during E1) E2:During E1 the first part of the way Sophie Amundsen walked with Joanna. E2: E1 laikā ceļa pirmo pusi Sofija Amundsena gāja kopā ar Jūrunu. They had been discussing robots. X6: robots; E3: Discussion( interlocutors: X1,X5; topic:X6; time:during E2) E3:During E2 Sophie Amundsen and Joanna discussed robots. E3: E2 laikā Sofija Amundsena un Jūruna apsprieda robotus. Joanna thought E4:Opinion(cognizer:X5;

  • pinion:E5; time:during

E3) E4:During E3 Joanna stated E5. E4: E3 laikā Jūruna apgalvoja E5. the human brain was like an advanced computer. X7:the human brain; X8: an advanced computer; E5: Similarity( entity1:X7; entity2:X8) E5:The human brain is similar to an advanced computer. E5: Cilvēka smadzenes ir līdzīgas sarežģītam datoram.