Case study: GATE in the NeOn project Diana Maynard University of - - PowerPoint PPT Presentation

case study gate in the neon project
SMART_READER_LITE
LIVE PREVIEW

Case study: GATE in the NeOn project Diana Maynard University of - - PowerPoint PPT Presentation

University of Sheffield, NLP Case study: GATE in the NeOn project Diana Maynard University of Sheffield University of Sheffield, NLP Aims of this talk Demonstrates using GATE for automating SW- specific tasks such as semantic annotation


slide-1
SLIDE 1

University of Sheffield, NLP

Case study: GATE in the NeOn project

Diana Maynard University of Sheffield

slide-2
SLIDE 2

University of Sheffield, NLP

Aims of this talk

  • Demonstrates using GATE for automating SW-

specific tasks such as semantic annotation and

  • ntology learning from texts
  • SARDINE: pattern-based relation extraction in

the fisheries domain

  • Adding new concepts and instances to the
  • ntology
  • Finding relations between existing concepts in

the ontology

  • SPRAT: generic version of SARDINE
slide-3
SLIDE 3

University of Sheffield, NLP

Recap: IE for the Semantic Web

  • Traditional IE is based on a flat structure, e.g.

recognising Person, Location, Organisation, Date, Time etc.

  • For the Semantic Web, we need information in a

hierarchical structure

  • Idea is that we attach annotations to the

documents, pointing to concepts in an ontology

  • Information can be exported as an ontology

annotated with instances

slide-4
SLIDE 4

University of Sheffield, NLP

Linking the Text to the Ontology

slide-5
SLIDE 5

University of Sheffield, NLP

The NeOn project

  • NeOn (Networking Ontologies) is a 4-year 14.7

million Euro EU project involving 14 European partners.

  • Focus on using ontologies for large-scale

semantic applications in distributed organizations

  • Handles multiple networked ontologies that

exist in a particular context, are created collaboratively, and might be highly dynamic and constantly evolving.

slide-6
SLIDE 6

6 University of Sheffield, NLP

ODd SOFAS

  • The Food and Agricultural Organisation of the UN have
  • dd sofas…..
slide-7
SLIDE 7

University of Sheffield, NLP

Wall climbing sofa

slide-8
SLIDE 8

University of Sheffield, NLP

Sofa made from bicycle seats

slide-9
SLIDE 9

University of Sheffield, NLP

FAO Case Study

  • Actually, it’s nothing to do with sofas, or any kind of

seating.

  • They do, however, have an Ontology-driven stock
  • ver-fishing alert system
  • Focuses on agricultural sector and information

management for hunger prevention

  • Case study aims at management of alerts to avoid
  • ver-fishing in already stretched global waters
  • Role of GATE is to analyse textual resources to find

new information such as new fish names, and relations between ontology elements, e.g. “Atlantic cod are fished in the Gulf of Maine”

slide-10
SLIDE 10

10 University of Sheffield, NLP

SARDINE

Species Annotation, Recognition and Indexing of Named Entities

  • SARDINE identify mentions of fish species from text
  • It identifies

– existing fish names listed in the ontology and their morphological variants – potential new fish names not listed in the ontology – potential relations between fish names

  • For the new fish, it attempts to classify them in the ontology,

based on linguistic information such as synonyms and hyponyms of existing fish

  • It may generate properties also for existing fish in the ontology
slide-11
SLIDE 11

11 University of Sheffield, NLP

slide-12
SLIDE 12

12 University of Sheffield, NLP

slide-13
SLIDE 13

13 University of Sheffield, NLP

Using patterns to find new fish

Synonyms:

–mummichogs (fundulus heteroclitus)

Names appearing in lists:

–“plankton, herring and clams....” –“clams, herring and other types of fish”

More specific fish names:

–Japanese flounder –Red salmon –Suberites sponges

slide-14
SLIDE 14

University of Sheffield, NLP

Example of JAPE rule (1)

Example: “Suberites sponges” (where “sponge” is a known class) Rule: AdjClass ( ({Token.category == JJ}) ({Class}):super ):sub

  • ->

:sub.SardineSubclass = {rule=AdjClass}, :super.SardineSuperclass = {rule=AdjClass}, …

slide-15
SLIDE 15

University of Sheffield, NLP

Example of JAPE rule (2)

Example: “Frogs are a kind of amphibian.” Rule:Subclass1 ( ({NP}):sub ( {Lookup.minorType == be} {Token.category == DT} {Lookup.majorType == kind} ) ({NP}):super ) --> …

slide-16
SLIDE 16

16 University of Sheffield, NLP

Annotated text in GATE

slide-17
SLIDE 17

17 University of Sheffield, NLP

Augmenting the Ontology

  • The new classes found are linked to existing classes in the
  • ntology
  • For existing fish, and new fish which we identified as a

synonym or hyponym of an existing fish, the link is to an existing ontology instance

  • When we don't identify a link to any existing fish, we create

a new concept

  • The changes to the ontology are stored and can be

verified later by human experts

slide-18
SLIDE 18

18 University of Sheffield, NLP

Generated “animal” ontology

slide-19
SLIDE 19

University of Sheffield, NLP

Recognising components from the ontology

  • In addition to the standard IE components, we use some

special ontology components.

  • The OntoRootGazetteer enables us to match words or

phrases in the text with classes, instances or properties in an ontology, as any morphological variant

  • Morphological analysis is performed on both text and
  • ntology, then matching is done between the two at the

root level.

  • Text is annotated with features containing the root and
  • riginal string(s)
  • When new elements are added to the ontology, these

features can be used to regenerate alternative forms

slide-20
SLIDE 20

University of Sheffield, NLP

Modifying the ontology

  • We developed a special GATE plugin called NEBOnE

(Named Entity Based ONtology Editor)

  • This reuses technology taken from CLOnE (Controlled

Language ONtology Editor)

  • CLOnE is designed to create new classes, instances etc

from raw (controlled) text generated by the user

  • NEBOnE enables changes to be made to the ontology

based on information extraction from input texts (e.g. web pages) in natural language

  • Morphological analysis enables both root forms and

variants to be added to the ontology (as properties), along with other variants (e.g. capitalisation)

slide-21
SLIDE 21

University of Sheffield, NLP

Finding relations between known elements

  • In this case study, we use existing information from the
  • ntology to find relations between them. e.g. fish

species -- gear type

  • We have already annotated all fish species, gear types,

fishing areas and so on in the text, based on ontology lookup

  • JAPE grammar first finds the subject of the document

(a gear type) and adds the information as a document feature

  • When a species name is found, we create a new

annotation for the relation “gear_used”, with a property denoting the species, and another property denoting the ID number of the gear.

slide-22
SLIDE 22

22 University of Sheffield, NLP

Viewing relations

slide-23
SLIDE 23

23 University of Sheffield, NLP

Using ANNIC to view results

  • By running our application on a Lucene

datastore, we can then use ANNIC to view the results

  • Search for the pattern consisting of the name
  • f the relation annotation (in this case

“gear_used”)

  • Show the relevant features (species, gear ID,

gear type)

slide-24
SLIDE 24

24 University of Sheffield, NLP

Using ANNIC to view results

slide-25
SLIDE 25

25 University of Sheffield, NLP

SARDINE demo

  • Running SARDINE from NeOn toolkit and

visualising results

  • Showing relation finding in GATE
  • Viewing results with ANNIC
slide-26
SLIDE 26

University of Sheffield, NLP

SPRAT

  • Semantic Pattern Recognition and

Annotation Tool

  • This is a generic version of SARDINE that

runs on all kinds of texts, not just fisheries

  • Does not require a seed ontology
  • Useful for building a domain ontology from

scratch

  • Tested on wikipedia pages
slide-27
SLIDE 27

27 University of Sheffield, NLP

How well can we do it?

  • Traditional NE recognition on news texts:

~90% precision/recall

  • Ontology-based information extraction on

news texts: ~80% precision/recall

  • Pattern-based relation extraction on

Wikipedia texts: high accuracy but low recall (or vice versa depending on setup)

  • Relation finding between known entities:

~90% precision/recall

slide-28
SLIDE 28

University of Sheffield, NLP

More information

Neon Project: http://www.neon-project.org Neon Toolkit is freely available: http://www.neon-toolkit.org SARDINE application can be downloaded from the GATE website http://gate.ac.uk/projects/neon/sardine