NIF NLP Interchange Format http://aksw.org/Projects/NIF Sebastian - - PowerPoint PPT Presentation

nif nlp interchange format
SMART_READER_LITE
LIVE PREVIEW

NIF NLP Interchange Format http://aksw.org/Projects/NIF Sebastian - - PowerPoint PPT Presentation

Creating Knowledge out of Interlinked Data NIF NLP Interchange Format http://aksw.org/Projects/NIF Sebastian Hellmann AKSW, Universitt Leipzig LOD2 Presentation . 02.09.2010 . Page http://lod2.eu Creating Knowledge out of


slide-1
SLIDE 1

Creating Knowledge out of Interlinked Data

LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

AKSW, Universität Leipzig

Sebastian Hellmann

NIF – NLP Interchange Format

http://aksw.org/Projects/NIF

slide-2
SLIDE 2

2

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format Outline:

  • NLP Interchange Format
  • Use Cases

– Integration of tools – Meaning Representation Language – Knowledge Extraction with SPARQL – Machine Learning

  • Related Projects

2 KAIST LOD2 17.8.2011

slide-3
SLIDE 3

3

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format Problem:

  • Currently NLP software is organized in pipelines
  • Integration is done „hard-wired“

– For each tool and each framework an adapter has to be created (n*m)

  • Difficult to aggregate output
  • Difficult to exchange single components

3 KAIST LOD2 17.8.2011

slide-4
SLIDE 4

4

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format Overview:

  • NLP tools can be integrated via a common output format (Common

pattern in Enterprise Application Integration)

  • For each tool a wrapper needs to be created, that reads NIF and

produces NIF

  • The combination of tools can be adhoc, i.e. it is not a pipeline that

needs to be configured

  • Multi-layer and overlapping annotations are possible
  • Ontologies provide interfaces for each layer and for applications

4 KAIST LOD2 17.8.2011

slide-5
SLIDE 5

5

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

  • First Challenge: Representing Strings in RDF
  • How to give a part of a document or text an identifier (URI)?
  • What properties can such URIs have?

5 KAIST LOD2 17.8.2011

slide-6
SLIDE 6

6

Creating Knowledge out of Interlinked Data

LOD2 Event . 06.09.2010 . Page http://lod2.eu

NIF – NLP Interchange Format

6

slide-7
SLIDE 7

7

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

7

Example URIs for annotating „Semantic Web“

KAIST LOD2 17.8.2011

slide-8
SLIDE 8

8

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

  • First Challenge: Representing Strings in RDF
  • How to give a part of a document or text an identifier (URI)?
  • What properties can such URIs have?

8 KAIST LOD2 17.8.2011

slide-9
SLIDE 9

9

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

  • URIs are used to integrate output. RDF merges naturally, if the URIs

are the same (or convertible using a certain recipe)

9 KAIST LOD2 17.8.2011

slide-10
SLIDE 10

10

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

  • Second challenge: Output of each layer is required to be stable.
  • Components and layers can be interchanged
  • Domain ontologies are needed to provide stable interfaces:

– OLiA provides an ontological interface for morpho-syntax http://nachhalt.sfb632.uni-potsdam.de/owl/ – DBpedia provides stable ids for Things

10 KAIST LOD2 17.8.2011

slide-11
SLIDE 11

11

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

11 KAIST LOD2 17.8.2011

slide-12
SLIDE 12

12

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

12 KAIST LOD2 17.8.2011

slide-13
SLIDE 13

13

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

13 KAIST LOD2 17.8.2011

slide-14
SLIDE 14

14

Creating Knowledge out of Interlinked Data

http://lod2.eu

Demo - Integration

  • http://nlp2rdf.lod2.eu/annotator-stanford/NIFStemmer?input=My%20favor
  • http://nlp2rdf.lod2.eu/annotator-stanford/NIFStanfordCore?input=My%20fa

14 KAIST LOD2 17.8.2011

slide-15
SLIDE 15

15

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Cases

  • Use Cases

– Integration of tools – Meaning Representation Language – Knowledge Extraction with SPARQL – Machine Learning

15 KAIST LOD2 17.8.2011

slide-16
SLIDE 16

16

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case – Integration of tools

16 KAIST LOD2 17.8.2011

slide-17
SLIDE 17

17

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case – Meaning Representation Language

  • RDF makes data integration easy: URIref, LinkedData
  • OWL is based on Description Logics (Guarded Fragment)
  • Availability of open data sets (access and licence)
  • Diverse serializations for annotations: XML, Turtle, RDFa+XHTML
  • Scalable tool support (Databases, Reasoning)

17 KAIST LOD2 17.8.2011

slide-18
SLIDE 18

18

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case – Meaning Representation Language

18 KAIST LOD2 17.8.2011

slide-19
SLIDE 19

19

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case – Knowledge Extraction with SPARQL

  • Classical approach:
  • POS tag / Dependency parser (e.g. Stanford)
  • create a rule/pattern language to extract knowledge

19 KAIST LOD2 17.8.2011

slide-20
SLIDE 20

20

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case – Knowledge Extraction with SPARQL

Johanna Völker – Learning Expressive Ontologies (LExO) # Example: # A fish is any aquatic vertebrate animal that is covered with scales, and equipped with two sets of paired fins and several unpaired fins. # [fish] subClassOf [any aquatic vertebrate animal that is covered …] Construct {?sub rdfs:subClassOf ?super} { ?is a penn:BePresentTense . ?is nlp:superToken ?is_any_aquatic_. ?is_any_aquatic_ a olia:VerbPhrase . ?is_any_aquatic_ nlp:syntacticSubToken [ nlp:normUri ?super] . ?animal nlp:cop ?is . ?animal nlp:nsubj ?fish .?fish nlp:superToken [ nlp:normUri ?sub] . }

20 KAIST LOD2 17.8.2011

slide-21
SLIDE 21

21

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case - Machine Learning

21 KAIST LOD2 17.8.2011

slide-22
SLIDE 22

22

Creating Knowledge out of Interlinked Data

http://lod2.eu

Use Case - Machine Learning

22 KAIST LOD2 17.8.2011

slide-23
SLIDE 23

23

Creating Knowledge out of Interlinked Data

http://lod2.eu

Workplan

  • EU Deliverable almost finished
  • Integration of SnowballStemming and the Stanford Parser
  • Next step: Integration of Knowledge Extraction tools (Zemanta,

DBpedia Spotlight, Alchemy, OpenCalais, FOX)

  • Web Service that read NIF and Output NIF
  • Google Code Project: http://code.google.com/p/nlp2rdf/
  • Web Site: http://aksw.org/Projects/NIF

23 KAIST LOD2 17.8.2011

slide-24
SLIDE 24

24

Creating Knowledge out of Interlinked Data

http://lod2.eu

Summary

  • NIF allows to represent NLP output using Knowledge Representation

Formalisms (RDF/OWL)

  • It is possible to mix it with other Knowledge (e.g. Wikipedia/DBpedia)
  • Good foundation to optimize machine learning:
  • Choose the best algortihms
  • Choose the best data

24 KAIST LOD2 17.8.2011

slide-25
SLIDE 25

25

Creating Knowledge out of Interlinked Data

http://lod2.eu

Related Projects

  • Wiktionary
  • LLOD
  • CKAN / Open Lingusistics

25 KAIST LOD2 17.8.2011

slide-26
SLIDE 26

NLP2RDF – http://aksw.org/Projects/NLP2RDF . Page 26 http://lod2.eu

Creation of data sets: Wiktionary2RDF

slide-27
SLIDE 27

NLP2RDF – http://aksw.org/Projects/NLP2RDF . Page 27 http://lod2.eu

Creation of data sets: Wiktionary2RDF

http://en.wiktionary.org/wiki/house

  • Covers 170 languages
  • T
  • tal of 10 million pages
  • 900.000 users
  • RDF Dump will increase number of editors
  • Same properties as Wikipedia (stable identifiers)
  • Hundreds of Wiktionary parsers (especially for English)
  • Information is trapped in the Wiki
  • Structure changes make software obsolete
  • Why try it again?
  • DBpedia Extraction Framework is very mature (5 years, 15 developers)
  • Configuration over Code, T

emplates will allow Wiktionarians to update Parsers

  • Early contact with the community
slide-28
SLIDE 28

NLP2RDF – http://aksw.org/Projects/NLP2RDF . Page 28 http://lod2.eu

Wiktionary, Wortschatz, OLiA can become the Crystallization point for a Linguistic Linked Data Web Four major types:

  • Lexical Semantic Resources
  • Dictionaries
  • Corpora
  • Schemas/Ontologies
slide-29
SLIDE 29

NLP2RDF – http://aksw.org/Projects/NLP2RDF . Page 29 http://lod2.eu

Open Licences – Focus of LOD2 and OKFN

http://ckan.net/ CKAN is an open registry of data and content packages. Harnessing the CKAN software, this site makes it easy to find, share and reuse content and data, especially in ways that are machine automatable. Working Group on Open Data in Linguistics http://linguistics.okfn.org

  • Founded on Nov 2010
  • 40 Members
  • Membership open, please join
  • Over 100 data sets in CKAN
slide-30
SLIDE 30

Creating Knowledge out of Interlinked Data

LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

Thank you for your attention!