Linked Data in Linguistics for NLP and Web Annotation - - PowerPoint PPT Presentation

linked data in linguistics for nlp and web annotation
SMART_READER_LITE
LIVE PREVIEW

Linked Data in Linguistics for NLP and Web Annotation - - PowerPoint PPT Presentation

Creating Knowledge out of Interlinked Data MultilingualWeb 2012/06/11 Dublin Page 1 MultilingualWeb http://lod2.eu Linked Data in Linguistics for NLP and Web Annotation http://nlp2rdf.org http://lod2.eu Sebastian Hellmann


slide-1
SLIDE 1

MultilingualWeb – 2012/06/11 Dublin – Page 1 http://lod2.eu MultilingualWeb –

Creating Knowledge out of Interlinked Data

LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

AKSW, Universität Leipzig

Sebastian Hellmann

Linked Data in Linguistics for NLP and Web Annotation

http://nlp2rdf.org http://lod2.eu

slide-2
SLIDE 2

MultilingualWeb – 2012/06/11 Dublin – Page 2 http://lod2.eu

The Semantic Gap

slide-3
SLIDE 3

MultilingualWeb – 2012/06/11 Dublin – Page 3 http://lod2.eu

Turning Walled Gardens into Park Networks of Semantic Linguistic Data

  • 1. Use the Data

Web as background knowledge for NLP

  • 2. Use Data

Web technologies for integrating NLP tools & approaches

How can we leverage the Data Web for natural language processing?

On the Web, by sharing and copying the value

  • f information

increases 50 Billion facts covering all kinds of domains are readily available Leverage the wisdom of the crowds RDF is all about semantic interoperability

  • 3. Make the
  • utput of NLP

tools available

  • n the Data

Web

slide-4
SLIDE 4

MultilingualWeb – 2012/06/11 Dublin – Page 4 http://lod2.eu

  • 1. Use the Data Web as

background knowledge for NLP

Linguistic Data currently filed under “cross-domain”

slide-5
SLIDE 5

MultilingualWeb – 2012/06/11 Dublin – Page 5 http://lod2.eu

Three communities with three resources:

  • Working Group for Open Linguistics Data (OWLG)

– > http://linguistics.okfn.org

  • DBpedia Internationalization Committee

– > http://wiki.dbpedia.org/Internationalization

  • Wiktionary2RDF Wrappers

– > http://dbpedia.org/Wiktionary All communities are open, please join!

  • 1. Use the Data Web as

background knowledge for NLP

slide-6
SLIDE 6

MultilingualWeb – 2012/06/11 Dublin – Page 6 http://lod2.eu

The Linguistic Linked Open Data Cloud

slide-7
SLIDE 7

MultilingualWeb – 2012/06/11 Dublin – Page 7 http://lod2.eu

Main question

slide-8
SLIDE 8

MultilingualWeb – 2012/06/11 Dublin – Page 8 http://lod2.eu

Wiktionary2RDF – Mediator Wrapper

http://dbpedia.org/Wiktionary

slide-9
SLIDE 9

MultilingualWeb – 2012/06/11 Dublin – Page 9 http://lod2.eu

Wiktionary2RDF – Mediator Wrapper

http://dbpedia.org/Wiktionary Mediator Lemon

slide-10
SLIDE 10

MultilingualWeb – 2012/06/11 Dublin – Page 10 http://lod2.eu

  • 2. Use Data Web Technologies for

Integrating NLP Tools and Approaches

Image from http://pbmo.wordpress.com/2011/09/29/maslows-hammer/ Golden Hammer Anti-pattern The question is not whether to use RDF and Linked Data, but when to use...

slide-11
SLIDE 11

MultilingualWeb – 2012/06/11 Dublin – Page 11 http://lod2.eu MultilingualWeb – 2012/06/11 Dublin – Page 11 http://lod2.eu

slide-12
SLIDE 12

MultilingualWeb – 2012/06/11 Dublin – Page 12 http://lod2.eu

  • Ontologies provide (formal) documentation (UML, ERD)
  • Structure is easy to understand
  • Wide range of RDF tools can be used, e.g. LOD2 Stack
  • Indexing and querying as Big Picture possible
  • 2. Use Data Web Technologies for

Integrating NLP Tools and Approaches

slide-13
SLIDE 13

MultilingualWeb – 2012/06/11 Dublin – Page 13 http://lod2.eu

The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.

  • Road map
  • Bootstrapped by LOD2, but a community project
  • First release in September 2011
  • Great resonance

– Over 50 people joined the mailing list: http://lists.okfn.org/mailman/listinfo/open-linguistics – First third party implementations and contributions – Several project discuss usage

  • Currently setting up advisory board, next draft in July
  • 2. Use Data Web Technologies for

Integrating NLP Tools and Approaches

slide-14
SLIDE 14

MultilingualWeb – 2012/06/11 Dublin – Page 14 http://lod2.eu

  • S. Auer and S. Hellmann: The Web of Data: Decentralized, collaborative, interlinked and interoperable

LREC 2012, http://www.lrec-conf.org/proceedings/lrec2012/keynotes/LREC%202012.Keynote%20Speech%201.Soeren%20Auer.pdf

slide-15
SLIDE 15

MultilingualWeb – 2012/06/11 Dublin – Page 15 http://lod2.eu

  • 3. Make the Output of NLP Tools

available on the Web

Currently there is no standard mechanism to transparently combine the WWW, GGG and NLP

GGG = Giant Global Graph (basically the Web of Data) see: http://dig.csail.mit.edu/breadcrumbs/node/215

slide-16
SLIDE 16

MultilingualWeb – 2012/06/11 Dublin – Page 16 http://lod2.eu

  • 3. Make the Output of NLP Tools

available on the Web

slide-17
SLIDE 17

MultilingualWeb – 2012/06/11 Dublin – Page 17 http://lod2.eu

  • 3. Make the Output of NLP Tools

available on the Web

http://dbpedia.org/spotlight P. Mendes et. al. DBpedia spotlight: Shedding light on the web of documents. In I-Semantics, 2011

slide-18
SLIDE 18

MultilingualWeb – 2012/06/11 Dublin – Page 18 http://lod2.eu

  • 3. Make the Output of NLP Tools

available on the Web

http://annotateit.org http://sourceforge.net/projects/fragmentlinks/

slide-19
SLIDE 19

MultilingualWeb – 2012/06/11 Dublin – Page 19 http://lod2.eu

  • 3. Make the Output of NLP Tools

available on the Web

NLP Interchange Format (NIF) join the mailing list at: http://nlp2rdf.org

Hellmann et.al.: Towards an Ontology for Representing Strings In: EKAW 2012 http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf

slide-20
SLIDE 20

LOD2 Title . 02.09.2010 . Page 20 http://lod2.eu

Address University of Leipzig Faculty of Mathematics and Computer Science Institute of Computer Science Department of Business Information Systems Postfach 100920 04009 Leipzig Germany

Thanks for your attention!

Contact

Project: http://lod2.eu Organisation: http://uni-leipzig.de, http://aksw.org Presenter: http://bis.informatik.uni-leipzig.de/SebastianHellmann NLP2RDF page: http://nlp2rdf.org Acknowledgement: some slides are taken from the keynote

  • f Sören Auer at LREC 2012

CC-BY-SA unless otherwise stated