IXA pipes: Efficient and Ready to Use Multilingual NLP tools - - PowerPoint PPT Presentation

ixa pipes efficient and ready to use multilingual nlp
SMART_READER_LITE
LIVE PREVIEW

IXA pipes: Efficient and Ready to Use Multilingual NLP tools - - PowerPoint PPT Presentation

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes:


slide-1
SLIDE 1

IXA pipes: Efficient and Ready to Use Multilingual NLP tools

Rodrigo Agerri

IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 1 / 21

slide-2
SLIDE 2

Outline

1

Introduction

2

Pipes

3

Concluding Remarks

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 2 / 21

slide-3
SLIDE 3

Introduction

Overview

http://ixa2.si.ehu.es/ixa-pipes

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 3 / 21

slide-4
SLIDE 4

Introduction

Motivation

Lowering the barriers of using NLP technology and allow researchers and SMEs to focus on their central/primary interests: Simple: Two simple steps: One if you get the binaries! Portable: Only a JVM 1.7+ and Maven is required. Modular data-centric architecture: The tools behave like Unix pipes; easily replaceable and extensible architecture. Multilingual: 8 languages and more coming soon!! Accurate: State of the art results. APL 2.0: To facilitate integration also with commercial applications.

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 4 / 21

slide-5
SLIDE 5

Introduction

Architecture (or lack thereof)

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 5 / 21

slide-6
SLIDE 6

Introduction

Basics

NAF: Natural Language annotation Format https://github.com/newsreader/NAF. kaflib: https://github.com/ixa-ehu/kaflib Apache Maven: http://maven.apache.org. github and git: https://github.com/ixa-ehu/ Apache OpenNLP Machine Learning Library: http://opennlp.apache.org.

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 6 / 21

slide-7
SLIDE 7

Introduction

NLP Annotation example

http://www.opener-project.eu/project/demos/

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 7 / 21

slide-8
SLIDE 8

Introduction

NLP Annotation example

http://ber2tekdemo-opener.rhcloud.com/welcome.action

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 8 / 21

slide-9
SLIDE 9

Introduction

NLP annotation example

http://pikes.fbk.eu/

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 9 / 21

slide-10
SLIDE 10

Pipes

Pipes

task languages ixa-pipe tok en, es, eu, fr, gl, it, nl ixa-pipe-tok pos en, es, eu, fr, gl, it ixa-pipe-pos lemmatizer en, es, eu, fr, gl, it ixa-pipe-pos nerc de, en, es, eu, gl, it, nl ixa-pipe-nerc

  • te

en, es, fr, ru, tr, nl ixa-pipe-nerc sst en ixa-pipe-nerc parse en, es ixa-pipe-parse

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 10 / 21

slide-11
SLIDE 11

Pipes

ixa-pipe-tok

<wf id="w69" sent="4" para="4" offset="354" length="9">announced</wf>

Tested for many languages, apostrophe treatment, etc. Treebank normalization: Ancora and Penn Treebank, Universal dependencies normalized tokenization... Paragraph treatment, whitespace tokenizer... Rule-based: regular expressions.

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 11 / 21

slide-12
SLIDE 12

Pipes

ixa-pipe-pos

rosa rosa AQ0CS0 rosa rosa NCFS000 rosa rosa NCMS000

<term id="t69" type="open" lemma="rosa" pos="R" morphofeat="NCFS000">

POS tagger Basque English Spanish Italian ixa-pipe-pos 94.28 97.07 98.88 95.00 SVMTool 97.16 98.86∗ Stanford POS 97.24 Freeling 97∗∗ 97∗∗ Felice 2009 96.34

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 12 / 21

slide-13
SLIDE 13

Pipes

ixa-pipe-nerc

Morras munduko txapeldun izan zen juniorretan 1994an, Ekuadorko hiriburuan, Quiton. NERC eu en es nl de ixa-pipe-nerc 75.70 91.36 84.16 85.04 76.48 Passos et al. 2014 − 90.90 − − − Ratinov and Roth 2009 − 90.57 − − − Stanford NER − 88.65 − − − CMP (2002-03) − 85.00 81.39 77.05 − C&C − − − 79.63 − Eihera 71.31 − − − − ExB (2014) − − − − 76.38

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 13 / 21

slide-14
SLIDE 14

Pipes

Out-of-domain: Wikinews

English Spanish Dutch Outer Inner Outer Inner Outer Inner Features F1 T-F1 F1 T-F1 F1 T-F1 F1 T-F1 F1 T-F1 F1 T-F1 Local 41.83 54.17 48.57 57.85 34.42 42.95 37.14 41.93 48.49 54.84 49.77 55.86 best-clusters 54.04 65.96 63.72 71.13 56.78 62.55 59.77 63.04 59.94 66.03 60.27 65.42 best-overall 55.48 67.36 64.95 71.98 58.94 65.63 62.14 65.54 63.40 70.68 63.93 70.94 Stanford NER 53.14 64.62 62.45 69.76 46.42 54.40 47.48 54.27

  • Illinois NER

53.24 65.68 62.72 71.04

  • Freeling 3.1
  • 38.27

48.06 40.93 46.52

  • Sonar ner
  • 48.60

53.60 48.44 52.79 Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 14 / 21

slide-15
SLIDE 15

Pipes

OTE at ABSA SemEval 2014 and 2015

This place is not good enough, especially the service is disgusting. System (type) Precision Recall F1 score Baseline 55.42 43.4 48.68 EliXa (u) 68.93 71.22 70.05 NLANGP (u) 70.53 64.02 67.12 EliXa (c) 67.23 66.61 66.91 IHS-RD-Belarus (c) 67.58 59.23 63.13

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 15 / 21

slide-16
SLIDE 16

Pipes

ixa-pipe-parse

S VP VP NP#4 the Globalization Studies Center head will NP NP PP NP#3 this prestigious university

  • f

NP#2 the Chancellor , NP#1 Richard Levin Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 16 / 21

slide-17
SLIDE 17

Pipes

ixa-pipe-parse

Constituent Parsing English Spanish ixa-pipe-parse 87.42 87.8∗ Collins 88.1 85.0∗ Stanford PCFG 85.5 n/a

  • St. Factored

86.6 n/a

  • St. PCFG Factored

89.4 n/a

  • St. CVG (SURNN)

90.4 n/a Berkeley 90.1 n/a

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 17 / 21

slide-18
SLIDE 18

Pipes

Third-party tools

Corefgraph: Rule-based coreference resolution for English and Spanish. Unsupervised WSD with UKB. SRL + Dependencies with Mate tools. NED with DBpedia Spotlight.

http://ixa2.si.ehu.es/ixa-pipes/third-party-tools.html

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 18 / 21

slide-19
SLIDE 19

Concluding Remarks

Used in

OpeNER: http://www.opener-project.eu/ Newsreader: http://www.newsreader-project.eu/ QTLeap: http://qtleap.eu/ Limousine: http://limosine-project.eu/ Spanish Administration Trivago, Olery, Vicomtech-IK4, Elhuyar... DSS2016 http://behagunea.dss2016.eu/

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 19 / 21

slide-20
SLIDE 20

Concluding Remarks

DSS2016 Behagunea

http://behagunea.dss2016.eu/

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 20 / 21

slide-21
SLIDE 21

Concluding Remarks

Conclusion

More languages, more annotations. Easy to use, easy to train, easy to adapt, easy to deploy. Server mode. State of the art performance. Free software and industrial friendly. Cross fertilization with Apache Software Foundation’s OpenNLP project.

Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes: Efficient and Ready to Use Multilingual NLP tools 21 / 21