Introduction to the Natural Language Processing (NLP) Thierry Hamon - - PowerPoint PPT Presentation

introduction to the natural language processing nlp
SMART_READER_LITE
LIVE PREVIEW

Introduction to the Natural Language Processing (NLP) Thierry Hamon - - PowerPoint PPT Presentation

Introduction to the Natural Language Processing (NLP) Thierry Hamon Institut Galil ee - Universit e Paris 13,Villetaneuse, France & LIMSI-CNRS, Orsay, France hamon@limsi.fr http://perso.limsi.fr/hamon/ March 2014 ERASMUS Mobility -


slide-1
SLIDE 1

Introduction to the Natural Language Processing (NLP)

Thierry Hamon

Institut Galil´ ee - Universit´ e Paris 13,Villetaneuse, France & LIMSI-CNRS, Orsay, France hamon@limsi.fr http://perso.limsi.fr/hamon/

March 2014 ERASMUS Mobility - M¨ alardalen University (MDH) - V¨ aster˚ as - Sweden

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 1 / 47

slide-2
SLIDE 2

Plan

History and context Example Introduction to NLP approaches Formal language vs. Natural language Evaluation

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 2 / 47

slide-3
SLIDE 3

History and Context

The very beginning

Context: Back in the fifties (cold war) Main application: Machine translation use of computers to translate texts or messages from one (source) language to a other language (target language) Budget: about $20 millions in 10 years

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 3 / 47

slide-4
SLIDE 4

History and Context

The mythological tests/jokes

Translation of the Biblical sentence The spirit is willing, but the flesh is weak

  • r

Out of sight, out of mind Translation in Russian

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 4 / 47

slide-5
SLIDE 5

History and Context

The mythological tests/jokes

Translation of the Biblical sentence The spirit is willing, but the flesh is weak

  • r

Out of sight, out of mind Translation in Russian and then in English

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 4 / 47

slide-6
SLIDE 6

History and Context

The mythological tests/jokes

Translation of the Biblical sentence The spirit is willing, but the flesh is weak

  • r

Out of sight, out of mind Translation in Russian and then in English The vodka is strong, but the meat is rotten Invisible idiot

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 4 / 47

slide-7
SLIDE 7

History and Context

The mythological tests/jokes

Translation of the Biblical sentence The spirit is willing, but the flesh is weak

  • r

Out of sight, out of mind Translation in Russian and then in English The vodka is strong, but the meat is rotten Invisible idiot Literal translation (word for word translation) is inappropriate (problem with idioms) More information is needed

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 4 / 47

slide-8
SLIDE 8

History and Context

The linguistic underside

Requirements : Machine readable dictionaries Syntactic information (order and function of the words) Problems:

Ambiguities, polysemy, ... Complex syntactic structures, Semantics (relations, categories, ...) Anaphora, ...

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 5 / 47

slide-9
SLIDE 9

History and Context

The linguistic underside

Requirements : Machine readable dictionaries Syntactic information (order and function of the words) Problems:

Ambiguities, polysemy, ... Complex syntactic structures, Semantics (relations, categories, ...) Anaphora, ...

→ Need of (a lot of) context

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 5 / 47

slide-10
SLIDE 10

History and Context

The (in)famous ”ALPAC report”

In 1966, by the US National Academy of the Sciences

  • Y. Bar-Hillel

Complete machine translation: slow, time consuming, with a low quality could be more expensive than human translators Machine Translation is hopeless (!) Recommendations:

Evaluation of the translations (quality and cost) Machine-aided translation More efforts on the computational linguistic research For machine translation or not

Consequences: lower budget for machine translation but the beginning of the Natural Language Processing (NLP)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 6 / 47

slide-11
SLIDE 11

History and Context

Contributions

Interdisciplinary research field: Linguistics

Phonology Generative grammars Syntax Philosophy of language

Mathematics:

Logic Formal language theory Statistics

Computer science

Algorithms Software engineering Machine learning

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 7 / 47

slide-12
SLIDE 12

History and Context

Research fields

Two main fields

1960 Computational linguistics Focus on mathematics and linguistics 1965 Natural Language Processing Focus on algorithms for software development

1970 Natural Language Understanding (AI) Cognitive approaches

T Winograd, M Minski, J Allen, ...

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 8 / 47

slide-13
SLIDE 13

History and Context

50 years later

Phonetics, phonology, prosody Morphology Syntax Semantics Pragmatics

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 9 / 47

slide-14
SLIDE 14

History and Context

50 years later

Phonetics Syllabation Prosody lexique.org, ... Pronunciation Speech Recognition Speech synthesis (text speech) Morphology inflected form derivation composition MorTAL, Celex, ... Morphological analysis Morphological segmentation Syntax Syntactic lexicon LTAG, FTAG, LFG, ... Part−of−speech tagging Syntactic analysis Chunking Semantics Semantic network WordNet, DEC, ... Semantic lexicon Terminology Extraction of semantic units (simples, complexes) Relation acquistion Decomposition en primitives Definition analysis Pragmatics Text structure Anaphora Communication Desambiguisation rules Resources Applications Tasks Speech recognition Spell checking Corpus Linguistics Text Generation MT (Machine Translation) CAT (Computer−assisted Translation) Man machine dialogue Resource building Weather forecast, report, ... Stylistics Terminology Ontology Statistical NLP Automatic summarization QA (Question Answering) IE/TM (Information Extraction/Text Mining) Natural Language Generation IR (Information Retrieval)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 10 / 47

slide-15
SLIDE 15

History and Context

Around the world

ACL: Association for computational linguistics Journals: Computational LinguisticsL, JNLE, ... Conferences: ACL, COLING, EACL, NNACL, LREC, ... Web site: http://www.aclweb.org Mailing list: linguist, corpora Universities and research centers (JRC in Europ) Compagnies (Xerox, IBM, Microsoft, Lingsoft, etc.)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 11 / 47

slide-16
SLIDE 16

Example

How to deal with the processing of natural language data?

Natural langage : system composed of signs, used to produce a utterance Words are basic signs of a language A word is made of two sides (Ferdinand de Saussure, Cours de linguistique g´ en´ erale, 1916)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 12 / 47

slide-17
SLIDE 17

Example

How to deal with the processing of natural language data?

Natural langage : system composed of signs, used to produce a utterance Words are basic signs of a language A word is made of two sides

Phonologic form (the signifier – train)

(Ferdinand de Saussure, Cours de linguistique g´ en´ erale, 1916)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 12 / 47

slide-18
SLIDE 18

Example

How to deal with the processing of natural language data?

Natural langage : system composed of signs, used to produce a utterance Words are basic signs of a language A word is made of two sides

Phonologic form (the signifier – train) Meaning (the signified - the mental picture of the train)

(Ferdinand de Saussure, Cours de linguistique g´ en´ erale, 1916)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 12 / 47

slide-19
SLIDE 19

Example

How to deal with the processing of natural language data?

Several types of linguistic information help to go from one side to the

  • ther

Those types of linguistic information are more or less autonomous Each interacts with others

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 13 / 47

slide-20
SLIDE 20

Example

How to deal with the processing of natural language data?

Example

Query to a kiosk to get train schedule (by the mean of human speech) Location: V¨ aster˚ as Station Question: What time is the first train to Stockholm, tomorrow morning?

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 14 / 47

slide-21
SLIDE 21

Example

How to deal with the processing of natural language data

First step

Speech processing and recognition Computing of the speech signal to the words of the question (Phonetics and phonology)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 15 / 47

slide-22
SLIDE 22

Example

Phonetics and Phonology

Phonetics: study of the sound of human speech (phones)

From the physical point of view More related to Speech processing

Phonology: Study of the groups of sound to make words or utterances in a natural language

From the linguistic point of view (phonemes) Organisation of the sounds, syllabs, rhymes, etc. Related to the meaning

The both also include the study of sign languages

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 16 / 47

slide-23
SLIDE 23

Example

How to deal with the processing of natural language data

Second step

Morphological analysis Description of the words regarding their form (morpheme) Recognition of the

What[what,WDT] time[time,NN:n,sg] is[be,VBZ:3,sg,ind,pres] the[the,DT] first[first,ADJ:num,ord] train[train,NN:sg] to[to,PREP] Stockholm[Stockholm,NAM] , tomorrow[tomorrow,NN:sg] morning[morning,NN:sg]?

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 17 / 47

slide-24
SLIDE 24

Example

How to deal with the processing of natural language data

Second step

Morphological analysis Description of the words regarding their form (morpheme) Recognition of the

Canonical form (dictionary entry)

What[what,WDT] time[time,NN:n,sg] is[be,VBZ:3,sg,ind,pres] the[the,DT] first[first,ADJ:num,ord] train[train,NN:sg] to[to,PREP] Stockholm[Stockholm,NAM] , tomorrow[tomorrow,NN:sg] morning[morning,NN:sg]?

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 17 / 47

slide-25
SLIDE 25

Example

How to deal with the processing of natural language data

Second step

Morphological analysis Description of the words regarding their form (morpheme) Recognition of the

Canonical form (dictionary entry) Part of speech (grammatical category)

What[what,WDT] time[time,NN:n,sg] is[be,VBZ:3,sg,ind,pres] the[the,DT] first[first,ADJ:num,ord] train[train,NN:sg] to[to,PREP] Stockholm[Stockholm,NAM] , tomorrow[tomorrow,NN:sg] morning[morning,NN:sg]?

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 17 / 47

slide-26
SLIDE 26

Example

How to deal with the processing of natural language data

Second step

Morphological analysis Description of the words regarding their form (morpheme) Recognition of the

Canonical form (dictionary entry) Part of speech (grammatical category) Inflectional parameters (gender, number, ...)

What[what,WDT] time[time,NN:n,sg] is[be,VBZ:3,sg,ind,pres] the[the,DT] first[first,ADJ:num,ord] train[train,NN:sg] to[to,PREP] Stockholm[Stockholm,NAM] , tomorrow[tomorrow,NN:sg] morning[morning,NN:sg]?

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 17 / 47

slide-27
SLIDE 27

Example

How to deal with the processing of natural language data

Third step

Parsing (syntactic analysis) Combination of the words to make sentences Two points of view: Recognition of

The constituents of the sentence (noun phrases, verbal phrases, adjectival phrases, ...) The dependency between the words (modifier of a noun, subject of a verb, ...

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 18 / 47

slide-28
SLIDE 28

Example

How to deal with the processing of natural language data

Third step

(output of the Stanford parser) det(time-2, What-1) attr(is-3, time-2) det(train-6, the-4) amod(train-6, first-5) nsubj(is-3, train-6) prep_to(train-6, Stockholm-8) nn(morning-11, tomorrow-10) appos(Stockholm-8, morning-11)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 19 / 47

slide-29
SLIDE 29

Example

How to deal with the processing of natural language data

Third step

(output of the Stanford parser) (ROOT (SBARQ (WHNP (WDT What) (NN time)) (SQ (VBZ is) (NP (NP (DT the) (JJ first) (NN train)) (PP (TO to) (NP (NP (NNP Stockholm)) (, ,) (NP (NN tomorrow) (NN morning)))))) (? ?)))

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 20 / 47

slide-30
SLIDE 30

Example

How to deal with the processing of natural language data

Fourth step

Semantic analysis Identification of the

meaning of the words or phrases semantic relations between them

Without taking into account the context Logic can be used to represent semantics of a sentence

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 21 / 47

slide-31
SLIDE 31

Example

How to deal with the processing of natural language data

Fourth step

train → object, mode of transportation first → first answer first train? Stockholm → Location/City/railway station/direction/destination (Stockholm C) What time → Hour? Tomorrow → (next day) Today + 1 day (12th of March, 2014) morning → (daytime, day period) 8H00-12h00? 7H00-12H00? before noon? ...

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 22 / 47

slide-32
SLIDE 32

Example

How to deal with the processing of natural language data

Fifth step

Pragmatics Semantic interpretation of the sentence according to the context Contextual information:

departure? (V¨ aster˚ as - V¨ aster˚ as C) date (today)? (11th of March, 2014) the results are sort by time (of departure) need of the schedule

but also, reference resolution (anaphora)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 23 / 47

slide-33
SLIDE 33

Example

How to deal with the processing of natural language data

Fifth step

Translation in SQL query: ad-hoc methods or compilation methods

SELECT MIN ( startHour ) FROM t r a i n WHERE departureDay=’ 03/12/2014 ’ AND departureLocation=’ V¨ aster˚ as ’ AND a r r i v a l L o c a t i o n=’ Stockholm ’ AND departureHour < 12:00 AND departureHour > 7 : 0 0 ;

(The answer is 7:04)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 24 / 47

slide-34
SLIDE 34

Example

How to deal with the processing of natural language data

Comments

In the real, the kiosk could need more information (need of a human/machine dialogue) What I didn’t say/ask (yet?): Direct train Track (at V¨ aster˚ as C and/or Stockholm C) Travel time Class Buy a ticket Return ticket Price Rebookable or not, Refundable For adult or child Number of tickets

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 25 / 47

slide-35
SLIDE 35

Example

How to deal with the processing of natural language data

here and back again

Answer generation Translation of the query result into a textual form The first train to Stockholm is at 7:04, tomorrow In case of spoken answer, speech synthesis of the text

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 26 / 47

slide-36
SLIDE 36

Example

How to deal with the processing of natural language data?

Two directions: Analysis of language data (textual data or human speech) towards (more or less) the understanding of the message Generation of language data (textual data or speech synthesis) towards a linguistic realisation Usually, NLP deals with the sentences

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 27 / 47

slide-37
SLIDE 37

Introduction to NLP approaches

How to deal with the processing of natural language data?

Two paradigms for processing texts: Symbolic paradigm: extraction of linguistic information with symbolic information or linguistic resources Use of dictionary, grammars, rules Stochastic paradigm: use of stochastic approaches to extract linguistic information from textual corpora Use of machine learning (classification, decision trees, ...) The both can be mixed

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 28 / 47

slide-38
SLIDE 38

Introduction to NLP approaches

Presentations of NLP approaches

Focus on NLP for acquisition and text understanding More or less with symbolic approaches Use of electronic texts: collection of textual documents

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 29 / 47

slide-39
SLIDE 39

Introduction to NLP approaches

Which documents?

Texts or collection of texts: textual corpora Great variations: Electronic formats (raw text, HTML, XML, PDF, Word, etc.) Character encoding (ASCII, ISO-LATIN-1, windows-1252, UTF-8, etc.) Type of documents (web pages, blogs, scientific articles, journal articles, books, tables, support group messages, emails, sms, etc.) Size: from few Kilo-bytes to several Giga-bytes → Work on raw text

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 30 / 47

slide-40
SLIDE 40

Introduction to NLP approaches

Raw text

Medline abstract

1: Biosci Biotechnol Biochem. 2003 Aug;67(8):1825-7. Related Articles, Links Comparative Analyses of Hairpin Substrate Recognition by Escherichia coli and Bacillus subtilis Ribonuclease P Ribozymes. Ando T, Tanaka T, Kikuchi Y. Division of Bioscience and Biotechnology, Department of Ecological Engineering, Toyohashi University of Technology. Previously, we reported that the substrate shape recognition of the Escherichia coli ribonuclease (RNase) P ribozyme depends on the concentration of magnesium ion in vitro. We additionally examined the Bacillus subtilis RNase P ribozyme and found that the

  • B. subtilis enzyme also required high magnesium ion, above 10 mM,

for cleavage of a hairpin substrate. The results of kinetic studies showed that the metal ion concentration affected both the catalysis and the affinity of the ribozymes toward a hairpin RNA substrate. PMID: 12951523 [PubMed - in process]

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 31 / 47

slide-41
SLIDE 41

Introduction to NLP approaches

HTML

Web page

<h1 id="sidrubrik">About me - Sergei Silvestrov</h1> <div id="artikeldata"> <h4 align="justify"><img hspace="7" vspace="7" align="right" src="/polopoly_fs/1.417 <p align="left"><strong>I am Professor in Mathematics and Applied Mathematics</strong></p> <p><strong>I am also the Subject Chair for the subject of</strong><strong> </strong> <p><strong>Mathematics and Applied Mathematics</strong></p> <p><strong>in M&auml;lardalen University</strong></p> <p><strong>(&Auml;mnesf&ouml;retr&auml;dare In Swedish)&nbsp;</strong></p> <p>&nbsp;<a href="/polopoly_fs/1.49763!/CV%20for%20Sergei%20Silvestrov%201%20sida.pdf" <p>&nbsp;<!--[if gte mso 9]>

→ Need to extract the content

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 32 / 47

slide-42
SLIDE 42

Introduction to NLP approaches

Raw text in the Web page

About me - Sergei Silvestrov I am Professor in Mathematics and Applied Mathematics I am also the Subject Chair for the subject of Mathematics and Applied Mathematics in M¨ alardalen University (¨ Amnesf¨

  • retr¨

adare In Swedish) CV (short, 1 page, pdf)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 33 / 47

slide-43
SLIDE 43

Introduction to NLP approaches

Pre-processing

Cleaning of the texts (HTML markup) Homogenisation of the encoding charset Extra-linguistic normalisation

duplicated blank characters hyphenation font marks typographic ligatures: difference, specific long dash: – (--) ...

To be continued...

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 34 / 47

slide-44
SLIDE 44

Formal language vs. Natural language

Formal language vs. Natural language

Formal language L: (possibly infinite) set of words Σ∗

  • ver a finite alphabet of symbols Σ

word: finite sequence of symbols of the alphabet (syntactic) rules used to decide if a word belong to L typical examples: regular expression, context-free grammar Formal language: raw approximation of natural language tool for analysing texts

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 35 / 47

slide-45
SLIDE 45

Formal language vs. Natural language

Formal language vs. Natural language

Formal language:

concatenation of symbols to make the words of the language (possibly infinitely) words have two sides: form and meaning

Natural language:

words are concatenated to make utterances/sentences (possibly infinitely) sentences have two sides: sound (or string) and meaning

→ Formalisation of grammars for Natural Language (Chomsky 1956)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 36 / 47

slide-46
SLIDE 46

Formal language vs. Natural language

Formal language vs. Natural language

But... Ambiguities:

Avoid/rejected by formal languages Very important in natural languages (several linguistic structures can be associated to a sentence)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 37 / 47

slide-47
SLIDE 47

Formal language vs. Natural language

Formal language vs. Natural language

Ambiguity appears at any linguistic levels: phonologic: I scream / Ice cream lexical: (river) bank / bank (financial institution) unlockable (impossible to lock / possible to unlock) saw (to see / to saw) syntactic: Mary ate a salad with spinach Mary ate a (salad with spinach) Mary (ate a salad) with spinach

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 38 / 47

slide-48
SLIDE 48

Formal language vs. Natural language

Formal language vs. Natural language

semantic: The police were ordered to stop drinking after midnight. A sailor was dancing with a wooden leg. Teacher strikes idle kids anaphoric: Margaret invited Susan for a visit, and she gave her a good lunch. → All the above sentences can be correct (but one meaning can be most probable than the other) The interpretation depends on the context

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 39 / 47

slide-49
SLIDE 49

Evaluation

How to measure effectiveness of a NLP system?

Evaluation

→ In general, evaluation measures issued from Information Retrieval and Machine Learning Require a gold standard (all the good answers) Let the contingency table: Gold standard YES NO system YES True Positive (TP) False Positive (FP) answers NO False Negative (FN) True Negative (TN)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 40 / 47

slide-50
SLIDE 50

Evaluation

How to measure effectiveness of a NLP system?

Evaluation

Three measures: Precision: Pi =

TPi TPi+FPi

Recall Ri =

TPi TPi+FNi

F-measure: avoid the difficulty to compare systems with two measures (harmonic mean of the precision and recall) Fβ = (β2+1)×P×R

β2P+R

(usually β = 1)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 41 / 47

slide-51
SLIDE 51

Evaluation

Gold standard

How to build a gold standard? Need of human annotators Time consuming Not so easy to build for some tasks Annotators can make different choices (inter-annotator agreement varies according to the task difficulties) Impossible to build on Terabyte of data Alternative: silver standard (combination of the results of several systems)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 42 / 47

slide-52
SLIDE 52

Evaluation

Local vs. global evaluation

If there is several class, need of measuring the effectiveness at the individual or class level: Microaveraging: Sum over all the individual instances (without taking into account the class)

Precision: Pµ =

|C|

i=1 TPi

|C|

i=1(TPi+FPi)

Recall Rµ =

|C|

i=1 TPi

|C|

i=1(TPi+FNi)

Macroaveraging: Evaluation by class (locally) then by averaging over the results by class (globally)

Precision: PM =

|C|

i=1 Pi

|C|

Recall RM =

|C|

i=1 Ri

|C|

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 43 / 47

slide-53
SLIDE 53

Evaluation

Other evaluation metrics

In some application there is not only one good answer (translation, rewriting, summarization)

BLEU (Bilingual Evaluation Understudy) NIST metric METEOR (Metric for Evaluation of Translation with Explicit ORdering) ROUGE, or Recall-Oriented Understudy for Gisting Evaluation ...

The correct answer among the n first ranked answers : P@n (Precision among the n first answer) Evaluation of the accuracy, sensibility, utility, ... How to measure the satisfaction of the final users? Finally, comparing systems required to evaluate the statistical significance

  • f their results (t-test, randomisation testing)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 44 / 47

slide-54
SLIDE 54

Evaluation

Evaluation campaigns and challenges

Message Understanding Conferences (MUC): Information extraction (1987-1997) Text Retrieval TREC: Information retrieval (since 1992), Several tracks (and new tracks each years) on chemistry-related documents, medical documents, about crowdsourcing, etc. Cross-Language Evaluation Forum (CLEF) / Conference and Labs of the Evaluation Forum: since 2000 Several Traks: Question-Answering, web people search, XML retrieval (INEX), Reputation management technologies, etc. BioCreative (Information extraction in Biology): three campaigns since 2004 I2B2 NLP Challenge (processing of clinical data) since 2008 ...

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 45 / 47

slide-55
SLIDE 55

Evaluation

Conclusion

How to analyse natural language data automatically? Need of a lot of various information

Linguistic information Contextual knowledge (general and specific to the task)

Several steps of analysis based on

Various approaches (formal language/grammar, rules, machine learning) Linguistic resources (dictionaries)

Evaluation based on data (no formal evaluation)

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 46 / 47

slide-56
SLIDE 56

Evaluation

Conclusion

Examples of full analysis of a sentence

Types of analysis The cat eats the mouse Part-of-speech DET NOUN VERB DET NOUN Fondamental Structure Subject Predicate Constituents SN SV SN Functions Subject Verb Object Thematic Roles Topic Focus Semantic Roles Agent Action Patient Modality Assertion

Thierry Hamon (LIMSI & Paris Nord) Introduction to NLP March 2014 47 / 47