Recycling Named Entity Taggers Unsupervised Domain and Language - - PowerPoint PPT Presentation

recycling named entity taggers
SMART_READER_LITE
LIVE PREVIEW

Recycling Named Entity Taggers Unsupervised Domain and Language - - PowerPoint PPT Presentation

Recycling Named Entity Taggers Unsupervised Domain and Language Adaptation for Named Entity Recognition based on Parallel Corpora Master thesis of Chrysoula Zerva EPFL supervisor SONY supervisor Dr Martin Rajman Dr Wilhelm Haag Outline


slide-1
SLIDE 1

Recycling Named Entity Taggers

Unsupervised Domain and Language Adaptation for Named Entity Recognition based on Parallel Corpora

Master thesis of

Chrysoula Zerva

EPFL supervisor Dr Martin Rajman SONY supervisor Dr Wilhelm Haag

slide-2
SLIDE 2

Outline

  • Named Entity Recognition task

○ Definition, Process, Evaluation

  • Importance

  • f NER

  • f Language (& domain) adaptation
  • Core System

○ Architecture, Early results, Problems

  • Evaluation

○ Final Results and Error analysis

  • Conclusions
slide-3
SLIDE 3

Named Entity Recognition

slide-4
SLIDE 4

Named Entity: Definition

Named entities : Atomic elements (in a text) that consist of one or more consecutive words and belong to predefined categories (labels).

slide-5
SLIDE 5

Named Entity: Definition

Named entities : Atomic elements (in a text) that consist of one or more consecutive words and belong to predefined categories (labels). Common labels: ORGANISATION, PERSON, LOCATION

slide-6
SLIDE 6

Named Entity: Definition

Named entities : Atomic elements (in a text) that consist of one or more consecutive words and belong to predefined categories (labels). Common labels: ORGANISATION, PERSON, LOCATION The word sequence has to refer to a particular representation of the label For example: The president failed to explain the new military policy→ NO NE The president Barack Obama failed to explain the new military policy

slide-7
SLIDE 7

Named Entity Recognition : LABELS

Name expressions: PERSON: Mr Thomson explained... NORP: The swiss law prohibits... FACILITY: Our reporter at the White House... ORGANIZATION: EPFL is located near... GPE: Lausanne has a population of... LOCATION: The situation in the Balkans is... PRODUCT: He is driving an SUV car... EVENT: After the second world war the ... WORK OF ART: “Lord of the Rings” is a three ... LAW: In the European Constitution... LANGUAGE: English is an international... People, including fictional Nationalities or religious or political groups Buildings, airports, highways, bridges Companies, agencies, institutions Countries, cities, states, administrative areas Non-GPE locations, mountains, rivers Vehicles, weapons, foods (Not services) Named hurricanes, battles, sports events Titles of books, songs Named documents made into laws Any named language

slide-8
SLIDE 8

Named Entity Recognition : LABELS

Time and Date expressions: DATE: Absolute or relative dates or periods TIME: Times smaller than a day PERCENT: Percentage (including “%”) MONEY: Monetary values, including unit QUANTITY: Measurements, as of weight or distance ORDINAL: “first”, “second”, etc CARDINAL: Numerals that do not fall under another type Last year the results... Tomorrow at noon ... An estimated 5% of the people... A monthly salary of 5000$ It weighs 3 pounds. The first time that I ... At least three people

slide-9
SLIDE 9

Named Entity Recognition : LABELS

Ontonotes (pre-annotated) Europarl (non-annotated) Europarl TestSet F-Score (Europarl) Fscore (Ontonotes)

ORG PERSON CARDINAL MONEY ORDINAL TIME WORK_OF_ART FAC LAW GPE DATE NORP PERCENT LOC QUANTITY EVENT PRODUCT LANGUAGE

Choosing criterion : Sufficient training resources

1.00 0.80 0.60 0.40 0.20 0.00

Label distribution vs F-score performance

slide-10
SLIDE 10

Named Entity Recognition

Step 1: Named Entity Identification Step 2: Named Entity Classification

slide-11
SLIDE 11

Named Entity Recognition

Step 1: Named Entity Identification

Classify every token under the following set of labels: (BIOES scheme) Label Meaning B beginning of NE I inside NE O

  • utside NE

E end of NE S single NE

Step 2: Named Entity Classification

Classify the tokens that are part of a NE under given set of predefined labels

ORGANISATION PERSON LOCATION CARDINAL PERCENT ORDINAL NORP GPE DATE

slide-12
SLIDE 12

Feature Extraction

slide-13
SLIDE 13

Feature Extraction:

Performed always: Before training AND Before parsing

Preprocessing:

  • tokenization
  • Part Of Speech tagging
  • Use of gazetteers, lexicons containing NEs

Feature categories:

  • Character-based (N-grams, Capitalised, All-Capitalised, Special Character, Numeric)
  • Lexical (included in a gazetteer, lexicon, wordForm, left WordForm, right wordForm)
  • Grammatical (Genitive, POS tag, left POS tag, right POS tag)
  • Other (position in sentence, context (sequence of words))

++ Combined Features : pair combination of the above

slide-14
SLIDE 14

We deal with a horrific story in Kosovo

Feature extraction: Example

We 0 0 1

...

0 1 1 0 0 0 1

...

0 0 1 0 1 1 1 0 0 0 0 0 1 1 1

...

1 0 0 0 1 0 1 1 0 1 1 O O are 0 0 0

...

0 1 1 1 1 0 1

...

0 0 0 0 1 0 1 1 0 0 1 0 1 1 0

...

0 0 1 0 1 0 0 0 1 0 1 O O dealing 1 0 0

...

1 1 0 0 1 1 0

...

0 1 1 0 1 1 1 1 1 1 0 1 1 0 1

...

0 0 0 1 0 0 1 1 0 1 0 O O with 1 1 1

...

0 1 1 0 0 1 1

...

1 0 0 1 1 0 0 1 0 1 1 0 0 1 1

...

1 0 0 0 0 0 0 0 1 0 1 O O a 0 0 1

...

1 1 0 1 1 0 0

...

1 1 0 0 1 1 1 0 1 0 1 0 0 1 0

...

1 1 0 1 0 0 1 0 1 0 1 O O horrific 1 0 1

...

0 1 1 1 0 0 1

...

1 1 0 0 0 0 0 1 0 0 0 1 0 1 1

...

1 1 1 0 0 1 0 1 0 1 1 O O situation 1 1 1

...

0 1 0 1 0 1 0

...

0 1 1 0 1 1 0 1 0 1 0 1 1 0 0

...

0 0 1 0 0 1 1 0 0 0 1 O O in 0 1 1

...

1 0 1 0 0 1 0

...

1 1 1 0 1 1 0 1 0 1 1 0 1 1 1

...

0 0 0 1 0 1 0 1 0 1 0 O O Kosovo 1 0 0

...

0 0 0 1 0 1 0

...

0 0 1 0 1 0 1 0 0 0 0 1 0 0 0

...

0 1 0 1 1 0 1 0 0 0 1 I GPE . 0 1 1

...

0 1 0 1 1 1 1

...

1 1 1 0 1 1 1 1 1 1 1 1 0 0 1

...

0 0 1 1 0 0 0 1 1 1 0 O O

N-grams Right word Capitalised Numeric Genitive Similarly for the rest of the features... IDENTIFICATION CLASSIFICATION

slide-15
SLIDE 15

Evaluation: Metrics and Methods

slide-16
SLIDE 16

Original Output1 Output2 Output3 Output 4 Output5 Output6 Output7 The B-L1 B-L1 O O O O B-L2 O European I-L1 I-L1 B-L1 O I-L1 I-L2 I-L2 O Parliament E-L1 E-L1 E-L1 I-L1 I-L1 I-L1 E-L2 O

Evaluation: Exact and Partial Matches

Evaluation metrics: Precision, Recall, F-score

slide-17
SLIDE 17

Evaluation: Exact and Partial Matches

Exact matches: Correctly identified NE: Assuming a NE (word sequence) that is labelled as L1, all tokens in the NE are attributed identical labelling to the original Partial matches: Correctly identified NE: Assuming a NE (word sequence) that is labelled as L1, at least one token in the NE is also labelled as L1

slide-18
SLIDE 18

Evaluation: Example

Tokens Original Attributed to O O talk O O to O O Minister O O Ringholm I-PERSON O , O O to O O members O O

  • f

O O the B-ORG O Swedish I-ORG I-NORP parliament E-ORG O . O O Tokens Original Attributed I O O will O O leave O O for O O Stockholm I-GPE I-GPE

  • n

O O Monday B-DATE I-DATE , I-DATE O 6 I-DATE B-DATE March E-DATE E-DATE , O O in O O

  • rder

O O

exact + partial partial

slide-19
SLIDE 19

Importance of NER

...and of Recycling it...

slide-20
SLIDE 20

Why is efficient NER important?

Applications of NER in NLP

Generally NER is an important first step in extracting meaningful information from text

  • Provide keywords for indexing documents

○ news recommenders : document clustering, user profiles ○ document classification/retrieval ○ search engines ○ Automated keyword extraction

  • Entities (especially proper names) point to objects about which we need to define

relations, roles, events ○ question answering: refers to “grounding” named entities to a model, defining their scope and role ○ semantic parsing ○ coreference resolution

slide-21
SLIDE 21

Why Recycle?

Need for multilingual NLP applications multilingual NE recognition sufficient resources and tools for English BUT resources are fewer and expensive… manual annotation requires time and manpower not very flexible method to acquire a new corpus for every adaptation need

slide-22
SLIDE 22

Why Recycle?

Adaptation to other domains is also important: New domains require NER (biology, medicine, scientific texts) Even top scorers in evaluation campaigns fail to perform well on different test sets (drop of 10%-30%) [1],[2]

slide-23
SLIDE 23

What to Recycle?

Available : One NE tagger trained for Ontonotes corpus English news Broadcasts Conll 2012 labels F-score performance : 74% - 79%

  • English
  • News Articles

Exact matches

slide-24
SLIDE 24

What to Recycle?

Available : One NE tagger trained for

Used for:

  • news recommender (main application)
  • conference management tool
  • coreference resolution
  • sentiment analysis
  • English
  • News Articles
slide-25
SLIDE 25

Recycling Scheme: Core System Architecture

slide-26
SLIDE 26

Recycling Scheme: Core System Architecture

Source Language NE tagger SC TC transfer Target Language NE tagger

SYSTEM

slide-27
SLIDE 27

Recycling Scheme: Core System Architecture

Existing NE tagger SC Source Corpus Existing NE tagger European Parliament Proceedings (EuroParl) English - French English - Greek TC Target Corpus

slide-28
SLIDE 28

Recycling Scheme: Core System Architecture

Manually Annotated Source Language corpus Source Language Parallel corpus Target Language Parallel corpus Train Source Language NE Tagger Target Language NE Tagger Train Parse Phase 1 Phase 2 Phase 3

Transfer NEs

slide-29
SLIDE 29

Early Results

slide-30
SLIDE 30

Early Results:

Exact Match Partial Match Precision Recall F-score Precision Recall F-score English Europarl 69.06 67.3 68.17 87.5 73.3 80.01 French EuroParl 63.23 53.41 57.91 74.88 74.05 74.46 Greek EuroParl 50.77 45.18 47.81 68.34 75.76 71.86 English Ontonotes 80.24 78.81 79.52 83.2 96.16 89.21

Need also to adapt to other domains...

slide-31
SLIDE 31

Domain Adaptation

slide-32
SLIDE 32

Not only in NER

Domain adaptation is necessary in various machine learning approaches:

  • In multiple NLP tasks:

○ spam filtering (adapt to new users) ○ Semantic parsing ○ Syntactic parsing ○ Speech recognition ○ Sentiment analysis

  • And in other tasks:

○ computer vision applications (image recognition etc) ○ note recognition/music processing ○ localisation problems (indoor WiFi localization)

slide-33
SLIDE 33

What is domain difference

Different domains ⇒ different instances ○ Generalists→ Appear in both domains in the same way/context, easy to classify

■ UNICEF is a non-governmental organisation

○ Bridges→ Appear in both domains in different ways/contexts, not always easy to correctly classify

■ Buyers also have to pay a 10 percent commission to StubHub (Ontonotes) ■ The Commission cannot be held responsible for the situation (Europarl)

○ Specialists→ Appear exclusively in one domain, hard to identify in the other

■ Berger Report (A5-0017/2000): (Europarl)

slide-34
SLIDE 34

What is domain difference

  • Different sentence structure (syntax)

○ EuroParl: Formal speech, common use of 1st, (2nd) person ○ Ontonotes: Less formal speech, more common use of 3rd person

  • Average sentence length:

○ Ontonotes: 21 words per sent ○ EuroParl: 30 words per sent

slide-35
SLIDE 35

Improving the initial system: Identifying the error sources

slide-36
SLIDE 36

Improving the initial architecture

Manually Annotated Source Language corpus Source Language Parallel corpus Target Language Parallel corpus Train Source Language NE Tagger Target Language NE Tagger

Train

Parse

Phase 1 Phase 2 Phase 3

Domain Difference Alignment errors Language Difference Align NEs Translation Errors

slide-37
SLIDE 37

Final Architecture & Results

slide-38
SLIDE 38

Final Architecture

slide-39
SLIDE 39

Final Results

slide-40
SLIDE 40

How to evaluate: What to compare

In both cases we want to compare with the performance BEFORE the error source

Manually Annotated Source Language corpus Source Language Parallel corpus Target Language Parallel corpus Train Source Language NE Tagger Target Language NE Tagger

Train

Parse

Phase 1 Phase 2 Phase 3

Domain Difference Language Difference Align NEs

slide-41
SLIDE 41

Final Results

ENGLISH:

Precision Recall F-score EuroParl Initial Architecture 69.06 67.3 68.17 EuroParl Final Architecture 73 68.45 70.65 Ontonotes 80.24 78.81 79.52 Precision Recall F-score EuroParl Initial Architecture 87.5 73.3 80.01 EuroParl Final Architecture 86.22 78.66 82.27 Ontonotes 83.2 96.16 89.21

Exact matches Partial matches

slide-42
SLIDE 42

Final Results

FRENCH:

Precision Recall F-score initial French 63.23 53.41 57.91 final French 71.71 64.04 67.66 final English 73 68.45 70.65 Precision Recall F-score initial French 74.88 74.05 74.46 final French 79.06 83.49 81.21 final English 78.66 86.22 82.27 Precision Recall F-score initial Greek 50.77 45.18 47.81 final Greek 63.65 56.20 59.69 final English 73 68.45 70.65 Precision Recall F-score initial Greek 68.34 75.76 71.86 final Greek 74.96 79.34 77.09 final English 78.66 86.22 82.27

GREEK: Exact matches Exact matches Partial matches Partial matches

slide-43
SLIDE 43

Final Results : Focusing in labels

PER LABEL RESULTS: English Exact Matches Partial Matches

Precision Recall Fscore CARDINAL 88.06 74.68 80.82 DATE 74.42 71.91 73.14 GPE 83.33 77.32 80.21 LOC 89.36 79.25 84 NORP 67.5 80.6 73.47 ORDINAL 89.13 93.18 91.11 ORG 58.17 54.06 56.04 PERCENT 85.71 85.71 85.71 PERSON 86.27 65.67 74.58 OVERALL 73 68.45 70.65 Recall Precision Fscore CARDINAL 88.24 75 81.08 DATE 90.8 87.78 89.27 GPE 91.01 83.51 87.1 LOC 93.62 83.02 88 NORP 73.75 86.76 79.73 ORDINAL 89.13 93.18 91.11 ORG 77.47 69.26 73.13 PERCENT 100 100 100 PERSON 90 68.18 77.59 OVERALL 78.66 86.22 82.27

slide-44
SLIDE 44

Final Results : Focusing in labels

PER LABEL RESULTS: English:

Precision Recall Fscore CARDINAL 88.06 74.68 80.82 DATE 74.42 71.91 73.14 GPE 83.33 77.32 80.21 LOC 89.36 79.25 84 NORP 67.5 80.6 73.47 ORDINAL 89.13 93.18 91.11 ORG 58.17 54.06 56.04 PERCENT 85.71 85.71 85.71 PERSON 86.27 65.67 74.58 OVERALL 73 68.45 70.65

slide-45
SLIDE 45

Final Results: ORG Error patterns

1. Non-ORG capitalised entities (domain specific): Eg: MEDA program (Europarl) UNICEF (Ontonotes) 2. ORG entities (bridges?): Eg: Commission: Is considered an ORG only in EuroParl ■ Buyers also have to pay a 10 percent commission to StubHub (Ontonotes) ■ The Commission cannot be held responsible for the situation (Europarl)

slide-46
SLIDE 46

Final Results: ORG Error patterns

3. Very long ORG entities: 4. Confusing with NORP:

slide-47
SLIDE 47

Final Results: ORG Error patterns

The most often occurring error patterns are related to word sequences appearing in both domains but with different roles (bridges)

slide-48
SLIDE 48

Final Results : Error Propagation

PER LABEL RESULTS:

Precision Recall Fscore CARDINAL 88.06 74.68 80.82 DATE 74.42 71.91 73.14 GPE 83.33 77.32 80.21 LOC 89.36 79.25 84 NORP 67.5 80.6 73.47 ORDINAL 89.13 93.18 91.11 ORG 58.17 54.06 56.04 PERCENT 85.71 85.71 85.71 PERSON 86.27 65.67 74.58 OVERALL 73 68.45 70.65

CARDINAL 74.07 64.52 68.97 DATE 66.67 57.47 61.73 GPE 76.14 73.63 74.86 LOC 68.09 60.38 64 NORP 75.31 84.72 79.74 ORDINAL 79.07 89.47 83.95 ORG 64.89 53.09 58.4 PERCENT 100 71.43 83.33 PERSON 88 69.84 77.88 OVERALL 71.71 64.04 67.66 CARDINAL 64.58 73.81 68.89 DATE 39.51 39.51 39.51 GPE 63.1 52.48 57.3 LOC 76.09 68.63 72.16 NORP 64.56 72.86 68.46 ORDINAL 69.44 80.65 74.63 ORG 64.85 46.79 54.36 PERCENT 100 100 100 PERSON 74.14 68.25 71.07 OVERALL 63.65 56.2 59.69

ENGLISH: GREEK: FRENCH:

slide-49
SLIDE 49

Final Results : Focusing in Greek

PER LABEL RESULTS:

CARDINAL 64.58 73.81 68.89 DATE 39.51 39.51 39.51 GPE 63.1 52.48 57.3 LOC 76.09 68.63 72.16 NORP 64.56 72.86 68.46 ORDINAL 69.44 80.65 74.63 ORG 64.85 46.79 54.36 PERCENT 100 100 100 PERSON 74.14 68.25 71.07 OVERALL 63.65 56.2 59.69

GREEK: DATE label includes expressions with structure not easy to define such as : daily, five years ago, the last five months etc Conclusion: Alignment error may increase for NEs that depend more on language structure → Classification error may increase as well

slide-50
SLIDE 50

Final Results : Focusing in Greek

CARDINAL 64.58 73.81 68.89 DATE 39.51 39.51 39.51 GPE 63.1 52.48 57.3 LOC 76.09 68.63 72.16 NORP 64.56 72.86 68.46 ORDINAL 69.44 80.65 74.63 ORG 64.85 46.79 54.36 PERCENT 100 100 100 PERSON 74.14 68.25 71.07 OVERALL 63.65 56.2 59.69 CARDINAL 66.67 76.19 71.11 DATE 77.33 72.5 74.84 GPE 83.53 70.3 76.34 LOC 84.78 76.47 80.41 NORP 70.89 80 75.17 ORDINAL 66.67 83.87 74.29 ORG 84.77 59.64 70.02 PERCENT 100 100 100 PERSON 79.31 73.02 76.03 OVERALL 74.96 79.34 77.09

Exact matches: Partial matches:

slide-51
SLIDE 51

Final Results: More on label confusion

Confusion matrix (Exact matches):

O CARD DATE GPE LOC NORP ORD ORG PERC PERS O 99.44 0.03 0.06 0.03 0.04 0.03 0.36 0.01 0.01 CARD 22.62 76.19 1.19 DATE 18.5 3.47 76.3 1.73 GPE 8.59 74.22 0.78 0.78 12.5 3.13 LOC 12.16 2.7 67.57 6.76 10.81 NORP 1.45 1.45 1.45 2.9 86.96 4.35 1.45 ORD 4.55 2.27 93.18 ORG 29.17 0.73 0.44 3.48 66.18 PERC 100 PERS 13.92 6.33 12.66 67.09

slide-52
SLIDE 52
  • Confusion of Labels I : ORG-NORP
  • Confusion of Labels II: PERSONS

The Fisher report states… Mr Fisher explained….

Final Results: More on label confusion

slide-53
SLIDE 53
  • Confusion of Labels ΙΙΙ : LOC-NORP
  • Confusion of Labels IV: LOC-ORG

Final Results: More on label confusion

slide-54
SLIDE 54

Final Results: Boundary Identification

Revisiting the Confusion matrix (Exact matches):

O CARD DATE GPE LOC NORP ORD ORG PERC PERS O 99.44 0.03 0.06 0.03 0.04 0.03 0.36 0.01 0.01 CARD 22.62 76.19 1.19 DATE 18.5 3.47 76.3 1.73 GPE 8.59 74.22 0.78 0.78 12.5 3.13 LOC 12.16 2.7 67.57 6.76 10.81 NORP 1.45 1.45 1.45 2.9 86.96 4.35 1.45 ORD 4.55 2.27 93.18 ORG 29.17 0.73 0.44 3.48 66.18 PERC 100 PERS 13.92 6.33 12.66 67.09

slide-55
SLIDE 55

Final Results: Boundary Identification

Identification evaluation:

Precision Recall Fscore ENGLISH 77.01 72.01 74.42 Exact Match 91.47 86.22 88.77 Partial Match FRENCH 76.91 68.77 72.61 Exact Match 95.1 83.47 88.9 Partial Match GREEK 68.8 60.91 64.62 Exact Match 92.01 79.34 85.21 Partial Match

Significant Boundary Identification Error

slide-56
SLIDE 56

Final Results: Boundary Identification

Incorrect Boundary examples:

slide-57
SLIDE 57

Questions:

  • How much does the error in the identification phase affects the error in the classification

phase? → Hard to evaluate

  • How important is the exact matching in terms of obtaining valuable results? Does it

makes sense to focus on partial matches? → Depends on the application

Final Results: Boundary Identification

slide-58
SLIDE 58

Conclusions:

  • Language Adaptation:

○ Overall satisfactory results with Parallel Corpora and alignment ■ French : less than 3% underperformance (compared to english) ■ Greek: significantly lower : 10% underperformance (compared to english) ○ Still room for improvement ■ perhaps combine alignment methods ■ use different translation lexicons/methods ■ experiment with different POS taggers and lemmatisers (Greek)

slide-59
SLIDE 59

Conclusions:

  • Domain Adaptation:

○ Less satisfactory results ■ Most significant improvement with Instance selection. ■ Improving methods do not combine in an additive way ■ Could be useful as a preprocessing step but there is room for improvement ○ It seems that fully unsupervised domain adaptation is not an easy task, but compromising with a semi-supervised idea could help ■ Exploit deep architectures ■ Active-learning : Supply correct examples ■ Experiment more with external knowledge sources?

slide-60
SLIDE 60

References:

[1] Massimiliano Ciaramita and Yasemin Altun. Named-entity recognition in novel domains with external lexical knowledge. in advances in structured learning for text and speech processing workshop, 2005. [2] Thierry Poibeau and Leila Kosseim. Proper name extraction from non-journalistic

  • texts. Language and computers, 37(1):144-157, 2001.

[3] Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (CONLL '03) [4] Nadeau, David, and Satoshi Sekine. "A survey of named entity recognition and classification." Lingvisticae Investigationes 30.1 (2007).

slide-61
SLIDE 61

Thank you!!! Questions?

slide-62
SLIDE 62

Additional Slides

slide-63
SLIDE 63

How to evaluate?

slide-64
SLIDE 64

How to evaluate: Baseline

Hard to identify a valid baseline method Most commonly used: Consider only the complete unambiguous named entities that appear in training data

CoNLL 2003: (English) 59.61% [3] MUC-6: (English) 21% [4]

slide-65
SLIDE 65

Baseline

A baseline SL method that is often proposed consists of tagging words of a test corpus when they are annotated as entities in the training corpus. The performance of the baseline system depends on the vocabulary transfer, which is the proportion of words, without repetitions, appearing in both training and testing corpus.

  • D. Palmer and Day (1997) calculated the vocabulary transfer on the MUC-6 training data.

They report a transfer of 21%, with as much as 42% of location names being repeated but only 17% of organizations and 13% of person names. Vocabulary transfer is a good indicator of the recall (number of entities identified over the total number of entities) of the baseline system but is a pessimistic measure since some entities are frequently repeated in Documents.

  • A. Mikheev et al. (1999) precisely calculated the recall of the baseline system on the MUC-7 corpus. They report a

recall of 76% for locations, 49% for organizations and 26% for persons with precision ranging from 70% to 90%. Whitelaw and Patrick (2003) report consistent results on MUC-7 for the aggregated enamex class. For the three enamex types together, the precision of recognition is 76% and the recall is 48%. Reference: Nadeau, David, and Satoshi Sekine. "A survey of named entity recognition and classification." Lingvisticae

Investigationes 30.1 (2007).

slide-66
SLIDE 66

How to evaluate: Baseline

English:

Precision Recall Fscore CARDINAL 20.44 70.89 31.73 DATE 39.58 64.04 48.93 GPE 40.91 74.23 52.75 LOC 63.79 69.81 66.67 NORP 65.75 71.64 68.57 ORDINAL 84.21 72.73 78.05 ORG 40.28 30.74 34.87 PERCENT 83.33 71.43 76.92 PERSON 23.68 13.24 16.98 OVERALL 39.39 51.21 44.53 Recall Precision Fscore CARDINAL 29.38 77.5 42.61 DATE 58.57 91.11 71.3 GPE 42.37 77.32 54.74 LOC 75.86 83.02 79.28 NORP 65.33 72.06 68.53 ORDINAL 84.21 72.73 78.05 ORG 93.33 49.47 64.67 PERCENT 85.71 85.71 85.71 PERSON 28.95 16.42 20.95 OVERALL 48.31 65.86 55.73

Exact matches Partial matches

slide-67
SLIDE 67

How to evaluate: Baseline

French:

Recall Precision Fscore CARDINAL 50 14.52 22.5 DATE 95.65 25.29 40 GPE 88 24.18 37.93 LOC 97.44 73.08 83.52 NORP ORDINAL 100 5.26 10 ORG 58.33 28.21 38.02 PERCENT 100 71.43 83.33 PERSON 34.29 18.75 24.24 OVERALL 68.12 27.92 39.61 Precision Recall Fscore CARDINAL 22.22 6.45 10 DATE 56.52 14.94 23.64 GPE 56 15.38 24.14 LOC 50 37.74 43.01 NORP ORDINAL 100 5.26 10 ORG 1.49 0.73 0.98 PERCENT 100 71.43 83.33 PERSON 25.71 14.06 18.18 Overall 24.47 9.22 13.4

Exact matches Partial matches

slide-68
SLIDE 68

How to evaluate: Baseline

Greek:

Precision Recall Fscore CARDINAL 71.43 23.26 35.09 DATE 44.44 14.81 22.22 GPE LOC NORP ORDINAL ORG PERCENT 100 71.43 83.33 PERSON 75 14.06 23.68 OVERALL 38.71 4.95 8.77 Recall Precision Fscore CARDINAL 71.43 23.26 35.09 DATE 92.59 31.25 46.73 GPE LOC NORP ORDINAL ORG 5.71 0.71 1.27 PERCENT 100 71.43 83.33 PERSON 100 18.75 31.58 OVERALL 59.34 8.24 14.47

Exact matches Partial matches

slide-69
SLIDE 69

How to evaluate: What to compare

Domain adaptation:

  • Need to evaluate: the english tagger on EuroParl
  • Need to assess: the adaptation to the new domain
  • Compare with: the performance of the English tagger on Ontonotes

Language adaptation:

  • Need to evaluate: the french and greek taggers
  • Need to assess: the knowledge transfer (NE transfer) across language
  • Compare with: the performance of the English tagger on Europarl