Recycling Named Entity Taggers
Unsupervised Domain and Language Adaptation for Named Entity Recognition based on Parallel Corpora
Master thesis of
Chrysoula Zerva
EPFL supervisor Dr Martin Rajman SONY supervisor Dr Wilhelm Haag
Recycling Named Entity Taggers Unsupervised Domain and Language - - PowerPoint PPT Presentation
Recycling Named Entity Taggers Unsupervised Domain and Language Adaptation for Named Entity Recognition based on Parallel Corpora Master thesis of Chrysoula Zerva EPFL supervisor SONY supervisor Dr Martin Rajman Dr Wilhelm Haag Outline
Unsupervised Domain and Language Adaptation for Named Entity Recognition based on Parallel Corpora
Master thesis of
EPFL supervisor Dr Martin Rajman SONY supervisor Dr Wilhelm Haag
○ Definition, Process, Evaluation
○
○
○ Architecture, Early results, Problems
○ Final Results and Error analysis
Named entities : Atomic elements (in a text) that consist of one or more consecutive words and belong to predefined categories (labels).
Named entities : Atomic elements (in a text) that consist of one or more consecutive words and belong to predefined categories (labels). Common labels: ORGANISATION, PERSON, LOCATION
Named entities : Atomic elements (in a text) that consist of one or more consecutive words and belong to predefined categories (labels). Common labels: ORGANISATION, PERSON, LOCATION The word sequence has to refer to a particular representation of the label For example: The president failed to explain the new military policy→ NO NE The president Barack Obama failed to explain the new military policy
Name expressions: PERSON: Mr Thomson explained... NORP: The swiss law prohibits... FACILITY: Our reporter at the White House... ORGANIZATION: EPFL is located near... GPE: Lausanne has a population of... LOCATION: The situation in the Balkans is... PRODUCT: He is driving an SUV car... EVENT: After the second world war the ... WORK OF ART: “Lord of the Rings” is a three ... LAW: In the European Constitution... LANGUAGE: English is an international... People, including fictional Nationalities or religious or political groups Buildings, airports, highways, bridges Companies, agencies, institutions Countries, cities, states, administrative areas Non-GPE locations, mountains, rivers Vehicles, weapons, foods (Not services) Named hurricanes, battles, sports events Titles of books, songs Named documents made into laws Any named language
Time and Date expressions: DATE: Absolute or relative dates or periods TIME: Times smaller than a day PERCENT: Percentage (including “%”) MONEY: Monetary values, including unit QUANTITY: Measurements, as of weight or distance ORDINAL: “first”, “second”, etc CARDINAL: Numerals that do not fall under another type Last year the results... Tomorrow at noon ... An estimated 5% of the people... A monthly salary of 5000$ It weighs 3 pounds. The first time that I ... At least three people
Ontonotes (pre-annotated) Europarl (non-annotated) Europarl TestSet F-Score (Europarl) Fscore (Ontonotes)
ORG PERSON CARDINAL MONEY ORDINAL TIME WORK_OF_ART FAC LAW GPE DATE NORP PERCENT LOC QUANTITY EVENT PRODUCT LANGUAGE
Choosing criterion : Sufficient training resources
1.00 0.80 0.60 0.40 0.20 0.00
Label distribution vs F-score performance
Step 1: Named Entity Identification Step 2: Named Entity Classification
Step 1: Named Entity Identification
Classify every token under the following set of labels: (BIOES scheme) Label Meaning B beginning of NE I inside NE O
E end of NE S single NE
Step 2: Named Entity Classification
Classify the tokens that are part of a NE under given set of predefined labels
ORGANISATION PERSON LOCATION CARDINAL PERCENT ORDINAL NORP GPE DATE
Performed always: Before training AND Before parsing
Preprocessing:
Feature categories:
++ Combined Features : pair combination of the above
We deal with a horrific story in Kosovo
We 0 0 1
...
0 1 1 0 0 0 1
...
0 0 1 0 1 1 1 0 0 0 0 0 1 1 1
...
1 0 0 0 1 0 1 1 0 1 1 O O are 0 0 0
...
0 1 1 1 1 0 1
...
0 0 0 0 1 0 1 1 0 0 1 0 1 1 0
...
0 0 1 0 1 0 0 0 1 0 1 O O dealing 1 0 0
...
1 1 0 0 1 1 0
...
0 1 1 0 1 1 1 1 1 1 0 1 1 0 1
...
0 0 0 1 0 0 1 1 0 1 0 O O with 1 1 1
...
0 1 1 0 0 1 1
...
1 0 0 1 1 0 0 1 0 1 1 0 0 1 1
...
1 0 0 0 0 0 0 0 1 0 1 O O a 0 0 1
...
1 1 0 1 1 0 0
...
1 1 0 0 1 1 1 0 1 0 1 0 0 1 0
...
1 1 0 1 0 0 1 0 1 0 1 O O horrific 1 0 1
...
0 1 1 1 0 0 1
...
1 1 0 0 0 0 0 1 0 0 0 1 0 1 1
...
1 1 1 0 0 1 0 1 0 1 1 O O situation 1 1 1
...
0 1 0 1 0 1 0
...
0 1 1 0 1 1 0 1 0 1 0 1 1 0 0
...
0 0 1 0 0 1 1 0 0 0 1 O O in 0 1 1
...
1 0 1 0 0 1 0
...
1 1 1 0 1 1 0 1 0 1 1 0 1 1 1
...
0 0 0 1 0 1 0 1 0 1 0 O O Kosovo 1 0 0
...
0 0 0 1 0 1 0
...
0 0 1 0 1 0 1 0 0 0 0 1 0 0 0
...
0 1 0 1 1 0 1 0 0 0 1 I GPE . 0 1 1
...
0 1 0 1 1 1 1
...
1 1 1 0 1 1 1 1 1 1 1 1 0 0 1
...
0 0 1 1 0 0 0 1 1 1 0 O O
N-grams Right word Capitalised Numeric Genitive Similarly for the rest of the features... IDENTIFICATION CLASSIFICATION
Original Output1 Output2 Output3 Output 4 Output5 Output6 Output7 The B-L1 B-L1 O O O O B-L2 O European I-L1 I-L1 B-L1 O I-L1 I-L2 I-L2 O Parliament E-L1 E-L1 E-L1 I-L1 I-L1 I-L1 E-L2 O
Evaluation metrics: Precision, Recall, F-score
Exact matches: Correctly identified NE: Assuming a NE (word sequence) that is labelled as L1, all tokens in the NE are attributed identical labelling to the original Partial matches: Correctly identified NE: Assuming a NE (word sequence) that is labelled as L1, at least one token in the NE is also labelled as L1
Tokens Original Attributed to O O talk O O to O O Minister O O Ringholm I-PERSON O , O O to O O members O O
O O the B-ORG O Swedish I-ORG I-NORP parliament E-ORG O . O O Tokens Original Attributed I O O will O O leave O O for O O Stockholm I-GPE I-GPE
O O Monday B-DATE I-DATE , I-DATE O 6 I-DATE B-DATE March E-DATE E-DATE , O O in O O
O O
exact + partial partial
Applications of NER in NLP
Generally NER is an important first step in extracting meaningful information from text
○ news recommenders : document clustering, user profiles ○ document classification/retrieval ○ search engines ○ Automated keyword extraction
relations, roles, events ○ question answering: refers to “grounding” named entities to a model, defining their scope and role ○ semantic parsing ○ coreference resolution
Need for multilingual NLP applications multilingual NE recognition sufficient resources and tools for English BUT resources are fewer and expensive… manual annotation requires time and manpower not very flexible method to acquire a new corpus for every adaptation need
Adaptation to other domains is also important: New domains require NER (biology, medicine, scientific texts) Even top scorers in evaluation campaigns fail to perform well on different test sets (drop of 10%-30%) [1],[2]
Available : One NE tagger trained for Ontonotes corpus English news Broadcasts Conll 2012 labels F-score performance : 74% - 79%
Exact matches
Available : One NE tagger trained for
Source Language NE tagger SC TC transfer Target Language NE tagger
Existing NE tagger SC Source Corpus Existing NE tagger European Parliament Proceedings (EuroParl) English - French English - Greek TC Target Corpus
Manually Annotated Source Language corpus Source Language Parallel corpus Target Language Parallel corpus Train Source Language NE Tagger Target Language NE Tagger Train Parse Phase 1 Phase 2 Phase 3
Transfer NEs
Exact Match Partial Match Precision Recall F-score Precision Recall F-score English Europarl 69.06 67.3 68.17 87.5 73.3 80.01 French EuroParl 63.23 53.41 57.91 74.88 74.05 74.46 Greek EuroParl 50.77 45.18 47.81 68.34 75.76 71.86 English Ontonotes 80.24 78.81 79.52 83.2 96.16 89.21
Domain adaptation is necessary in various machine learning approaches:
○ spam filtering (adapt to new users) ○ Semantic parsing ○ Syntactic parsing ○ Speech recognition ○ Sentiment analysis
○ computer vision applications (image recognition etc) ○ note recognition/music processing ○ localisation problems (indoor WiFi localization)
Different domains ⇒ different instances ○ Generalists→ Appear in both domains in the same way/context, easy to classify
■ UNICEF is a non-governmental organisation
○ Bridges→ Appear in both domains in different ways/contexts, not always easy to correctly classify
■ Buyers also have to pay a 10 percent commission to StubHub (Ontonotes) ■ The Commission cannot be held responsible for the situation (Europarl)
○ Specialists→ Appear exclusively in one domain, hard to identify in the other
■ Berger Report (A5-0017/2000): (Europarl)
○ EuroParl: Formal speech, common use of 1st, (2nd) person ○ Ontonotes: Less formal speech, more common use of 3rd person
○ Ontonotes: 21 words per sent ○ EuroParl: 30 words per sent
Manually Annotated Source Language corpus Source Language Parallel corpus Target Language Parallel corpus Train Source Language NE Tagger Target Language NE Tagger
Train
Parse
Phase 1 Phase 2 Phase 3
Domain Difference Alignment errors Language Difference Align NEs Translation Errors
In both cases we want to compare with the performance BEFORE the error source
Manually Annotated Source Language corpus Source Language Parallel corpus Target Language Parallel corpus Train Source Language NE Tagger Target Language NE Tagger
Train
Parse
Phase 1 Phase 2 Phase 3
Domain Difference Language Difference Align NEs
ENGLISH:
Precision Recall F-score EuroParl Initial Architecture 69.06 67.3 68.17 EuroParl Final Architecture 73 68.45 70.65 Ontonotes 80.24 78.81 79.52 Precision Recall F-score EuroParl Initial Architecture 87.5 73.3 80.01 EuroParl Final Architecture 86.22 78.66 82.27 Ontonotes 83.2 96.16 89.21
Exact matches Partial matches
FRENCH:
Precision Recall F-score initial French 63.23 53.41 57.91 final French 71.71 64.04 67.66 final English 73 68.45 70.65 Precision Recall F-score initial French 74.88 74.05 74.46 final French 79.06 83.49 81.21 final English 78.66 86.22 82.27 Precision Recall F-score initial Greek 50.77 45.18 47.81 final Greek 63.65 56.20 59.69 final English 73 68.45 70.65 Precision Recall F-score initial Greek 68.34 75.76 71.86 final Greek 74.96 79.34 77.09 final English 78.66 86.22 82.27
GREEK: Exact matches Exact matches Partial matches Partial matches
PER LABEL RESULTS: English Exact Matches Partial Matches
Precision Recall Fscore CARDINAL 88.06 74.68 80.82 DATE 74.42 71.91 73.14 GPE 83.33 77.32 80.21 LOC 89.36 79.25 84 NORP 67.5 80.6 73.47 ORDINAL 89.13 93.18 91.11 ORG 58.17 54.06 56.04 PERCENT 85.71 85.71 85.71 PERSON 86.27 65.67 74.58 OVERALL 73 68.45 70.65 Recall Precision Fscore CARDINAL 88.24 75 81.08 DATE 90.8 87.78 89.27 GPE 91.01 83.51 87.1 LOC 93.62 83.02 88 NORP 73.75 86.76 79.73 ORDINAL 89.13 93.18 91.11 ORG 77.47 69.26 73.13 PERCENT 100 100 100 PERSON 90 68.18 77.59 OVERALL 78.66 86.22 82.27
PER LABEL RESULTS: English:
Precision Recall Fscore CARDINAL 88.06 74.68 80.82 DATE 74.42 71.91 73.14 GPE 83.33 77.32 80.21 LOC 89.36 79.25 84 NORP 67.5 80.6 73.47 ORDINAL 89.13 93.18 91.11 ORG 58.17 54.06 56.04 PERCENT 85.71 85.71 85.71 PERSON 86.27 65.67 74.58 OVERALL 73 68.45 70.65
1. Non-ORG capitalised entities (domain specific): Eg: MEDA program (Europarl) UNICEF (Ontonotes) 2. ORG entities (bridges?): Eg: Commission: Is considered an ORG only in EuroParl ■ Buyers also have to pay a 10 percent commission to StubHub (Ontonotes) ■ The Commission cannot be held responsible for the situation (Europarl)
3. Very long ORG entities: 4. Confusing with NORP:
The most often occurring error patterns are related to word sequences appearing in both domains but with different roles (bridges)
PER LABEL RESULTS:
Precision Recall Fscore CARDINAL 88.06 74.68 80.82 DATE 74.42 71.91 73.14 GPE 83.33 77.32 80.21 LOC 89.36 79.25 84 NORP 67.5 80.6 73.47 ORDINAL 89.13 93.18 91.11 ORG 58.17 54.06 56.04 PERCENT 85.71 85.71 85.71 PERSON 86.27 65.67 74.58 OVERALL 73 68.45 70.65
CARDINAL 74.07 64.52 68.97 DATE 66.67 57.47 61.73 GPE 76.14 73.63 74.86 LOC 68.09 60.38 64 NORP 75.31 84.72 79.74 ORDINAL 79.07 89.47 83.95 ORG 64.89 53.09 58.4 PERCENT 100 71.43 83.33 PERSON 88 69.84 77.88 OVERALL 71.71 64.04 67.66 CARDINAL 64.58 73.81 68.89 DATE 39.51 39.51 39.51 GPE 63.1 52.48 57.3 LOC 76.09 68.63 72.16 NORP 64.56 72.86 68.46 ORDINAL 69.44 80.65 74.63 ORG 64.85 46.79 54.36 PERCENT 100 100 100 PERSON 74.14 68.25 71.07 OVERALL 63.65 56.2 59.69
ENGLISH: GREEK: FRENCH:
PER LABEL RESULTS:
CARDINAL 64.58 73.81 68.89 DATE 39.51 39.51 39.51 GPE 63.1 52.48 57.3 LOC 76.09 68.63 72.16 NORP 64.56 72.86 68.46 ORDINAL 69.44 80.65 74.63 ORG 64.85 46.79 54.36 PERCENT 100 100 100 PERSON 74.14 68.25 71.07 OVERALL 63.65 56.2 59.69
GREEK: DATE label includes expressions with structure not easy to define such as : daily, five years ago, the last five months etc Conclusion: Alignment error may increase for NEs that depend more on language structure → Classification error may increase as well
CARDINAL 64.58 73.81 68.89 DATE 39.51 39.51 39.51 GPE 63.1 52.48 57.3 LOC 76.09 68.63 72.16 NORP 64.56 72.86 68.46 ORDINAL 69.44 80.65 74.63 ORG 64.85 46.79 54.36 PERCENT 100 100 100 PERSON 74.14 68.25 71.07 OVERALL 63.65 56.2 59.69 CARDINAL 66.67 76.19 71.11 DATE 77.33 72.5 74.84 GPE 83.53 70.3 76.34 LOC 84.78 76.47 80.41 NORP 70.89 80 75.17 ORDINAL 66.67 83.87 74.29 ORG 84.77 59.64 70.02 PERCENT 100 100 100 PERSON 79.31 73.02 76.03 OVERALL 74.96 79.34 77.09
Exact matches: Partial matches:
Confusion matrix (Exact matches):
O CARD DATE GPE LOC NORP ORD ORG PERC PERS O 99.44 0.03 0.06 0.03 0.04 0.03 0.36 0.01 0.01 CARD 22.62 76.19 1.19 DATE 18.5 3.47 76.3 1.73 GPE 8.59 74.22 0.78 0.78 12.5 3.13 LOC 12.16 2.7 67.57 6.76 10.81 NORP 1.45 1.45 1.45 2.9 86.96 4.35 1.45 ORD 4.55 2.27 93.18 ORG 29.17 0.73 0.44 3.48 66.18 PERC 100 PERS 13.92 6.33 12.66 67.09
The Fisher report states… Mr Fisher explained….
Revisiting the Confusion matrix (Exact matches):
O CARD DATE GPE LOC NORP ORD ORG PERC PERS O 99.44 0.03 0.06 0.03 0.04 0.03 0.36 0.01 0.01 CARD 22.62 76.19 1.19 DATE 18.5 3.47 76.3 1.73 GPE 8.59 74.22 0.78 0.78 12.5 3.13 LOC 12.16 2.7 67.57 6.76 10.81 NORP 1.45 1.45 1.45 2.9 86.96 4.35 1.45 ORD 4.55 2.27 93.18 ORG 29.17 0.73 0.44 3.48 66.18 PERC 100 PERS 13.92 6.33 12.66 67.09
Identification evaluation:
Precision Recall Fscore ENGLISH 77.01 72.01 74.42 Exact Match 91.47 86.22 88.77 Partial Match FRENCH 76.91 68.77 72.61 Exact Match 95.1 83.47 88.9 Partial Match GREEK 68.8 60.91 64.62 Exact Match 92.01 79.34 85.21 Partial Match
Significant Boundary Identification Error
Incorrect Boundary examples:
Questions:
phase? → Hard to evaluate
makes sense to focus on partial matches? → Depends on the application
○ Overall satisfactory results with Parallel Corpora and alignment ■ French : less than 3% underperformance (compared to english) ■ Greek: significantly lower : 10% underperformance (compared to english) ○ Still room for improvement ■ perhaps combine alignment methods ■ use different translation lexicons/methods ■ experiment with different POS taggers and lemmatisers (Greek)
○ Less satisfactory results ■ Most significant improvement with Instance selection. ■ Improving methods do not combine in an additive way ■ Could be useful as a preprocessing step but there is room for improvement ○ It seems that fully unsupervised domain adaptation is not an easy task, but compromising with a semi-supervised idea could help ■ Exploit deep architectures ■ Active-learning : Supply correct examples ■ Experiment more with external knowledge sources?
[1] Massimiliano Ciaramita and Yasemin Altun. Named-entity recognition in novel domains with external lexical knowledge. in advances in structured learning for text and speech processing workshop, 2005. [2] Thierry Poibeau and Leila Kosseim. Proper name extraction from non-journalistic
[3] Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (CONLL '03) [4] Nadeau, David, and Satoshi Sekine. "A survey of named entity recognition and classification." Lingvisticae Investigationes 30.1 (2007).
CoNLL 2003: (English) 59.61% [3] MUC-6: (English) 21% [4]
A baseline SL method that is often proposed consists of tagging words of a test corpus when they are annotated as entities in the training corpus. The performance of the baseline system depends on the vocabulary transfer, which is the proportion of words, without repetitions, appearing in both training and testing corpus.
They report a transfer of 21%, with as much as 42% of location names being repeated but only 17% of organizations and 13% of person names. Vocabulary transfer is a good indicator of the recall (number of entities identified over the total number of entities) of the baseline system but is a pessimistic measure since some entities are frequently repeated in Documents.
recall of 76% for locations, 49% for organizations and 26% for persons with precision ranging from 70% to 90%. Whitelaw and Patrick (2003) report consistent results on MUC-7 for the aggregated enamex class. For the three enamex types together, the precision of recognition is 76% and the recall is 48%. Reference: Nadeau, David, and Satoshi Sekine. "A survey of named entity recognition and classification." Lingvisticae
Investigationes 30.1 (2007).
Precision Recall Fscore CARDINAL 20.44 70.89 31.73 DATE 39.58 64.04 48.93 GPE 40.91 74.23 52.75 LOC 63.79 69.81 66.67 NORP 65.75 71.64 68.57 ORDINAL 84.21 72.73 78.05 ORG 40.28 30.74 34.87 PERCENT 83.33 71.43 76.92 PERSON 23.68 13.24 16.98 OVERALL 39.39 51.21 44.53 Recall Precision Fscore CARDINAL 29.38 77.5 42.61 DATE 58.57 91.11 71.3 GPE 42.37 77.32 54.74 LOC 75.86 83.02 79.28 NORP 65.33 72.06 68.53 ORDINAL 84.21 72.73 78.05 ORG 93.33 49.47 64.67 PERCENT 85.71 85.71 85.71 PERSON 28.95 16.42 20.95 OVERALL 48.31 65.86 55.73
Exact matches Partial matches
Recall Precision Fscore CARDINAL 50 14.52 22.5 DATE 95.65 25.29 40 GPE 88 24.18 37.93 LOC 97.44 73.08 83.52 NORP ORDINAL 100 5.26 10 ORG 58.33 28.21 38.02 PERCENT 100 71.43 83.33 PERSON 34.29 18.75 24.24 OVERALL 68.12 27.92 39.61 Precision Recall Fscore CARDINAL 22.22 6.45 10 DATE 56.52 14.94 23.64 GPE 56 15.38 24.14 LOC 50 37.74 43.01 NORP ORDINAL 100 5.26 10 ORG 1.49 0.73 0.98 PERCENT 100 71.43 83.33 PERSON 25.71 14.06 18.18 Overall 24.47 9.22 13.4
Exact matches Partial matches
Precision Recall Fscore CARDINAL 71.43 23.26 35.09 DATE 44.44 14.81 22.22 GPE LOC NORP ORDINAL ORG PERCENT 100 71.43 83.33 PERSON 75 14.06 23.68 OVERALL 38.71 4.95 8.77 Recall Precision Fscore CARDINAL 71.43 23.26 35.09 DATE 92.59 31.25 46.73 GPE LOC NORP ORDINAL ORG 5.71 0.71 1.27 PERCENT 100 71.43 83.33 PERSON 100 18.75 31.58 OVERALL 59.34 8.24 14.47
Exact matches Partial matches
Domain adaptation:
Language adaptation: