Introduction Our Model Experiments Ambiguity Conclusion Appendix
Learning to Recognize Discontiguous Entities Aldrian Obaja Muis and - - PowerPoint PPT Presentation
Learning to Recognize Discontiguous Entities Aldrian Obaja Muis and - - PowerPoint PPT Presentation
Introduction Our Model Experiments Ambiguity Conclusion Appendix Learning to Recognize Discontiguous Entities Aldrian Obaja Muis and Wei Lu Singapore University of Technology and Design aldrian muis@sutd.edu.sg luwei@sutd.edu.sg
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Introduction
2 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Works in Entity Recognition
Assuming non-overlapping and contiguous entities:
line1 line2 line1 line2 line1 line2 line1 line2
3 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Works in Entity Recognition
Assuming non-overlapping and contiguous entities:
Mostly using BIO/BILOU tagset
line1 line2 line1 line2 line1 line2 line1 line2
3 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Works in Entity Recognition
Assuming non-overlapping and contiguous entities:
Mostly using BIO/BILOU tagset
Allow overlaps/nesting but still assume contiguous:
line1 line2 line1 line2 line1 line2 line1 line2
3 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Works in Entity Recognition
Assuming non-overlapping and contiguous entities:
Mostly using BIO/BILOU tagset
Allow overlaps/nesting but still assume contiguous:
1
Tag n-grams instead of words (Byrne. 2007)1
1Kate Byrne (2007). “Nested Named Entity Recognition in Historical Archive
Text”. In: IEEE ICSC 2007. IEEE Computer Society, pp. 589–596 line1 line2 line1 line2 line1 line2
3 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Works in Entity Recognition
Assuming non-overlapping and contiguous entities:
Mostly using BIO/BILOU tagset
Allow overlaps/nesting but still assume contiguous:
1
Tag n-grams instead of words (Byrne. 2007)1
2
Tag in multiple layers (Alex, Haddow, and Grover. 2007)2
1Kate Byrne (2007). “Nested Named Entity Recognition in Historical Archive
Text”. In: IEEE ICSC 2007. IEEE Computer Society, pp. 589–596
2Beatrice Alex, Barry Haddow, and Claire Grover (2007). “Recognising Nested
Named Entities in Biomedical Text”. In: BioNLP Workshop 2007. June, pp. 65–72 line1 line2 line1 line2
3 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Works in Entity Recognition
Assuming non-overlapping and contiguous entities:
Mostly using BIO/BILOU tagset
Allow overlaps/nesting but still assume contiguous:
1
Tag n-grams instead of words (Byrne. 2007)1
2
Tag in multiple layers (Alex, Haddow, and Grover. 2007)2
3
Treat as parsing task (Finkel and Manning. 2009)3
1Kate Byrne (2007). “Nested Named Entity Recognition in Historical Archive
Text”. In: IEEE ICSC 2007. IEEE Computer Society, pp. 589–596
2Beatrice Alex, Barry Haddow, and Claire Grover (2007). “Recognising Nested
Named Entities in Biomedical Text”. In: BioNLP Workshop 2007. June, pp. 65–72
3Jenny Rose Finkel and Christopher D. Manning (2009). “Nested named entity
recognition”. In: Proc. of EMNLP 2009. Vol. 1, pp. 141–150 line1 line2
3 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Works in Entity Recognition
Assuming non-overlapping and contiguous entities:
Mostly using BIO/BILOU tagset
Allow overlaps/nesting but still assume contiguous:
1
Tag n-grams instead of words (Byrne. 2007)1
2
Tag in multiple layers (Alex, Haddow, and Grover. 2007)2
3
Treat as parsing task (Finkel and Manning. 2009)3
4
Use mention hypergraph (Lu and Roth. 2015)4
1Kate Byrne (2007). “Nested Named Entity Recognition in Historical Archive
Text”. In: IEEE ICSC 2007. IEEE Computer Society, pp. 589–596
2Beatrice Alex, Barry Haddow, and Claire Grover (2007). “Recognising Nested
Named Entities in Biomedical Text”. In: BioNLP Workshop 2007. June, pp. 65–72
3Jenny Rose Finkel and Christopher D. Manning (2009). “Nested named entity
recognition”. In: Proc. of EMNLP 2009. Vol. 1, pp. 141–150
4Wei Lu and Dan Roth (2015). “Joint Mention Extraction and Classification with
Mention Hypergraphs”. In: Proc. of EMNLP 2015, pp. 857–867
3 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Works in Entity Recognition
Assuming non-overlapping and contiguous entities:
Mostly using BIO/BILOU tagset
Allow overlaps/nesting but still assume contiguous:
1
Tag n-grams instead of words (Byrne. 2007)1
2
Tag in multiple layers (Alex, Haddow, and Grover. 2007)2
3
Treat as parsing task (Finkel and Manning. 2009)3
4
Use mention hypergraph (Lu and Roth. 2015)4
How about discontiguous entities?
1Kate Byrne (2007). “Nested Named Entity Recognition in Historical Archive
Text”. In: IEEE ICSC 2007. IEEE Computer Society, pp. 589–596
2Beatrice Alex, Barry Haddow, and Claire Grover (2007). “Recognising Nested
Named Entities in Biomedical Text”. In: BioNLP Workshop 2007. June, pp. 65–72
3Jenny Rose Finkel and Christopher D. Manning (2009). “Nested named entity
recognition”. In: Proc. of EMNLP 2009. Vol. 1, pp. 141–150
4Wei Lu and Dan Roth (2015). “Joint Mention Extraction and Classification with
Mention Hypergraphs”. In: Proc. of EMNLP 2015, pp. 857–867
3 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Discontiguous Entity Recognition
Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) 4 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Discontiguous Entity Recognition
Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac. 4 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Discontiguous Entity Recognition
Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.
1
hiatal hernia
4 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Discontiguous Entity Recognition
Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.
1
hiatal hernia
2
laceration . . . esophagus
4 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Discontiguous Entity Recognition
Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.
1
hiatal hernia
2
laceration . . . esophagus
3
blood in stomach
4 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Discontiguous Entity Recognition
Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.
1
hiatal hernia
2
laceration . . . esophagus
3
blood in stomach
4
stomach . . . lac
4 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Discontiguous Entity Recognition
Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.
1
hiatal hernia
2
laceration . . . esophagus
3
blood in stomach
4
stomach . . . lac
Infarctions either water shed or embolic 4 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Discontiguous Entity Recognition
Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.
1
hiatal hernia
2
laceration . . . esophagus
3
blood in stomach
4
stomach . . . lac
Infarctions either water shed or embolic
1
Infarctions
4 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Discontiguous Entity Recognition
Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.
1
hiatal hernia
2
laceration . . . esophagus
3
blood in stomach
4
stomach . . . lac
Infarctions either water shed or embolic
1
Infarctions
2
Infarctions . . . water shed
4 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Discontiguous Entity Recognition
Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.
1
hiatal hernia
2
laceration . . . esophagus
3
blood in stomach
4
stomach . . . lac
Infarctions either water shed or embolic
1
Infarctions
2
Infarctions . . . water shed
3
Infarctions . . . embolic
4 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Approaches
In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:
line1 line2 line1 line2 line1 line2
5 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Approaches
In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:
1 Pathak et al. (2014)5 5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for
Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014 line1 line2 line1 line2
5 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Approaches
In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:
1 Pathak et al. (2014)5
Standard NER using BIO tagset pipelined with SVM to combine the spans
5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for
Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014 line1 line2 line1 line2
5 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Approaches
In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:
1 Pathak et al. (2014)5
Standard NER using BIO tagset pipelined with SVM to combine the spans
2 Zhang et al. (2014)6 (best team) 5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for
Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014
6Yaoyun Zhang et al. (2014). “UTH CCB: A report for SemEval 2014 – Task 7
Analysis of Clinical Text”. In: SemEval 2014 line1 line2
5 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Approaches
In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:
1 Pathak et al. (2014)5
Standard NER using BIO tagset pipelined with SVM to combine the spans
2 Zhang et al. (2014)6 (best team)
Use extended BIO tagset coupled with heuristics7
5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for
Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014
6Yaoyun Zhang et al. (2014). “UTH CCB: A report for SemEval 2014 – Task 7
Analysis of Clinical Text”. In: SemEval 2014
7Buzhou Tang et al. (2013). “Recognizing and Encoding Discorder Concepts in
Clinical Text using Machine Learning and Vector Space”. In: ShARe/CLEF Eval. Lab
5 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Approaches
In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:
1 Pathak et al. (2014)5
Standard NER using BIO tagset pipelined with SVM to combine the spans
2 Zhang et al. (2014)6 (best team)
Use extended BIO tagset coupled with heuristics7 B, I for contiguous tokens
5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for
Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014
6Yaoyun Zhang et al. (2014). “UTH CCB: A report for SemEval 2014 – Task 7
Analysis of Clinical Text”. In: SemEval 2014
7Buzhou Tang et al. (2013). “Recognizing and Encoding Discorder Concepts in
Clinical Text using Machine Learning and Vector Space”. In: ShARe/CLEF Eval. Lab
5 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Approaches
In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:
1 Pathak et al. (2014)5
Standard NER using BIO tagset pipelined with SVM to combine the spans
2 Zhang et al. (2014)6 (best team)
Use extended BIO tagset coupled with heuristics7 B, I for contiguous tokens BD, ID for discontiguous tokens
5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for
Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014
6Yaoyun Zhang et al. (2014). “UTH CCB: A report for SemEval 2014 – Task 7
Analysis of Clinical Text”. In: SemEval 2014
7Buzhou Tang et al. (2013). “Recognizing and Encoding Discorder Concepts in
Clinical Text using Machine Learning and Vector Space”. In: ShARe/CLEF Eval. Lab
5 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Previous Approaches
In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:
1 Pathak et al. (2014)5
Standard NER using BIO tagset pipelined with SVM to combine the spans
2 Zhang et al. (2014)6 (best team)
Use extended BIO tagset coupled with heuristics7 B, I for contiguous tokens BD, ID for discontiguous tokens BH, IH for overlapping tokens
5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for
Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014
6Yaoyun Zhang et al. (2014). “UTH CCB: A report for SemEval 2014 – Task 7
Analysis of Clinical Text”. In: SemEval 2014
7Buzhou Tang et al. (2013). “Recognizing and Encoding Discorder Concepts in
Clinical Text using Machine Learning and Vector Space”. In: ShARe/CLEF Eval. Lab
5 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Encoding in Model of Zhang et al. Infarctions either water shed
- r embolic
Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”
6 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Encoding in Model of Zhang et al. [Infarctions]1 either [water shed]1 or embolic
1 Infarctions ... water shed
Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”
6 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Encoding in Model of Zhang et al. [[Infarctions]1]2 either [water shed]1 or [embolic]2
1 Infarctions ... water shed 2 Infarctions ... embolic
Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”
6 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Encoding in Model of Zhang et al. [[[Infarctions]1]2]3 either [water shed]1 or [embolic]2
1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions
Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”
6 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Encoding in Model of Zhang et al. [[[Infarctions]1]2]3 either [water shed]1 or [embolic]2 O O
1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions
Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”
6 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Encoding in Model of Zhang et al. [[[Infarctions]1]2]3 either [water shed]1 or [embolic]2 BH O O
1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions
Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”
6 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Encoding in Model of Zhang et al. [[[Infarctions]1]2]3 either [water shed]1 or [embolic]2 BH O BD ID O
1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions
Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”
6 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Encoding in Model of Zhang et al. [[[Infarctions]1]2]3 either [water shed]1 or [embolic]2 BH O BD ID O BD
1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions
Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”
6 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Encoding in Model of Zhang et al. [[[Infarctions]1]2]3 either [water shed]1 or [embolic]2 BH O BD ID O BD
1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions
This is the canonical encoding of this particular set of entities
Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”
6 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Decoding in Model of Zhang et al. Infarctions either water shed
- r embolic
BH O BD ID O BD
7 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Decoding in Model of Zhang et al. Infarctions either [water shed]1 or [embolic]1 BH O BD ID O BD
7 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Decoding in Model of Zhang et al. Infarctions either water shed
- r embolic
BH O BD ID O BD
7 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Decoding in Model of Zhang et al. [Infarctions]1 either [water shed]1 or embolic BH O BD ID O BD
1 Infarctions ... water shed
7 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Decoding in Model of Zhang et al. [[Infarctions]1]2 either [water shed]1 or [embolic]2 BH O BD ID O BD
1 Infarctions ... water shed 2 Infarctions ... embolic
7 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Decoding in Model of Zhang et al. [[Infarctions]1]2 either [water shed]1 or [embolic]2 BH O BD ID O BD
1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions (?)
7 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Decoding in Model of Zhang et al. [[Infarctions]1]2 either [water shed]1 or [embolic]2 BH O BD ID O BD
1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions (?)
Ambiguous!
7 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Number of Entity Combinations
In a sentence with n words, there are:
1 2n − 1 possible discontiguous entities
8 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Number of Entity Combinations
In a sentence with n words, there are:
1 2n − 1 possible discontiguous entities 2 22n−1 possible combinations of discontiguous entities*
8 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Discontiguous Entities Recognition
1 How to efficiently model these discontiguous (and possibly
- verlapping) entities?
9 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Discontiguous Entities Recognition
1 How to efficiently model these discontiguous (and possibly
- verlapping) entities?
2 How to compare the ambiguity between models for
discontiguous entities?
9 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Contributions
In this paper, we contributed:
1 A new hypergraph-based model to handle discontiguous
entities better
10 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Contributions
In this paper, we contributed:
1 A new hypergraph-based model to handle discontiguous
entities better
2 A simple theoretical framework to compare ambiguity
between models
10 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 B0 B0 B0 B0 B0 O1 O1 O1 O1 O1 O1 B1 B1 B1 B1 B1 B1 X X X X X X X X X X X X X X X X X X
Infarctions either water shed
- r
embolic
11 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 O1 O1 O1 O1 B1 B1 B1 X X X X X X X X
Infarctions either water shed
- r
embolic Infarctions water shed embolic
Infarctions Infarctions . . . water shed Infarctions . . . embolic
12 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
Key ideas:
1 Build a hypergraph that can encode any entity combination
13 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
Key ideas:
1 Build a hypergraph that can encode any entity combination 2 For any sentence annotated with entities, there would be a
unique subgraph that represents it (canonical encoding) 13 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
Key ideas:
1 Build a hypergraph that can encode any entity combination 2 For any sentence annotated with entities, there would be a
unique subgraph that represents it (canonical encoding)
3 Each entity is represented as a path in the entity-encoded
hypergraph, where the B-nodes indicate which tokens are part
- f the entity
13 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 O1 O1 O1 O1 B1 B1 B1 X X X X X X X X
Infarctions either water shed
- r
embolic
Infarctions Infarctions . . . water shed Infarctions . . . embolic
14 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T X X X X X
Infarctions either water shed
- r
embolic
14 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 X X X X X
Infarctions either water shed
- r
embolic
14 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 X X X X X X
Infarctions either water shed
- r
embolic Infarctions
Infarctions
14 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 O1 X X X X X X
Infarctions either water shed
- r
embolic
Infarctions
14 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 O1 B1 X X X X X X
Infarctions either water shed
- r
embolic
Infarctions
14 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 O1 B1 B1 X X X X X X
Infarctions either water shed
- r
embolic
Infarctions
14 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 O1 B1 B1 X X X X X X X
Infarctions either water shed
- r
embolic Infarctions water shed
Infarctions Infarctions . . . water shed
14 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 O1 O1 B1 B1 X X X X X X X
Infarctions either water shed
- r
embolic
Infarctions Infarctions . . . water shed
14 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 O1 O1 O1 B1 B1 X X X X X X X
Infarctions either water shed
- r
embolic
Infarctions Infarctions . . . water shed
14 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 O1 O1 O1 O1 B1 B1 X X X X X X X
Infarctions either water shed
- r
embolic
Infarctions Infarctions . . . water shed
14 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 O1 O1 O1 O1 B1 B1 B1 X X X X X X X
Infarctions either water shed
- r
embolic
Infarctions Infarctions . . . water shed
14 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
A A A A A A E E E E E E T T T T T T B0 O1 O1 O1 O1 B1 B1 B1 X X X X X X X X
Infarctions either water shed
- r
embolic Infarctions embolic
Infarctions Infarctions . . . water shed Infarctions . . . embolic
14 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
Training and predicting:
1 Training: Maximize conditional log-likelihood of training data
15 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Our Hypergraph-based Model
Training and predicting:
1 Training: Maximize conditional log-likelihood of training data 2 Predicting: Use Viterbi to find the highest-scoring subgraph
15 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Experiments
16 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Experimental Setup
Dataset taken from SemEval 2014 Task 7, taking sentences containing discontiguous entities 17 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Experimental Setup
Dataset taken from SemEval 2014 Task 7, taking sentences containing discontiguous entities Two setups for training set: “Discontiguous” (smaller) and “Original” (larger) 17 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Experimental Setup
Dataset taken from SemEval 2014 Task 7, taking sentences containing discontiguous entities Two setups for training set: “Discontiguous” (smaller) and “Original” (larger) Models optimized for F1-score in dev set by varying λ 17 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Experimental Setup
Dataset taken from SemEval 2014 Task 7, taking sentences containing discontiguous entities Two setups for training set: “Discontiguous” (smaller) and “Original” (larger) Models optimized for F1-score in dev set by varying λ Features followed Tang et al. (2013): words, POS, Brown cluster, semantic category, . . . 17 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Results Using Smaller Training Set
Precision Recall F1-score 20 40 60 80 100
54.70 41.20 47.00 15.20 44.90 22.70 76.90 40.10 52.70 76.00 40.50 52.80
Score (%) Li-Enh Li-All Sh-Enh Sh-All 18 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Results Using Larger Training Set
Precision Recall F1-score 20 40 60 80 100
64.10 46.50 53.90 52.80 49.40 51.10 73.90 49.10 59.00 73.40 49.50 59.10
Score (%) Li-Enh Li-All Sh-Enh Sh-All 19 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity
20 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity
One encoding can have multiple interpretations (set of entities)
A A A A A A E E E E E E T T T T T T B0 B0 B0 O1 O1 B1 X X X X X X
apparent [atrial [pacemaker]2 artifact]1 without [capture]2
atrial pacemaker artifact pacemaker artifact pacemaker . . . capture atrial pacemaker . . . capture
Infarctions either water shed
- r
embolic BH O BD ID O BD
1
atrial pacemaker artifact
2
pacemaker . . . capture
1
pacemaker artifact
2
atrial pacemaker . . . capture
1
infarctions . . . water shed
2
infarctions . . . embolic
1
infarctions
2
infarctions . . . water shed
3
infarctions . . . embolic
21 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity
The models need further processing after prediction to generate one set of entities 22 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity
The models need further processing after prediction to generate one set of entities We compare two heuristics: 22 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity
The models need further processing after prediction to generate one set of entities We compare two heuristics:
1
All: Return the union of all possible interpretations
22 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity
The models need further processing after prediction to generate one set of entities We compare two heuristics:
1
All: Return the union of all possible interpretations
2
Enough: Return one possible interpretation
22 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity
23 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity
Definition Ambiguity level A(M) of model M is the average number of interpretations of each canonical encoding in the model 23 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
How many canonical encodings do the models have? 24 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
How many canonical encodings do the models have? For the baseline model: 24 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
How many canonical encodings do the models have? For the baseline model:
There are 7 possible tags per word (B, I, BD, ID, BH, IH, O)
24 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
How many canonical encodings do the models have? For the baseline model:
There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n
24 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
How many canonical encodings do the models have? For the baseline model:
There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n
24 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
How many canonical encodings do the models have? For the baseline model:
There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n
For our hypergraph-based model: 24 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
How many canonical encodings do the models have? For the baseline model:
There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n
For our hypergraph-based model:
Number of canonical encoding = number of subgraphs
24 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
How many canonical encodings do the models have? For the baseline model:
There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n
For our hypergraph-based model:
Number of canonical encoding = number of subgraphs Q: How to calculate the number of subgraphs?
24 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
A: Use dynamic programming on combination of nodes 25 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
A: Use dynamic programming on combination of nodes
- Fig. 1: Simplified graph to illustrate
subgraph counting
25 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
A: Use dynamic programming on combination of nodes
- Fig. 1: Simplified graph to illustrate
subgraph counting
- Fig. 2: State transitions
25 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
A: Use dynamic programming on combination of nodes
- Fig. 1: Simplified graph to illustrate
subgraph counting
- Fig. 2: State transitions
f11(n) = 2 ∗ f11(n − 1) + f01(n − 1) (1) 25 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
How many canonical encodings do the models have? For the baseline:
There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n
For our hypergraph-based model:
Number of canonical encoding = number of subgraphs
26 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
How many canonical encodings do the models have? For the baseline:
There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n
For our hypergraph-based model:
Number of canonical encoding = number of subgraphs After more calculations: MSh(n) > C · 210n
26 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
How many canonical encodings do the models have? For the baseline:
There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n
For our hypergraph-based model:
Number of canonical encoding = number of subgraphs After more calculations: MSh(n) > C · 210n
So our model is less ambiguous compared to the baseline model 26 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Empirical Ambiguity
Discontiguous Original Prec Err Rec Err Prec Err Rec Err Li-all 63.66% 0.00%* 23.81% 0.00%*
(3,478/5,463) (0/1985) (3,484/14,632) (0/11,147)
Sh-all 1.73% 0.00%* 0.35% 0.00%*
(35/2,020) (0/1985) (39/11,186) (0/11,147)
Li-enh 2.74% 3.82% 0.52% 0.90%
(54/1,969) (76/1,991) (58/11,123) (101/11,166)
Sh-enh 1.21% 1.46% 0.25% 0.38%
(24/1,986) (29/1,991) (28/11,152) (42/11,166)
Table 1: Precision and recall errors (%) of each model in the “Discontiguous” and “Original” training data when given the gold output
- structures. Lower numbers are better.
27 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Empirical Ambiguity
Discontiguous Original Prec Err Rec Err Prec Err Rec Err Li-all 63.66% 0.00%* 23.81% 0.00%*
(3,478/5,463) (0/1985) (3,484/14,632) (0/11,147)
Sh-all 1.73% 0.00%* 0.35% 0.00%*
(35/2,020) (0/1985) (39/11,186) (0/11,147)
Li-enh 2.74% 3.82% 0.52% 0.90%
(54/1,969) (76/1,991) (58/11,123) (101/11,166)
Sh-enh 1.21% 1.46% 0.25% 0.38%
(24/1,986) (29/1,991) (28/11,152) (42/11,166)
Table 1: Precision and recall errors (%) of each model in the “Discontiguous” and “Original” training data when given the gold output
- structures. Lower numbers are better.
27 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Empirical Ambiguity
Discontiguous Original Prec Err Rec Err Prec Err Rec Err Li-all 63.66% 0.00%* 23.81% 0.00%*
(3,478/5,463) (0/1985) (3,484/14,632) (0/11,147)
Sh-all 1.73% 0.00%* 0.35% 0.00%*
(35/2,020) (0/1985) (39/11,186) (0/11,147)
Li-enh 2.74% 3.82% 0.52% 0.90%
(54/1,969) (76/1,991) (58/11,123) (101/11,166)
Sh-enh 1.21% 1.46% 0.25% 0.38%
(24/1,986) (29/1,991) (28/11,152) (42/11,166)
Table 1: Precision and recall errors (%) of each model in the “Discontiguous” and “Original” training data when given the gold output
- structures. Lower numbers are better.
27 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Empirical Ambiguity
Discontiguous Original Prec Err Rec Err Prec Err Rec Err Li-all 63.66% 0.00%* 23.81% 0.00%*
(3,478/5,463) (0/1985) (3,484/14,632) (0/11,147)
Sh-all 1.73% 0.00%* 0.35% 0.00%*
(35/2,020) (0/1985) (39/11,186) (0/11,147)
Li-enh 2.74% 3.82% 0.52% 0.90%
(54/1,969) (76/1,991) (58/11,123) (101/11,166)
Sh-enh 1.21% 1.46% 0.25% 0.38%
(24/1,986) (29/1,991) (28/11,152) (42/11,166)
Table 1: Precision and recall errors (%) of each model in the “Discontiguous” and “Original” training data when given the gold output
- structures. Lower numbers are better.
27 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Conclusion
28 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Conclusion
The hypergraph-based model we proposed is better in recognizing discontiguous and overlapping spans compared to a strong baseline 29 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Conclusion
The hypergraph-based model we proposed is better in recognizing discontiguous and overlapping spans compared to a strong baseline Our theoretical analysis (by counting encodings) shows that
- ur model is less ambiguous in representing discontiguous
entities, which matches the result of experiments in ambiguity 29 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Future Work
Explore applications of discontiguous spans recognition for
- ther tasks
30 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Future Work
Explore applications of discontiguous spans recognition for
- ther tasks
Explore more extensions of this model similar to semi-Markov CRF 30 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Future Work
Explore applications of discontiguous spans recognition for
- ther tasks
Explore more extensions of this model similar to semi-Markov CRF Explore other training procedures (SSVM, max-margin) 30 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Thank You
Code available at: http://statnlp.org/research/ie/ Aldrian Obaja Muis and Wei Lu
Singapore University of Technology and Design
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity in Our Model
A A A A A A E E E E E E T T T T T T X X X X
apparent [atrial [pacemaker]2 artifact]1 without [capture]2
31 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity in Our Model
A A A A A A E E E E E E T T T T T T B0 B0 B0 X X X X X
apparent [atrial [pacemaker]2 artifact]1 without [capture]2
atrial pacemaker artifact
31 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity in Our Model
A A A A A A E E E E E E T T T T T T B0 B0 B0 O1 O1 B1 X X X X X X
apparent [atrial [pacemaker]2 artifact]1 without [capture]2
atrial pacemaker artifact pacemaker . . . capture
31 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity in Our Model
A A A A A A E E E E E E T T T T T T B0 B0 B0 O1 O1 B1 X X X X X X
apparent [atrial [pacemaker]2 artifact]1 without [capture]2
atrial pacemaker artifact pacemaker artifact pacemaker . . . capture
31 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity in Our Model
A A A A A A E E E E E E T T T T T T B0 B0 B0 O1 O1 B1 X X X X X X
apparent [atrial [pacemaker]2 artifact]1 without [capture]2
atrial pacemaker artifact pacemaker artifact pacemaker . . . capture atrial pacemaker . . . capture
31 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Double Counting in Naive DP
32 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Double Counting in Naive DP
32 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Double Counting in Naive DP
32 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Double Counting in Naive DP
32 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Double Counting in Naive DP
32 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Counting Number of Encodings
n MLi(n) MSh(n) N(n) 1 2 2 21 = 2 2 8 8 23 = 8 3 46 80 27 = 128 4 < 2401 3584 215 = 32768 5 < 16807 533504 231 = 2147483648
Table 2: The number of possible encodings for small values of n
33 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity Level
Definition Relative ambiguity Ar(M1, M2) between models M1 and M2 is the ratio of log of number of canonical encodings: Ar(M1, M2) = lim
n→∞
log n
i=1 MM2(i)
log n
i=1 MM1(i)
where MM(i) is the number of encodings in model M for a sequence of length i. 34 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix
Ambiguity Level
Definition Relative ambiguity Ar(M1, M2) between models M1 and M2 is the ratio of log of number of canonical encodings: Ar(M1, M2) = lim
n→∞
log n
i=1 MM2(i)
log n
i=1 MM1(i)
where MM(i) is the number of encodings in model M for a sequence of length i. Results in Ar(Li, Sh)≥ lim
n→∞
log C +10n log 2 3n log 2 = 10 3 >1 34 / 37
Introduction Our Model Experiments Ambiguity Conclusion Appendix