hits at tac 2015 entity discovery and linking
play

HITS at TAC 2015 Entity Discovery and Linking Benjamin Heinzerling - PowerPoint PPT Presentation

HITS at TAC 2015 Entity Discovery and Linking Benjamin Heinzerling 1 , 2 and Michael Strube 2 1 AIPHES, 2 Heidelberg Institute for Theoretical Studies Joint Global Disambiguation and NIL Clustering Our previous years system (Fahrni et


  1. HITS at TAC 2015 Entity Discovery and Linking Benjamin Heinzerling 1 , 2 and Michael Strube 2 1 AIPHES, 2 Heidelberg Institute for Theoretical Studies

  2. Joint Global Disambiguation and NIL Clustering • Our previous years’ system (Fahrni et al., 2013, Judea et al., 2014). • Jointly performs global disambiguation and NIL clustering using a Markov Logic Network (MLN). • General, not trained on TAC data. • Performed consistently well in various evaluations. • Gets some easy decisions wrong. • Mention detection not joint.

  3. Goals • Better integrate mention detection. • Get the easy decisions right. • See how far the new KB gets us. • English EDL with mention detection.

  4. Architecture: Options • Joint approach: Strong interaction, slow, development difficult. • Pipeline of tasks: Weak interaction, fast, ordering difficult. • Pipeline of decisions: Some interaction, fast, ordering feasible.

  5. Architecture Segmentation, Segmentation, Segmentation, POS POS POS NER NER NER Mention Detection Mention Detection Linking / NIL class. Linking / NIL class. NIL clustering Mention Mention Detection Detection Linking / NIL class. Mention Detection Joint Mention Linking / NIL class. Linking Detection, Mention Detection and NIL Linking, Linking / NIL class. classification and NIL NER Clustering Joint Linking NIL clus- & Clustering tering Mention Detection Linking Post- Post- Post- Processing Processing Processing Too simple! Happy middle. Too complex!

  6. Sieves • High-precision linking: • Unambiguous CrossWiki mentions • Entity type checking • Sense label mismatch filter • Person name matching • Salient semantic paths • General and TAC-specific post-processing: • Dominant sense fallback • Country adjectivals mapping • Media organization filter

  7. Salient Semantic Paths • Previous work maximizes relatedness between candidate senses of two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura .

  8. Salient Semantic Paths • Previous work maximizes relatedness between candidate senses of two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura . Nigeria (Africa) - Pierre Daura

  9. Salient Semantic Paths • Previous work maximizes relatedness between candidate senses of two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura . Nigeria (Africa) - Pierre Daura Nigeria (Jazz Album) - Pierre Daura

  10. Salient Semantic Paths • Previous work maximizes relatedness between candidate senses of two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura . Nigeria (Africa) - Pierre Daura Nigeria (Jazz Album) - Pierre Daura Nigeria (Africa) /location/location/contains Daura (Nigeria)

  11. Salient Semantic Paths • Previous work maximizes relatedness between candidate senses of two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura . Nigeria (Africa) - Pierre Daura Nigeria (Jazz Album) - Pierre Daura Nigeria (Africa) /location/location/contains Daura (Nigeria) Nigeria (Jazz Album) - Daura (Nigeria) • Slow on Freebase.

  12. Salient Semantic Paths • Our approach: Given one linked mention, follow salient semantic paths through the knowledge base graph. • Faster than looking for paths between candidate senses. /location/location/contains + Nigeria became Africas largest economy . . . . . . . town of Daura. /people/person/children /people/person/children Netanyahu ’s sons, Avner and Yair , were chosen . . . . /common/topic/alias Think of it as Oscar Pistorius on steriods. I couldn’t help but think of the blade runner.

  13. Results: Linking 80 Sieves + MLN (HITS2) Sieves only (HITS1) 75 Best strong all match 70 . 9 70 . 7 70 . 3 70 64 . 0 64 . 0 65 63 . 6 60 58 . 8 58 . 8 57 . 8 55 Precision Recall F1

  14. Results: Clustering 80 Sieves + MLN (HITS2) 76 . 5 Sieves only (HITS1) 74 . 8 74 . 7 75 Best mention ceaf 70 68 . 4 68 . 2 67 . 1 65 62 . 2 62 . 2 61 . 0 60 55 Precision Recall F1

  15. Nominal Coreference? Hilary Clinton ’s latest book . . . the author . . . Netanyahu will be busy being a leader, unlike Obama the golfer ! • Can common noun surface forms (NOM) be reached in the Kb? • Tested for paths up to length two. • Only 30 percent of non-NIL NOMs reachable.

  16. Limitations • Fails for nominal coreference. • Hand-picked, genre-specific semantic paths. • Only works for non-NILs.

  17. Conclusions • State-of-the-art monolingual linking and clustering. • Simple symbolic approach: large-scale resources, string match, KB queries. • Global disambiguation has only a small effect on linking performance. • Joint disambiguation and NIL clustering improves clustering performance.

  18. Future Work • Ordering and granularity of decisions. • Top three systems’ linking performance similar: All solving the same easy problems? • Hard: Metonymy, cohyperonymal lists, nominal coreference. • Symbolic approaches unlikely to solve these. • Need to combine with distributional methods.

  19. Thank You

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend