HITS at TAC 2015 Entity Discovery and Linking Benjamin Heinzerling - - PowerPoint PPT Presentation

hits at tac 2015 entity discovery and linking
SMART_READER_LITE
LIVE PREVIEW

HITS at TAC 2015 Entity Discovery and Linking Benjamin Heinzerling - - PowerPoint PPT Presentation

HITS at TAC 2015 Entity Discovery and Linking Benjamin Heinzerling 1 , 2 and Michael Strube 2 1 AIPHES, 2 Heidelberg Institute for Theoretical Studies Joint Global Disambiguation and NIL Clustering Our previous years system (Fahrni et


slide-1
SLIDE 1

HITS at TAC 2015 Entity Discovery and Linking

Benjamin Heinzerling1,2 and Michael Strube2

1AIPHES, 2Heidelberg Institute for Theoretical Studies

slide-2
SLIDE 2

Joint Global Disambiguation and NIL Clustering

  • Our previous years’ system (Fahrni et al., 2013, Judea et al.,

2014).

  • Jointly performs global disambiguation and NIL clustering using

a Markov Logic Network (MLN).

  • General, not trained on TAC data.
  • Performed consistently well in various evaluations.
  • Gets some easy decisions wrong.
  • Mention detection not joint.
slide-3
SLIDE 3

Goals

  • Better integrate mention detection.
  • Get the easy decisions right.
  • See how far the new KB gets us.
  • English EDL with mention detection.
slide-4
SLIDE 4

Architecture: Options

  • Joint approach: Strong interaction, slow, development difficult.
  • Pipeline of tasks: Weak interaction, fast, ordering difficult.
  • Pipeline of decisions: Some interaction, fast, ordering feasible.
slide-5
SLIDE 5

Architecture

Segmentation, POS NER Mention Detection Linking and NIL classification NIL clus- tering Post- Processing

Too simple!

Segmentation, POS

NER Mention Detection Linking / NIL class. Mention Detection Linking / NIL class. NIL clustering Mention Detection Linking / NIL class. Mention Detection Linking / NIL class. Mention Detection Linking / NIL class. NER Joint Linking & Clustering Mention Detection Linking

Post- Processing

Happy middle.

Segmentation, POS NER Joint Mention Detection, Linking, and NIL Clustering Post- Processing

Too complex!

slide-6
SLIDE 6

Sieves

  • High-precision linking:
  • Unambiguous CrossWiki mentions
  • Entity type checking
  • Sense label mismatch filter
  • Person name matching
  • Salient semantic paths
  • General and TAC-specific post-processing:
  • Dominant sense fallback
  • Country adjectivals mapping
  • Media organization filter
slide-7
SLIDE 7

Salient Semantic Paths

  • Previous work maximizes relatedness between candidate senses of

two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura .

slide-8
SLIDE 8

Salient Semantic Paths

  • Previous work maximizes relatedness between candidate senses of

two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura . Nigeria (Africa)

  • Pierre Daura
slide-9
SLIDE 9

Salient Semantic Paths

  • Previous work maximizes relatedness between candidate senses of

two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura . Nigeria (Africa)

  • Pierre Daura

Nigeria (Jazz Album)

  • Pierre Daura
slide-10
SLIDE 10

Salient Semantic Paths

  • Previous work maximizes relatedness between candidate senses of

two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura . Nigeria (Africa)

  • Pierre Daura

Nigeria (Jazz Album)

  • Pierre Daura

Nigeria (Africa) /location/location/contains Daura (Nigeria)

slide-11
SLIDE 11

Salient Semantic Paths

  • Previous work maximizes relatedness between candidate senses of

two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura . Nigeria (Africa)

  • Pierre Daura

Nigeria (Jazz Album)

  • Pierre Daura

Nigeria (Africa) /location/location/contains Daura (Nigeria) Nigeria (Jazz Album)

  • Daura (Nigeria)
  • Slow on Freebase.
slide-12
SLIDE 12

Salient Semantic Paths

  • Our approach: Given one linked mention, follow salient semantic

paths through the knowledge base graph.

  • Faster than looking for paths between candidate senses.

Nigeria became Africas largest economy . . . . . . . town of Daura.

/location/location/contains +

Netanyahu ’s sons, Avner and Yair , were chosen . . . .

/people/person/children /people/person/children

Think of it as Oscar Pistorius on steriods. I couldn’t help but think of the blade runner.

/common/topic/alias

slide-13
SLIDE 13

Results: Linking

Precision Recall F1 55 60 65 70 75 80 70.3 58.8 64.0 70.7 57.8 63.6 70.9 58.8 64.0 strong all match Sieves + MLN (HITS2) Sieves only (HITS1) Best

slide-14
SLIDE 14

Results: Clustering

Precision Recall F1 55 60 65 70 75 80 74.8 62.2 68.2 74.7 61.0 67.1 76.5 62.2 68.4 mention ceaf Sieves + MLN (HITS2) Sieves only (HITS1) Best

slide-15
SLIDE 15

Nominal Coreference?

Hilary Clinton ’s latest book . . . the author . . . Netanyahu will be busy being a leader, unlike Obama the golfer !

  • Can common noun surface forms (NOM) be reached in the Kb?
  • Tested for paths up to length two.
  • Only 30 percent of non-NIL NOMs reachable.
slide-16
SLIDE 16

Limitations

  • Fails for nominal coreference.
  • Hand-picked, genre-specific semantic paths.
  • Only works for non-NILs.
slide-17
SLIDE 17

Conclusions

  • State-of-the-art monolingual linking and clustering.
  • Simple symbolic approach: large-scale resources, string match,

KB queries.

  • Global disambiguation has only a small effect on linking

performance.

  • Joint disambiguation and NIL clustering improves clustering

performance.

slide-18
SLIDE 18

Future Work

  • Ordering and granularity of decisions.
  • Top three systems’ linking performance similar: All solving the

same easy problems?

  • Hard: Metonymy, cohyperonymal lists, nominal coreference.
  • Symbolic approaches unlikely to solve these.
  • Need to combine with distributional methods.
slide-19
SLIDE 19

Thank You