NERank: Bringing Order to Named Entities from Texts Chengyu Wang 1 , - - PowerPoint PPT Presentation

nerank bringing order to named entities from texts
SMART_READER_LITE
LIVE PREVIEW

NERank: Bringing Order to Named Entities from Texts Chengyu Wang 1 , - - PowerPoint PPT Presentation

NERank: Bringing Order to Named Entities from Texts Chengyu Wang 1 , Rong Zhang 1 , Xiaofeng He 1 , Guomin Zhou 2 , Aoying Zhou 1 1) Institute for Data Science and Engineering, East China Normal University 2) Zhejiang Police College Outline


slide-1
SLIDE 1

NERank: Bringing Order to Named Entities from Texts

Chengyu Wang1, Rong Zhang1, Xiaofeng He1, Guomin Zhou2, Aoying Zhou1

1) Institute for Data Science and Engineering,

East China Normal University

2) Zhejiang Police College

slide-2
SLIDE 2

Outline

  • Introduction
  • Problem Statement
  • Proposed Approach
  • Experiments
  • Conclusion

2

slide-3
SLIDE 3

Entity Ranking

  • Ranking entities from texts

– Input: a text collection – Output: a ranked order of named entities

  • Why entity ranking?

– Entity-oriented Web search

  • given a query, retrieve a list of entities from relevant documents

– Web semantification

  • add semantic tags to Web documents

– Knowledge base population

  • extract and rank entities and then link them to knowledge bases

3

slide-4
SLIDE 4

Outline

  • Introduction
  • Problem Statement
  • Proposed Approach
  • Experiments
  • Conclusion

4

slide-5
SLIDE 5

Problem Statement

  • Given a document collection 𝐸 and a normalized

named entity collection 𝐹 detected from 𝐸, the goal is to give each entity 𝑓 ∈ 𝐹 a rank 𝑠(𝑓) to denote the relative importance such that

– 0 ≤ 𝑠 𝑓 ≤ 1 – ∑

𝑠 𝑓

.∈/

= 1

5

slide-6
SLIDE 6

General Framework

6

slide-7
SLIDE 7

Outline

  • Introduction
  • Problem Statement
  • Proposed Approach
  • Experiments
  • Conclusion

7

slide-8
SLIDE 8

Topical Tripartite Graph Modeling

  • Topics in Egypt Revolution
  • TTG construction

8

slide-9
SLIDE 9

Prior Topic Rank Estimation Three Quality Metrics

  • Probabilities derived from TTG modeling

– 𝜄2,4: probability of topic 𝑢

4 in document 𝑒2

– 𝜒 82,4: probability of normalized entity 𝑓

4 in topic 𝑢2

  • Quality metrics

– Prior probability 𝑞𝑠 𝑢2 = 1 |𝐸| ; 𝜄2,4

|<| 4=>

– Entity richness 𝑓𝑠 𝑢2 = 1 𝑎.@ ; 𝜒 82,4

|/| 4=>

– Topic specificity 𝑢𝑡 𝑢2 = B 0, (𝑞𝑠 𝑢2 < 𝜁)

> EFG ∑

𝜄2,4 logK 𝜄2,4

< 4=>

(𝑞𝑠 𝑢2 ≥ 𝜁)

9

slide-10
SLIDE 10

Prior Topic Rank Estimation Ranking Function

  • Linear ranking function

𝑠

M 𝑢2 = 𝑋O P 𝐺(𝑢2)

– 𝐺 𝑢2 =< 𝑞𝑠 𝑢2 ,𝑓𝑠 𝑢2 ,𝑢𝑡 𝑢2 > – ∑ 𝑥2

2

= 1

  • Parameter learning

– For two topics 𝑢2 and 𝑢4, if 𝑢2 is a more important topic than 𝑢4, we have 𝑠M 𝑢2 > 𝑠M 𝑢

4

– Optimization objective: 𝑋

K K + 𝐷 P ∑

𝜊2,4

2,4

– Constraints: 𝑋O P 𝐺 𝑢2 − 𝑋O P 𝐺 𝑢4 ≥ 1 − 𝜊2,4 – Train a linear SVM classifier to learn the weights

10

slide-11
SLIDE 11

Meta-Path Constrained Random Walk Algorithm

11

  • Initialization

– 𝑠 𝑢2 = 𝑠M 𝑢2

  • Probability propagation

– Following TDT (Topic-Doc-Topic) meta path (with prob. 𝛽 > 0)

𝑢2

YZ,[

Y

\,[

]\∈^

𝑒4

Y

[,\

→ 𝑢`

– Following TET (Topic-Entity-Topic) meta path (with prob. 𝛾 > 0) 𝑢2

b cZ,[ ∑ b cZ,\

d\∈e

𝑓

4 b c\,[ ∑ b c f,[

Ff∈g

𝑢` – Random jump (with prob. 1 − 𝛽 − 𝛾 > 0)

slide-12
SLIDE 12

Proof of Convergence (1)

  • Update rule of NERank

𝑈

i = 𝛽 P Θk OΘ P 𝑈il> + 𝛾 P Φ

noΦ nk

O P 𝑈il> + (1 − 𝛽 − 𝛾)𝑈M

  • Non-recursive form of NERank

𝑈

i = 𝑁i𝑈M + (1 − 𝛽 − 𝛾) ; 𝑁2𝑈M il> 2=M

– where 𝑁 = 𝛽 P Θk

OΘ + 𝛾 P Φ

noΦ nk

O

  • Matrix limit of 𝑈

i

– lim

i→s𝑈 i = lim i→s𝑁i𝑈M + (1 − 𝛽 − 𝛾) lim i→s∑

𝑁2𝑈M

il> 2=M

– lim

i→s𝑁i𝑈M = 0 (because Θk OΘ and Φ

noΦ nk

O are transition matrices

with 0 < 𝛽 + 𝛾<1) – lim

i→s∑

𝑁2𝑈M

il> 2=M

= (𝐽 − 𝑁)l>𝑈M

12

slide-13
SLIDE 13

Proof of Convergence (2)

  • Matrix limit of 𝑈

i

lim

i→s𝑈 i = (1 − 𝛽 − 𝛾)(𝐽 − 𝑁)l>𝑈 M

  • Close form of 𝑈

i

𝑈∗ = (1 − 𝛽 − 𝛾)(𝐽 − 𝛽 P Θk

OΘ + 𝛾 P Φ

8 oΦ 8 k

O)l>𝑈M

  • Close form of 𝐹i

𝐹∗ = (1 − 𝛽 − 𝛾)Φ nk

O(𝐽 − 𝛽 P Θk OΘ + 𝛾 P Φ

noΦ nk

O)l>𝑈 M

13

slide-14
SLIDE 14

Outline

  • Introduction
  • Problem Statement
  • Proposed Approach
  • Experiments
  • Conclusion

14

slide-15
SLIDE 15

Experiments (1)

  • Datasets

– 50 newswire collections from TimelineData and CrisisData, each related to an international event – Example events: Egypt Revolution, Iraq War, BP Oil Spill, etc.

  • Hyper-parameter settings

15

slide-16
SLIDE 16

Experiments (2)

  • Comparative study

– Baselines: TF-IDF, TextRank, LexRank and Kim et al. – Variants of our approaches: NERankUni and NERankα=0

16

slide-17
SLIDE 17

Experiments (3)

  • Case studies

17

slide-18
SLIDE 18

Outline

  • Introduction
  • Problem Statement
  • Proposed Approach
  • Experiments
  • Conclusion

18

slide-19
SLIDE 19

Conclusion

  • NERank

– Effective to rank named entities in documents with little human intervention

  • Future work

– A general framework for entity ranking from different types of texts (i.e., documents, tweets, etc.) – A complete benchmark for evaluating entity ranking

19

slide-20
SLIDE 20

Thanks!

Questions & Answers