nerank bringing order to named entities from texts
play

NERank: Bringing Order to Named Entities from Texts Chengyu Wang 1 , - PowerPoint PPT Presentation

NERank: Bringing Order to Named Entities from Texts Chengyu Wang 1 , Rong Zhang 1 , Xiaofeng He 1 , Guomin Zhou 2 , Aoying Zhou 1 1) Institute for Data Science and Engineering, East China Normal University 2) Zhejiang Police College Outline


  1. NERank: Bringing Order to Named Entities from Texts Chengyu Wang 1 , Rong Zhang 1 , Xiaofeng He 1 , Guomin Zhou 2 , Aoying Zhou 1 1) Institute for Data Science and Engineering, East China Normal University 2) Zhejiang Police College

  2. Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 2

  3. Entity Ranking • Ranking entities from texts – Input: a text collection – Output: a ranked order of named entities • Why entity ranking? – Entity-oriented Web search • given a query, retrieve a list of entities from relevant documents – Web semantification • add semantic tags to Web documents – Knowledge base population • extract and rank entities and then link them to knowledge bases 3

  4. Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 4

  5. Problem Statement • Given a document collection 𝐸 and a normalized named entity collection 𝐹 detected from 𝐸 , the goal is to give each entity 𝑓 ∈ 𝐹 a rank 𝑠(𝑓) to denote the relative importance such that – 0 ≤ 𝑠 𝑓 ≤ 1 – ∑ = 1 𝑠 𝑓 .∈/ 5

  6. General Framework 6

  7. Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 7

  8. Topical Tripartite Graph Modeling • Topics in Egypt Revolution • TTG construction 8

  9. Prior Topic Rank Estimation Three Quality Metrics • Probabilities derived from TTG modeling – 𝜄 2,4 : probability of topic 𝑢 4 in document 𝑒 2 – 𝜒 8 2,4 : probability of normalized entity 𝑓 4 in topic 𝑢 2 • Quality metrics – Prior probability = 1 |<| |𝐸| ; 𝑞𝑠 𝑢 2 𝜄 2,4 4=> – Entity richness 𝑓𝑠 𝑢 2 = 1 |/| ; 𝜒 8 2,4 𝑎 .@ 4=> – Topic specificity 0, (𝑞𝑠 𝑢 2 < 𝜁) 𝑢𝑡 𝑢 2 = B > < E FG ∑ 𝜄 2,4 log K 𝜄 2,4 (𝑞𝑠 𝑢 2 ≥ 𝜁) 4=> 9

  10. Prior Topic Rank Estimation Ranking Function • Linear ranking function M 𝑢 2 = 𝑋 O P 𝐺(𝑢 2 ) 𝑠 – 𝐺 𝑢 2 =< 𝑞𝑠 𝑢 2 ,𝑓𝑠 𝑢 2 ,𝑢𝑡 𝑢 2 > – ∑ 𝑥 2 = 1 2 • Parameter learning – For two topics 𝑢 2 and 𝑢 4 , if 𝑢 2 is a more important topic than 𝑢 4 , we have 𝑠 M 𝑢 2 > 𝑠 M 𝑢 4 K + 𝐷 P ∑ – Optimization objective: 𝑋 𝜊 2,4 K 2,4 – Constraints: 𝑋 O P 𝐺 𝑢 2 − 𝑋 O P 𝐺 𝑢 4 ≥ 1 − 𝜊 2,4 – Train a linear SVM classifier to learn the weights 10

  11. Meta-Path Constrained Random Walk Algorithm • Initialization – 𝑠 𝑢 2 = 𝑠 M 𝑢 2 • Probability propagation – Following TDT (Topic-Doc-Topic) meta path (with prob. 𝛽 > 0 ) Y Z,[ ∑ Y ]\∈^ \,[ Y 𝑢 2 𝑒 4 → 𝑢 ` [,\ – Following TET (Topic-Entity-Topic) meta path (with prob. 𝛾 > 0 ) b c Z,[ b c \,[ ∑ ∑ b c Z,\ b c f,[ d\∈e Ff∈g 𝑢 2 𝑓 𝑢 ` 4 – Random jump (with prob. 1 − 𝛽 − 𝛾 > 0 ) 11

  12. Proof of Convergence (1) • Update rule of NERank O P 𝑈 il> + (1 − 𝛽 − 𝛾)𝑈 M n o Φ n k O Θ P 𝑈 il> + 𝛾 P Φ 𝑈 i = 𝛽 P Θ k • Non-recursive form of NERank il> i = 𝑁 i 𝑈 M + (1 − 𝛽 − 𝛾) ; 𝑁 2 𝑈 M 𝑈 2=M n o Φ n k O Θ + 𝛾 P Φ O – where 𝑁 = 𝛽 P Θ k • Matrix limit of 𝑈 i il> i→s 𝑁 i 𝑈 M + (1 − 𝛽 − 𝛾) lim 𝑁 2 𝑈 M i→s ∑ – i→s 𝑈 lim i = lim 2=M O are transition matrices n o Φ n k i→s 𝑁 i 𝑈 M = 0 (because Θ k O Θ and Φ – lim with 0 < 𝛽 + 𝛾<1) il> 𝑁 2 𝑈 M = (𝐽 − 𝑁) l> 𝑈 M i→s ∑ – lim 2=M 12

  13. Proof of Convergence (2) • Matrix limit of 𝑈 i i = (1 − 𝛽 − 𝛾)(𝐽 − 𝑁) l> 𝑈 i→s 𝑈 lim M • Close form of 𝑈 i 𝑈 ∗ = (1 − 𝛽 − 𝛾)(𝐽 − 𝛽 P Θ k O ) l> 𝑈 M O Θ + 𝛾 P Φ 8 o Φ 8 k • Close form of 𝐹 i 𝐹 ∗ = (1 − 𝛽 − 𝛾)Φ n k n o Φ n k O (𝐽 − 𝛽 P Θ k O Θ + 𝛾 P Φ O ) l> 𝑈 M 13

  14. Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 14

  15. Experiments (1) • Datasets – 50 newswire collections from TimelineData and CrisisData, each related to an international event – Example events: Egypt Revolution, Iraq War, BP Oil Spill, etc. • Hyper-parameter settings 15

  16. Experiments (2) • Comparative study – Baselines: TF-IDF, TextRank, LexRank and Kim et al. – Variants of our approaches: NERank Uni and NERank α =0 16

  17. Experiments (3) • Case studies 17

  18. Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 18

  19. Conclusion • NERank – Effective to rank named entities in documents with little human intervention • Future work – A general framework for entity ranking from different types of texts (i.e., documents, tweets, etc.) – A complete benchmark for evaluating entity ranking 19

  20. Thanks! Questions & Answers

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend