event phase extraction and summarization
play

Event Phase Extraction and Summarization Chengyu Wang 1 , Rong Zhang - PowerPoint PPT Presentation

Event Phase Extraction and Summarization Chengyu Wang 1 , Rong Zhang 1 , Xiaofeng He 1 , Guomin Zhou 2 , Aoying Zhou 1 1) Institute for Data Science and Engineering, East China Normal University 2) Zhejiang Police College Outline Introduction


  1. Event Phase Extraction and Summarization Chengyu Wang 1 , Rong Zhang 1 , Xiaofeng He 1 , Guomin Zhou 2 , Aoying Zhou 1 1) Institute for Data Science and Engineering, East China Normal University 2) Zhejiang Police College

  2. Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 2

  3. Event Phase Extraction and Summarization (1) • Event phase – Model an single event as multiple event phases – Each event phase relates to a single development period of a long, complicated event. • Example: Egypt Revolution 1. Protests against Hosni Mubarak 2. Egypt under the Supreme Council 3. Egypt under President Morsi 4. Protests against President Morsi Egypt Revolution https://en.wikipedia.org/wiki/Egyptian_revolution_of_2011 3

  4. Event Phase Extraction and Summarization (2) • Event phase extraction and summarization – Input: a collection of news articles w.r.t. the same event – Event phase extraction: cluster news articles into different event phases – Event phase summarization: select top-k news headlines as the event phase summary for each event phase • Techniques – Graph-based representation of news articles: Temporal Content Coherence Graph (TCCG) – A structural clustering algorithm to partition news articles into event phases: EPCluster – News headline ranking and selection: vertex-reinforced random walk process 4

  5. Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 5

  6. Problem Statement • News article 𝑒 " = (ℎ " , 𝑢 " , 𝑡 " ) – ℎ " : news headline – 𝑢 " : publication time – 𝑡 " : the sentence collection of news contents • News collection 𝐸 = {𝑒 " } 1 • Event phase summary 𝑄 = {(ℎ " , 𝑢 " )} "/0 – A collection of 𝑙 news headline and publication time pairs • Event phase extraction and summarization – Input: a news collection 𝐸 6 – Output: a collection of 𝑂 event phase summaries 𝑸 = {𝑄 5 } 5/0 – The number 𝑂 is not pre-defined. 6

  7. Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 7

  8. Framework of Event Phase Extraction 8

  9. Semantic Relatedness (1) • Content coherence – Topic level similarity: Jansen-Shannon divergence between topic distributions 𝐸 ;< 𝜄 " :𝜄̅ + 𝐸 ;< 𝜄 5 :𝜄̅ 𝐸 78 𝜄 " :𝜄 = 5 2 – Entity level similarity: Tanimoto coefficient • 𝐷 " : count vector of key entities in 𝑒 " C D 𝐷 𝐷 " 5 𝑈𝐷 𝐷 " ,𝐷 5 = E − 𝐷 " C D 𝐷 𝐷 " E + 𝐷 5 5 – Content coherence score 𝑥 H 𝑒 " ,𝑒 5 = 𝛽 1 − 𝐸 78 𝜄 " :𝜄 + (1 − 𝛽)𝑈𝐷 𝐷 " ,𝐷 5 5 9

  10. Semantic Relatedness (2) • Temporal influence – Use Hamming kernel to map the publication time gap to a real number in [0,1] ∆𝑢 ",5 = 𝑢 " − 𝑢 5 ∆𝑢 ",5 D 𝜌 1 2(1 + cos ), 𝑦 < 0 𝑥 L 𝑒 " ,𝑒 5 = M 𝜏 0, 𝑦 ≥ 0 10

  11. Structural Clustering • Temporal Content Coherence Graph (TCCG) Temporal influence Content 𝑥 H 𝑒 " , 𝑒 > 𝜈 0 5 coherence 𝑥 L 𝑒 " ,𝑒 > 𝜈 E 5 • EPCluster: Structural clustering algorithm – Parameter: 𝑁𝑗𝑜𝑄𝑢𝑡 – Core Object – Border Object – Noise Object 𝑁𝑗𝑜𝑄𝑢𝑡 = 3 11

  12. Cluster Postprocessing • Goal – Use a classifier to filter out “small” clusters that do not correspond to an actual event phase • Features ^ _ – Article quantity 𝑂 𝐷 " = ` ×100% " " – Time interval 𝑈 𝐷 " = 𝑢 cde − 𝑢 f E ∑ ` jk l m nl o pm,po∈r_ – Pairwise topic similarity 𝐵𝑈𝑇 𝐷 " = 1 − ^ _ D( ^ _ s0) E ∑ C^ ^ m ,^ o pm,po∈r_ – Pairwise entity similarity 𝐵𝐹𝑇 𝐷 " = ^ _ D( ^ _ s0) 0 • Prediction function 𝑔 𝐷 " = 0vw xyDz(r_) 12

  13. News Article Ranking • Goal – Assign each news article in an event phase an “informative-ness” rank value • Vertex-reinforced random walk process • Graph construction: build a complete graph where the node set is news articles in an event phase Prior transition probability 𝑁 (c,{) = 0 | D 𝑥 H 𝑒 c ,𝑒 { D 𝑥 L 𝑒 c ,𝑒 { • • Rank propagation process • Transition matrix update: 𝑈 { = 𝑆 { 𝑆 { ⋯𝑆 { 𝑁 {v0 = 𝜇𝑈 { 𝑁 { + (1 − 𝜇)𝑁 f • Rank update: 𝑆 {v0 = 𝜇𝑁 {v0 𝑆 { + (1 − 𝜇)𝑆 f 13

  14. Event Phase Summary Generation • New article selection problem – Select 𝑙 news articles from 𝐷 " (denoted as 𝑇 " ) to generate the event phase summary – Optimization problem 𝑆 𝑇 " = ∑ • Objective function: max 𝑠(𝑒 5 ) … † ∈8 _ 8 _ ⊂^ _ • Subject to: 𝑇 " = 𝑙 , ∀𝑒 c , 𝑒 { ∈ 𝑇 " , 𝑥 H 𝑒 c ,𝑒 { ≤ 𝜈 0 , 𝑥 L 𝑒 c ,𝑒 { ≤ 𝜈 E – Algorithm – A greedy algorithm with approximation ratio 1 − 0 w 14

  15. Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 15

  16. Experiments (1) • Datasets – Four English news datasets regarding long-span recent armed conflicts – News source: 24 news agencies, e.g., Associated Press, Reuters, Guardian, etc. 16

  17. Experiments (2) • Parameter Tuning – Pairwise judgment • Testing set: news article pairs 𝑈 " = {(𝑒 c ,𝑒 { )} • Manually label whether each pair is related to the same event phase – Evaluation metrics: Precision, Recall and F-measure – Experimental results • 𝜈 0 = 0.4, 𝜈 E = 0.5,𝑁𝑗𝑜𝑄𝑢𝑡 = 10 17

  18. Experiments (3) • Baselines – VSMCluster: KMeans using word features of TF-IDF weights – TopicCluster: KMeans using topic distributions based on LDA – SCAN: structural clustering algorithm for network partitioning – EPCluster-C: EPCluster without postprocessing • Results – Our method EPCluster is effective for event phase extraction. 18

  19. Experiments (4) • Baselines – Random: selects news articles randomly – Longest: selects news articles with longest headlines – Tran et al., Chieu et al.: timeline generation methods – Our Method (PageRank): the variant of our method • Evaluation – Evaluate the relevance of news headlines based on gold-standard event summaries – Experimental results 19

  20. Case Study 20

  21. Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 21

  22. Conclusion • Event Phase Extraction and Summarization – A structural clustering algorithm for event phase extraction based on TCCG – Summary generation via news article ranking and rank optimization • Future work – Improving the performance of document summarization and timeline generation when event phases are considered 22

  23. Thanks! Questions & Answers

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend