Event Phase Extraction and Summarization
Chengyu Wang1, Rong Zhang1, Xiaofeng He1, Guomin Zhou2, Aoying Zhou1
1) Institute for Data Science and Engineering,
East China Normal University
2) Zhejiang Police College
Event Phase Extraction and Summarization Chengyu Wang 1 , Rong Zhang - - PowerPoint PPT Presentation
Event Phase Extraction and Summarization Chengyu Wang 1 , Rong Zhang 1 , Xiaofeng He 1 , Guomin Zhou 2 , Aoying Zhou 1 1) Institute for Data Science and Engineering, East China Normal University 2) Zhejiang Police College Outline Introduction
1) Institute for Data Science and Engineering,
East China Normal University
2) Zhejiang Police College
2
– Model an single event as multiple event phases – Each event phase relates to a single development period of a long, complicated event.
3
Egypt Revolution
https://en.wikipedia.org/wiki/Egyptian_revolution_of_2011
– Input: a collection of news articles w.r.t. the same event – Event phase extraction: cluster news articles into different event phases – Event phase summarization: select top-k news headlines as the event phase summary for each event phase
– Graph-based representation of news articles: Temporal Content Coherence Graph (TCCG) – A structural clustering algorithm to partition news articles into event phases: EPCluster – News headline ranking and selection: vertex-reinforced random walk process
4
5
– ℎ": news headline – 𝑢": publication time – 𝑡": the sentence collection of news contents
1
– A collection of 𝑙 news headline and publication time pairs
– Input: a news collection 𝐸 – Output: a collection of 𝑂 event phase summaries 𝑸 = {𝑄
5}5/0 6
– The number 𝑂 is not pre-defined.
6
7
8
– Topic level similarity: Jansen-Shannon divergence between topic distributions 𝐸78 𝜄" :𝜄
5
= 𝐸;< 𝜄" :𝜄̅ + 𝐸;< 𝜄
5 :𝜄̅
2 – Entity level similarity: Tanimoto coefficient
𝑈𝐷 𝐷",𝐷
5 =
𝐷"
C D 𝐷 5
𝐷" E + 𝐷
5 E − 𝐷" C D 𝐷 5
– Content coherence score 𝑥H 𝑒",𝑒5 = 𝛽 1 − 𝐸78 𝜄" :𝜄
5
+ (1 − 𝛽)𝑈𝐷 𝐷",𝐷
5
9
– Use Hamming kernel to map the publication time gap to a real number in [0,1] ∆𝑢",5 = 𝑢" − 𝑢5 𝑥L 𝑒",𝑒5 = M 1 2(1 + cos ∆𝑢",5 D 𝜌 𝜏 ), 𝑦 < 0 0, 𝑦 ≥ 0
10
– Parameter: 𝑁𝑗𝑜𝑄𝑢𝑡 – Core Object – Border Object – Noise Object
11
Temporal influence Content coherence 𝑥H 𝑒", 𝑒
5
> 𝜈0 𝑥L 𝑒",𝑒
5
> 𝜈E 𝑁𝑗𝑜𝑄𝑢𝑡 = 3
– Use a classifier to filter out “small” clusters that do not correspond to an actual event phase
– Article quantity 𝑂 𝐷" =
^_ ` ×100%
– Time interval 𝑈 𝐷" = 𝑢cde
"
− 𝑢f
"
– Pairwise topic similarity 𝐵𝑈𝑇 𝐷" = 1 −
E ∑ `jk lmnlo
pm,po∈r_
^_ D( ^_ s0)
– Pairwise entity similarity 𝐵𝐹𝑇 𝐷" =
E ∑ C^ ^m,^o
pm,po∈r_
^_ D( ^_ s0)
0vwxyDz(r_)
12
13
– Assign each news article in an event phase an “informative-ness” rank value
articles in an event phase
| D 𝑥H 𝑒c,𝑒{ D 𝑥L 𝑒c,𝑒{
𝑈{ = 𝑆{𝑆{ ⋯𝑆{ 𝑁{v0 = 𝜇𝑈{𝑁{ + (1 − 𝜇)𝑁f
𝑆{v0 = 𝜇𝑁{v0𝑆{ + (1 − 𝜇)𝑆f
14
– Select 𝑙 news articles from 𝐷" (denoted as 𝑇") to generate the event phase summary – Optimization problem
8_⊂^_
𝑆 𝑇" = ∑ 𝑠(𝑒5)
…†∈8_
– Algorithm
– A greedy algorithm with approximation ratio 1 − 0
w
15
16
17
– VSMCluster: KMeans using word features of TF-IDF weights – TopicCluster: KMeans using topic distributions based on LDA – SCAN: structural clustering algorithm for network partitioning – EPCluster-C: EPCluster without postprocessing
– Our method EPCluster is effective for event phase extraction.
18
– Random: selects news articles randomly – Longest: selects news articles with longest headlines – Tran et al., Chieu et al.: timeline generation methods – Our Method (PageRank): the variant of our method
– Evaluate the relevance of news headlines based on gold-standard event summaries – Experimental results
19
20
21
22