Event Phase Extraction and Summarization Chengyu Wang 1 , Rong Zhang - - PowerPoint PPT Presentation

event phase extraction and summarization
SMART_READER_LITE
LIVE PREVIEW

Event Phase Extraction and Summarization Chengyu Wang 1 , Rong Zhang - - PowerPoint PPT Presentation

Event Phase Extraction and Summarization Chengyu Wang 1 , Rong Zhang 1 , Xiaofeng He 1 , Guomin Zhou 2 , Aoying Zhou 1 1) Institute for Data Science and Engineering, East China Normal University 2) Zhejiang Police College Outline Introduction


slide-1
SLIDE 1

Event Phase Extraction and Summarization

Chengyu Wang1, Rong Zhang1, Xiaofeng He1, Guomin Zhou2, Aoying Zhou1

1) Institute for Data Science and Engineering,

East China Normal University

2) Zhejiang Police College

slide-2
SLIDE 2

Outline

  • Introduction
  • Problem Statement
  • Proposed Approach
  • Experiments
  • Conclusion

2

slide-3
SLIDE 3

Event Phase Extraction and Summarization (1)

  • Event phase

– Model an single event as multiple event phases – Each event phase relates to a single development period of a long, complicated event.

  • Example: Egypt Revolution

3

Egypt Revolution

  • 1. Protests against Hosni Mubarak
  • 2. Egypt under the Supreme Council
  • 3. Egypt under President Morsi
  • 4. Protests against President Morsi

https://en.wikipedia.org/wiki/Egyptian_revolution_of_2011

slide-4
SLIDE 4

Event Phase Extraction and Summarization (2)

  • Event phase extraction and summarization

– Input: a collection of news articles w.r.t. the same event – Event phase extraction: cluster news articles into different event phases – Event phase summarization: select top-k news headlines as the event phase summary for each event phase

  • Techniques

– Graph-based representation of news articles: Temporal Content Coherence Graph (TCCG) – A structural clustering algorithm to partition news articles into event phases: EPCluster – News headline ranking and selection: vertex-reinforced random walk process

4

slide-5
SLIDE 5

Outline

  • Introduction
  • Problem Statement
  • Proposed Approach
  • Experiments
  • Conclusion

5

slide-6
SLIDE 6

Problem Statement

  • News article 𝑒" = (ℎ", 𝑢", 𝑡")

– ℎ": news headline – 𝑢": publication time – 𝑡": the sentence collection of news contents

  • News collection 𝐸 = {𝑒"}
  • Event phase summary 𝑄 = {(ℎ", 𝑢")}"/0

1

– A collection of 𝑙 news headline and publication time pairs

  • Event phase extraction and summarization

– Input: a news collection 𝐸 – Output: a collection of 𝑂 event phase summaries 𝑸 = {𝑄

5}5/0 6

– The number 𝑂 is not pre-defined.

6

slide-7
SLIDE 7

Outline

  • Introduction
  • Problem Statement
  • Proposed Approach
  • Experiments
  • Conclusion

7

slide-8
SLIDE 8

Framework of Event Phase Extraction

8

slide-9
SLIDE 9

Semantic Relatedness (1)

  • Content coherence

– Topic level similarity: Jansen-Shannon divergence between topic distributions 𝐸78 𝜄" :𝜄

5

= 𝐸;< 𝜄" :𝜄̅ + 𝐸;< 𝜄

5 :𝜄̅

2 – Entity level similarity: Tanimoto coefficient

  • 𝐷": count vector of key entities in 𝑒"

𝑈𝐷 𝐷",𝐷

5 =

𝐷"

C D 𝐷 5

𝐷" E + 𝐷

5 E − 𝐷" C D 𝐷 5

– Content coherence score 𝑥H 𝑒",𝑒5 = 𝛽 1 − 𝐸78 𝜄" :𝜄

5

+ (1 − 𝛽)𝑈𝐷 𝐷",𝐷

5

9

slide-10
SLIDE 10

Semantic Relatedness (2)

  • Temporal influence

– Use Hamming kernel to map the publication time gap to a real number in [0,1] ∆𝑢",5 = 𝑢" − 𝑢5 𝑥L 𝑒",𝑒5 = M 1 2(1 + cos ∆𝑢",5 D 𝜌 𝜏 ), 𝑦 < 0 0, 𝑦 ≥ 0

10

slide-11
SLIDE 11

Structural Clustering

  • Temporal Content Coherence Graph (TCCG)
  • EPCluster: Structural clustering algorithm

– Parameter: 𝑁𝑗𝑜𝑄𝑢𝑡 – Core Object – Border Object – Noise Object

11

Temporal influence Content coherence 𝑥H 𝑒", 𝑒

5

> 𝜈0 𝑥L 𝑒",𝑒

5

> 𝜈E 𝑁𝑗𝑜𝑄𝑢𝑡 = 3

slide-12
SLIDE 12

Cluster Postprocessing

  • Goal

– Use a classifier to filter out “small” clusters that do not correspond to an actual event phase

  • Features

– Article quantity 𝑂 𝐷" =

^_ ` ×100%

– Time interval 𝑈 𝐷" = 𝑢cde

"

− 𝑢f

"

– Pairwise topic similarity 𝐵𝑈𝑇 𝐷" = 1 −

E ∑ `jk lmnlo

pm,po∈r_

^_ D( ^_ s0)

– Pairwise entity similarity 𝐵𝐹𝑇 𝐷" =

E ∑ C^ ^m,^o

pm,po∈r_

^_ D( ^_ s0)

  • Prediction function 𝑔 𝐷" =

0vwxyDz(r_)

12

slide-13
SLIDE 13

News Article Ranking

13

  • Goal

– Assign each news article in an event phase an “informative-ness” rank value

  • Vertex-reinforced random walk process
  • Graph construction: build a complete graph where the node set is news

articles in an event phase

  • Prior transition probability 𝑁(c,{) =

| D 𝑥H 𝑒c,𝑒{ D 𝑥L 𝑒c,𝑒{

  • Rank propagation process
  • Transition matrix update:

𝑈{ = 𝑆{𝑆{ ⋯𝑆{ 𝑁{v0 = 𝜇𝑈{𝑁{ + (1 − 𝜇)𝑁f

  • Rank update:

𝑆{v0 = 𝜇𝑁{v0𝑆{ + (1 − 𝜇)𝑆f

slide-14
SLIDE 14

Event Phase Summary Generation

14

  • New article selection problem

– Select 𝑙 news articles from 𝐷" (denoted as 𝑇") to generate the event phase summary – Optimization problem

  • Objective function: max

8_⊂^_

𝑆 𝑇" = ∑ 𝑠(𝑒5)

…†∈8_

  • Subject to: 𝑇" = 𝑙, ∀𝑒c, 𝑒{ ∈ 𝑇", 𝑥H 𝑒c,𝑒{ ≤ 𝜈0, 𝑥L 𝑒c,𝑒{ ≤ 𝜈E

– Algorithm

– A greedy algorithm with approximation ratio 1 − 0

w

slide-15
SLIDE 15

Outline

  • Introduction
  • Problem Statement
  • Proposed Approach
  • Experiments
  • Conclusion

15

slide-16
SLIDE 16

Experiments (1)

  • Datasets

– Four English news datasets regarding long-span recent armed conflicts – News source: 24 news agencies, e.g., Associated Press, Reuters, Guardian, etc.

16

slide-17
SLIDE 17

Experiments (2)

  • Parameter Tuning

– Pairwise judgment

  • Testing set: news article pairs 𝑈" = {(𝑒c,𝑒{)}
  • Manually label whether each pair is related to the same event phase

– Evaluation metrics: Precision, Recall and F-measure – Experimental results

  • 𝜈0 = 0.4, 𝜈E = 0.5,𝑁𝑗𝑜𝑄𝑢𝑡 = 10

17

slide-18
SLIDE 18

Experiments (3)

  • Baselines

– VSMCluster: KMeans using word features of TF-IDF weights – TopicCluster: KMeans using topic distributions based on LDA – SCAN: structural clustering algorithm for network partitioning – EPCluster-C: EPCluster without postprocessing

  • Results

– Our method EPCluster is effective for event phase extraction.

18

slide-19
SLIDE 19

Experiments (4)

  • Baselines

– Random: selects news articles randomly – Longest: selects news articles with longest headlines – Tran et al., Chieu et al.: timeline generation methods – Our Method (PageRank): the variant of our method

  • Evaluation

– Evaluate the relevance of news headlines based on gold-standard event summaries – Experimental results

19

slide-20
SLIDE 20

Case Study

20

slide-21
SLIDE 21

Outline

  • Introduction
  • Problem Statement
  • Proposed Approach
  • Experiments
  • Conclusion

21

slide-22
SLIDE 22

Conclusion

  • Event Phase Extraction and Summarization

– A structural clustering algorithm for event phase extraction based on TCCG – Summary generation via news article ranking and rank

  • ptimization
  • Future work

– Improving the performance of document summarization and timeline generation when event phases are considered

22

slide-23
SLIDE 23

Thanks!

Questions & Answers