Ranking Daily News Events Speaker: Shih-Han Lo Advisor: Professor - - PowerPoint PPT Presentation

ranking daily news events
SMART_READER_LITE
LIVE PREVIEW

Ranking Daily News Events Speaker: Shih-Han Lo Advisor: Professor - - PowerPoint PPT Presentation

Modeling Event Importance for Ranking Daily News Events Speaker: Shih-Han Lo Advisor: Professor Jia-Ling Koh Author: Vinay Setty, Abhijit Anand, Arunav Mishra, Avishek Anand Date: 2017/03/21 Source: WSDM 17 1 Outline Introduction


slide-1
SLIDE 1

Speaker: Shih-Han Lo Advisor: Professor Jia-Ling Koh Author: Vinay Setty, Abhijit Anand, Arunav Mishra, Avishek Anand Date: 2017/03/21 Source: WSDM ’17

1

Modeling Event Importance for Ranking Daily News Events

slide-2
SLIDE 2

Outline

2

 Introduction  Method  Experiment  Conclusion

slide-3
SLIDE 3

Introduction

3

Business Insider Google News

slide-4
SLIDE 4

Introduction

4

Motivation

 The observation that both automated aggregation

and manual curation of news events need to solve two fundamental tasks:

 Mining news events  Modeling news importance

slide-5
SLIDE 5

Introduction

5

Goal

 Model the importance of wide variety of news

events reported by large number of news articles.

slide-6
SLIDE 6

Introduction

6

https://en.wikipedia.org/wiki/Portal:Current_events/April_2014

slide-7
SLIDE 7

Outline

7

 Introduction  Method  Experiment  Conclusion

slide-8
SLIDE 8

Method

8

Problem Definition

 News story

 𝑒 ∈ 𝒠 is a news article document.

 News event

 c, a cluster of stories associated with a news event.

 News topic, σ.  We approach the news ranking problem as a

Learning-to-Rank task, specifically SVMRank.

slide-9
SLIDE 9

Method

9

Mining Daily News Events

 First, we need to mine events from the news

collection.

 A bag of entities ℰ(𝑒)  A bag of shingles 𝒯(𝑒) (w-shingling, n-grams)

 We combine entities and shingles into a single

bag ℱ 𝑒 = ℰ 𝑒 ∪ 𝒯(𝑒). Then:

Frequency of unique entities

slide-10
SLIDE 10

Method

10

 Problem: Inability to accurately determine the

true number of events

 We resort to Locally Sensitive Hashing (LSH) with

min-wise independent permutations.

 Cluster cohesiveness:

slide-11
SLIDE 11

Method

11

Improved Popularity Estimation

 Improving Cluster Size Estimate  Maximum Sub-Cluster Density

 k, with ρk as the radius containing k nearest

neighbors of the centroid.

 Find a sub-cluster which maximizes k/ρk (= ψmax).  Effective size:

Cluster centroid Radius

slide-12
SLIDE 12

Method

12

 Source Diversity

 Collection bias: Relying only on structural features

may be misleading.

 Compute a diversity score for each cluster:

 Source Authority

 We extract all possible news citations and construct

a probability distribution based on their frequencies.

slide-13
SLIDE 13

Method

13

Historical Importance

 Cluster Chaining

 Previous day similarity:  The overall historical value for a chain initiated

from c is:

slide-14
SLIDE 14

Method

14

slide-15
SLIDE 15

Method

15

 Temporal Profile from Named Events

 Moving Window Language Model:  Moving Window Entity Overlap using the

disambiguated entities:

slide-16
SLIDE 16

Method

16

 Temporal Prior:  Finally, we compute historical significance on a day

t:

Frequency of edits

slide-17
SLIDE 17

Outline

17

 Introduction  Method  Experiment  Conclusion

slide-18
SLIDE 18

Experiment

18

Datasets

 Gdelt

 8 million stories.  Sep. 2013 – Aug. 2014 (365 days).  6000 sources from 167 different countries.

 Stics

 1.69 million stories.  Jan. 2014 – Jun. 2015 (545 days).  300 sources from 10 different countries.

slide-19
SLIDE 19

Experiment

19

Benchmark

 GTS

 We add the news stories referred in the WCEP

summaries into the input collection.

 Time Lag

 Within the 3 days window of the WCEP dates.

slide-20
SLIDE 20

Experiment

20

Ranking Results

slide-21
SLIDE 21

Outline

21

 Introduction  Method  Experiment  Conclusion

slide-22
SLIDE 22

Conclusion

22

 We introduced the problem of ranking a daily

batch of events for large heterogeneous news corpora.

 With the use of improved popularity and

historical features for events in a learning to rank framework we came up with an effective daily event ranking.