[PPT] - Ranking Related News Predictions Nattiya Kanhabua 1 , Roi Blanco 2 PowerPoint Presentation

SLIDE 1

Ranking Related News Predictions

Nattiya Kanhabua1, Roi Blanco2 and Michael Matthews2

1Norwegian University of Science and Tech., Norway 2Yahoo! Research, Barcelona, Spain

SIGIR’2011, Beijing

SLIDE 2

Ranking Related News Predictions Outline

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 3

Ranking Related News Predictions Outline

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 4

Ranking Related News Predictions Outline

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 5

Ranking Related News Predictions Outline

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 6

Ranking Related News Predictions Introduction Problem Statement

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 7

Ranking Related News Predictions Introduction Problem Statement

Problem statement

People are naturally curious about the future.

◮ How long will a war in the middle east last? ◮ What is the latest health care plan? ◮ What will happen to EU economies in next 5 years? ◮ What will be potential effects of climate changes?

Over 32% of 2.5M documents from Yahoo! News (July 2009 to July 2010) contain at least one prediction. A new task called ranking related news predictions.

◮ Retrieve predictions related to a news story in news archives. ◮ Rank them according to their relevance to the news story.

SLIDE 8

Ranking Related News Predictions Introduction Problem Statement

Problem statement

People are naturally curious about the future.

◮ How long will a war in the middle east last? ◮ What is the latest health care plan? ◮ What will happen to EU economies in next 5 years? ◮ What will be potential effects of climate changes?

Over 32% of 2.5M documents from Yahoo! News (July 2009 to July 2010) contain at least one prediction. A new task called ranking related news predictions.

◮ Retrieve predictions related to a news story in news archives. ◮ Rank them according to their relevance to the news story.

SLIDE 9

Ranking Related News Predictions Introduction Problem Statement

Problem statement

People are naturally curious about the future.

◮ How long will a war in the middle east last? ◮ What is the latest health care plan? ◮ What will happen to EU economies in next 5 years? ◮ What will be potential effects of climate changes?

Over 32% of 2.5M documents from Yahoo! News (July 2009 to July 2010) contain at least one prediction. A new task called ranking related news predictions.

◮ Retrieve predictions related to a news story in news archives. ◮ Rank them according to their relevance to the news story.

SLIDE 10

Ranking Related News Predictions Introduction Problem Statement

Related News Predictions

SLIDE 11

Ranking Related News Predictions Introduction Problem Statement

Related News Predictions

SLIDE 12

Ranking Related News Predictions Introduction Problem Statement

Related News Predictions

Query = <gas, emission, percent, european, global, climate>

SLIDE 13

Ranking Related News Predictions Introduction Related Work

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 14

Ranking Related News Predictions Introduction Related Work

Future-related Information Analyzing Tools

Recorded Future

Difference: a user must specify a query in advance using “predefined” entities.

SLIDE 15

Ranking Related News Predictions Introduction Related Work

Future-related Information Analyzing Tools

Yahoo’s Time Explorer

Difference: No ranking or performance evaluation is done.

SLIDE 16

Ranking Related News Predictions Introduction Related Work

Previous Work on Future Retrieval

R. Baeza-Yates. Searching the future. SIGIR’2005 Workshop
n Mathematical/Formal Methods in IR.

◮ Extract temporal expressions from news articles. ◮ Retrieve future information using a probabilistic model, i.e.,

multiplying term similarity and a time confidence.

◮ Only a small data set and a year granularity are used.

SLIDE 17

Ranking Related News Predictions Introduction Related Work

Previous Work on Future Retrieval

A. Jatowt et al. Supporting analysis of future-related

information in news archives and the web. JCDL ’2009.

◮ Extract future mentions from news snippets obtained from

search engines.

◮ Summarize and aggregate results using clustering methods. ◮ Not focus on relevance and ranking of future information.

SLIDE 18

Ranking Related News Predictions Introduction Contributions

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 19

Ranking Related News Predictions Introduction Contributions

Contributions

I. Formally define ranking related news predictions.
II. Four classes of features: term similarity, entity-based

similarity, topic similarity and temporal similarity.

III. Extensive evaluation using dataset with over 6000

judgments from the NYT Annotated Corpus.

SLIDE 20

Ranking Related News Predictions Task Definition System Architecture

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 21

Ranking Related News Predictions Task Definition System Architecture

System Architecture

Step 1: Document annotation.

◮ Extract temporal expressions

using time and event recognition.

◮ Normalize them to dates so they

can be anchored on a timeline.

◮ Output: sentences annotated

with named entities and dates, i.e., predictions.

SLIDE 22

Ranking Related News Predictions Task Definition System Architecture

System Architecture

Step 2: Retrieving predictions.

◮ Automatically generate a query

from a news article being read.

◮ Retrieve predictions that match

the query.

◮ Rank predictions by relevance. A

prediction is “relevant” if it is about the topics of the article.

SLIDE 23

Ranking Related News Predictions Task Definition Models

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 24

Ranking Related News Predictions Task Definition Models

Annotated Document Model

Collection C = {d1, . . . , dn}. Document d = {{w1, . . . , wn} , time(d)}.

◮ time(d) gives the publication date of d.

Annotated document ˆ d is composed of:

◮ Named entities ˆ

de = {e1, . . . , en}

◮ Temporal expressions ˆ

dt = {t1, . . . , tm}

◮ Sentences ˆ

ds = {s1, . . . , sz}

SLIDE 25

Ranking Related News Predictions Task Definition Models

Annotated Document Model

Collection C = {d1, . . . , dn}. Document d = {{w1, . . . , wn} , time(d)}.

◮ time(d) gives the publication date of d.

Annotated document ˆ d is composed of:

◮ Named entities ˆ

de = {e1, . . . , en}

◮ Temporal expressions ˆ

dt = {t1, . . . , tm}

◮ Sentences ˆ

ds = {s1, . . . , sz}

SLIDE 26

Ranking Related News Predictions Task Definition Models

Prediction Model

Let dp be the parent document of a prediction p. p is a sentence containing field/value pairs:

Field Value

ID

1136243_1

PARENT_ID

1136243

TITLE

Gore Pledges A Health Plan For Every Child

TEXT

Vice President Al Gore proposed today to guarantee access to affordable health insurance for all children by 2005, expanding

n a program enacted two years ago that he conceded had had

limited success so far.

CONTEXT

Mr. Gore acknowledged that the number of Americans without

health coverage had increased steadily since he and President Clinton took office.

ENTITY

Al Gore

FUTURE_DATE

2005

PUB_DATE

1999/09/08

SLIDE 27

Ranking Related News Predictions Task Definition Models

Query Model

Query q is extracted from a news article being read dq.

1. Keywords qtext
2. Time constraints qtime

SLIDE 28

Ranking Related News Predictions Task Definition Models

Query Keywords

A news article being read Query keyword extraction Term query (1) (2) (3) Entity query Q Q

E T

Field A prediction

ID PARENT_ID TITLE

TEXT

ENTITY

CONTEXT FUTURE_DATE PUB_DATE

Combined query Q

C

QE = {e1, . . . , em} E.g., Barack Obama, Iraq, America

SLIDE 29

Ranking Related News Predictions Task Definition Models

Query Keywords

A news article being read Query keyword extraction Term query (1) (2) (3) Entity query Q Q Combined query Q

E T C

Field A prediction

ID PARENT_ID TITLE

TEXT

ENTITY CONTEXT FUTURE_DATE PUB_DATE

QE = {w1, . . . , wn} E.g., troop, war, withdraw

SLIDE 30

Ranking Related News Predictions Task Definition Models

Query Keywords

A news article being read Query keyword extraction Term query (1) (2) (3) Entity query Q Q Combined query Q

E T C

Field A prediction

ID PARENT_ID TITLE

TEXT ENTITY

CONTEXT FUTURE_DATE PUB_DATE

QC = {e1, . . . , em} ∪ {w1, . . . , wn} E.g., Barack Obama, Iraq, America, troop, war, withdraw

SLIDE 31

Ranking Related News Predictions Task Definition Models

Query Time

Time constraints qtime

1. only predictions that are future to time(dq) - (time(dq), tmax]
2. only articles published before time(dq) - [tmin, time(dq)]

now future past Query 2016 2033 2018 1999 2006 2002

P P P

SLIDE 32

Ranking Related News Predictions Approach Features

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 33

Ranking Related News Predictions Approach Features

Term Similarity

Capture the term-similarity between q and p.

1. retScore(q,p) Lucene’s TF-IDF scoring function

◮ Problem: keyword matching, short texts ◮ Predictions not containing query terms are not retrieved.

2. bm25f(q,p) field-aware ranking function

◮ Extend a sentence structure by surrounding sentences. ◮ Search CONTEXT in addition to TEXT [Blanco et al. 2010].

SLIDE 34

Ranking Related News Predictions Approach Features

Term Similarity

Capture the term-similarity between q and p.

1. retScore(q,p) Lucene’s TF-IDF scoring function

◮ Problem: keyword matching, short texts ◮ Predictions not containing query terms are not retrieved.

2. bm25f(q,p) field-aware ranking function

◮ Extend a sentence structure by surrounding sentences. ◮ Search CONTEXT in addition to TEXT [Blanco et al. 2010].

SLIDE 35

Ranking Related News Predictions Approach Features

Term Similarity

Capture the term-similarity between q and p.

1. retScore(q,p) Lucene’s TF-IDF scoring function

◮ Problem: keyword matching, short texts ◮ Predictions not containing query terms are not retrieved.

2. bm25f(q,p) field-aware ranking function

◮ Extend a sentence structure by surrounding sentences. ◮ Search CONTEXT in addition to TEXT [Blanco et al. 2010].

SLIDE 36

Ranking Related News Predictions Approach Features

Entity-based Similarity

Measure the similarity between q and p by exploiting annotated entities in dp, p, q.

◮ Only applicable for QE and QC. ◮ Features commonly employed in

entity ranking tasks.

◮ Time distance captures the

relationship of term and time.

ID Feature 1 entitySim(q, p) 2 title(e, dp) 3 titleSim(e, dp) 4 senPos(e, dp) 5 senLen(e, dp) 6 cntSenSubj(e, dp) 7 cntEvent(e, dp) 8 cntFuture(e, dp) 9 cntEventSubj(e, dp) 10 cntFutureSubj(e, dp) 11 timeDistEvent(e, dp) 12 timeDistFuture(e, dp) 13 tagSim(e, dp) 14 isSubj(e, p) 15 timeDist(e, p)

SLIDE 37

Ranking Related News Predictions Approach Features

Topic Similarity

Compute the similarity between q and p on a topic level.

◮ Latent Dirichlet allocation [Blei et al. 2003] for modeling topics.

1. Train a topic model
2. Infer topics
3. Compute topic similarity

SLIDE 38

Ranking Related News Predictions Approach Features

Topic Similarity

Step 1: Learn a topic model.

◮ Partition DN into sub-collections,

called document snapshot Dtrain,tk .

◮ For each Dtrain,tk , randomly select

documents for training a topic model.

◮ Output: topic models at different

time snapshots, e.g., φtk at tk.

SLIDE 39

Ranking Related News Predictions Approach Features

Topic Similarity

Step 2: Infer topics.

◮ Determine topics for q and p using

their contents, called topic inference.

◮ Both q and p are represented by a

probability distribution of topics.

◮ pφ = p(z1), . . . , p(zn), where p(z) is

a probability of a topic z.

SLIDE 40

Ranking Related News Predictions Approach Features

Topic Similarity

I. Which model snapshot should be used for inference?

Select a topic model φtk for inference in 2 ways:

◮ tk = time(dq) ◮ tk = time(dp)

II. Which contents should be used for inference?

For a query q, the parent document dq is used. For a prediction p, the contents can be:

◮ Only text ptxt ◮ Both text ptxt and context pctx ◮ Parent document dp

SLIDE 41

Ranking Related News Predictions Approach Features

Topic Similarity

I. Which model snapshot should be used for inference?

Select a topic model φtk for inference in 2 ways:

◮ tk = time(dq) ◮ tk = time(dp)

II. Which contents should be used for inference?

For a query q, the parent document dq is used. For a prediction p, the contents can be:

◮ Only text ptxt ◮ Both text ptxt and context pctx ◮ Parent document dp

SLIDE 42

Ranking Related News Predictions Approach Features

Topic Similarity

I. Which model snapshot should be used for inference?

Select a topic model φtk for inference in 2 ways:

◮ tk = time(dq) ◮ tk = time(dp)

II. Which contents should be used for inference?

For a query q, the parent document dq is used. For a prediction p, the contents can be:

◮ Only text ptxt ◮ Both text ptxt and context pctx ◮ Parent document dp

SLIDE 43

Ranking Related News Predictions Approach Features

Topic Similarity

I. Which model snapshot should be used for inference?

Select a topic model φtk for inference in 2 ways:

◮ tk = time(dq) ◮ tk = time(dp)

II. Which contents should be used for inference?

For a query q, the parent document dq is used. For a prediction p, the contents can be:

◮ Only text ptxt ◮ Both text ptxt and context pctx ◮ Parent document dp

SLIDE 44

Ranking Related News Predictions Approach Features

Topic Similarity

Step 3: Measuring topic similarity.

◮ q and p are represented by topic

distributions.

◮ qφ = p(z1), . . . , p(zn) ◮ pφ = p(z1), . . . , p(zn) ◮ Compute the topic similarity using

cosine similarity.

topicSim(q, p) = qφ · pφ ||qφ|| · ||pφ|| =

z∈Z qφz · pφz
z∈Z q2

φz ·

z∈Z p2

φz

SLIDE 45

Ranking Related News Predictions Approach Features

Temporal Similarity

Hypothesis I. Predictions that are more recent to the query are more relevant.

now future past Query 2016 2033 2018 1999 2006 2002

P P P

Time distance

SLIDE 46

Ranking Related News Predictions Approach Features

Temporal Similarity

Hypothesis II. Predictions extracted from more recent documents are more relevant.

now future past Query 2016 2033 2018 1999 2006 2002

P P P

Time distance

◮ Timestamp-based Uncertainty (TSU) [Kanhabua and Nørvåg 2010] ◮ FussySet (FS) [Kalczynski and Chou 2005]

SLIDE 47

Ranking Related News Predictions Approach Ranking Method

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 48

Ranking Related News Predictions Approach Ranking Method

Ranking Method

Learning-to-rank: Given an unseen (q, p), p is ranked using a model trained over a set of labeled query/prediction pairs. score(q, p) =

N

i=1

wi × fi

◮ SVMMAP [Yue et al. 2007] ◮ RankSVM [Joachims 2002] ◮ SGD-SVM [Zhang 2004] ◮ PegasosSVM [Shalev-Shwartz et al. 2007] ◮ PA-Perceptron [Crammer et al. 2006]

SLIDE 49

Ranking Related News Predictions Evaluation Experiment Setting

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 50

Ranking Related News Predictions Evaluation Experiment Setting

Document collection

NYT Annotated Corpus 1.8M from 1987 to 2007.

◮ More than 25% contain at least one prediction

Annotation process uses several language processing tools.

◮ OpenNLP for tokenizing, sentence splitting, part-of-speech

tagging, shallow parsing

◮ SuperSense tagger for named entity recognition ◮ TARSQI for extracting temporal expressions

Apache Lucene for indexing and retrieving.

◮ 44,335,519 sentences and 548,491 predictions ◮ 939,455 future dates (avg. future date/prediction is 1.7)

SLIDE 51

Ranking Related News Predictions Evaluation Experiment Setting

Relevance judgments

42 future-related topics

POLITICS ENVIRONMENT SPACE president election global warming Mars Iraq war energy efficiency Moon SCIENCE PHYSICS HEALTH earthquake particle Physics bird flue tsunami Big Bang influenza BUSINESS SPORT TECHNOLOGY subprime Olympics Internet financial crisis World cup search engine

SLIDE 52

Ranking Related News Predictions Evaluation Experiment Setting

Relevance judgments

Human assessors gave a relevance score Grade(q, p, t).

◮ 4 (very relevant), 3 (relevant), 2 (related), 1 (non-relevant), and 0

(incorrect tagged date)

◮ relevant if Grade(q, p, t) ≥ 3 and non-relevant if

1 ≤ Grade(q, p, t) ≤ 2

In total, assessors judged 52 queries.

◮ On average 94 predictions were retrieved per query ◮ 4,888 query/prediction pairs (approximately 6,032 of triples)

Available for download at:

www.idi.ntnu.no/~nattiya/data/sigir2011/futurepredictions.zip

SLIDE 53

Ranking Related News Predictions Evaluation Experiment Setting

Parameter setting

BM25F: b = 0.75, k1 = 1.2 [Robertson et al. 1994]

◮ boost(TEXT) = 5.0 ◮ boost(CONTEXT) = 1.0 ◮ boost(TITLE) = 2.0

LDA: Stanford Topic Modeling Toolbox

◮ randomly select 4% of documents in each year for training ◮ filter 100 most common terms and in less than 15 documents ◮ number of topics Nz is 500 ◮ collapsed variational Bayes approximation algorithm

Temporal features:

◮ DecayRate = 0.5, λ = 0.5, µ = 2y ◮ n = 2, m = 2, smin = 4y, smax = 2y ◮ α1 = time(dq) − 4y, α2 = time(dq) + 2y

SLIDE 54

Ranking Related News Predictions Evaluation Experimental Results

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 55

Ranking Related News Predictions Evaluation Experimental Results

Methods for comparison

Baseline: QE, QT, QC

◮ Rank using Lucene’s default ranking function.

Our approach: Re-QE, Re-QT, Re-QC

◮ Re-rank the baseline results using learning-to-rank.

Metrics: P@1, P@3, MRR

◮ Typically, a user is interested in a few top predictions.

SLIDE 56

Ranking Related News Predictions Evaluation Experimental Results

Selecting top-m entities and top-n terms

Select m and n with reasonable improvement in a hold-out set.

◮ Using QE to retrieve predictions, choose m = 11. ◮ Observing the performance of QC when m = 11, choose n = 10.

0.1 0.2 0.3 0.4 0.5 0.6 1 2 3 4 5 6 7 8 9 1 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 top-m entities P10 MAP 0.1 0.2 0.3 0.4 0.5 0.6 1 2 3 4 5 6 7 8 9 1 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 top-n terms P10 MAP

SLIDE 57

Ranking Related News Predictions Evaluation Experimental Results

Compare all other methods against QE

0.2 0.4 0.6 0.8 1 P1 QE QT QC Re-QE Re-QT Re-QC 0.2 0.4 0.6 0.8 1 P3 QE QT QC Re-QE Re-QT Re-QC 0.2 0.4 0.6 0.8 1 MRR QE QT QC Re-QE Re-QT Re-QC

Results:

◮ QE performs worst among the baselines, while QC is superior to QT . ◮ Re-QC gains the highest effectiveness followed by Re-QT . ◮ Re-ranking approach gains improvement, except Re-QE.

SLIDE 58

Ranking Related News Predictions Evaluation Experimental Results

Compare all other methods against QE

0.2 0.4 0.6 0.8 1 P1 QE QT QC Re-QE Re-QT Re-QC 0.2 0.4 0.6 0.8 1 P3 QE QT QC Re-QE Re-QT Re-QC 0.2 0.4 0.6 0.8 1 MRR QE QT QC Re-QE Re-QT Re-QC

Analysis:

◮ QE not retrieved any relevant result in the judged pool, difficult for re-ranking. ◮ Entity-based features perform well for some topics.

SLIDE 59

Ranking Related News Predictions Evaluation Experimental Results

Feature analysis

Top-5 features with highest weights and lowest weights for each query type.

QE QT QC Feature Wi Feature Wi Feature Wi tagSim 1.00 bm25f 1.00 LDA1,parent,k 1.00 FS1 0.97 retScore 0.60 retScore 0.99 TSU2 0.88 LDA1,parent,k 0.55 LDA1,parent,all 0.96 LDA1,txt,k 0.87 LDA2,parent,k 0.51 bm25f 0.93 LDA1,txt,all 0.82 LDA1,parent,all 0.49 isSubj 0.87 cntSenSubj 0.01 timeDistEvent

0.03

cntEventSen

0.02

cntEventSubj 0.01 timeDistFuture

0.11

querySim

0.05

isInTitle 0.00 cntEventSen

0.12

cntFutureSen

0.10

cntEventSen 0.00 cntFutureSen

0.12

timeDistFuture

0.14

querySim

0.01

senLen

0.16

senLen

0.18

◮ Topic-based features play an important role in the re-ranking model. ◮ Although relying on terms, retScore and bm25f help to re-rank predictions. ◮ Features in top-5 features with lowest weights are from the entity-based class.

SLIDE 60

Ranking Related News Predictions Conclusions Conclusions and Future Work

Outline

Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results

SLIDE 61

Ranking Related News Predictions Conclusions Conclusions and Future Work

Conclusions and future work

◮ Define the task of ranking related future predictions. ◮ Employ learning-to-rank incorporating 4 feature classes. ◮ Conduct extensive experiments and create an evaluation

dataset with over 6000 relevance judgments.

◮ Future work:

◮ Combining multiple sources (Wikipedia, blogs, home

pages, etc.) of future-related information.

◮ Sentimental analysis for future-related information.

SLIDE 62

Ranking Related News Predictions Conclusions Conclusions and Future Work