Classifying News Stories to Estimate the Direction of a Stock Market - - PowerPoint PPT Presentation

classifying news stories to estimate the direction of a
SMART_READER_LITE
LIVE PREVIEW

Classifying News Stories to Estimate the Direction of a Stock Market - - PowerPoint PPT Presentation

Classifying News Stories to Estimate the Direction of a Stock Market Index (Brett Drury, Luis Torgo and J. J. Almeida)[1] Hao Fu, Jiatong Ruan Introduction Background Timely information from news -> Prediction of the prospects of economic


slide-1
SLIDE 1

Classifying News Stories to Estimate the Direction of a Stock Market Index (Brett Drury, Luis

Torgo and J. J. Almeida)[1]

Hao Fu, Jiatong Ruan

slide-2
SLIDE 2

Introduction

slide-3
SLIDE 3

Background

  • Timely information from news -> Prediction of the prospects of economic actors
  • News Information: the past or the future VS. Numeric data: the past
  • Some published methods exist:

○ manually created rules ○ models learnt from manully selected data and manually constrcted dictionaries

  • Disadvantage: Rely on human annotator
slide-4
SLIDE 4

Related Work

Manually organize news stories

  • 19 categories with different
  • levels. [2]
  • Using machine readable news

to automatically classify

  • stories. [3]
  • Increase to 39 categories. [4]
  • Dictionary contains 423
  • features. [5]

Alignment of news sotries to market movement [6]

  • limited in single companies and

where the company names are on headlines.[6]

slide-5
SLIDE 5

News Story Classification

Manual constrcuted rules with automatically constructed dictionaries Alignment of stories with sharp market movement Self-training to construct a model to classify news stories

Fig 1: Proposed Classification

slide-6
SLIDE 6

Data

Amount: News stories (>300,000) News Source: Really SImple Syndication (RSS) feeds Time Period: Oct. 2008 - Jun. 2010, crawler ran at the same time each day Database: RDBMS: headline, description, published data and story text Stock Data: Yahoo Finance

slide-7
SLIDE 7

Data

Data pre-process:

  • Remove duplicate stories and non-finance stories
  • Remove sentences that did not contain the named entities: companies, organizations, market

indexes and company employees.

  • The sentence set was parsed with the ANNIE Part of Speech Tagger[8].
slide-8
SLIDE 8

Model from Rule Selected Data[7]

event or sentiment phrases Economic Actor(company,

  • rganization, market, etc.)

Verb/Adj. Object(profits, unemployment, etc) Classified as positive or negative Unclassified Fig 2: Rule Classifyer Model

slide-9
SLIDE 9

Alighment of Market Data

  • Assumption: If the market moves sharply then this movement will be reflected in the published

news stories.

  • This strategy selected data by labelling news stories by their co-occurrence with a single market

movement.

  • A positive day is assumed to be when the market move by more than 1.7% and a negative day

when the market lose more than 2.11%.

slide-10
SLIDE 10

Hybrid of Rules and Alignment

  • This strategy attempts to mitigate the flaws of a rule classifier and alignment with a simple voting

strategy.

  • news

story rule classifier Alignment equal labels! training set Fig 3: Hybrid Strategy for equal labels

slide-11
SLIDE 11

Hybrid of Rules and Alignment

  • The strategy ensured that stories which were contrary to market trend were not included in the

training set.

news story rule classifier Alignment contradictory labels! training set Fig 4: Hybrid Strategy for contradictory labels

slide-12
SLIDE 12

Proposed Algorithm

Fig 5: Flow Diagram for Proposed Algorithm

slide-13
SLIDE 13

Evaluation

  • The evaluation methodology is based on estimated F-Measure.
  • The F-Measure is estimated for models genearated from: headline, description and story text

information.

Strategy Headline Text Description Rules 0.77 0.60 0.65 Alignment 0.57 0.57 0.57 Hybrid 0.66 0.57 0.58 Proposed 0.84 0.71 0.77 Fig 6: Estimated F-Measure for competing strategies

slide-14
SLIDE 14

Conclusion

  • This paper presents a proposed method for categorizing news stories into positive or negative

categories.

  • By combining a rule classifier and alignment with market movement the chance of identifying

events which may influence the market is increased. The proposed method adds further documents with a self-training method.

  • The proposed method has a clear advantage over the competing methods by F - Measure.
slide-15
SLIDE 15

Contribution

  • Designed a hybrid strategy that can mitigate the flaws of rule classifier and alignment of market

data.

  • Proposed a new algorithm by introducing self-training to utilize unlabelled training data for

training more robust model.

slide-16
SLIDE 16

Limitations

  • How models are induced from headline, description and story text, which is really important for us

to evaluate, is not clearly presented in paper.

  • Market movement depends on many factors, some of which might be contradictory, it’s probably

not a good idea to ignore data that contrary to market trend.

slide-17
SLIDE 17

Future Work

  • Evaluate techniques with news published when the market is closed.
  • Assign a relevance measure to news story.
  • Utilize news volume.
slide-18
SLIDE 18

Q & A

slide-19
SLIDE 19

Reference

[1] Drury, Brett, Luis Torgo, and J. J. Almeida. "Classifying news stories to estimate the direction of a stock market index." Information Systems and Technologies (CISTI), 2011 6th Iberian Conference on. IEEE, 2011. [2] Taleb, Nassim Nicholas and Lane, Allen., The Black Swan (The impact of the highly improbable). Random House, 2008. [3] Thomas, James D. News and Trading Rules. s.1. : CiteSeer, 2003 [4] Mittermayer, M A and Knolmaye, G F. Text Mining Systems for Market Response to News: A Survey. University of Bern, 2006

slide-20
SLIDE 20

Reference

[5] Wuthrich, B, et al., Daily prediction of major stock indices from textual www data. International conference on Knowledge Discovery and Data Mining, 1998 [6] Lavrenko, Victor, et al., Language Models for Financial News Recommendation. ACM Press, 2000 [7] Drury, Brett and Almeida, J J., Identification of Fine Grained Feature Based Event and Sentiment Phrases from Business News Stories. ACM, 2011 [8] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. Gate: A framwork and graphical development environment for robust nlp tools and applications. In Proceeding of the 40th Anniversary Meeting of the Association for Computational LInguistics, 2002