classifying news stories to estimate the direction of a
play

Classifying News Stories to Estimate the Direction of a Stock Market - PowerPoint PPT Presentation

Classifying News Stories to Estimate the Direction of a Stock Market Index (Brett Drury, Luis Torgo and J. J. Almeida)[1] Hao Fu, Jiatong Ruan Introduction Background Timely information from news -> Prediction of the prospects of economic


  1. Classifying News Stories to Estimate the Direction of a Stock Market Index (Brett Drury, Luis Torgo and J. J. Almeida)[1] Hao Fu, Jiatong Ruan

  2. Introduction

  3. Background Timely information from news -> Prediction of the prospects of economic actors ● News Information: the past or the future VS. Numeric data: the past ● Some published methods exist: ● manually created rules ○ models learnt from manully selected data and manually constrcted dictionaries ○ Disadvantage: Rely on human annotator ●

  4. Related Work Manually organize news stories Alignment of news sotries to market movement [6] 19 categories with different ● levels. [2] limited in single companies and ● Using machine readable news ● where the company names are on to automatically classify headlines.[6] stories. [3] Increase to 39 categories. [4] ● Dictionary contains 423 ● features. [5]

  5. News Story Classification Alignment of stories with sharp Manual constrcuted rules with Self-training to construct a automatically constructed market movement model to classify news stories dictionaries Fig 1: Proposed Classification

  6. Data Amount: News stories (>300,000) News Source: Really SImple Syndication (RSS) feeds Time Period: Oct. 2008 - Jun. 2010 , crawler ran at the same time each day Database: RDBMS : headline, description, published data and story text Stock Data: Yahoo Finance

  7. Data Data pre-process: Remove duplicate stories and non-finance stories ● Remove sentences that did not contain the named entities: companies, organizations, market ● indexes and company employees. The sentence set was parsed with the ANNIE Part of Speech Tagger[8]. ●

  8. Model from Rule Selected Data[7] Economic Actor (company, organization, market, etc.) Classified as positive or negative event or Verb/Adj . sentiment phrases Unclassified Object (profits, unemployment, etc) Fig 2: Rule Classifyer Model

  9. Alighment of Market Data Assumption: If the market moves sharply then this movement will be reflected in the published ● news stories. This strategy selected data by labelling news stories by their co-occurrence with a single market ● movement . A positive day is assumed to be when the market move by more than 1.7% and a negative day ● when the market lose more than 2.11% .

  10. Hybrid of Rules and Alignment This strategy attempts to mitigate the flaws of a rule classifier and alignment with a simple voting ● strategy . equal ● labels! rule classifier news training story set Alignment Fig 3: Hybrid Strategy for equal labels

  11. Hybrid of Rules and Alignment contradictory labels! rule classifier news training story set Alignment Fig 4: Hybrid Strategy for contradictory labels The strategy ensured that stories which were contrary to market trend were not included in the ● training set.

  12. Proposed Algorithm Fig 5: Flow Diagram for Proposed Algorithm

  13. Evaluation The evaluation methodology is based on estimated F-Measure. ● The F-Measure is estimated for models genearated from: headline , description and story text ● information. Strategy Headline Text Description Rules 0.77 0.60 0.65 Alignment 0.57 0.57 0.57 Hybrid 0.66 0.57 0.58 Proposed 0.84 0.71 0.77 Fig 6: Estimated F-Measure for competing strategies

  14. Conclusion This paper presents a proposed method for categorizing news stories into positive or negative ● categories. By combining a rule classifier and alignment with market movement the chance of identifying ● events which may influence the market is increased. The proposed method adds further documents with a self-training method. The proposed method has a clear advantage over the competing methods by F - Measure . ●

  15. Contribution Designed a hybrid strategy that can mitigate the flaws of rule classifier and alignment of market ● data. Proposed a new algorithm by introducing self-training to utilize unlabelled training data for ● training more robust model.

  16. Limitations How models are induced from headline, description and story text, which is really important for us ● to evaluate, is not clearly presented in paper. Market movement depends on many factors, some of which might be contradictory, it’s probably ● not a good idea to ignore data that contrary to market trend.

  17. Future Work Evaluate techniques with news published when the market is closed . ● Assign a relevance measure to news story. ● Utilize news volume. ●

  18. Q & A

  19. Reference [1] Drury, Brett, Luis Torgo, and J. J. Almeida. "Classifying news stories to estimate the direction of a stock market index." Information Systems and Technologies (CISTI), 2011 6th Iberian Conference on. IEEE, 2011. [2] Taleb, Nassim Nicholas and Lane, Allen., The Black Swan (The impact of the highly improbable). Random House, 2008. [3] Thomas, James D. News and Trading Rules. s.1. : CiteSeer, 2003 [4] Mittermayer, M A and Knolmaye, G F. Text Mining Systems for Market Response to News: A Survey. University of Bern, 2006

  20. Reference [5] Wuthrich, B, et al., Daily prediction of major stock indices from textual www data. International conference on Knowledge Discovery and Data Mining, 1998 [6] Lavrenko, Victor, et al., Language Models for Financial News Recommendation. ACM Press, 2000 [7] Drury, Brett and Almeida, J J., Identification of Fine Grained Feature Based Event and Sentiment Phrases from Business News Stories. ACM, 2011 [8] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. Gate: A framwork and graphical development environment for robust nlp tools and applications. In Proceeding of the 40th Anniversary Meeting of the Association for Computational LInguistics, 2002

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend