brnir at the ntcir 14 finnum task scalable feature
play

BRNIR at the NTCIR-14 finnum task: Scalable feature extraction - PowerPoint PPT Presentation

BRNIR at the NTCIR-14 finnum task: Scalable feature extraction technique for numeral classification Alan Spark, Team Lead at AUTO1 GROUP 1 Agenda Motivation Features types Extraction pipeline Experiment design Results 2


  1. BRNIR at the NTCIR-14 finnum task: Scalable feature extraction technique for numeral classification Alan Spark, Team Lead at AUTO1 GROUP 1

  2. Agenda • Motivation • Features types • Extraction pipeline • Experiment design • Results 2

  3. Motivation • Focus on feature extraction in unsupervised fashion • Experiments on different features concatenations • Suggest a feature extraction pipeline 3

  4. Features types Topic distribution Tickers Tags a vector with topics multi-label encoding of multi-label encoding of distribution of a tweet tickers presented in a tags presented in a tweet tweet Number properties Token context Character context a vector encoding a "Bag-of-words" like encoding "Bag-of-words" like number properties such a of tokens neighboring a encoding of characters value, position & type and number. neighboring a number other 4

  5. Extraction pipeline 5

  6. Experiment design 6

  7. Results 7

  8. Summary and Future work • unsupervised approaches for feature extraction in application to FinNum task • methods are parallelizable and meant to be run at scale • utilize data discovered at preprocessing step • address natural imbalance • embedding for all “sparse” features • experiment with classification models 8

  9. 9 Thank you

  10. 10 Q&A

  11. AUTO1 Group GmbH c/o Alan Spark Bergmannstraße 72 10961 Berlin alan.spark@auto1.com mail@alanspark.net 11

  12. Additional plots 12

  13. Preprocessing highlight $ FNKO $ 10 is a no-brainer. Should trade back to IPO price $ 12. Remember, initial range on IPO was $ 16 on high end. Quiet period expiry soon. target num : [”10”, ”12.”] discovered numbers : [10, 12, 16] The approach detects extra numbers in more than 32% of tweets in given corpus 13

  14. Number of ”target numbers” per tweet on the left, Number of unique categories/subcategories per tweet on the right 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend