BRNIR at the NTCIR-14 finnum task: Scalable feature extraction - - PowerPoint PPT Presentation

brnir at the ntcir 14 finnum task scalable feature
SMART_READER_LITE
LIVE PREVIEW

BRNIR at the NTCIR-14 finnum task: Scalable feature extraction - - PowerPoint PPT Presentation

BRNIR at the NTCIR-14 finnum task: Scalable feature extraction technique for numeral classification Alan Spark, Team Lead at AUTO1 GROUP 1 Agenda Motivation Features types Extraction pipeline Experiment design Results 2


slide-1
SLIDE 1

1

BRNIR at the NTCIR-14 finnum task: Scalable feature extraction technique for numeral classification Alan Spark, Team Lead at AUTO1 GROUP

slide-2
SLIDE 2

2

Agenda

  • Motivation
  • Features types
  • Extraction pipeline
  • Experiment design
  • Results
slide-3
SLIDE 3

3

Motivation

  • Focus on feature extraction in unsupervised fashion
  • Experiments on different features concatenations
  • Suggest a feature extraction pipeline
slide-4
SLIDE 4

4

Features types

Topic distribution

a vector with topics distribution of a tweet

Number properties

a vector encoding a number properties such a value, position & type and

  • ther

Tickers

multi-label encoding of tickers presented in a tweet

Token context

"Bag-of-words" like encoding

  • f tokens neighboring a

number.

Tags

multi-label encoding of tags presented in a tweet

Character context

"Bag-of-words" like encoding of characters neighboring a number

slide-5
SLIDE 5

5

Extraction pipeline

slide-6
SLIDE 6

6

Experiment design

slide-7
SLIDE 7

7

Results

slide-8
SLIDE 8

8

Summary and Future work

  • unsupervised approaches for feature extraction in application to FinNum task
  • methods are parallelizable and meant to be run at scale
  • utilize data discovered at preprocessing step
  • address natural imbalance
  • embedding for all “sparse” features
  • experiment with classification models
slide-9
SLIDE 9

Thank you

9

slide-10
SLIDE 10

Q&A

10

slide-11
SLIDE 11

11

AUTO1 Group GmbH c/o Alan Spark Bergmannstraße 72 10961 Berlin alan.spark@auto1.com mail@alanspark.net

slide-12
SLIDE 12

Additional plots

12

slide-13
SLIDE 13

13

$FNKO $10 is a no-brainer. Should trade back to IPO price $12. Remember, initial range on IPO was $16 on high end. Quiet period expiry soon.

Preprocessing highlight

target num: [”10”, ”12.”] discovered numbers: [10, 12, 16] The approach detects extra numbers in more than 32% of tweets in given corpus

slide-14
SLIDE 14

14

Number of ”target numbers” per tweet on the left, Number of unique categories/subcategories per tweet on the right