incorporating knowledge into dnn for financial numeral
play

Incorporating Knowledge into DNN for Financial Numeral - PowerPoint PPT Presentation

ASNLU at NTCIR-14 Finnum Task: Incorporating Knowledge into DNN for Financial Numeral Classification ChaoChun Liang Institute of Information Science Academia Sinica, Taipei June 12, 2019 0 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019


  1. ASNLU at NTCIR-14 Finnum Task: Incorporating Knowledge into DNN for Financial Numeral Classification ChaoChun Liang Institute of Information Science Academia Sinica, Taipei June 12, 2019 0 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  2. Outlin line • Propos oposed ed A Appr proa oaches hes • Exper perim imen ental R al Result ults • Discu scussi ssion • Conc nclu lusion ion 1 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  3. Task O Ta Overview ew • Purpos pose: e: To unde derstand t and the f fine ne-gr grai ained ned numer eral al infor ormat atio ion i n in financ nancia ial T l Tweet et ”8” is a numeral about quantity ”17.99” is about stop loss price “200” is a indicator of technical indicator (T1) 8 breakouts: $CHMT (stop: $ 17.99 ), $FLO ( 200 -day MA), $OMX (gap), $SIRO (gap). One sub-$ 1 stock. Modest selection on attempted swing low. 2 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  4. Propo posed A App pproac ach 1/5 • Model del the Numer eral al C Clas assific icat ation ion a as a a Sequen quence L e Labeli beling P ng Process o Input Word Sequence: W1, W2, … Wn o Output Label Sequence: T1, T2, … Tn M : main category class set, S : sub-category class set O : Not a target word to be classified 3 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  5. Propo posed A App pproac ach 2/5 • Propos opose a e a token en r repr pres esent ntat ation ion w with h exter ernal k al knowle owledge dge to o rep represent t the w he wor ord meanin aning i g in Tweet eet s sent ntenc ences es • Imple lement ent t three v ee vanill nilla n a neur ural n al networ work models dels 4 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  6. Propo posed A App pproac ach 3/5 • Token en R Repr present entat atio ion o W: Pre-trained Word Embedding o P: Part-of-Speech, N: Named entity Type o C: Category-Pattern Feature (#=6) • Company. (‘$NTNX’) • Money. (‘$20 20’ or ’13 13$’) • Product number. (‘PS4’) • Date. (’11 11/09 09/17 17’ or ’11 11-09 09-17 17’) • Time. (‘6:45 45’ or ‘3:25 25 p.m.’) • Number. (’68 68’) 5 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  7. Propo posed A App pproac ach 4/5 • CNN CNN (det (detect l loc ocal pat patterns, e. e.g. g. ’85 85%’) • RNN RNN (capt ptur ure c cont ntex ext i infor ormat atio ion) n) • RNN+ N+CN CNN (capt ptur ure l local al i info. o. i in RNN) N) 6 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  8. Propo posed A App pproac ach 5/5 • Rescor orin ing g in P Predic ediction ion T Time: e: o Exclude the Out-of-Category (‘O’) label from the candidate set for each target numeral to avoid inconsistency. 7 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  9. Expe perime ment nt S Setting • Pr Pre-trai ained ned E Embedd bedding ing o GLOVE 840.300D • CNN CNN o Kernel sizes of 2,3,4 and 5 o 32 filters for each kernel • RNN RNN o Bi-GRUs with 128 hidden nodes • Dropou opout 0 0.5 8 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  10. Over erall P Performa mance CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63 +POS&NE 88.21 79.14 88.45 78.63 89.72 80.93 +POS&NE 87.73 78.47 88.76 83.55 89.24 81.50 +Pattern Task-1 Test Set Performance CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 69.88 58.66 75.22 71.72 73.94 65.54 +POS&NE 75.14 65.77 78.49 72.37 78.17 70.16 +POS&NE 76.41 68.5 79.36 70.5 79.12 72.51 +Pattern Task-2 Test Set Performance “ None ” denotes the NN models without incorporating any knowledge. “ POS&NE ” denotes the NN models with both POS and NE information. 9 “ Pattern ” denotes the NN models that incorporate category patterns specified by handcrafted rules. ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  11. Expe perime ment ntal Res Results 1/3 CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63 Task-1 testing set performance • Divis isio ion o n of clas assif ific ication r ion result ults betwee ween n CNN a N and d RNN m N models dels 10 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  12. Expe perime ment ntal Res Results 2/3 CNN CNN RNN RNN RNN+CNN RNN+CNN Micro Micro Macro Macro Micro Micro Macro Macro Micro Micro Macro Macro None None 81.83 81.83 69.54 69.54 84.22 84.22 73.36 73.36 82.71 82.71 69.63 69.63 +POS&NE 88.21 79.14 88.45 78.63 89.72 80.93 Task-1 testing set performance • OOVs p prov ovide ide no u useful ul Infor ormation ion o OOVs: 30+% on Development and Test sets • Lingu nguis istic ic I Infor ormat ation ion (POS&NE NE) a attac ache hed d to o OOVs i impr mproved t the per he performance signif gnific icant ntly ly ( (4% ~ ~ 10%) %). 11 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  13. Expe perime ment ntal Res Results 3/3 CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63 +POS&NE 88.21 79.14 88.45 78.63 89.72 80.93 +POS&NE 87.73 78.47 88.76 83.55 89.24 81.50 +Pattern Task-1 testing set performance • Categor egory-pat atter ern f n featur ures es of offer s smal mall impr prov ovem ement nts or even d en degr grad ade e perf rforma rmance ce. • Not ot c cov over eno enough pat patterns for man or manually- encoded oded r rules les. 12 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  14. Discussion Di 1/2 • Issu ssue-1: 1: High OOV OOV rate te • Issu ssue-2: 2: Dive verse rse p pattern rns in T n Twee eet ( ( Not ot enough cover enough overage w age with handc h handcraf afted pat ed patter erns) ns) • Solut lution ion: Nume mera ral-Spli plittin ing o Most OOVs are concatenations of a numeral and other characters. o Split each token with numbers into individual sub-tokens. o e.g., “FY22” -> ”FY” and “22” OOV Rate Dev Test o e.g., “12/3/2017” -> “12”, “/”, “3”, ”/”, “2017” Before 36% 39% After 22% 23% 13 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  15. Di Discussion 2/2 • Per erforman ance i ce impr mprove ves s s signi gnificant cantly. y. E.g. g., 9% 9% (mi micr cro), 18%( 18%(ma macr cro) i in n RNN+CNN(“None one”). • Out utper performs t the he handc handcraf afted pat ed patter erns. s. CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63 +POS&NE 87.73 78.47 88.76 83.55 89.24 81.50 +Pattern Task-1 Test Set Performance (before Numeral Splitting) CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 89.56 83.17 92.27 86.60 92.11 88.18 +POS&NE 90.68 83.60 91.95 88.36 92.99 88.25 Task-1 Test Set Performance (after Numeral Splitting) 14 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  16. Conc Co nclusion on • The p propos oposed d token r en repr pres esen entat ation ( ion (wit ith h lingu nguis istic ic k knowled owledge) ge) impro rove ves s perfor ormanc ance s signi gnific icant antly ly. • A suitab able p le pre-pro roce cessi ssing ( (split plitting ing nume mera rals) t to red o reduce OOV rat rates is essent ential ial. • Joint intly ly a adopt opting ing both a h appr proac oaches es c could uld offer er a addit dition ional b al benef nefit its. 15 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  17. Q & A Thanks 16 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  18. App ppen endix – P10 10 1/ 1/2 • Erro rrors mad made by by RNN wer ere due due t to o the he model m del missing l ng local p al patter erns o E.g., “num/num” (Temporal) in “10/24” “num%” (Percentage) in “7.8%” • Erro rrors mad made by by CNN wer ere due due t to o the he model m del missing c ng cont ntex ext i infor ormat ation ion o E.g., “ You sol old ESPR at 11 11 and CLVS at 29 29 but thanks for this tip. ” 17 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  19. App ppen endix – P10 10 2/ 2/2 • Erro rrors mad made by by RNN and and C CNN bot both w wer ere due t due to o the he num number can an not not be be cat ategorized explic plicitly ly (i.e. e. n need m ed more e infor ormat atio ion) n). o E.g., “ $NGAS Buy on dips on $UGAZ $UNG. Dip to 3.075, NG is on wave 3 move to 3.27 on 8HR chart. ” 18 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  20. App ppen endix – P12 12 • Categor egory F F-Scor ore o e of RNN N with P h POS&NE NE and C d Categ egor ory-Pat atter erns ns +POS&NE +POS&NE +Pattern Monetary 0.9107 0.9085 Quantity 0.7727 0.7857 Percentage 0.9882 0.9882 Temporal 0.8978 0.8903 Product Number 0.3182 0.6818 Option 0.7727 0.7727 Indicator 0.7778 0.7037 19 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend