Minimal Domain Expertise in a Financial Domain Mayank Kejriwal - - PowerPoint PPT Presentation

minimal domain expertise in a
SMART_READER_LITE
LIVE PREVIEW

Minimal Domain Expertise in a Financial Domain Mayank Kejriwal - - PowerPoint PPT Presentation

Predicting Role Relevance with Minimal Domain Expertise in a Financial Domain Mayank Kejriwal Role Relevance Problem In the context of a text fragment, how relevant are the triples? "For the year ended December 31, 2015, billed business


slide-1
SLIDE 1

Predicting Role Relevance with Minimal Domain Expertise in a Financial Domain

Mayank Kejriwal

slide-2
SLIDE 2

Role Relevance Problem

"For the year ended December 31, 2015, billed business from charge cards comprised 56 percent of total U.S. Card Services billed business. Centurion Bank and American Express Bank as Issuers of Certain Cards and Deposit Products We have two U.S. bank subsidiaries, American Express Centurion Bank (?Centurion Bank?) and American Express Bank, FSB (?American Express Bank?), which are both FDIC-insured depository institutions. Certain information regarding each bank is set forth in the table below:"

FILER_NAME MENTIONED_FIN ANCIAL_ENTITY ROLE AMERICAN EXPRESS CO American Express Bank Issuers AMERICAN EXPRESS CO American Express Centurion Bank Issuers AMERICAN EXPRESS CO American Express Bank, FSB Issuers

In the context of a text fragment, how relevant are the triples?

slide-3
SLIDE 3

Domain Expertise: what do we mean?

  • Definitions of terms (taken from NYS-CPA)

Accrued Interest: INTEREST that has accumulated between the most recent payment and the sale of a BOND or

  • ther fixed-income security.
  • Definitions of roles

GUARANTOR : A legal arrangement involving a promise by a person (guarantor) to perform the

  • bligations of a second person (or many persons), in the event that the latter person fails to

meet their obligations.

  • Characteristics of the problem to be solved
  • Feature crafting
  • Ability to diagnose unexpected outputs
  • Many more!
slide-4
SLIDE 4

Motivation 1: Feature crafting is tricky!

slide-5
SLIDE 5

Motivation 2: Unlabeled data can be powerful...if used right

  • Intuitively, a source of (additional) background knowledge
  • What if there’s no human?
slide-6
SLIDE 6

Motivation 3: Impressive recent advances in NLP

  • Mainly due to neural networks and low-dimensional latent space modeling
slide-7
SLIDE 7

Case study: *2vec (word2vec, doc2vec...)

slide-8
SLIDE 8

Approach

  • Use labeled+unlabeled text data to train a vector for each ‘contextual

fragment’ (after some preprocessing)

"For the year ended December 31, 2015, billed business from charge cards comprised 56 percent

  • f total U.S. Card Services

billed business. Centurion Bank and American Express Bank as Issuers of Certain Cards and Deposit Products We have two U.S. bank subsidiaries, American Express Centurion Bank (?Centurion Bank?) and American Express Bank, FSB (?American Express Bank?), which are both FDIC- insured depository institutions. Certain information regarding each bank is set forth in the table below:"

Skip-gram

doc1 doc4 doc2 doc3

Centurion percent bank FDIC institutions billed

slide-9
SLIDE 9

Train role-specific classifiers

  • Use explicitly given role-specific relevance labels (ignore documents without

labels in training) doc1 doc4 doc2 doc3

Centurion percent bank FDIC institutions billed

Random Forest Classifier: Affiliate ...

slide-10
SLIDE 10

Training results

  • Consistent with (eventually released) ground truth 2 results

Role Precision Recall F1-Measure affiliate 0.941176471 0.941176471 0.941176471 trustee 0.976744186 1 0.988235294 issuer 0.909090909 0.909090909 0.909090909

slide-11
SLIDE 11

Conclusion

  • Extremely simple to set up (1-2 hours of programming and set up)
  • Will likely improve with more data, whether labeled or unlabeled
  • Does not require us to understand the domain
  • Could also work in tandem with hand-crafted features
  • Achieved median results on all ground truths