minimal domain expertise in a
play

Minimal Domain Expertise in a Financial Domain Mayank Kejriwal - PowerPoint PPT Presentation

Predicting Role Relevance with Minimal Domain Expertise in a Financial Domain Mayank Kejriwal Role Relevance Problem In the context of a text fragment, how relevant are the triples? "For the year ended December 31, 2015, billed business


  1. Predicting Role Relevance with Minimal Domain Expertise in a Financial Domain Mayank Kejriwal

  2. Role Relevance Problem In the context of a text fragment, how relevant are the triples? "For the year ended December 31, 2015, billed business from MENTIONED_FIN charge cards comprised 56 percent of total U.S. Card Services FILER_NAME ANCIAL_ENTITY ROLE billed business. Centurion Bank and American Express Bank as AMERICAN American Express Issuers of Certain Cards and Deposit Products EXPRESS CO Bank Issuers We have two U.S. bank subsidiaries, American Express AMERICAN American Express Centurion Bank (?Centurion Bank?) and American Express EXPRESS CO Centurion Bank Issuers Bank, FSB AMERICAN American Express (?American Express Bank?), which are both FDIC-insured EXPRESS CO Bank, FSB Issuers depository institutions. Certain information regarding each bank is set forth in the table below:"

  3. Domain Expertise: what do we mean? • Definitions of terms (taken from NYS-CPA) Accrued Interest: INTEREST that has accumulated between the most recent payment and the sale of a BOND or other fixed-income security. • Definitions of roles GUARANTOR : A legal arrangement involving a promise by a person (guarantor) to perform the obligations of a second person (or many persons), in the event that the latter person fails to meet their obligations. • Characteristics of the problem to be solved • Feature crafting • Ability to diagnose unexpected outputs • Many more!

  4. Motivation 1: Feature crafting is tricky!

  5. Motivation 2: Unlabeled data can be powerful...if used right • Intuitively, a source of (additional) background knowledge • What if there’s no human?

  6. Motivation 3: Impressive recent advances in NLP • Mainly due to neural networks and low-dimensional latent space modeling

  7. Case study: *2vec (word2vec, doc2vec...)

  8. Approach • Use labeled+unlabeled text data to train a vector for each ‘contextual fragment’ (after some preprocessing) "For the year ended December 31, 2015, billed business from charge cards comprised 56 percent doc3 of total U.S. Card Services billed Skip-gram billed business. Centurion Bank and American percent doc4 Express Bank as Issuers of Certain Cards and Deposit Products institutions FDIC bank doc1 We have two U.S. bank subsidiaries, American Centurion Express Centurion Bank (?Centurion Bank?) and doc2 American Express Bank, FSB (?American Express Bank?), which are both FDIC- insured depository institutions. Certain information regarding each bank is set forth in the table below:"

  9. Train role-specific classifiers • Use explicitly given role-specific relevance labels (ignore documents without labels in training) doc3 billed Random Forest percent Classifier: Affiliate doc4 institutions FDIC bank doc1 Centurion doc2 ...

  10. Training results • Consistent with (eventually released) ground truth 2 results Role Precision Recall F1-Measure affiliate 0.941176471 0.941176471 0.941176471 trustee 0.976744186 1 0.988235294 issuer 0.909090909 0.909090909 0.909090909

  11. Conclusion • Extremely simple to set up (1-2 hours of programming and set up) • Will likely improve with more data, whether labeled or unlabeled • Does not require us to understand the domain • Could also work in tandem with hand-crafted features • Achieved median results on all ground truths

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend