DOCUMENT DIGITIZATION
Rethinking it with Machine Learning
Nischal Harohalli Padmanabha QConAI SFO 2019
DOCUMENT DIGITIZATION Rethinking it with Machine Learning Nischal - - PowerPoint PPT Presentation
DOCUMENT DIGITIZATION Rethinking it with Machine Learning Nischal Harohalli Padmanabha QConAI SFO 2019 The brain sure as hell doesnt work by somebody programming in rule. - Geoffrey Hinton @nischalhp | Document Digitization | QconAI
Nischal Harohalli Padmanabha QConAI SFO 2019
@nischalhp | Document Digitization | QconAI SFO 2019
@nischalhp | Document Digitization | QconAI SFO 2019
Understanding unstructured documents and extracting semantic information to automate claims handling.
@nischalhp | Document Digitization | QconAI SFO 2019
DOCUMENT CLASSPolicy
POLICY NUMBERH 54/16 307 728
CUSTOMERRenolate GmbH 10115 Berlin
AGENTpma Insurance Broker 48149 Nurnberg
RISK DESCRIPTION / INSURED LOCATIONPrivate liability insurance comfort plus Dog liability Environmental damage insurance Employees on premises
POLICYLiability Protection
EFFECTIVE DATE OF CHANGE22.12.2016 12:00
TERMINATION22.12.2019 12:00
ANNUAL CHARGEEUR 424,63
COVERAGESPersons & property damage flat Financial losses Environmental damage basic flat EUR 3.000.000 EUR 100.000 EUR 3.000.000
TABULAR INFORMATION EXTRACTION
@nischalhp | Document Digitization | QconAI SFO 2019
Writing a lot of rules
COURSE OF ACTION - ROUND 1
Initial results, gave us a lot of happiness. Evaluation on known Data
@nischalhp | Document Digitization | QconAI SFO 2019
In production 58% accuracy
@nischalhp | Document Digitization | QconAI SFO 2019
We failed, miserably. Rules became cumbersome & brittle.
In production 58% accuracy
@nischalhp | Document Digitization | QconAI SFO 2019
Life or death situation for the project (and us engineers)
@nischalhp | Document Digitization | QconAI SFO 2019
ADAPTIVE LEARNING THOUGHT PROCESS
How does a human solve the same problem? Identifies Grouping of Text, to build Context Eg: Tables, paragraphs, passages Given the context, domain knowledge and semantic understanding of text
@nischalhp | Document Digitization | QconAI SFO 2019
@nischalhp | Document Digitization | QconAI SFO 2019
TECH STACK CHECK
@nischalhp | Document Digitization | QconAI SFO 2019
Which algorithms to use? What should we feed as input to the algorithm? What to annotate? What are our deadlines? Human and computation resources required? How to agile this?
@nischalhp | Document Digitization | QconAI SFO 2019
Which algorithms to use?
COURSE OF ACTION - ROUND 2
Supervised Learning Unsupervised Learning Computer Vision NLP Computer Vision NLP
Using this technique to generate data for supervised training. Wrote implementations of Deep clustering, word / sentence / page / document embeddings
domain adaptation
EMPHASIS ON SUPERVISED LEARNING
@nischalhp | Document Digitization | QconAI SFO 2019
@nischalhp | Document Digitization | QconAI SFO 2019
Computer Vision NLP
Complex annotation of passages, phrases, tables, line items, hierarchy nature of textual information
What should we feed as input to the algorithm? What to annotate?
Built an in house Annotation System
COURSE OF ACTION - ROUND 2
Workflows support huge annotation jobs
@nischalhp | Document Digitization | QconAI SFO 2019
Human and computation resources required? Data Scientists Engineers
Leadership & Mentors Cloud startup programmes
COURSE OF ACTION - ROUND 2
@nischalhp | Document Digitization | QconAI SFO 2019
What are our Deadlines? How to agile this? Sprint Planning for Research Quick turn around of POC Engineer AI systems to run experiments in a systematic and automated way
COURSE OF ACTION - ROUND 2
@nischalhp | Document Digitization | QconAI SFO 2019
In production 94% accuracy Successful AI delivery
@nischalhp | Document Digitization | QconAI SFO 2019
TECH STACK CHECK
@nischalhp | Document Digitization | QconAI SFO 2019
Trained Models Predict
AI IN PRODUCTION
Human in the loop, fixes the errors and validates corrections Train on the corrections, Continuous improvements
@nischalhp | Document Digitization | QconAI SFO 2019
DO NOT IGNORE
Domain Knowledge is essential Educate your customers on AI Engineer end to end AI systems to solve business use case, not a dataset
@nischalhp | Document Digitization | QconAI SFO 2019
PLATFORM
Training Platform Prediction Platform with human in the loop Management Console
Applications & Users
@nischalhp | Document Digitization | QconAI SFO 2019
Training Platform
COURSE OF ACTION - ROUND 3
Annotation System Ability to train and evaluate models
Mechanism and system to trigger training, retraining of evaluation and versioning of different types models, in a managed way across various infrastructures supporting CPU and GPU System to define data models, annotate data, manage annotation jobs, audit the annotated data and version control the datasets]
Console connecting the two together
@nischalhp | Document Digitization | QconAI SFO 2019
COURSE OF ACTION - ROUND 3
Async API for Ingestion Data Pipelines
Robust data pipelines connecting the services with providing capabilities of high throughput, reliability and retry mechanisms. Rest API that supports asynchronous data upload capabilities ]
Prediction console connects all.
Prediction Platform with human in the loop Validation UI AI microservices
User interface to fix prediction errors Scaling deep learning models as microservices
@nischalhp | Document Digitization | QconAI SFO 2019
Management Console
Applications & Users
COURSE OF ACTION - ROUND 3
Configuration management Application logs
Monitoring logs of applications and setting up dashboards for internal and external stakeholders Central management of configuration of various systems, consoles and services ]
Management and monitoring console
User management Infrastructure logs
Managing users and providing authentication and authorisation capabilities for services. Monitoring infrastructure usage and patterns to setup alerts and notifications
@nischalhp | Document Digitization | QconAI SFO 2019
TECH STACK CHECK
@nischalhp | Document Digitization | QconAI SFO 2019
@nischalhp | Document Digitization | QconAI SFO 2019
hammer for all.
certain problems well. Use them wisely.
research.