document digitization
play

DOCUMENT DIGITIZATION Rethinking it with Machine Learning Nischal - PowerPoint PPT Presentation

DOCUMENT DIGITIZATION Rethinking it with Machine Learning Nischal Harohalli Padmanabha QConAI SFO 2019 The brain sure as hell doesnt work by somebody programming in rule. - Geoffrey Hinton @nischalhp | Document Digitization | QconAI


  1. DOCUMENT DIGITIZATION Rethinking it with Machine Learning Nischal Harohalli Padmanabha QConAI SFO 2019

  2. “The brain sure as hell doesn’t work by somebody programming in rule.” - Geoffrey Hinton @nischalhp | Document Digitization | QconAI SFO 2019

  3. PROBLEM Understanding unstructured documents and extracting semantic information to automate claims handling. @nischalhp | Document Digitization | QconAI SFO 2019

  4. DOCUMENT CLASS Policy POLICY NUMBER POLICY H 54/16 307 728 Liability Protection CUSTOMER AGENT Renolate GmbH pma Insurance Broker 10115 Berlin 48149 Nurnberg EFFECTIVE DATE OF CHANGE TERMINATION ANNUAL CHARGE 22.12.2016 12:00 22.12.2019 12:00 EUR 424,63 COVERAGES Persons & property damage flat EUR 3.000.000 Financial losses EUR 100.000 Environmental damage basic flat EUR 3.000.000 RISK DESCRIPTION / INSURED LOCATION Private liability insurance comfort plus Dog liability Environmental damage insurance Employees on premises @nischalhp | Document Digitization | QconAI SFO 2019

  5. REWIND

  6. TABULAR INFORMATION EXTRACTION

  7. COURSE OF ACTION - ROUND 1 Writing a Evaluation on known Initial results, gave us a lot of happiness. lot of rules Data @nischalhp | Document Digitization | QconAI SFO 2019

  8. In production 58% accuracy RESULT @nischalhp | Document Digitization | QconAI SFO 2019

  9. In production 58% accuracy RESULT We failed, miserably . Rules became cumbersome & brittle. @nischalhp | Document Digitization | QconAI SFO 2019

  10. Life or death situation for the project (and us engineers) @nischalhp | Document Digitization | QconAI SFO 2019

  11. ADAPTIVE LEARNING THOUGHT PROCESS How does a human Identifies Grouping of Text, to build solve the same Context problem? Eg: Tables, paragraphs, passages Given the context, domain knowledge and semantic understanding of text @nischalhp | Document Digitization | QconAI SFO 2019

  12. Sounds straightforward, right? @nischalhp | Document Digitization | QconAI SFO 2019

  13. TECH STACK CHECK @nischalhp | Document Digitization | QconAI SFO 2019

  14. NEXT STEPS

  15. What are our deadlines? Which algorithms to use? How to agile this? What should we feed as Human and computation input to the algorithm? resources required? What to annotate? @nischalhp | Document Digitization | QconAI SFO 2019

  16. COURSE OF ACTION - ROUND 2 Object detection ● Computer Messaging parsing networks ● Custom CNN networks ● Vision Supervised Learning Implementation of Deep Topic modeling ● NLP Custom RNN + CNN networks with ● domain adaptation Computer Using this technique to generate data for Vision supervised training. Wrote implementations of Unsupervised Deep clustering, word / sentence / page / Learning document embeddings Which algorithms NLP to use? @nischalhp | Document Digitization | QconAI SFO 2019

  17. EMPHASIS ON SUPERVISED LEARNING @nischalhp | Document Digitization | QconAI SFO 2019

  18. COURSE OF ACTION - ROUND 2 ] Built an in house Drawing polygon bounding boxes ● Computer Labeling pages ● Annotation System Labeling documents ● Vision Workflows support Complex annotation of passages, NLP phrases, tables, line items, hierarchy huge annotation jobs nature of textual information What should we feed as input to the algorithm? What to annotate? @nischalhp | Document Digitization | QconAI SFO 2019

  19. COURSE OF ACTION - ROUND 2 Data Scientists from Academia ● Data Deep learning engineers ● Research programme with Universities ● Scientists Master Thesis sponsorship at omni:us ● Full stack engineers ● Engineers Data Engineers ● Devops ● Team leads with experience in AI ● Identifying and convincing industry experts to mentor Leadership & ● Devops ● Mentors Human and computation Cloud startup Credits to support memory and GPU training algorithms resources required? ● Mentoring to scale operations ● programmes @nischalhp | Document Digitization | QconAI SFO 2019

  20. COURSE OF ACTION - ROUND 2 What are our Deadlines? Sprint Quick turn Engineer AI systems to run Planning for around of experiments in a systematic Research POC and automated way How to agile this? @nischalhp | Document Digitization | QconAI SFO 2019

  21. In production 94% accuracy RESULT Successful AI delivery @nischalhp | Document Digitization | QconAI SFO 2019

  22. TECH STACK CHECK @nischalhp | Document Digitization | QconAI SFO 2019

  23. GO LIVE OR GO HOME

  24. AI IN PRODUCTION Human in the loop, fixes Trained Models the errors and validates Predict corrections Train on the corrections, Continuous improvements @nischalhp | Document Digitization | QconAI SFO 2019

  25. DO NOT IGNORE Domain Knowledge is Educate your Engineer end to end AI essential customers on AI systems to solve business use case, not a dataset @nischalhp | Document Digitization | QconAI SFO 2019

  26. PLATFORM Training Platform Prediction Platform Management Console with human in the of Infrastructure, loop Applications & Users @nischalhp | Document Digitization | QconAI SFO 2019

  27. data and version control the datasets ] COURSE OF ACTION - ROUND 3 System to define data models, annotate data, Annotation manage annotation jobs, audit the annotated System Console connecting the two together Mechanism and system to trigger training, Ability to train retraining of evaluation and versioning of and evaluate different types models, in a managed way models across various infrastructures supporting CPU and GPU Training Platform @nischalhp | Document Digitization | QconAI SFO 2019

  28. upload capabilities ] COURSE OF ACTION - ROUND 3 Async API Rest API that supports asynchronous data for Ingestion User interface to fix prediction errors Validation UI Prediction console connects all. Scaling deep learning models as microservices AI microservices Prediction Platform with human in the Robust data pipelines connecting the services Data Pipelines loop with providing capabilities of high throughput, reliability and retry mechanisms. @nischalhp | Document Digitization | QconAI SFO 2019

  29. systems, consoles and services ] COURSE OF ACTION - ROUND 3 Configuration Central management of configuration of various management Managing users and providing authentication User management and authorisation capabilities for services. Management and monitoring Monitoring infrastructure usage and patterns to console Infrastructure logs setup alerts and notifications Management Console of Infrastructure, Monitoring logs of applications and setting up Application logs dashboards for internal and external Applications & Users stakeholders @nischalhp | Document Digitization | QconAI SFO 2019

  30. TECH STACK CHECK @nischalhp | Document Digitization | QconAI SFO 2019

  31. omni:us platform console | @nischalhp | Document Digitization | QconAI SFO 2019

  32. Learnings

  33. Learnings Very important for an entire organization to believe that AI can solve problems ● Engineer AI products, do not believe that having just AI models are good enough ● Agile for AI works, choose an interpretation that works for your team ● Pay attention to details, domain knowledge and use case to be solved. ● Combination of multiple technologies have to be used to solve use case, not just one ● hammer for all. Do not try to “AI” everything, certain matured technologies are capable of solving ● certain problems well. Use them wisely. Believe in human in the loop, builds trust with business ● Educate internal and external stakeholders around the possibilities and limitations ● of AI. Visualisation is power tool to understand and explain AI to everybody. Use them. ● AI is no more a black box, it can fine tuned, managed and configured appropriately. ● Automate your current processes as much as possible, this gives more room for ● research. @nischalhp | Document Digitization | QconAI SFO 2019

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend