Your Data. Your Value.
23384: How AI Revolutionizes Data & Document Management
Florian Kuhlmann (CTO & Co-founder)
Your Data. Your Value. Florian Kuhlmann (CTO & Co-founder) - - PowerPoint PPT Presentation
23384: How AI Revolutionizes Data & Document Management Your Data. Your Value. Florian Kuhlmann (CTO & Co-founder) LEVERTON - Company overview 100+ 20+ 100% Data Transparency Global Corporate Clients Languages supported 4+ 75+
23384: How AI Revolutionizes Data & Document Management
Florian Kuhlmann (CTO & Co-founder)
Data Transparency Global Corporate Clients
Years using Deep Learning Technology
Employees in offices in Berlin, London and New York Data Security – ISO 27001 / 9001
Languages supported
Document Management
Data Management Manual data input Inefficient Non-transparent Error prone Use AI to classify documents and extract data Control using efficient 4-eyes principal Keep link to position in document
Deep Learning
Upload scanned PDFs Automated extraction Review and correct
Document Classification – categorize documents Information Extraction – extract relevant data OCR – convert images into machine readable text
DATA CORE
Train
(Wikipedia)
Amount DECIMAL
LIST (Currencies)
Period LIST (Periods) Start date DATE Rent charge
Amount Period Start date Rent charge
From 1 October 2006 to 31 December 2010: The rent (inclusive of management fees and air-conditioning charges during normal office hours) for the period from the Commencement Date to the Expiration Date of the said term shall be USD ONE THOUSAND EIGHT HUNDRED AND THIRTY FOUR (USD 1,834.00) per calendar month.
2006-10-01 1.834,00 USD monthly
Rule based
Extraction based on hand written rules 1. Apple is bad, if it has brown spots 2. Apple is bad, if it has wrinkles 3. Apple is bad, if it is brown 4. …
Machine & Deep Learning
Learn from examples Bad apple Bad apple Good apple
Rule: Phone1 (({Token.string=="Telefon“}) | ({Token.string=="Tel“}({Token.string=="."})?) | ({Token.string=="Fax“})) ({Token.kind==punctuation})? ({Token.string=="("})? ({Token.numtype==ordinal})[1,3] ({Token.string==")"}|{Token.string=="/"})? ({Token.numtype==ordinal})[1,6] ({Token.string=="-"}{Token.numtype==ordinal})? :contact
:contact.Phone = {rule = "Phone"}
yes no
… 0.023 0.121 3.421 0.223 …
Rule based Machine & Deep Learning
PROS
CONS
PROS
CONS
understand
languages
... So what is the better choice?
à 20%-65% on 20 different data points in 1 language
à 70% to near 100% on 700+ different data points in 4 languages à This sounds good, but how can we get better?
Different layouts might need different extraction strategies
Structured Semi structured Unstructured
interpretation of extracted text leads to correct reading order
multiple steps (Visual separation, recognizing of characters and words, interpreting words to form sentences, which leads to correct separation / layout recognition)
problem at a time
solve such problems
1 2
Amount Period Start date Rent charge DATE ??? yearly 1
Peppercorn
robust on layout
Global consolidation Extract data System Integration
IFRS 16 DATA
LEVERTON can be integrated into ERP systems at customers
Real estate leases Machinery leases Car leases & more
Florian Kuhlmann CTO & Co-founder florian.kuhlmann@leverton.ai