Module 8 Using ABBYY: Practice Uwe Springmann Centrum fr - - PowerPoint PPT Presentation

module 8 using abbyy practice
SMART_READER_LITE
LIVE PREVIEW

Module 8 Using ABBYY: Practice Uwe Springmann Centrum fr - - PowerPoint PPT Presentation

Module 8 Using ABBYY: Practice Uwe Springmann Centrum fr Informations- und Sprachverarbeitung (CIS) Ludwig-Maximilians-Universitt Mnchen (LMU) 2015-09-15 Uwe Springmann Module 8 Using ABBYY: Practice 2015-09-15 1 / 4 Practice


slide-1
SLIDE 1

Module 8 Using ABBYY: Practice

Uwe Springmann

Centrum fýr Informations- und Sprachverarbeitung (CIS) Ludwig-Maximilians-Universität München (LMU)

2015-09-15

Uwe Springmann Module 8 Using ABBYY: Practice 2015-09-15 1 / 4

slide-2
SLIDE 2

Practice session: Overview

register for a fsee ABBYY developer account:

register with name of your app (make up a project name) enter a Cloud OCR SDK promo code promo code: (ask instructor) ⒈000 pages valid until 15 of January, 2016 (thanks to Michael Fuchs of ABBYY Deutschland)

adapt a script for the Cloud OCR service OCR some sample images with various output formats

Uwe Springmann Module 8 Using ABBYY: Practice 2015-09-15 2 / 4

slide-3
SLIDE 3

Adapt the script

download the data for Module 8 to your laptop insert your data into the script cloud_recognize.sh:

ApplicationId=”” Password=””

  • pen a terminal and run the following command fsom your data directory:

./cloud_recognize.sh

you will see the following output:

ABBYY Cloud OCR SDK demo recognition script Invalid arguments. Usage: ./cloud_recognize.sh <input> <output> [-f output_format] [-l language] \ [-t typeface]

  • utput_format: txt|rtf|docx|xlsx|pptx|pdfSearchable|pdfTextAndImages|xml

typeface: normal (default), gothic Some language examples: English (default), Russian, ChinesePRC, \ German, OldGerman etc. For full list see ocrsdk documentation

Uwe Springmann Module 8 Using ABBYY: Practice 2015-09-15 3 / 4

slide-4
SLIDE 4

Recognize some page images

the downloaded data contain the following images:

goethe.tif (Goethe 1809, Wahlverwandtschafuen) grenzboten.tif (Grenzboten 1841) latin.tif (Hobbes 1668, Leviathan) greek.tif (Zonaras 1870, Epitome)

OCR some of the images:

default options: output_format=txt, language=english, typeface=normal

./cloud_recognize.sh goethe.tif goethe.txt -l oldgerman -t gothic ./cloud_recognize.sh latin.tif latin.txt -l latin ./cloud_recognize.sh greek.tif greek.txt -l greek

prepare a searchable pdf compare with ground truth, e.g.

  • crevalutf8 accuracy somefile.gt.txt somefile.txt | more

Uwe Springmann Module 8 Using ABBYY: Practice 2015-09-15 4 / 4