Zoning Tabular Documents Heath Nielson and William Barrett - - PowerPoint PPT Presentation

zoning tabular documents
SMART_READER_LITE
LIVE PREVIEW

Zoning Tabular Documents Heath Nielson and William Barrett - - PowerPoint PPT Presentation

Zoning Tabular Documents Heath Nielson and William Barrett Motivation Move granularity of indexing from image level to field level Search or browse through fields rather than images Let the computer perform the repetitive task of


slide-1
SLIDE 1

Zoning Tabular Documents

Heath Nielson and William Barrett

slide-2
SLIDE 2

Motivation

  • Move granularity of indexing from image

level to field level

– Search or browse through fields rather than images

  • Let the computer perform the repetitive task
  • f finding regions within a document and

determining content of those regions

slide-3
SLIDE 3

Processing Pipeline

Cropping

slide-4
SLIDE 4

Processing Pipeline

Zoning

slide-5
SLIDE 5

Processing Pipeline

Recognition

NAME and Surname of each Person Michael Morrison Mary J. Ellen

slide-6
SLIDE 6

Zoning Tabular Documents

slide-7
SLIDE 7

Profiles

  • Horizontal profile
  • Vertical profile

=

=

M i h

y i image y p ) , ( ) (

=

=

N i v

i x image x p ) , ( ) (

slide-8
SLIDE 8
slide-9
SLIDE 9

Matched Filter

Creation

  • Get 3 samples from the profile containing the

highest “peak”

  • Determine the number of points on either side of

the “peak” to establish the size of the filter

  • Set the value at each point in the filter to the

average value from the corresponding points in the 3 samples

  • Compute the average value from each of the filter’s

points and subtract that amount from each point in the filter

slide-10
SLIDE 10
slide-11
SLIDE 11

Geometric Layout

  • Split the document into its

component parts, representing similar geometric layouts:

  • Header
  • Body
  • Footer
slide-12
SLIDE 12

Body Identification

  • Exploit the periodicity of the rows
  • Compute
  • Identify first peak (lowest frequency)
  • Compute
  • Identify lines using the 2-prong probe

)) ( ( ) ( y p s P

h h

ℑ = f ps w / =

slide-13
SLIDE 13

Horizontal Profile Amplitude Spectrum

Body Identification

Lowest peak frequency

slide-14
SLIDE 14

Body Identification

2-Prong Probe

∑ ∑

+ − = + − =

− + =

δ δ δ δ i i j h i i j h

w j p j p i C ) ( ) ( ) (

Filtered Profile Output Profile

slide-15
SLIDE 15

Body Line Classification

Intra-document Consensus

Row Candidates

Green row identified as false positive

slide-16
SLIDE 16

Initial Pass

slide-17
SLIDE 17

Image “Snapping”

  • For each line segment in a row or column

– Generate a profile over the segment’s area – Calculate line strength – “Snap” to the location with the largest value

) ( ) ( 1 1 ) ( i f ls gp i i ls

g l

+ − =

slide-18
SLIDE 18

Image “Snapping”

slide-19
SLIDE 19

False Positive Identification

  • Generate a profile

perpendicular to the line segment

  • Line profiles have low

variance

  • Text profiles have high

variance

slide-20
SLIDE 20

Edge Variance

Variance Edges

slide-21
SLIDE 21

Document Template Creation

Inter-document Consensus

  • Combine meshes generated from several

documents

  • Vote on line positions
  • Discard line segments with a low vote count
slide-22
SLIDE 22

Document Templates

slide-23
SLIDE 23

Template to Image Registration

  • Identify the document’s body within the

image

  • Position the template to the corresponding

location

  • Locally snap each line segment to the image
slide-24
SLIDE 24

Template to Image Registration

slide-25
SLIDE 25

Classification

Machine Printed Text

slide-26
SLIDE 26

Classification

Handwriting

slide-27
SLIDE 27

Zoned Image

slide-28
SLIDE 28

Future Work

  • Implement mesh-to-mesh registration
  • Classification through the use of document

templates