SLIDE 1
Zoning Tabular Documents
Heath Nielson and William Barrett
SLIDE 2 Motivation
- Move granularity of indexing from image
level to field level
– Search or browse through fields rather than images
- Let the computer perform the repetitive task
- f finding regions within a document and
determining content of those regions
SLIDE 3 Processing Pipeline
Cropping
SLIDE 4 Processing Pipeline
Zoning
SLIDE 5 Processing Pipeline
Recognition
NAME and Surname of each Person Michael Morrison Mary J. Ellen
SLIDE 6
Zoning Tabular Documents
SLIDE 7 Profiles
- Horizontal profile
- Vertical profile
∑
=
=
M i h
y i image y p ) , ( ) (
∑
=
=
N i v
i x image x p ) , ( ) (
SLIDE 8
SLIDE 9 Matched Filter
Creation
- Get 3 samples from the profile containing the
highest “peak”
- Determine the number of points on either side of
the “peak” to establish the size of the filter
- Set the value at each point in the filter to the
average value from the corresponding points in the 3 samples
- Compute the average value from each of the filter’s
points and subtract that amount from each point in the filter
SLIDE 10
SLIDE 11 Geometric Layout
- Split the document into its
component parts, representing similar geometric layouts:
SLIDE 12 Body Identification
- Exploit the periodicity of the rows
- Compute
- Identify first peak (lowest frequency)
- Compute
- Identify lines using the 2-prong probe
)) ( ( ) ( y p s P
h h
ℑ = f ps w / =
SLIDE 13
Horizontal Profile Amplitude Spectrum
Body Identification
Lowest peak frequency
SLIDE 14 Body Identification
2-Prong Probe
∑ ∑
+ − = + − =
− + =
δ δ δ δ i i j h i i j h
w j p j p i C ) ( ) ( ) (
Filtered Profile Output Profile
SLIDE 15
Body Line Classification
Intra-document Consensus
Row Candidates
Green row identified as false positive
SLIDE 16
Initial Pass
SLIDE 17 Image “Snapping”
- For each line segment in a row or column
– Generate a profile over the segment’s area – Calculate line strength – “Snap” to the location with the largest value
) ( ) ( 1 1 ) ( i f ls gp i i ls
g l
+ − =
SLIDE 18
Image “Snapping”
SLIDE 19 False Positive Identification
perpendicular to the line segment
variance
variance
SLIDE 20
Edge Variance
Variance Edges
SLIDE 21 Document Template Creation
Inter-document Consensus
- Combine meshes generated from several
documents
- Vote on line positions
- Discard line segments with a low vote count
SLIDE 22
Document Templates
SLIDE 23 Template to Image Registration
- Identify the document’s body within the
image
- Position the template to the corresponding
location
- Locally snap each line segment to the image
SLIDE 24
Template to Image Registration
SLIDE 25
Classification
Machine Printed Text
SLIDE 26
Classification
Handwriting
SLIDE 27
Zoned Image
SLIDE 28 Future Work
- Implement mesh-to-mesh registration
- Classification through the use of document
templates