Keeping Informed: Automatic Processing of Residual Functional Capacity Form Images
JULIA PORCINO AND CHUNXIAO ZHOU HIP’19 SEPTEMBER 20-21, 2019
Automatic Processing of Residual Functional Capacity Form Images - - PowerPoint PPT Presentation
Keeping Informed: Automatic Processing of Residual Functional Capacity Form Images JULIA PORCINO AND CHUNXIAO ZHOU HIP19 SEPTEMBER 20-21, 2019 Acknowledgements This research was supported by the Intramural Research Program of the National
JULIA PORCINO AND CHUNXIAO ZHOU HIP’19 SEPTEMBER 20-21, 2019
This research was supported by the Intramural Research Program
Administration All opinions expressed here are the authors and not those of the US government. We have no conflicts of interest to disclose.
Disability Programs:
Adjudication Process:
evidence
processing data
0.00 2.00 4.00 6.00 8.00 10.00 12.00 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Number of Beneficiaries (Millions)
Total Disabled Workers Spouses Children
SSA Office of the Chief Actuary: https://www.ssa.gov/oact/STATS/DIbenies.html
Function as relates to work
Why are we interested in historical RFC Forms?
Millions of paper forms
SSA stores all documents as TIF images
RFC forms come from templates that can be edited
Number of checkboxes per section:
Sections per page:
Section Spans Two Pages:
Distance between rows and columns:
Steps:
➢ Checkbox Detection ➢ Checkbox Matching
➢ Templates ➢ Template Matching Algorithm
➢ Record Output
Use python’s OpenCV to detect checkboxes based on size and shape Ratio of black and white pixels at center of checkbox indicates marked checkboxes
Checkbox Position:
𝑗, 𝑑𝑗
Checkbox Alignment:
𝑘
𝑗 = 𝑠 𝑘
RCC when no break occurs:
Before: [(1,1), (2,2), (2,3), (3,2), (3,3)] After: {}
RCC when break occurs after 1st row:
Before: [(1,1)] After: [(1,1), (1,2), (2,1), (2,2)]
RCC when break occurs after 2nd row:
Before: [(1,1), (2,2), (2,3)] After: [(1,1), (1,2)]
3 Types of Templates:
to match form
sections
breaks
File Name Environmental Limitations Extreme Cold Extreme Heat Wetness Humidity SAMPLE Avoid Concentrated Unlimited Avoid Concentrated Unlimited
SAMPLE.tif:
TASK PURPOSE PHYSICAL RFCs* MENTAL RFCs*
Validation Evaluate templates and matching algorithm performance against
10000 5000 Comparison Evaluate template matching (RCC) against location matching (Euclidean) 4914 2364 Sample Generation Perform data entry for entire sample 497646 98408 *Refers to number of images in sample
Performance across 3 tasks for Physical RFC (PRFC) and Mental RFC (MRFC) Comparison of Template vs. Location Matching
Recall Errors:
Precision Errors:
Checkbox Identification:
Checkbox Matching:
Generalization:
Successfully used novel templates to extract checkbox data Good performance comes from specificity of task and strong assumptions
Able to achieve good performance with basic computer vision
identification or may be necessary for other applications (e.g., medical records)
Contact Information: julia.porcino@nih.gov