SLIDE 1
Intelligent Indexing: A Semi-Automated, Trainable System for Field - - PowerPoint PPT Presentation
Intelligent Indexing: A Semi-Automated, Trainable System for Field - - PowerPoint PPT Presentation
Intelligent Indexing: A Semi-Automated, Trainable System for Field Labeling Robert Clawson, Bill Barrett Brigham Young University What is indexing? Currently Our Research 5-10 times faster More accurate More enjoyable Word
SLIDE 2
SLIDE 3
Currently…
SLIDE 4
5-10 times faster More accurate More enjoyable
Our Research
SLIDE 5
https://www.youtube.com/watch?v=eBQjHgejchA
Word Morphing
SLIDE 6
How can handwriting recognition be used to improve indexing?
Question
SLIDE 7
Machine Learning Approach Split training/testset 1920 Utah Census About50000 fjelds per category Accuracy~80%
- Errorrate too high
- Would require lotsof corrections after the fact
Possible Methods
SLIDE 8
Pre-clustering Cost Matrix for each category/document
Possible Methods
Relationship to headof household
20.2 15.9 11.4 8.0 9.3 3.2
SLIDE 9
Pre-clustering
Possible Methods
SLIDE 10
Pre-clustering Problems
- Still makes frequent errors,
- Indexer has to scan up and down the page to look for mistakes
- How to show clusters when there are many difgerent words in the column?
Possible Methods
SLIDE 11
Interactive training: learn as you go, correct as yougo Indexer drives T
raining set can startempty, quick ramp up
Still use cost matrix
- Switch from per document to per enumerator
Introduce threshold
- Allfjelds that match under a threshold are labeled
- Learn thethreshold
Demo
Breakthrough
SLIDE 12
Training Set
SLIDE 13
Training Set
SLIDE 14
Training Set
SLIDE 15
We need volunteers to test Intelligent Indexing T
- volunteer: email me (Robert Clawson) at: