Intelligent Indexing: A Semi-Automated, Trainable System for Field - - PowerPoint PPT Presentation

intelligent indexing a semi automated trainable system
SMART_READER_LITE
LIVE PREVIEW

Intelligent Indexing: A Semi-Automated, Trainable System for Field - - PowerPoint PPT Presentation

Intelligent Indexing: A Semi-Automated, Trainable System for Field Labeling Robert Clawson, Bill Barrett Brigham Young University What is indexing? Currently Our Research 5-10 times faster More accurate More enjoyable Word


slide-1
SLIDE 1

Intelligent Indexing: A Semi-Automated, Trainable System for Field Labeling

Robert Clawson, Bill Barrett Brigham Young University

slide-2
SLIDE 2

What is indexing?

slide-3
SLIDE 3

Currently…

slide-4
SLIDE 4

 5-10 times faster  More accurate  More enjoyable

Our Research

slide-5
SLIDE 5

 https://www.youtube.com/watch?v=eBQjHgejchA

Word Morphing

slide-6
SLIDE 6

 How can handwriting recognition be used to improve indexing?

Question

slide-7
SLIDE 7

 Machine Learning Approach  Split training/testset  1920 Utah Census  About50000 fjelds per category  Accuracy~80%

  • Errorrate too high
  • Would require lotsof corrections after the fact

Possible Methods

slide-8
SLIDE 8

 Pre-clustering  Cost Matrix for each category/document

Possible Methods

Relationship to headof household

20.2 15.9 11.4 8.0 9.3 3.2

slide-9
SLIDE 9

 Pre-clustering

Possible Methods

slide-10
SLIDE 10

 Pre-clustering  Problems

  • Still makes frequent errors,
  • Indexer has to scan up and down the page to look for mistakes
  • How to show clusters when there are many difgerent words in the column?

Possible Methods

slide-11
SLIDE 11

 Interactive training: learn as you go, correct as yougo  Indexer drives  T

raining set can startempty, quick ramp up

 Still use cost matrix

  • Switch from per document to per enumerator

 Introduce threshold

  • Allfjelds that match under a threshold are labeled
  • Learn thethreshold

 Demo

Breakthrough

slide-12
SLIDE 12

Training Set

slide-13
SLIDE 13

Training Set

slide-14
SLIDE 14

Training Set

slide-15
SLIDE 15

 We need volunteers to test Intelligent Indexing  T

  • volunteer: email me (Robert Clawson) at:

You can help

intelligentindexing@gmail.com