LOC-DB Reference Extraction
DR. DR.-ING SHERAZ AHMED SYED TA TAHSEEN RAZA RIZVI
LOC-DB Reference Extraction DR. DR.-ING SHERAZ AHMED SYED TA - - PowerPoint PPT Presentation
LOC-DB Reference Extraction DR. DR.-ING SHERAZ AHMED SYED TA TAHSEEN RAZA RIZVI LOC-DB Architecture 2 LOC-DB OCR Component Types of Input files: Digital Born PDF Scanned Documents XML/HTML XML File Scanned Document Textual PDF
DR. DR.-ING SHERAZ AHMED SYED TA TAHSEEN RAZA RIZVI
2
3
XML File Textual PDF Scanned Document
4
5
Binary (0-1)
RGB Image Binary Image
6
Single Column Document Double Column Document
Single Column Documents Double Column Documents
7
OCR Result
8
9
10
Textual PDF Extracted Text
11
12
13
14
Scanned Documents Textual PDFs Structured XML
15
Binarization Image Classification OCR Text Extraction Reference Segmentation Pre-Processing
16
17
18
19
ParsCit Output DeepBibX Output
images:
20
1000 2000 3000 4000 5000 6000 ParsCit FCN based approach
Extraction Comparison
Total References Total Detections
21