Semantic PDF Segmentation for Legacy Documents in Technical Documentation
Jan Oevermann
jan.oevermann@dfki.de
SEMANTiCS 2018, Vienna, 13.09.18
Semantic PDF Segmentation for Legacy Documents in Technical - - PowerPoint PPT Presentation
Semantic PDF Segmentation for Legacy Documents in Technical Documentation Jan Oevermann jan.oevermann@dfki.de SEMANTiCS 2018, Vienna, 13.09.18 Technical Documentation 2 Most common: PDF documents Digital Paper, archival &
jan.oevermann@dfki.de
SEMANTiCS 2018, Vienna, 13.09.18
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 2
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 3
Task Desc Task
Search
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 4
Faceted search
Information request with semantic concepts which can be used as facets
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 5
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 6
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 7
(VSM) Feature extraction (Bag o n-grams)
Weighting (TF-ICF-CF) Training data New data (unclassified)
Prediction
Learning phase Classification
cosine similarity/ k-nearest neighbour
Classifier
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 8
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 9
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 10 Range finding
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 11
https://iirds.org/
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 12 Metadata generation
Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 13 Application
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 14 Results
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 15
13.09.18 Jan Oevermann (DFKI), SEMANTiCS 2018, Vienna 16