Document Management using Protg Henrik Eriksson Linkping - - PowerPoint PPT Presentation

document management using prot g
SMART_READER_LITE
LIVE PREVIEW

Document Management using Protg Henrik Eriksson Linkping - - PowerPoint PPT Presentation

Document Management using Protg Henrik Eriksson Linkping University Approach: Semantic Documents Combine documents with knowledge representation Like semantic web, but for real documents Semantic Documents


slide-1
SLIDE 1

Document Management using Protégé

Henrik Eriksson Linköping University

slide-2
SLIDE 2

Approach: Semantic Documents

  • Combine documents with knowledge representation
  • Like semantic web, but for “real” documents
  • Semantic Documents
  • Printable electronic documents
  • Knowledge representation: Ontologies, workflows, and rules
  • An integrated format that keeps textual and computer-based

guidelines together

  • Based on wide-spread document formats
  • Currently supported format: PDF
slide-3
SLIDE 3

Adding Additional Information to the PDF Structure

  • Ontologies inside PDF

documents

  • OWL-based metadata

XMP

Document Root/Catalog Pages Outlines Metadata Contents XMP OWLMetadata OWL

Added OWL statements

slide-4
SLIDE 4

PDFTab: Annotation Tool for Protégé

Protégé Adobe Acrobat (PDF)

Annotation tool

slide-5
SLIDE 5

Tool Architecture

Protégé Adobe Acrobat

Protégé extensions PDFTab extension

Acrobat extension

slide-6
SLIDE 6

Corresponding Ontology

slide-7
SLIDE 7

Document Mark Up

slide-8
SLIDE 8

Annotation Process

slide-9
SLIDE 9

Document-centric Annotation Framework

slide-10
SLIDE 10

Ontology Structure

  • Linking documents and ontologies
  • Standard ontology structure
  • Annotation ontology
  • The annotation types
  • Document ontology
  • The key document parts
  • Domain ontology
  • The “regular” ontology
slide-11
SLIDE 11

Supporting multiple documents

  • Architecture with multiple ontologies and ontology

modules

slide-12
SLIDE 12

Case Study: Document Repository in Protégé

  • Document data set
  • All statistics reports (PDF) published by Statistics Sweden in

2006

  • Five volumes of Statistical Yearbook (2002–2006)
  • Method
  • Document acquisition
  • Ontology development
  • Automated annotation (through annotator program)
  • Number of automatically-annotated documents: 302
  • Total number of annotations for these documents: 17,470
slide-13
SLIDE 13

Statistics Reports Loaded in Protégé

slide-14
SLIDE 14

Discussion

  • Scalability issues
  • Beyond hundreds of documents
  • Too many ontologies for the current Protégé implementation
  • How can we scale to thousands or millions of documents
  • Vision: Repository storage backend
  • Possibly backend based on a document-repository database

(e.g., Dspace)

  • Normal document services and semantic services
slide-15
SLIDE 15

Summary

  • Semantic Documents
  • Protégé — a platform for document management
  • Ontologies as model document repositories
  • Furthermore, ontologies can act as document

repositories

  • However, large document sets will require a custom-

tailored database backend