Support for Semantic Documents in Protg Henrik Eriksson Linkping - - PowerPoint PPT Presentation

support for semantic documents in prot g
SMART_READER_LITE
LIVE PREVIEW

Support for Semantic Documents in Protg Henrik Eriksson Linkping - - PowerPoint PPT Presentation

Support for Semantic Documents in Protg Henrik Eriksson Linkping University Semantic Documents Combining documents with knowledge representation Like semantic web, but for real documents Problem: Large amounts of


slide-1
SLIDE 1

Support for Semantic Documents in Protégé

Henrik Eriksson Linköping University

slide-2
SLIDE 2

2

2005-07-19 2

Semantic Documents

  • Combining documents with knowledge representation
  • Like semantic web, but for “real” documents
  • Problem: Large amounts of information is available

electronically, but it is

  • difficult to find the right information when the search query is

complex, and

  • difficult to navigate content-rich information.
  • Goal
  • Semantic description of document content (i.e., a meta-model for

documents)

  • Support for systematic authoring of complex electronic documents
  • Adding support for PDF to Protégé – a PDF tab for Protégé
slide-3
SLIDE 3

3

2005-07-19 3

One Document—Many Applications

Semantic document

Printing On-screen viewing Workflow-based decision support Reasoning Consistency check Semantic search

SAGE Diabetes Guideline

Metadata: Guideline logic

slide-4
SLIDE 4

4

2005-07-19 4

Semantic Documents

  • Knowledge representation
  • Semantic web: OWL
  • Ontologies
  • Document models
  • Adobe’s Portable Document

Format (PDF)

  • Extensible Metadata Platform

(XMP)

  • MS Word, RTF (?)
  • Functions
  • Semantic search based on

metadata

  • Reasoning, inference

XMP markup

Semantic search Report publication database

XMP markup

XMP markup Statistics documents (PDF) Document retrieval Functions Reasoning engine

slide-5
SLIDE 5

5

2005-07-19 5

The “Secrets” of the Portable Document Format (PDF)

  • Open and documented format
  • PDF files contain something

like a file system

  • Indexing for fast random access
  • Like the .doc format of MS Word
  • Extendible file layout
  • Custom additions
  • Different object and streams

with support for text, binary data, compression, and encryption

Document (PDF)

Pages Metadata Objects Streams Index (xref)

slide-6
SLIDE 6

6

2005-07-19 6

Internal PDF Structure

Document Root/Catalog Pages Outlines Metadata Contents XMP

slide-7
SLIDE 7

7

2005-07-19 7

Adding Additional Information to the PDF Structure

  • OWL-based metadata

Document (PDF)

XMP

Pages Knowledge base (OWL)

Document Root/Catalog Pages Outlines Metadata Contents XMP OWLMetadata OWL

Added OWL statements

slide-8
SLIDE 8

8

2005-07-19 8

Annotations

  • Relates document text to OWL individuals

Document Root/Catalog Pages Outlines Metadata Contents XMP OWLMetadata OWL Annotations

slide-9
SLIDE 9

9

2005-07-19 9

A Protégé Extension for PDF

  • Adobe Acrobat runs inside a Protégé tab

protégé

PDFTab Acrobat extension Adobe Acrobat List of documents Control buttons

slide-10
SLIDE 10

10

2005-07-19 10

PDFTab: Annotation Tool for Protégé

Protégé Adobe Acrobat (PDF)

Annotation tool

slide-11
SLIDE 11

11

2005-07-19 11

Corresponding Ontology

slide-12
SLIDE 12

12

2005-07-19 12

Mark up of Table Headings

slide-13
SLIDE 13

13

2005-07-19 13

Word processor Protégé/PDFTab User-interface front-end Inference or search engine Domain ontology Document ontology Semantic documents (PDF)

A Semantic Document Architecture for Knowledge Management

slide-14
SLIDE 14

14

2005-07-19 14

Document Production Process

  • Basic idea: Tool support for the entire chain
  • Knowledge-management approach
  • Metadata is kept throughout the process
  • Support for annotation (tagging) based on data sources, including

metadata

Knowledge source Meta data Data Editing Publication Authoring Analysis Semantic mark-up

slide-15
SLIDE 15

15

2005-07-19 15

Application Areas

  • Statistics
  • Annotation of statistics reports
  • Highly structured documents with tables and diagram
  • Report series (e.g., quarterly and annual reports)
  • Collaboration with Statistics Sweden (SCB)
  • Clinical guidelines
  • Generation of documentation from SAGE knowledge bases
  • Highly structured documents with graphs and cross links
  • Target: Guideline documents in PDF complete with annotations
  • Collaboration with Samson Tu, Stanford University
  • Document search
  • Searching text and metadata
  • Different levels of search
  • Test case: Statistics reports
slide-16
SLIDE 16

16

2005-07-19 16

Statistics Reports as Semantic Documents

  • Statistics Reports
  • Statistical Yearbook of Sweden (784 pages)
  • Manual and (semi-)automated annotation
  • Statistical metadata available
  • Development of relevant ontologies
  • Annotation ontology
  • Document ontology
  • Macro data ontology
  • Domain ontology
  • In general, an ontology of the entire country!
  • Interesting idea: Use annotation of the previous document

edition as the starting point

slide-17
SLIDE 17

17

2005-07-19 17

Mark-up of Statistical Yearbook

slide-18
SLIDE 18

18

2005-07-19 18

Statistics Ontologies in Protégé

slide-19
SLIDE 19

19

2005-07-19 19

Document and Domain Modeling

Document TextAnnotation Table NumberOf Person PDF annotation Annotation

  • ntology

Document

  • ntology

Domain ontology Acrobat Protégé

slide-20
SLIDE 20

20

2005-07-19 20

Questions to the OWL Experts…

  • 1. How would you model thinks like:
  • “Asylum applicants, rejections at border and persons granted

residence permits as refugees or similar, by basis of residence permit,” or

  • “Number of divorces in each marriage cohort by number of years

since marriage”?

  • 2. How you then search for this information?
slide-21
SLIDE 21

21

2005-07-19 21

Clinical Guidelines as Semantic Documents

  • Experiments with SAGE clinical guideline knowledge

bases in collaboration with Samson Tu

  • SAGE uses knowledge bases to store authoritative

guidelines

  • Uses of the knowledge bases
  • Inference
  • Workflow engines
  • Generation of guideline documentation (XML, HTML, and PDF)
  • Goal: Semantic document with the knowledge base
  • PDF file with annotations and embedded SAGE knowledge base
slide-22
SLIDE 22

22

2005-07-19 22

Document Generation from XML

  • Generation of guideline documentation in PDF

Guideline KB (e.g., SAGE) HTML XML XSL-FO Metadata Guideline document (PDF) Protégé FOP XSLT XSLT Converter

slide-23
SLIDE 23

23

2005-07-19 23

The Resulting Guideline Document

slide-24
SLIDE 24

26

2005-07-19 26

Summary

  • Semantic documents
  • An approach to combining printable documents with ontologies

and knowledge bases

  • Combined documentation (human-readable) and reasoning

(machine-readable)

  • One document with several applications
  • Tool support: PDFTab
  • Creation of semantic documents
  • Support for document annotation
  • Editing of ontologies and knowledge bases stored in PDF files