support for semantic documents in prot g
play

Support for Semantic Documents in Protg Henrik Eriksson Linkping - PowerPoint PPT Presentation

Support for Semantic Documents in Protg Henrik Eriksson Linkping University Semantic Documents Combining documents with knowledge representation Like semantic web, but for real documents Problem: Large amounts of


  1. Support for Semantic Documents in Protégé Henrik Eriksson Linköping University

  2. Semantic Documents • Combining documents with knowledge representation � Like semantic web, but for “real” documents • Problem: Large amounts of information is available electronically, but it is � difficult to find the right information when the search query is complex, and � difficult to navigate content-rich information. Goal • � Semantic description of document content (i.e., a meta-model for documents) � Support for systematic authoring of complex electronic documents � Adding support for PDF to Protégé – a PDF tab for Protégé 2 2005-07-19 2

  3. One Document—Many Applications On-screen Printing viewing Semantic document Workflow-based Reasoning decision support SAGE Diabetes Guideline Metadata: Guideline logic Consistency Semantic search check 3 2005-07-19 3

  4. Semantic Documents Knowledge representation • � Semantic web: OWL � Ontologies • Document models � Document Adobe’s Portable Document retrieval Format (PDF) Statistics � documents (PDF) Extensible Metadata Platform Semantic search (XMP) XMP markup XMP markup XMP markup � MS Word, RTF (?) Reasoning engine Report publication Functions • database Functions � Semantic search based on metadata � Reasoning, inference 4 2005-07-19 4

  5. The “Secrets” of the Portable Document Format (PDF) • Open and documented format Document (PDF) PDF files contain something • like a file system Objects � Indexing for fast random access � Streams Like the .doc format of MS Word • Extendible file layout � Custom additions Metadata Pages Different object and streams • with support for text, binary Index data, compression, and (xref) encryption 5 2005-07-19 5

  6. 6 XMP Metadata Internal PDF Structure Root/Catalog Document Outlines Contents Pages 6 2005-07-19

  7. Adding Additional Information to the PDF Structure Document (PDF) OWL-based metadata • Pages XMP Knowledge base (OWL) Document Root/Catalog Pages Outlines Metadata OWLMetadata Added OWL statements Contents XMP OWL 7 2005-07-19 7

  8. Annotations Relates document text to OWL individuals • Document Root/Catalog Pages Outlines Metadata OWLMetadata Contents XMP OWL Annotations 8 2005-07-19 8

  9. A Protégé Extension for PDF Adobe Acrobat runs inside a Protégé tab • protégé PDFTab Adobe Acrobat Control buttons List of documents Acrobat extension 9 2005-07-19 9

  10. PDFTab: Annotation Tool for Protégé Annotation tool Protégé Adobe Acrobat (PDF) 10 2005-07-19 10

  11. 11 Corresponding Ontology 11 2005-07-19

  12. 12 Mark up of Table Headings 12 2005-07-19

  13. A Semantic Document Architecture for Knowledge Management Domain ontology Document ontology Protégé/PDFTab Semantic Word processor documents (PDF) Inference or search engine User-interface front-end 13 2005-07-19 13

  14. Document Production Process Basic idea: Tool support for the entire chain • � Knowledge-management approach � Metadata is kept throughout the process � Support for annotation (tagging) based on data sources, including metadata Knowledge Analysis Authoring Editing Publication source Semantic Meta data mark-up Data 14 2005-07-19 14

  15. Application Areas • Statistics � Annotation of statistics reports � Highly structured documents with tables and diagram � Report series (e.g., quarterly and annual reports) � Collaboration with Statistics Sweden (SCB) • Clinical guidelines � Generation of documentation from SAGE knowledge bases � Highly structured documents with graphs and cross links � Target: Guideline documents in PDF complete with annotations � Collaboration with Samson Tu, Stanford University • Document search � Searching text and metadata � Different levels of search � Test case: Statistics reports 15 2005-07-19 15

  16. Statistics Reports as Semantic Documents • Statistics Reports • Statistical Yearbook of Sweden (784 pages) • Manual and (semi-)automated annotation Statistical metadata available • Development of relevant ontologies • � Annotation ontology � Document ontology � Macro data ontology � Domain ontology • In general, an ontology of the entire country! • Interesting idea: Use annotation of the previous document edition as the starting point 16 2005-07-19 16

  17. 17 Mark-up of Statistical Yearbook 17 2005-07-19

  18. 18 Statistics Ontologies in Protégé 18 2005-07-19

  19. Document and Domain Modeling Document TextAnnotation Table NumberOf Person Annotation Document PDF annotation Domain ontology ontology ontology Acrobat Protégé 19 2005-07-19 19

  20. Questions to the OWL Experts… 1. How would you model thinks like: � “Asylum applicants, rejections at border and persons granted residence permits as refugees or similar, by basis of residence permit,” or � “Number of divorces in each marriage cohort by number of years since marriage”? 2. How you then search for this information? 20 2005-07-19 20

  21. Clinical Guidelines as Semantic Documents Experiments with SAGE clinical guideline knowledge • bases in collaboration with Samson Tu • SAGE uses knowledge bases to store authoritative guidelines Uses of the knowledge bases • � Inference � Workflow engines � Generation of guideline documentation (XML, HTML, and PDF) • Goal: Semantic document with the knowledge base � PDF file with annotations and embedded SAGE knowledge base 21 2005-07-19 21

  22. Document Generation from XML Generation of guideline documentation in PDF • Protégé HTML XSLT Converter Guideline KB XML (e.g., SAGE) Guideline XSLT document XSL-FO (PDF) FOP Metadata 22 2005-07-19 22

  23. 23 The Resulting Guideline Document 23 2005-07-19

  24. Summary Semantic documents • � An approach to combining printable documents with ontologies and knowledge bases � Combined documentation (human-readable) and reasoning (machine-readable) � One document with several applications • Tool support: PDFTab � Creation of semantic documents � Support for document annotation � Editing of ontologies and knowledge bases stored in PDF files 26 2005-07-19 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend