concerto conceptual indexing querying and retrieval of
play

CONCERTO Conceptual indexing, querying and retrieval of digital - PowerPoint PPT Presentation

CONCERTO Conceptual indexing, querying and retrieval of digital documents J. McNaught , W.J. Black, F. Rinaldi, E. Bertino, A. Brasher, D. Deavin, B. Catania, D. Silvestri, B. Armani, P. Leo, A. Persidis, G. Semeraro, F. Esposito, V. Candela,


  1. CONCERTO Conceptual indexing, querying and retrieval of digital documents J. McNaught , W.J. Black, F. Rinaldi, E. Bertino, A. Brasher, D. Deavin, B. Catania, D. Silvestri, B. Armani, P. Leo, A. Persidis, G. Semeraro, F. Esposito, V. Candela, G-P. Zarri & L. Gilardoni

  2. Overview • Concerto: Textual annotation • Functionalities • Textual annotation and the knowledge worker • Relation to principles and assumptions of Knowledge Management

  3. Concerto — Esprit P29159 • Project funded by the European Commission • Esprit: Strategic Research in IT • Conceptual indexing, querying and retrieval of digital documents: � WWW, digital libraries, corporate document bases, … • Ended September 00

  4. CONCERTO: main features • Full knowledge engineering software environment • Computer-aided conceptual annotation • Intelligent IR via annotations • No full semantic or conceptual analysis

  5. Textual annotation • Use KB & document management, IR & language engineering technologies • Knowledge represented as annotations to textual sources of knowledge (so unlike traditional KB) • Only annotate what is relevant to user needs • User decides what knowledge is finally stored

  6. Concerto functionality • Document capture � XML documents • Basic semantic element discovery (terms, names, relationships, actions) • Interactive conceptual annotation � map textual elements to NKRL objects and templates • KM facility: store NKRL annotations in XML documents; index annotations • Query facility, report generation

  7. Acquisition Knowledge Repositories & Preprocessing Management (13) Concept Concept XML Document Manager Repository Translator (5) (9) Template Template Manager (7) Repository (16) BSE & Extraction (14) (10) Conceptual Mapping to (17) Annotation (6) Ontology Knowledge Repository Manager (18) Document (1) (11) (2) Repository Inference (8) (3) Engine (19) (15) (12) (4) (20) Conceptual Query Concept Template Document Annotation Environment Ontology Ontology Acquisition Builder Interface Interface Interface Interface Interface Interfaces Abstracts Conceptual Annotation Knowledge Web Pages Editors & General Users Administrators

  8. Document capture � XML • Define suitable Document Type Definitions (DTDs) • Automatic analysis of logical structure of input documents • Generation of corresponding XML documents • Simple DTDs: later processing enriches DTDs with other meta- information

  9. Document capture and normalisation to XML DAI Web Browser WEB SERVER + 1. Servlet Engine HTML HTTP XML DT Bean 3. SERVLET 2.

  10. Basic Semantic Element Extraction • Extract & tag basic semantic elements • Named entities: companies, products, locations, people, trade names, offices, amounts, … • Context sensitive rules � Also handle co-reference

  11. BSE Extraction continued… • Partial filling of templates (“business rules”) • Database of known (part) names and cues to aid analysis • Success > 90% with proven FACILE technology • Results checked manually � faster than purely manual approach

  12. Basic Semantic Element Extraction Input text NE Rules Basic Database NE Preprocessing Lookup Analyser Named Filled Entities Templates External Tagger & Database Morph. Analyser

  13. Text in process of undergoing Basic Semantic Element Extraction Colours: different types of entity Tool-tip showing rule that applied

  14. Ontology Mapper • Associates names, terms & partially filled templates with classes & templates of main Concerto repositories • Ontology covers domains of current user partners (publishing/printing & biotechnology) • OM can be used to classify documents (as in FACILE)

  15. Conceptual annotation builder interface • User checks output of BSEE module • Completes/verifies partially filled templates • Interaction ensures high-quality results • Higher precision in search

  16. Knowledge representation • Most proposals for conceptual annotation limited in scope � They cannot handle complex narrative actions, facts, events, states relating real or intended behaviour of actors (typical of industrial/economic context) • Use of Narrative Knowledge Representation Language (NKRL)

  17. Knowledge management facilities • Set of repositories, to store � Documents � Conceptual annotations � Plus information required for construction of annotations

  18. Management facilities… • Set of manager modules providing � Access to repositories � Advanced manipulation operations

  19. Knowledge Repositories • Concept repository � Concepts (ontology) and NKRL instances

  20. Repositories… • Template repository � NKRL templates represented in RDF � Information required to construct instances (predicative occurrences) from predicative templates: • “move a generic object” � “Tomorrow, I will move the table”

  21. Repositories… • Document repository � XML documents that are (to be) annotated • Conceptual annotation repository � Conceptual annotations in terms of NKRL predicative occurrences and bindings between them � Represented in RDF

  22. Resource Description Format • W3C proposal for metadata • Handles complex NKRL structures including second-order structures

  23. Knowledge Repository management Concept Concepts manager repository Template manager Conceptual Template annotations repository repository Knowledge manager Conceptual annotations repository Document Inference repository engine

  24. Annotation and the Knowledge Worker: Pira International • Pira Int operates database of abstracts � Published on-line � Printed abstracts journals � 600 titles (many types) about publishing � 3 current types of knowledge worker •abstracters, editors, system workers � Metadata: currently simple lists of index terms, company and trade names

  25. Evolving Knowledge Work Roles at Pira International • Need for new type of worker: Knowledge Administrator � Develops user templates � Maintains templates and domain specific ontologies • User templates: application specific � Written using user-friendly means, not raw NKRL (internal system use)

  26. Example Pira Int Template: Contract Expected values Examples of use between Organisation or Author Person and Organisation or Publisher Person for use of Resource Any content (resource) created by Author in Resource Book status In progress, negotiation, completed

  27. Annotation and the Knowledge Worker: Biovista • Production of corporate intelligence industry reports • Editors scan large amount of information � expert analysis of events, synthesis of trends, much added value � need to identify links between business entities and events � wish to be able to ask questions of stored knowledge

  28. Biovista: specific needs • Business entity identification, e.g. � Companies, people, products, processes (mergers, collaborations, drug development) • Business relationship identification, e.g. � ‘Employee’ relationship � Company activity in industrial sector due to co-development agreement

  29. Biovista Ontologies • Generic business entities • Biotechnology concepts • Fine-grained � Leads to richly-connected concepts that help to answer complex editor queries • Like Pira Int, need for knowledge administrator for ontologies and templates

  30. Principles and Assumptions of KM Davenport’s Principles (subset) • KM is expensive: Concerto reduces costs by � Easing document capture � Enabling rapid integration of knowledge � Adding value via conceptual annotations that are reliable (human-validated)

  31. KM is expensive: reduce costs by � Offering automatic categorisation � Allowing flexible access � Affering basis for training and updating

  32. Davenport’s Principles • Effective management of K requires hybrid solutions of people & technology � Concerto provides an ‘intelligent assistant’ • KM never ends, descriptions must be quick and dirty, highly relevant to user � We offer quick and accurate description, guided by user who maximises relevance

  33. Davenport’s Principles • KM benefits more from maps than models � Concerto offers partial maps, not full models • KM involves sharing & reuse of K (‘unnatural acts’) � We offer a shared knowledge base, typically accessed by those who did not contribute the knowledge initially

  34. Davenport’s Principles • KM means improving work processes � Pira Int and Biovista are examining how business processes will be affected and improved by use of Concerto • Knowledge access is only the beginning � Annotations can be used to provide summaries, enabling wider involvement

  35. Applehans et al.: KM Assumptions • KM does not have to be profound � We offer partial, highly accurate and relevant knowledge, easily targettable • Document management concepts, technologies and procedures are basis for success in KM � Concerto strongly supports document-based KM

  36. Applehans et al.: KM Assumptions • Think big, but start small � Terminologies and ontologies are backbone � Useful in own right in many ways � An additional principle: KM relies on accurate, well-defined, structured terminologies and concept systems

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend