CONCERTO Conceptual indexing, querying and retrieval of digital - - PowerPoint PPT Presentation

concerto conceptual indexing querying and retrieval of
SMART_READER_LITE
LIVE PREVIEW

CONCERTO Conceptual indexing, querying and retrieval of digital - - PowerPoint PPT Presentation

CONCERTO Conceptual indexing, querying and retrieval of digital documents J. McNaught , W.J. Black, F. Rinaldi, E. Bertino, A. Brasher, D. Deavin, B. Catania, D. Silvestri, B. Armani, P. Leo, A. Persidis, G. Semeraro, F. Esposito, V. Candela,


slide-1
SLIDE 1

CONCERTO Conceptual indexing, querying and retrieval of digital documents

  • J. McNaught, W.J. Black, F. Rinaldi,
  • E. Bertino, A. Brasher, D. Deavin,
  • B. Catania, D. Silvestri, B. Armani,
  • P. Leo, A. Persidis, G. Semeraro,
  • F. Esposito, V. Candela, G-P. Zarri &
  • L. Gilardoni
slide-2
SLIDE 2

Overview

  • Concerto: Textual annotation
  • Functionalities
  • Textual annotation and the

knowledge worker

  • Relation to principles and

assumptions of Knowledge Management

slide-3
SLIDE 3

Concerto — Esprit P29159

  • Project funded by the European

Commission

  • Esprit: Strategic Research in IT
  • Conceptual indexing, querying and

retrieval of digital documents:

WWW, digital libraries, corporate document bases, …

  • Ended September 00
slide-4
SLIDE 4

CONCERTO: main features

  • Full knowledge engineering

software environment

  • Computer-aided conceptual

annotation

  • Intelligent IR via annotations
  • No full semantic or conceptual

analysis

slide-5
SLIDE 5

Textual annotation

  • Use KB & document management, IR

& language engineering technologies

  • Knowledge represented as annotations

to textual sources of knowledge (so unlike traditional KB)

  • Only annotate what is relevant to user

needs

  • User decides what knowledge is finally

stored

slide-6
SLIDE 6

Concerto functionality

  • Document capture XML

documents

  • Basic semantic element discovery

(terms, names, relationships, actions)

  • Interactive conceptual annotation

map textual elements to NKRL

  • bjects and templates
  • KM facility: store NKRL annotations

in XML documents; index annotations

  • Query facility, report generation
slide-7
SLIDE 7

Concept Repository Template Repository Conceptual Annotation Repository Document Repository Concept Manager Template Manager Knowledge Manager Inference Engine XML Document Translator Document Acquisition Interface Conceptual Annotation Builder Interface Query Environment Interface Concept Ontology Interface Template Ontology Interface BSE & Extraction Mapping to Ontology

Abstracts Web Pages

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17)

Interfaces Acquisition & Preprocessing

Knowledge Management

Repositories

(20)

Conceptual Annotation Editors & General Users Knowledge Administrators

(18) (19)

slide-8
SLIDE 8

Document capture XML

  • Define suitable Document Type

Definitions (DTDs)

  • Automatic analysis of logical structure
  • f input documents
  • Generation of corresponding XML

documents

  • Simple DTDs: later processing

enriches DTDs with other meta- information

slide-9
SLIDE 9

WEB SERVER + Servlet Engine

SERVLET HTML

HTTP

XML DT Bean

1. 2. 3.

Web Browser

DAI

Document capture and normalisation to XML

slide-10
SLIDE 10

Basic Semantic Element Extraction

  • Extract & tag basic semantic

elements

  • Named entities: companies,

products, locations, people, trade names, offices, amounts, …

  • Context sensitive rules

Also handle co-reference

slide-11
SLIDE 11

BSE Extraction continued…

  • Partial filling of templates

(“business rules”)

  • Database of known (part) names

and cues to aid analysis

  • Success > 90% with proven

FACILE technology

  • Results checked manually

faster than purely manual approach

slide-12
SLIDE 12

Basic Semantic Element Extraction

Basic Preprocessing Database Lookup NE Analyser Input text External Tagger & Morph. Analyser Database NE Rules Named Entities Filled Templates

slide-13
SLIDE 13

Text in process of undergoing Basic Semantic Element Extraction Tool-tip showing rule that applied Colours: different types of entity

slide-14
SLIDE 14

Ontology Mapper

  • Associates names, terms & partially

filled templates with classes & templates of main Concerto repositories

  • Ontology covers domains of current

user partners (publishing/printing & biotechnology)

  • OM can be used to classify documents

(as in FACILE)

slide-15
SLIDE 15

Conceptual annotation builder interface

  • User checks output of BSEE

module

  • Completes/verifies partially filled

templates

  • Interaction ensures high-quality

results

  • Higher precision in search
slide-16
SLIDE 16
slide-17
SLIDE 17

Knowledge representation

  • Most proposals for conceptual

annotation limited in scope

They cannot handle complex narrative actions, facts, events, states relating real or intended behaviour of actors (typical of industrial/economic context)

  • Use of Narrative Knowledge

Representation Language (NKRL)

slide-18
SLIDE 18

Knowledge management facilities

  • Set of repositories, to store

Documents Conceptual annotations Plus information required for construction of annotations

slide-19
SLIDE 19

Management facilities…

  • Set of manager modules

providing

Access to repositories Advanced manipulation operations

slide-20
SLIDE 20

Knowledge Repositories

  • Concept repository

Concepts (ontology) and NKRL instances

slide-21
SLIDE 21

Repositories…

  • Template repository

NKRL templates represented in RDF Information required to construct instances (predicative occurrences) from predicative templates:

  • “move a generic object”

“Tomorrow, I will move the table”

slide-22
SLIDE 22
  • Document repository

XML documents that are (to be) annotated

  • Conceptual annotation repository

Conceptual annotations in terms

  • f NKRL predicative occurrences

and bindings between them Represented in RDF

Repositories…

slide-23
SLIDE 23

Resource Description Format

  • W3C proposal for metadata
  • Handles complex NKRL structures

including second-order structures

slide-24
SLIDE 24

Conceptual annotations repository

Concept manager Template manager

Knowledge manager Inference engine Conceptual annotations repository Concepts repository Knowledge management Repository Template repository Document repository

slide-25
SLIDE 25

Annotation and the Knowledge Worker: Pira International

  • Pira Int operates database of abstracts

Published on-line Printed abstracts journals 600 titles (many types) about publishing 3 current types of knowledge worker

  • abstracters, editors, system workers

Metadata: currently simple lists of index terms, company and trade names

slide-26
SLIDE 26

Evolving Knowledge Work Roles at Pira International

  • Need for new type of worker:

Knowledge Administrator Develops user templates Maintains templates and domain specific ontologies

  • User templates: application specific

Written using user-friendly means, not raw NKRL (internal system use)

slide-27
SLIDE 27

Example Pira Int Template: Contract

Expected values Examples of use between Organisation or Person Author and Organisation or Person Publisher for use of Resource Any content (resource) created by Author in Resource Book status In progress, negotiation, completed

slide-28
SLIDE 28

Annotation and the Knowledge Worker: Biovista

  • Production of corporate intelligence

industry reports

  • Editors scan large amount of information

expert analysis of events, synthesis of trends, much added value need to identify links between business entities and events wish to be able to ask questions of stored knowledge

slide-29
SLIDE 29

Biovista: specific needs

  • Business entity identification, e.g.

Companies, people, products, processes (mergers, collaborations, drug development)

  • Business relationship identification, e.g.

‘Employee’ relationship Company activity in industrial sector due to co-development agreement

slide-30
SLIDE 30

Biovista Ontologies

  • Generic business entities
  • Biotechnology concepts
  • Fine-grained

Leads to richly-connected concepts that help to answer complex editor queries

  • Like Pira Int, need for knowledge

administrator for ontologies and templates

slide-31
SLIDE 31

Principles and Assumptions of KM Davenport’s Principles (subset)

  • KM is expensive: Concerto reduces

costs by Easing document capture Enabling rapid integration of knowledge Adding value via conceptual annotations that are reliable (human-validated)

slide-32
SLIDE 32

KM is expensive: reduce costs by

Offering automatic categorisation Allowing flexible access Affering basis for training and updating

slide-33
SLIDE 33
  • Effective management of K requires

hybrid solutions of people & technology Concerto provides an ‘intelligent assistant’

  • KM never ends, descriptions must be

quick and dirty, highly relevant to user We offer quick and accurate description, guided by user who maximises relevance

Davenport’s Principles

slide-34
SLIDE 34

Davenport’s Principles

  • KM benefits more from maps than models

Concerto offers partial maps, not full models

  • KM involves sharing & reuse of K

(‘unnatural acts’) We offer a shared knowledge base, typically accessed by those who did not contribute the knowledge initially

slide-35
SLIDE 35
  • KM means improving work processes

Pira Int and Biovista are examining how business processes will be affected and improved by use of Concerto

  • Knowledge access is only the beginning

Annotations can be used to provide summaries, enabling wider involvement

Davenport’s Principles

slide-36
SLIDE 36
  • KM does not have to be profound

We offer partial, highly accurate and relevant knowledge, easily targettable

  • Document management concepts,

technologies and procedures are basis for success in KM Concerto strongly supports document-based KM

Applehans et al.: KM Assumptions

slide-37
SLIDE 37

Applehans et al.: KM Assumptions

  • Think big, but start small

Terminologies and ontologies are backbone Useful in own right in many ways An additional principle: KM relies

  • n accurate, well-defined,

structured terminologies and concept systems

slide-38
SLIDE 38
  • Classification & tagging of documents

(metadata) crucial for effective collaboration We fully exploit metadata means to carry conceptual annotations, thus enriching documents and classifying them

Applehans et al.: KM Assumptions

slide-39
SLIDE 39

Conclusion

  • Concerto offers strong support for the

knowledge-based enterprise

  • Store of knowledge in form of

conceptual annotations is accurate and relevant due to human interaction

  • Users keen on close involvement in

production of knowledge stores

slide-40
SLIDE 40