Y ielding O ntologies for T ransition-Based O rganization - - PowerPoint PPT Presentation

y ielding o ntologies for t ransition based o rganization
SMART_READER_LITE
LIVE PREVIEW

Y ielding O ntologies for T ransition-Based O rganization - - PowerPoint PPT Presentation

Y ielding O ntologies for T ransition-Based O rganization ICT-211423 February, 2008 Intelligent Content and Semantics KYOTO (ICT-211423) Overview Title : Y ielding O ntologies for T ransition-Based O rganization Funded: 7 th


slide-1
SLIDE 1

Yielding Ontologies for Transition-Based Organization

ICT-211423 February, 2008 Intelligent Content and Semantics

slide-2
SLIDE 2

General presentation, February 2008 ICT-211423

KYOTO (ICT-211423) Overview

  • Title: Yielding Ontologies for Transition-Based Organization
  • Funded:

– 7th Framework Program-ICT of the European Union: Intelligent Content and Semantics – Taiwan and Japan funded by national grants

  • Goal:

– Platform for knowledge sharing across languages and cultures – Knowledge transition and information across different target groups, transgressing linguistic, cultural and geographic boundaries. – Open text mining and deep semantic search – Wiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skills

  • Duration:

– March 2008 – March 2011

  • Effort:

– 364 person months of work.

slide-3
SLIDE 3

General presentation, February 2008 ICT-211423

KYOTO (ICT-211423) Overview

  • Languages:

– English, Dutch, Italian, Spanish, Basque, Chinese, Japanese

  • Domain:

– Environmental domain, BUT usable in any domain

  • Global:

– Both European and non-European languages

  • Available:

– Free: as open source system and data (GPL)

  • Future perspective:

– Content standardization that supports world wide communication – Global Wordnet Grid

slide-4
SLIDE 4

General presentation, February 2008 ICT-211423

Consortium

  • 1. Vrije Universiteit Amsterdam (Amsterdam, The Netherlands),
  • 2. Consiglio Nazionale delle Ricerche (Pisa, Italy),
  • 3. Berlin-Brandenburg Academy of Sciences and Humantities (Berlin,

Germany),

  • 4. Euskal Herriko Unibertsitatea (San Sebastian, Spain),
  • 5. Academia Sinica (Tapei, Taiwan),
  • 6. National Institute of Information and Communications Technology

(Kyoto, Japan),

  • 7. Irion Technologies (Delft, The Netherlands),
  • 8. Synthema (Rome, Italy),
  • 9. European Centre for Nature Conservation (Tilburg, The

Netherlands),

  • Subcontractors:

– World Wide Fund for Nature (Zeist, The Netherlands), – Masaryk University (Brno, Czech)

slide-5
SLIDE 5

General presentation, February 2008 ICT-211423 Concept Mining Images Index Docs URLs Experts Search Dialogue

CO2 emission water pollution

Capture Fact Mining Citizens Governors Companies

Domain

Domain Wiki Wordnets

Θ

Abstract Physical Top Middle water CO2 Substance

Universal Ontology

Process

Environmental

  • rganizations

Environmental

  • rganizations

Global Wordnet Grid Kybots

slide-6
SLIDE 6

General presentation, February 2008 ICT-211423

Generic Knowledge & Language Layer

Top Ontology Middle Ontology Domain Ontology Milo Sumo Dolce Wikipedia Wikipedia Wikipedia Wordnets Central Ontology Language Independent Ontology Sources Gemet GEO DB Wikipedia Language dependent Ontology Sources Meaning Others

m e r g e

  • n

t

  • l
  • g

i z e m a p & p a r s e

  • type hierarchy
  • axioms
slide-7
SLIDE 7

General presentation, February 2008 ICT-211423

Ontologize synsets

  • (Semi-)rigid type hierarchy in the ontology:

– Canine => PoodleDog; NewfoundlandDog; DalmatianDog, etc.

  • Wordnet consists of names for (semi-)rigid dog-types

and other words for dogs with roles:

– NAMES for TYPES:

{poodle}EN, {poedel}NL, {pudoru}JP ⇔ ((instance x PoodleDog)‏

– LABELS for ROLES:

{watchdog}EN, {waakhond}NL, {banken}JP ⇒((instance x Canine) and (role x GuardingProcess))‏

  • Type hierarchy remains compact and pure
slide-8
SLIDE 8

General presentation, February 2008 ICT-211423

Ontologize

– "theewater" (water for making tea), Dutch

  • (exists (?A ?W)

– (and » (instance ?W Water) » (hasPurposeForAgent ?W » (exists (?T) » (and » (instance ?T Tea) » (part ?W ?T))))))

slide-9
SLIDE 9

General presentation, February 2008 ICT-211423

Ontologize

  • Ontologize concepts from a specific wordnet:

– Only disjunct types need to be added (Fellbaum and Vossen 2007). – For example, CO2 is type of substance, but green- house gas does not represent a different type of gas

  • r substance but refers to substances that play a

specific role in specific circumstances.

  • All languages can contribute
  • Knowledge is shared among all participating

languages through the mapping of the different wordnets to the ontology.

slide-10
SLIDE 10

General presentation, February 2008 ICT-211423

Knowledge mining

  • Concept mining:

– Extract terms and relations in a language – Map the terms to an existing wordnet – Ontologize terms to concepts and axioms

  • Fact mining

– Define logical patterns – Define expression rules in a language

slide-11
SLIDE 11

General presentation, February 2008 ICT-211423

Concept mining

Source Documents Linguistic Processors [[the emission]NP [of greenhouse gases]PP [in agricultural areas]PP] NP

Morpho-syntactic analysis English Wordnet

emission:2 gas:1 area:1 greenhouse gas:1 rural area:1 geographical area:1 regio:3 location:3 substance:1 emission:3 farmland:2 natural process:1

in

  • f

Term hierarchy

emission gas greenhouse gas area agricultural area Concept Miners

slide-12
SLIDE 12

General presentation, February 2008 ICT-211423

Concept integration

Θ

Abstract Physical H20 CO2 Substance CO2Emission WaterPollution

Ontology

Process Chemical Reaction

English Wordnet Extended for domain

emission:2 gas:1 greenhouse gas:1 substance:1 emission:3 natural process:1 GlobalWarming

Ontologize Axiomatize

CO2

(instance s1 Substance) (instance e1 Warming) (katalyist s1 e1)

GreenhouseGas

slide-13
SLIDE 13

General presentation, February 2008 ICT-211423

Fact mining

  • KYBOT = Knowledge Yielding Robot
  • Logical expression

– (instance, e1, Burn) (instance, e2, Warming) (cause, e1, e2) – (instance, s1, CO2) (instance, e1, GlobalWarming) (katalyist, s1,e1)

  • Expression rules per language:

– [N[s1]V[e1]]S – [N[e1]N[s1]N – [[N[e1]][prep][N[s2]]NP

  • Ontology * Wordnets

– Capabilities – Conditions: WNT -> adjectives, WNT -> nouns – Causes: WNT -> verbs, WNT -> nouns – Process: DamageProcess, ProduceProcess

  • Kybot compiler

– kybots = logical pattern+ ontology + WN[Lx] + ER[Lx]

slide-14
SLIDE 14

General presentation, February 2008 ICT-211423

Fact mining

Source Documents Linguistic Processors [[the emission]NP [of greenhouse gases]PP [in agricultural areas]PP] NP

Morpho-syntactic analysis Θ

Abstract Physical H2O CO2 Substance CO2 emission water pollution

Ontology Wordnets & Linguistic Expressions

Generic Process Chemical Reaction

Logical Expressions

Domain

[[the emission]NP ] Process: e1 [of greenhouse gases]PP Patient: s2 [in agricultural areas]PP] Location: a3

Fact analysis

slide-15
SLIDE 15

General presentation, February 2008 ICT-211423

Wiki for knowledge sharing

  • Uses XFLOW workflow engine as underlying mechanism
  • Easy interface tailored to domain experts who don't know the

underlying complex data model (ontology plus multi grid wordnet);

  • Simplified wiki syntax that is much easier to use for non technical

users than e.g. HTML;

  • Web based interface;
  • Rollback mechanism: each change to the content is versioned;
  • Search functions: synset;
  • Automatic downloading of information from web resources e.g.

Wikipedia;

  • Support for collaborative editing and consensus achievement such

as discussion forums, and list of last updates.

  • Role based user management;
slide-16
SLIDE 16

General presentation, February 2008 ICT-211423

Wiki for knowledge sharing

  • Manage the underlying complex data model in
  • rder to keep it consistent:

– "water pollution" is inserted into a language specific wordnet by a domain expert – a new entry will be automatically inserted in the

  • ntology extension and in every wordnet.

– list all dummy entries to be filled in. – English used as the common ground language to support the extension and propagation of changes between the different wordnets and the ontology.

slide-17
SLIDE 17

General presentation, February 2008 ICT-211423

Evaluation

  • Wordnets and ontologies are evaluated across

linguistic partners;

  • Language and ontology experts will use the Wiki

system to build the basic ontology and wordnet layers needed for the extension to the domain;

  • Domain experts will use the top layer and middle

layer of wordnets and ontologies plus the Wiki system to encode the knowledge in their domains and reach consensus;

  • The system is tested by integration in a retrieval

system;

slide-18
SLIDE 18

General presentation, February 2008 ICT-211423

Evaluation

  • Cross-lingual portal:

– show the effects of deep semantic processing for user-scenarios – match queries across languages and cultures..

  • User queries processed by Kybots and

matched with deep semantic patterns:

– polluting substance and polluted substance

slide-19
SLIDE 19

General presentation, February 2008 ICT-211423

Knowledge sharing

  • Domains share the generic:

– Generic knowledge from the wordnets and the

  • ntology is re-used and shared in various domains

– Generic Kybots (knowledge yielding miners) are re- used and shared in various domains

  • Languages share the knowledge:

– Ontologies (both generic and domain-specific) are shared across languages – Kybots (both generic and domain-specific) are re- used and shared across languages

slide-20
SLIDE 20

General presentation, February 2008 ICT-211423

Θ

Abstract Physical H20 CO2 Substance CO2 Emission Water pollution

Ontology Wordnets Linguistic Expresssions

Generic Process Chemical Reaction

Logical Expressions Kybots

Domain words words words words

Kybot sharing

slide-21
SLIDE 21

General presentation, February 2008 ICT-211423

Sharing Kybots

  • General conceptual patterns using a simple logical expression:

concentrations of substances, causal relations between processes

  • r conditional states for processes
  • Domain text:

– people usually do not use special words in a language to refer to the causal relation itself but they use general words such as “cause” or “factor”. – Certain valid conditions can be specified in addition to the general ones, as they are relevant for the users.

  • CO2 emissions can be derived from a certain process involving

certain amounts of the substance CO2 but critical levels can be defined in the text miner as a conceptual constraint.

  • Limit the ambiguity of interpretation that arises at the generic levels

to only one interpretation at the domain level.

slide-22
SLIDE 22

General presentation, February 2008 ICT-211423

Major Innovations

  • Specific knowledge acquired from different textual

sources, domains and languages is grounded to a shared ontology: the specific is anchored in the generic.

  • Specific text miners developed for different languages

and domains are shared through logical expressions based on the shared ontology.

  • Language-based knowledge is anchored to universal

knowledge so that all language can contribute and benefit from acquisition.

  • Community software allows for maintenance, fine-

tuning and customization of the wordnets and ontology and consequently of the information system.

slide-23
SLIDE 23

General presentation, February 2008 ICT-211423

Results of Kyoto

  • Open knowledge sharing and anchoring system.
  • Ontologies:

– high-level and mid-level concepts needed to accommodate the information in the environmental domain. – Most generic level to maximize the re-usability – Precise enough to yield useful constraints in detecting relations in the domain – Database and XML data free for the whole community.

  • Wordnets:

– Existing wordnets extended and harmonized with the ontology – Database and XML data freefor the whole community.

  • Acquisition tools:

– Software in all 7 languages to automatically extract synsets and synset-relations from text within a domain.

  • Linguistic processors:

– tokenization, segmentation, tagging, parsing and word-sense disambiguation. – Use existing technology and resources.

slide-24
SLIDE 24

General presentation, February 2008 ICT-211423

Table 3: Work package list

364 TOTAL 36 1 26 VUA Dissemination WP11 36 19 8 SYNTHEMA Exploitation WP10 33 4 20 ECNC Evaluation WP9 30 13 12 ECNC Domain extension WP8 24 1 25 CNR-ILC-IIT Database systems and wiki WP7 24 4 106 BBAW Knowledge integration WP6 30 7 120 EHU Knowledge mining WP5 12 4 11 IRION Indexing WP4 9 1 10 IRION Capture WP3 6 1 12 SYNTHEMA System design WP2 6 1 5 VUA User requirements WP1 36 1 9 VUA Management WP0 End Start PM Lead partic. Work package title WP No

slide-25
SLIDE 25

General presentation, February 2008 ICT-211423

WP6: Knowledge Integration WP8 Domain extension WP9 Evaluatio WP7 Databases & wiki WP5 Knowledge mining WP4 Index WP3 Capture WP1 User requirements

Text & Meta data in XMLFormat term hierarchy wordnet Concept Miners term relations

  • ntology

Kybots Manual Revision Wiki DEB Client domain wordnet domain

  • ntology

Indexing source data Capture Data & Facts in XML Format DEB Server Access end-users Index User scenarios User scenarios Manual Test Bench mark data Bench marking

WP2 System Design

slide-26
SLIDE 26

General presentation, February 2008 ICT-211423

Milestone Overview

month 33 VUA WP3, WP4, WP5, WP6, WP7, WP8, WP9 Final evaluation M4 month 21 ECNC WP3, WP4, WP5, WP6, WP7, WP8, WP9 Intermediate evaluation M3 month 12 BBAW WP3, WP4, WP5, WP6 Generic knowledge layer M2 month 6 VUA WP1, WP2, WP9 System architecture and design M1 Delivery date Lead WPs no's Description Mil.

slide-27
SLIDE 27

General presentation, February 2008 ICT-211423

Complex questions in the cross-lingual environmental portal

measurements to reduce noise geluidsreducerende maatregelen air pollution from the Ruhr area luchtverontreiniging vanuit het ruhrgebied vegetables from garden groente uit tuin cause of air pollution

  • orzaak luchtverontreining

what companies are the biggest polluters? welke bedrijven zijn grote luchtvervuilers environmental complaint waste batteries milieu klacht afval batterij heavy metals in ground water zware metalen in grondwater sick because of air pollution ziek door luchtverontreiniging On how many different ways can you measure air quality Op hoeveel manieren wordt de luchtkwaliteit gemeten? air pollution by traffic Luchtvervuiling door verkeer Akzo Nobel foam in Apeldoorns Channel Akzo Nobel schuim Apeldoorns Kanaal Where is air being measured Waar wordt de lucht gemeten What companies produce a lot of damaging substances? Welke bedrijven stoten veel schadelijke stoffen uit Translation Original question

slide-28
SLIDE 28

General presentation, February 2008 ICT-211423

Complex questions for Aarhus-registered documents of government permits

measurements to reduce noise geluidsreducerende maatregelen air pollution from the Ruhr area luchtverontreiniging vanuit het ruhrgebied vegetables from garden groente uit tuin cause of air pollution

  • orzaak luchtverontreining

what companies are the biggest polluters? welke bedrijven zijn grote luchtvervuilers environmental complaint waste batteries milieu klacht afval batterij heavy metals in ground water zware metalen in grondwater sick because of air pollution ziek door luchtverontreiniging air pollution by traffic Luchtvervuiling door verkeer Fine dust emissions Electrabel fijn stof emissies Electrabel Akzo Nobel foam in Apeldoorns Channel Akzo Nobel schuim Apeldoorns Kanaal Where is air being measured Waar wordt de lucht gemeten Translation Original question

slide-29
SLIDE 29