Y ielding O ntologies for T ransition-Based O rganization - - PowerPoint PPT Presentation
Y ielding O ntologies for T ransition-Based O rganization - - PowerPoint PPT Presentation
Y ielding O ntologies for T ransition-Based O rganization ICT-211423 February, 2008 Intelligent Content and Semantics KYOTO (ICT-211423) Overview Title : Y ielding O ntologies for T ransition-Based O rganization Funded: 7 th
General presentation, February 2008 ICT-211423
KYOTO (ICT-211423) Overview
- Title: Yielding Ontologies for Transition-Based Organization
- Funded:
– 7th Framework Program-ICT of the European Union: Intelligent Content and Semantics – Taiwan and Japan funded by national grants
- Goal:
– Platform for knowledge sharing across languages and cultures – Knowledge transition and information across different target groups, transgressing linguistic, cultural and geographic boundaries. – Open text mining and deep semantic search – Wiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skills
- Duration:
– March 2008 – March 2011
- Effort:
– 364 person months of work.
General presentation, February 2008 ICT-211423
KYOTO (ICT-211423) Overview
- Languages:
– English, Dutch, Italian, Spanish, Basque, Chinese, Japanese
- Domain:
– Environmental domain, BUT usable in any domain
- Global:
– Both European and non-European languages
- Available:
– Free: as open source system and data (GPL)
- Future perspective:
– Content standardization that supports world wide communication – Global Wordnet Grid
General presentation, February 2008 ICT-211423
Consortium
- 1. Vrije Universiteit Amsterdam (Amsterdam, The Netherlands),
- 2. Consiglio Nazionale delle Ricerche (Pisa, Italy),
- 3. Berlin-Brandenburg Academy of Sciences and Humantities (Berlin,
Germany),
- 4. Euskal Herriko Unibertsitatea (San Sebastian, Spain),
- 5. Academia Sinica (Tapei, Taiwan),
- 6. National Institute of Information and Communications Technology
(Kyoto, Japan),
- 7. Irion Technologies (Delft, The Netherlands),
- 8. Synthema (Rome, Italy),
- 9. European Centre for Nature Conservation (Tilburg, The
Netherlands),
- Subcontractors:
– World Wide Fund for Nature (Zeist, The Netherlands), – Masaryk University (Brno, Czech)
General presentation, February 2008 ICT-211423 Concept Mining Images Index Docs URLs Experts Search Dialogue
CO2 emission water pollution
Capture Fact Mining Citizens Governors Companies
Domain
Domain Wiki Wordnets
Θ
Abstract Physical Top Middle water CO2 Substance
Universal Ontology
Process
Environmental
- rganizations
Environmental
- rganizations
Global Wordnet Grid Kybots
General presentation, February 2008 ICT-211423
Generic Knowledge & Language Layer
Top Ontology Middle Ontology Domain Ontology Milo Sumo Dolce Wikipedia Wikipedia Wikipedia Wordnets Central Ontology Language Independent Ontology Sources Gemet GEO DB Wikipedia Language dependent Ontology Sources Meaning Others
m e r g e
- n
t
- l
- g
i z e m a p & p a r s e
- type hierarchy
- axioms
General presentation, February 2008 ICT-211423
Ontologize synsets
- (Semi-)rigid type hierarchy in the ontology:
– Canine => PoodleDog; NewfoundlandDog; DalmatianDog, etc.
- Wordnet consists of names for (semi-)rigid dog-types
and other words for dogs with roles:
– NAMES for TYPES:
{poodle}EN, {poedel}NL, {pudoru}JP ⇔ ((instance x PoodleDog)
– LABELS for ROLES:
{watchdog}EN, {waakhond}NL, {banken}JP ⇒((instance x Canine) and (role x GuardingProcess))
- Type hierarchy remains compact and pure
General presentation, February 2008 ICT-211423
Ontologize
– "theewater" (water for making tea), Dutch
- (exists (?A ?W)
– (and » (instance ?W Water) » (hasPurposeForAgent ?W » (exists (?T) » (and » (instance ?T Tea) » (part ?W ?T))))))
General presentation, February 2008 ICT-211423
Ontologize
- Ontologize concepts from a specific wordnet:
– Only disjunct types need to be added (Fellbaum and Vossen 2007). – For example, CO2 is type of substance, but green- house gas does not represent a different type of gas
- r substance but refers to substances that play a
specific role in specific circumstances.
- All languages can contribute
- Knowledge is shared among all participating
languages through the mapping of the different wordnets to the ontology.
General presentation, February 2008 ICT-211423
Knowledge mining
- Concept mining:
– Extract terms and relations in a language – Map the terms to an existing wordnet – Ontologize terms to concepts and axioms
- Fact mining
– Define logical patterns – Define expression rules in a language
General presentation, February 2008 ICT-211423
Concept mining
Source Documents Linguistic Processors [[the emission]NP [of greenhouse gases]PP [in agricultural areas]PP] NP
Morpho-syntactic analysis English Wordnet
emission:2 gas:1 area:1 greenhouse gas:1 rural area:1 geographical area:1 regio:3 location:3 substance:1 emission:3 farmland:2 natural process:1
in
- f
Term hierarchy
emission gas greenhouse gas area agricultural area Concept Miners
General presentation, February 2008 ICT-211423
Concept integration
Θ
Abstract Physical H20 CO2 Substance CO2Emission WaterPollution
Ontology
Process Chemical Reaction
English Wordnet Extended for domain
emission:2 gas:1 greenhouse gas:1 substance:1 emission:3 natural process:1 GlobalWarming
Ontologize Axiomatize
CO2
(instance s1 Substance) (instance e1 Warming) (katalyist s1 e1)
GreenhouseGas
General presentation, February 2008 ICT-211423
Fact mining
- KYBOT = Knowledge Yielding Robot
- Logical expression
– (instance, e1, Burn) (instance, e2, Warming) (cause, e1, e2) – (instance, s1, CO2) (instance, e1, GlobalWarming) (katalyist, s1,e1)
- Expression rules per language:
– [N[s1]V[e1]]S – [N[e1]N[s1]N – [[N[e1]][prep][N[s2]]NP
- Ontology * Wordnets
– Capabilities – Conditions: WNT -> adjectives, WNT -> nouns – Causes: WNT -> verbs, WNT -> nouns – Process: DamageProcess, ProduceProcess
- Kybot compiler
– kybots = logical pattern+ ontology + WN[Lx] + ER[Lx]
General presentation, February 2008 ICT-211423
Fact mining
Source Documents Linguistic Processors [[the emission]NP [of greenhouse gases]PP [in agricultural areas]PP] NP
Morpho-syntactic analysis Θ
Abstract Physical H2O CO2 Substance CO2 emission water pollution
Ontology Wordnets & Linguistic Expressions
Generic Process Chemical Reaction
Logical Expressions
Domain
[[the emission]NP ] Process: e1 [of greenhouse gases]PP Patient: s2 [in agricultural areas]PP] Location: a3
Fact analysis
General presentation, February 2008 ICT-211423
Wiki for knowledge sharing
- Uses XFLOW workflow engine as underlying mechanism
- Easy interface tailored to domain experts who don't know the
underlying complex data model (ontology plus multi grid wordnet);
- Simplified wiki syntax that is much easier to use for non technical
users than e.g. HTML;
- Web based interface;
- Rollback mechanism: each change to the content is versioned;
- Search functions: synset;
- Automatic downloading of information from web resources e.g.
Wikipedia;
- Support for collaborative editing and consensus achievement such
as discussion forums, and list of last updates.
- Role based user management;
General presentation, February 2008 ICT-211423
Wiki for knowledge sharing
- Manage the underlying complex data model in
- rder to keep it consistent:
– "water pollution" is inserted into a language specific wordnet by a domain expert – a new entry will be automatically inserted in the
- ntology extension and in every wordnet.
– list all dummy entries to be filled in. – English used as the common ground language to support the extension and propagation of changes between the different wordnets and the ontology.
General presentation, February 2008 ICT-211423
Evaluation
- Wordnets and ontologies are evaluated across
linguistic partners;
- Language and ontology experts will use the Wiki
system to build the basic ontology and wordnet layers needed for the extension to the domain;
- Domain experts will use the top layer and middle
layer of wordnets and ontologies plus the Wiki system to encode the knowledge in their domains and reach consensus;
- The system is tested by integration in a retrieval
system;
General presentation, February 2008 ICT-211423
Evaluation
- Cross-lingual portal:
– show the effects of deep semantic processing for user-scenarios – match queries across languages and cultures..
- User queries processed by Kybots and
matched with deep semantic patterns:
– polluting substance and polluted substance
General presentation, February 2008 ICT-211423
Knowledge sharing
- Domains share the generic:
– Generic knowledge from the wordnets and the
- ntology is re-used and shared in various domains
– Generic Kybots (knowledge yielding miners) are re- used and shared in various domains
- Languages share the knowledge:
– Ontologies (both generic and domain-specific) are shared across languages – Kybots (both generic and domain-specific) are re- used and shared across languages
General presentation, February 2008 ICT-211423
Θ
Abstract Physical H20 CO2 Substance CO2 Emission Water pollution
Ontology Wordnets Linguistic Expresssions
Generic Process Chemical Reaction
Logical Expressions Kybots
Domain words words words words
Kybot sharing
General presentation, February 2008 ICT-211423
Sharing Kybots
- General conceptual patterns using a simple logical expression:
concentrations of substances, causal relations between processes
- r conditional states for processes
- Domain text:
– people usually do not use special words in a language to refer to the causal relation itself but they use general words such as “cause” or “factor”. – Certain valid conditions can be specified in addition to the general ones, as they are relevant for the users.
- CO2 emissions can be derived from a certain process involving
certain amounts of the substance CO2 but critical levels can be defined in the text miner as a conceptual constraint.
- Limit the ambiguity of interpretation that arises at the generic levels
to only one interpretation at the domain level.
General presentation, February 2008 ICT-211423
Major Innovations
- Specific knowledge acquired from different textual
sources, domains and languages is grounded to a shared ontology: the specific is anchored in the generic.
- Specific text miners developed for different languages
and domains are shared through logical expressions based on the shared ontology.
- Language-based knowledge is anchored to universal
knowledge so that all language can contribute and benefit from acquisition.
- Community software allows for maintenance, fine-
tuning and customization of the wordnets and ontology and consequently of the information system.
General presentation, February 2008 ICT-211423
Results of Kyoto
- Open knowledge sharing and anchoring system.
- Ontologies:
– high-level and mid-level concepts needed to accommodate the information in the environmental domain. – Most generic level to maximize the re-usability – Precise enough to yield useful constraints in detecting relations in the domain – Database and XML data free for the whole community.
- Wordnets:
– Existing wordnets extended and harmonized with the ontology – Database and XML data freefor the whole community.
- Acquisition tools:
– Software in all 7 languages to automatically extract synsets and synset-relations from text within a domain.
- Linguistic processors:
– tokenization, segmentation, tagging, parsing and word-sense disambiguation. – Use existing technology and resources.
General presentation, February 2008 ICT-211423
Table 3: Work package list
364 TOTAL 36 1 26 VUA Dissemination WP11 36 19 8 SYNTHEMA Exploitation WP10 33 4 20 ECNC Evaluation WP9 30 13 12 ECNC Domain extension WP8 24 1 25 CNR-ILC-IIT Database systems and wiki WP7 24 4 106 BBAW Knowledge integration WP6 30 7 120 EHU Knowledge mining WP5 12 4 11 IRION Indexing WP4 9 1 10 IRION Capture WP3 6 1 12 SYNTHEMA System design WP2 6 1 5 VUA User requirements WP1 36 1 9 VUA Management WP0 End Start PM Lead partic. Work package title WP No
General presentation, February 2008 ICT-211423
WP6: Knowledge Integration WP8 Domain extension WP9 Evaluatio WP7 Databases & wiki WP5 Knowledge mining WP4 Index WP3 Capture WP1 User requirements
Text & Meta data in XMLFormat term hierarchy wordnet Concept Miners term relations
- ntology
Kybots Manual Revision Wiki DEB Client domain wordnet domain
- ntology
Indexing source data Capture Data & Facts in XML Format DEB Server Access end-users Index User scenarios User scenarios Manual Test Bench mark data Bench marking
WP2 System Design
General presentation, February 2008 ICT-211423
Milestone Overview
month 33 VUA WP3, WP4, WP5, WP6, WP7, WP8, WP9 Final evaluation M4 month 21 ECNC WP3, WP4, WP5, WP6, WP7, WP8, WP9 Intermediate evaluation M3 month 12 BBAW WP3, WP4, WP5, WP6 Generic knowledge layer M2 month 6 VUA WP1, WP2, WP9 System architecture and design M1 Delivery date Lead WPs no's Description Mil.
General presentation, February 2008 ICT-211423
Complex questions in the cross-lingual environmental portal
measurements to reduce noise geluidsreducerende maatregelen air pollution from the Ruhr area luchtverontreiniging vanuit het ruhrgebied vegetables from garden groente uit tuin cause of air pollution
- orzaak luchtverontreining
what companies are the biggest polluters? welke bedrijven zijn grote luchtvervuilers environmental complaint waste batteries milieu klacht afval batterij heavy metals in ground water zware metalen in grondwater sick because of air pollution ziek door luchtverontreiniging On how many different ways can you measure air quality Op hoeveel manieren wordt de luchtkwaliteit gemeten? air pollution by traffic Luchtvervuiling door verkeer Akzo Nobel foam in Apeldoorns Channel Akzo Nobel schuim Apeldoorns Kanaal Where is air being measured Waar wordt de lucht gemeten What companies produce a lot of damaging substances? Welke bedrijven stoten veel schadelijke stoffen uit Translation Original question
General presentation, February 2008 ICT-211423
Complex questions for Aarhus-registered documents of government permits
measurements to reduce noise geluidsreducerende maatregelen air pollution from the Ruhr area luchtverontreiniging vanuit het ruhrgebied vegetables from garden groente uit tuin cause of air pollution
- orzaak luchtverontreining
what companies are the biggest polluters? welke bedrijven zijn grote luchtvervuilers environmental complaint waste batteries milieu klacht afval batterij heavy metals in ground water zware metalen in grondwater sick because of air pollution ziek door luchtverontreiniging air pollution by traffic Luchtvervuiling door verkeer Fine dust emissions Electrabel fijn stof emissies Electrabel Akzo Nobel foam in Apeldoorns Channel Akzo Nobel schuim Apeldoorns Kanaal Where is air being measured Waar wordt de lucht gemeten Translation Original question