 
              Knowledge Management Institute 707.009 Foundations of Knowledge Management g g „Participative Knowledge Acquisition Methods“ Markus Strohmaier Univ. Ass. / Assistant Professor Knowledge Management Institute Graz University of Technology, Austria e-mail: markus.strohmaier@tugraz.at web: http://www.kmi.tugraz.at/staff/markus Markus Strohmaier 2011 1
Knowledge Management Institute Overview • Knowledge Organization – Problems of Categorization P bl f C t i ti • Broad Knowledge Bases – WordNet CyC ConceptNet and others WordNet, CyC, ConceptNet and others • Knowledge Acquisition – Knowledge and Ontology Engineering g gy g g – Collaborative Knowledge Acquisition Systems Perspective – Game-Based Knowledge Acquisition Markus Strohmaier 2011 2
Knowledge Management Institute Rückblick Homonyme : Mehrdeutige Benennungen (z B Bank) Homonyme : Mehrdeutige Benennungen (z.B. Bank) Homophone : Gleichlautende Benennungen (z.B. Mohr, Moor) Homographen : Gleiche Schreibweisen (z.B. Wach(-)s(-)tube) S Synonyme : Mehrere Bezeichnungen stehen für denselben Begriff M h B i h t h fü d lb B iff (Auto, PKW) Antonyme : Gegensätze (z.B. hart - weich) Hyper/Hyponyme : Abstraktere / Spezifischere Begriffe (z.B. Fahrzeug / H /H Ab t kt / S ifi h B iff ( B F h / PKW) Formale Begriffssysteme zielen oft darauf ab wenig Raum für Interpretation zu lassen! Interpretation zu lassen! – Homonymzusätze (Qualifikatoren) (z.B. „Ring <Schmuckstück>, Ring <Mathematik>) – Korrekte Zuordnung von Begriffen und Benennungen oft erst aus dem Kontext g g g heraus interpretierbar! Markus Strohmaier 2011 3
Knowledge Management Institute Retrospect St Structure and characteristics of t d h t i ti f • Semantic Representations / Ontologies • WordNet • ConceptNet • CyC Markus Strohmaier 2011 4
Knowledge Management Institute Retrospect: ConceptNet Retrospect: ConceptNet Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers. Markus Strohmaier 2011 5
Knowledge Management Institute „Reale Welt“ Objekt Sem iotisches Dreieck W ort Begriff Ausdruck Dolphin Konzept Sym bol Wissen Wissen Sprache Sprache Markus Strohmaier 2011 8
Knowledge Management Institute Der Ontology-Engineering-Prozess Example: Example: Granulität Klarheit Vollständigkeit Wiederverwendbarkeit Modularität Konsistenz Redundanz Redundanz Verbreitung Zugänglichkeit Zugangsform Markus Strohmaier 2011 9
Knowledge Management Institute Der Ontology-Engineering-Prozess Anforderungsspezifikation Anforderungsspezifikation Wissenssammlung, Aufbau von Domänen-Wissen K Konzeptidentifikation tid tifik ti Informelle Repräsentation Entwickung einer ersten Taxonomie Konzeptualisierung und Formalisierung Integration evtl. vorhandener Integration evtl. vorhandener Ontologien Evaluation und Dokumentation Instandhaltung Instandhaltung Weiterentwicklung, Iteration Grafik in Anlehnung an Studer Markus Strohmaier 2011 10
Knowledge Management Institute K Knowledge Acquisition from Text: Hearst Patterns l d A i iti f T t H t P tt Automatic Acquisition of Hyponyms from Large Text Corpora 1992 cereals: rice* wheat* countries: Cuba Vietnam France* hydrocarbon: ethylene hydrocarbon: ethylene substances: bromine* hydrogen* protozoa: paramecium liqueurs: anisette* absinthe* q rocks: granite* substances: phosphorus* nitrogen* species: steatornis oilbirds bivalves: scallop* bivalves: scallop* fungi: smuts* rusts* fabrics: acrylics* nylon* silk* antibiotics: ampicillin erythromycin* p y y institutions: temples king seabirds: penguins albatross* flatworms: tapeworms planaria amphibians: frogs* amphibians: frogs* Markus Strohmaier 2011 11
Knowledge Management Institute Hearst Patterns Hearst Patterns Automatic Acquisition of Hyponyms from Large Text Corpora 1992 (S1) The bow lute, such as the Bambara ndang, is plucked and has an individual curved neck for each string curved neck for each string. Markus Strohmaier 2011 12
Knowledge Management Institute Hearst Patterns Hearst Patterns Automatic Acquisition of Hyponyms from Large Text Corpora 1992 Process for identifying further patterns: 1 Decide on a lexical relation R that is of interest e g 1. Decide on a lexical relation , R, that is of interest, e.g., \group/member" (in our \group/member" (in our formulation this is a subset of the hyponymy relation). 2. Gather a list of terms for which this relation is known to hold , e.g., \England- country". This list can be found automatically using the method described here, y y g , bootstrapping from patterns found by hand, or by bootstrapping from an existing lexicon or knowledge base. 3. Find places in the corpus where these expressions occur syntactically near one another and record the environment another and record the environment. 4. Find the commonalities among these environments and hypothesize that common ones yield patterns that indicate the relation of interest. 5. Once a new pattern has been positively identified, use it to gather more instances of p p y , g the target relation and go to Step 2 . Markus Strohmaier 2011 13
Knowledge Management Institute Open Mind Common Sense Project Open Mind Common Sense Project http://commonsense.media.mit.edu/ C Cyc uses paid experts id t to enter facts in CycL – a proprietary language to represent knowledge ConceptNet leverages ConceptNet leverages •User participation Two types of input: • Template based acquisition i iti • Freeform input (restricted in length) ( g ) Markus Strohmaier 2011 14
Knowledge Management Institute Open Mind Common Sense Project Open Mind Common Sense Project http://commonsense.media.mit.edu/ Types of relations: Markus Strohmaier 2011 15
Knowledge Management Institute Open Mind Common Sense Project p j http://commonsense.media.mit.edu/ Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers. Building ConceptNet Building ConceptNet 1. Extraction phase 50 extraction rules in regular expression form expression form Syntactic and semantic constraints are enforced 2 2. Normalisation phase Normalisation phase Inferring assertions: Spelling correction, lemmatization (replacing terms with their base form), removal of determiners („the , „a ) removal of determiners ( the“ a“) 3. Relaxation phase Improving the connectivity of the network. Merging duplicate assertions, e o e g g dup ca e asse o s, adding frequency metadata, heuristics, Utilization of WordNet‘s and FrameNet‘s synsets and class- hi hierarchies hi Markus Strohmaier 2011 16
Knowledge Management Institute Open Mind Common Sense Project Open Mind Common Sense Project http://commonsense.media.mit.edu/ DEMO : Openmind Common Sense http://commonsense.media.mit.edu/ htt // di it d / Example: Example: A car „is a kind of“ animal Markus Strohmaier 2011 17
Knowledge Management Institute Constructing ConceptNet g p http://commonsense.media.mit.edu/ Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers. Extraction Phase Each node is an english fragment composed of 4 syntactic constructions: • Verbs (buy, not eat, drive) • N Noun phrases (red car, laptop computer) h ( d l t t ) • Prepositional phrases (at work) • Adjectival phrases (very sour red) Adjectival phrases (very sour, red) Verbs must precede noun phrases and adj. Phrases, which in turn must precede prepositional phrases Illustration: „If you want to own an expensive car then you should have lots of money or rich parents If you want to own an expensive car then you should have lots of money or rich parents“ Markus Strohmaier 2011 18
Knowledge Management Institute Constructing ConceptNet http://commonsense.media.mit.edu/ Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers. Normalization Phase • Unsupervised spellchecker • Stripping of determiners (the, a) Lemmatization: • Words are stripped of tense (is/are/were -> be) Words are stripped of tense (is/are/were > be) • Plural -> Singular (apples -> apple) Illustration: „If you want to own an expensive car then you should have earned lots of money or have rich parents“ Markus Strohmaier 2011 19
Knowledge Management Institute Constructing ConceptNet g p http://commonsense.media.mit.edu/ Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers. Apple IsA fruit Relaxation Phase „Lifting“ knowledge by leveraging the IsA relationship Goal: Improve the connectivity of the network Merge duplicate assertions Merge duplicate assertions Add additional metadata field „frequency“ • f counts the number of times a fact is uttered in the OMCS corp s OMCS corpus. • i counts how many times an assertion was inferred during the relaxation phase „Lifting“ knowledge by leveraging adjectival modifiers Produce „intermediate“ knowledge such as semantic and lexical generalisations Helps bridge other knowledge and to improve the connectivity of the knowledgebase Markus Strohmaier 2011 20
Recommend
More recommend