[PPT] - 707.009 Foundations of Knowledge Management g g Participative PowerPoint Presentation

SLIDE 1

Knowledge Management Institute

707.009 Foundations of Knowledge Management g g „Participative Knowledge Acquisition Methods“

Markus Strohmaier

Univ. Ass. / Assistant Professor

Knowledge Management Institute Graz University of Technology, Austria e-mail: markus.strohmaier@tugraz.at web: http://www.kmi.tugraz.at/staff/markus

1

Markus Strohmaier 2011

SLIDE 2

Knowledge Management Institute

Overview

Knowledge Organization

P bl f C t i ti – Problems of Categorization

Broad Knowledge Bases

– WordNet CyC ConceptNet and others WordNet, CyC, ConceptNet and others

Knowledge Acquisition

– Knowledge and Ontology Engineering g gy g g – Collaborative Knowledge Acquisition – Game-Based Knowledge Acquisition Systems Perspective

2

Markus Strohmaier 2011

SLIDE 3

Knowledge Management Institute

Rückblick

Homonyme: Mehrdeutige Benennungen (z B Bank) Homonyme: Mehrdeutige Benennungen (z.B. Bank) Homophone: Gleichlautende Benennungen (z.B. Mohr, Moor) Homographen: Gleiche Schreibweisen (z.B. Wach(-)s(-)tube) S M h B i h t h fü d lb B iff Synonyme: Mehrere Bezeichnungen stehen für denselben Begriff (Auto, PKW) Antonyme: Gegensätze (z.B. hart - weich) H /H Ab t kt / S ifi h B iff ( B F h / Hyper/Hyponyme: Abstraktere / Spezifischere Begriffe (z.B. Fahrzeug / PKW) Formale Begriffssysteme zielen oft darauf ab wenig Raum für Interpretation zu lassen! Interpretation zu lassen!

– Homonymzusätze (Qualifikatoren) (z.B. „Ring <Schmuckstück>, Ring <Mathematik>) – Korrekte Zuordnung von Begriffen und Benennungen oft erst aus dem Kontext g g g heraus interpretierbar!

3

Markus Strohmaier 2011

SLIDE 4

Knowledge Management Institute

Retrospect

St t d h t i ti f Structure and characteristics of

Semantic Representations / Ontologies
WordNet
ConceptNet
CyC

4

Markus Strohmaier 2011

SLIDE 5

Knowledge Management Institute

Retrospect: ConceptNet Retrospect: ConceptNet

Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers.

5

Markus Strohmaier 2011

SLIDE 6

Knowledge Management Institute

„Reale Welt“

Objekt W ort

Sem iotisches Dreieck

Ausdruck Sym bol Begriff Konzept

Sprache Wissen

Dolphin

Sprache Wissen

8

Markus Strohmaier 2011

SLIDE 7

Knowledge Management Institute

Der Ontology-Engineering-Prozess

Example: Example:

Granulität Klarheit Vollständigkeit Wiederverwendbarkeit Modularität Konsistenz Redundanz Redundanz Verbreitung Zugänglichkeit Zugangsform

9

Markus Strohmaier 2011

SLIDE 8

Knowledge Management Institute

Der Ontology-Engineering-Prozess

Anforderungsspezifikation Anforderungsspezifikation Wissenssammlung, Aufbau von Domänen-Wissen K tid tifik ti Konzeptidentifikation Informelle Repräsentation Entwickung einer ersten Taxonomie Konzeptualisierung und Formalisierung Integration evtl. vorhandener Integration evtl. vorhandener Ontologien Evaluation und Dokumentation Instandhaltung Instandhaltung Weiterentwicklung, Iteration

Grafik in Anlehnung an Studer

10

Markus Strohmaier 2011

SLIDE 9

Knowledge Management Institute

K l d A i iti f T t H t P tt Knowledge Acquisition from Text: Hearst Patterns

Automatic Acquisition of Hyponyms from Large Text Corpora 1992

cereals: rice* wheat* countries: Cuba Vietnam France* hydrocarbon: ethylene hydrocarbon: ethylene substances: bromine* hydrogen* protozoa: paramecium liqueurs: anisette* absinthe* q rocks: granite* substances: phosphorus* nitrogen* species: steatornis oilbirds bivalves: scallop* bivalves: scallop* fungi: smuts* rusts* fabrics: acrylics* nylon* silk* antibiotics: ampicillin erythromycin* p y y institutions: temples king seabirds: penguins albatross* flatworms: tapeworms planaria amphibians: frogs*

11

Markus Strohmaier 2011

amphibians: frogs*

SLIDE 10

Knowledge Management Institute

Hearst Patterns Hearst Patterns

Automatic Acquisition of Hyponyms from Large Text Corpora 1992

(S1) The bow lute, such as the Bambara ndang, is plucked and has an individual curved neck for each string curved neck for each string.

12

Markus Strohmaier 2011

SLIDE 11

Knowledge Management Institute

Hearst Patterns Hearst Patterns

Automatic Acquisition of Hyponyms from Large Text Corpora 1992

Process for identifying further patterns: 1 Decide on a lexical relation R that is of interest e g \group/member" (in our

1. Decide on a lexical relation, R, that is of interest, e.g., \group/member" (in our

formulation this is a subset of the hyponymy relation).

2. Gather a list of terms for which this relation is known to hold, e.g., \England-

country". This list can be found automatically using the method described here, y y g , bootstrapping from patterns found by hand, or by bootstrapping from an existing lexicon

r knowledge base.
3. Find places in the corpus where these expressions occur syntactically near one

another and record the environment another and record the environment.

4. Find the commonalities among these environments and hypothesize that common
nes yield patterns that indicate the relation of interest.
5. Once a new pattern has been positively identified, use it to gather more instances of

p p y , g the target relation and go to Step 2.

13

Markus Strohmaier 2011

SLIDE 12

Knowledge Management Institute

Open Mind Common Sense Project Open Mind Common Sense Project

http://commonsense.media.mit.edu/ C id t Cyc uses paid experts to enter facts in CycL – a proprietary language to represent knowledge ConceptNet leverages ConceptNet leverages

User participation

Two types of input:

Template based

i iti acquisition

Freeform input

(restricted in length)

14

Markus Strohmaier 2011

( g )

SLIDE 13

Knowledge Management Institute

Open Mind Common Sense Project Open Mind Common Sense Project

http://commonsense.media.mit.edu/

Types of relations:

15

Markus Strohmaier 2011

SLIDE 14

Knowledge Management Institute

Open Mind Common Sense Project p j

http://commonsense.media.mit.edu/

Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers.

Building ConceptNet Building ConceptNet 1. Extraction phase

50 extraction rules in regular expression form expression form Syntactic and semantic constraints are enforced

2 Normalisation phase

2. Normalisation phase

Spelling correction, lemmatization (replacing terms with their base form), removal of determiners ( the“ a“) Inferring assertions: removal of determiners („the , „a )

3. Relaxation phase

Improving the connectivity of the

network. Merging duplicate assertions,

e

e g g dup ca e asse
s,

adding frequency metadata, heuristics, Utilization of WordNet‘s and FrameNet‘s synsets and class- hi hi

16

Markus Strohmaier 2011

hierarchies

SLIDE 15

Knowledge Management Institute

Open Mind Common Sense Project Open Mind Common Sense Project

http://commonsense.media.mit.edu/ DEMO: Openmind Common Sense htt // di it d / http://commonsense.media.mit.edu/ Example: Example: A car „is a kind of“ animal

17

Markus Strohmaier 2011

SLIDE 16

Knowledge Management Institute

Constructing ConceptNet g p

http://commonsense.media.mit.edu/

Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers.

Extraction Phase Each node is an english fragment composed of 4 syntactic constructions:

Verbs (buy, not eat, drive)

N h ( d l t t )

Noun phrases (red car, laptop computer)
Prepositional phrases (at work)
Adjectival phrases (very sour red)

Adjectival phrases (very sour, red) Verbs must precede noun phrases and adj. Phrases, which in turn must precede prepositional phrases

If you want to own an expensive car then you should have lots of money or rich parents“ Illustration:

18

Markus Strohmaier 2011

„If you want to own an expensive car then you should have lots of money or rich parents

SLIDE 17

Knowledge Management Institute

Constructing ConceptNet

http://commonsense.media.mit.edu/

Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers.

Normalization Phase

Unsupervised spellchecker
Stripping of determiners (the, a)

Lemmatization:

Words are stripped of tense (is/are/were -> be)

Words are stripped of tense (is/are/were > be)

Plural -> Singular (apples -> apple)

„If you want to own an expensive car then you should have earned lots of money or have rich parents“

Illustration:

19

Markus Strohmaier 2011

SLIDE 18

Knowledge Management Institute

Constructing ConceptNet g p

http://commonsense.media.mit.edu/

Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers.

Relaxation Phase Goal: Improve the connectivity of the

Apple IsA fruit

„Lifting“ knowledge by leveraging the IsA relationship

network Merge duplicate assertions Merge duplicate assertions Add additional metadata field „frequency“

f counts the number of times a fact is uttered in the

OMCS corp s OMCS corpus.

i counts how many times an assertion was inferred

during the relaxation phase

Produce „intermediate“ knowledge such as semantic and lexical generalisations Helps bridge other knowledge and to improve the

„Lifting“ knowledge by leveraging adjectival modifiers

20

Markus Strohmaier 2011

connectivity of the knowledgebase

SLIDE 19

Knowledge Management Institute

Constructing ConceptNet g p

http://commonsense.media.mit.edu/

Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers.

Relaxation Phase Goal: Improve the connectivity of the network Resolve vocabulary discrepancies and morphological variations (bike / bicycle) bicycle) Adding SuperThematicKline to reconcile action/state differences (relax/relaxation) or adjective/nominal differences (sad/sadness) Utilizing WordNet and FrameNet‘s verb synonym-sets and class- hierarchies hierarchies

21

Markus Strohmaier 2011

SLIDE 20

Knowledge Management Institute

Constructing ConceptNet g p

http://commonsense.media.mit.edu/

Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers.

Output: Output:

f t th b f ti f t i

f counts the number of times a fact is

uttered in the OMCS corpus.

i counts how many times an

assertion was inferred during the relaxation phase

22

Markus Strohmaier 2011

SLIDE 21

Knowledge Management Institute Also see: Wired Magazin Article: “Inside the High Tech Hunt for a Missing Silicon Valley Legend” http://www.wired.com/techbiz/people/magazine/15-08/ff_jimgray?currentPage=all

Human Computation

Wi d N St F tt i i Wired News on Steve Fossett gone missing

http://www.wired.com/science/planetearth/news/2007/09/f

ssett_search_expands

23

Markus Strohmaier 2011

SLIDE 22

Knowledge Management Institute

Mechanical Turk

24

Markus Strohmaier 2011

SLIDE 23

Knowledge Management Institute

Mechanical Turk

Demo: http://www.mturk.com

Examples: Powerset, Steve Fossett Criticism: minimum wage, exploitation of workers and exploitation of workers, and more

25

Markus Strohmaier 2011

SLIDE 24

Knowledge Management Institute

Taylorism & Fordism Taylorism & Fordism

Chaplin's Modern Times, mins 04:00-05:30) (1936)

26

Markus Strohmaier 2011

SLIDE 25

Knowledge Management Institute

Now, Luis van Ahn, Ass. Prof. at CMU, asks:

What if…

This work would be fun? And people would love to do it? And people would love to do it? Even if they do not get paid?

Recommended lecture: http://video.google.com/videoplay?docid=-8246463980976635143

27

Markus Strohmaier 2011

p g g p y

SLIDE 26

Knowledge Management Institute

Games with a Purpose / Human Computation

EXAMPLE EXAMPLE The Problem:

Computers are not-so-good in understanding images
This restricts the ability to provide effective image search engines
This restricts the ability to provide effective image search engines
Image search relies on accurate metadata
Accurate metadata is costly to generate

Observation: humans can easily identify complex concepts that consist of multiple parts in images, such as trees, bicycles, … Idea: human computation

28

Markus Strohmaier 2011

SLIDE 27

Knowledge Management Institute

Games with a Purpose / Human Computation Games with a Purpose / Human Computation The Google Image Labeler

Slide: Luis von Ahn

29

Markus Strohmaier 2011

Slide: Luis von Ahn

SLIDE 28

Knowledge Management Institute

Games with a Purpose / Human Computation Games with a Purpose / Human Computation The ESP Game

Purse Addresses two bl t th Bag Handbag problems at the same time: 1) metadata ti d brown generation and 2) metadata validation

L. von Ahn and L. Dabbish. Labeling images with a computer game. In Proceedings of the ACM CHI, 2004.

30

Markus Strohmaier 2011

SLIDE 29

Knowledge Management Institute

Games with a Purpose / Human Computation Games with a Purpose / Human Computation

Slide: Luis von Ahn

31

Markus Strohmaier 2011

Slide: Luis von Ahn

SLIDE 30

Knowledge Management Institute

The Google Image Labeler

32

Markus Strohmaier 2011

SLIDE 31

Knowledge Management Institute

Games with a Purpose / Human Computation

DEMO: Let‘s play the Google Image Labeling Game: Let s play the Google Image Labeling Game:

http://images.google.com/imagelabeler/

33

Markus Strohmaier 2011

SLIDE 32

Knowledge Management Institute

Verbosity Verbosity

Luis von Ahn, Mihir Kedia and Manuel Blum, Verbosity: a game for collecting common-sense facts, CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems, : 75--78, 2006.

Instead of asking users to enter true or false statements, or to rate statements, Verbosity leverages the fact that a game exists that requires users to state common-sense facts:TabooTM Example: Players have to describe the word „apple“ without saying „apple“ and without saying „red, pie, fruit, macintosh etc. without saying „red, pie, fruit, macintosh etc. The player has to give a good enough description of the word to get his teamates guess the right concept („you can make juice out of it“) The game requires players to say a list of common-sense facts about each word in order to get their teammates to guess it

34

Markus Strohmaier 2011

SLIDE 33

Knowledge Management Institute

Verbosity Verbosity

Luis von Ahn, Mihir Kedia and Manuel Blum, Verbosity: a game for collecting common-sense facts, CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems, : 75--78, 2006.

One of the players is chosen as the “Narrator” Fact: Laptop contains a keyboard One of the players is chosen as the Narrator while the other is the “Guesser.” The Narrator gets a secret word and must get the Guesser to type that word by sending hints to the Fact: Laptop contains a keyboard Guesser. The hints take the form of sentence templates with blanks to be filled in. The Narrator can fill in the blanks with any word they wish except the secret word (or any string containing the secret word) secret word). For example, if the word is LAPTOP, the Narrator might say: “it has a KEYBOARD.” Narrator might say: it has a KEYBOARD. The Guesser guesses. The Narrator can see all of these guesses, and can tell the Guesser

35

Markus Strohmaier 2011

g , whether each is „hot“ or „cold“.

SLIDE 34

Knowledge Management Institute

Verbosity Verbosity

Luis von Ahn, Mihir Kedia and Manuel Blum, Verbosity: a game for collecting common-sense facts, CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems, : 75--78, 2006.

Player can describe the secret by using sentence templates only. Reasons:

Disambiguation

Template Examples: is a kind of

Disambiguation
Categorization
Parsing

__ is a kind of__ __ is used for __ __ is typically near/in/on __ __ is the opposite of __

Fun

__ is related to __ __ (wildcard)

36

Markus Strohmaier 2011

SLIDE 35

Knowledge Management Institute

Verbosity Verbosity

Google Techtalks, Luis von Ahn, 2006

http://video.google.com/videoplay?docid=-8246463980976635143

NARRATOR GUESSER

> Is typically

near CEREAL MILK

> Is a LIQUID
> …

Is typically near CEREAL MILK! CEREAL

37

Markus Strohmaier 2011

SLIDE 36

Knowledge Management Institute

Verbosity

Google Techtalks, Luis von Ahn, 2006

http://video.google.com/videoplay?docid=-8246463980976635143

NARRATOR GUESSER

> Common-

sense facts b t ilk MILK about milk MILK MILK

38

Markus Strohmaier 2011

SLIDE 37

Knowledge Management Institute

Verbosity Verbosity

Luis von Ahn, Mihir Kedia and Manuel Blum, Verbosity: a game for collecting common-sense facts, CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems, : 75--78, 2006.

Validation and Strategies for assuring accuracy of facts:

Both Guesser and the Narrator receive points whenever the Guesser

enters the correct word

Success of the Guesser: time taken to enter the proper word as an

indicator for the quality of the Narrator‘s statementss

Random pairing of the players: Avoiding manipulation
Description testing: Single player mode

39

Markus Strohmaier 2011

SLIDE 38

Knowledge Management Institute

Verbosity

Google Techtalks, Luis von Ahn, 2006

http://video.google.com/videoplay?docid=-8246463980976635143

NARRATOR GUESSER INPUT (SECRET) OUTPUT INPUT OUTPUT Asymmetric verification game

40

Markus Strohmaier 2011

Properties: Often fun, verified output

SLIDE 39

Knowledge Management Institute

Verbosity Verbosity

Google Techtalks, Luis von Ahn, 2006

http://video.google.com/videoplay?docid=-8246463980976635143

Player 1 Player 2 INPUT INPUT OUTPUT2 OUTPUT1 Symmetric verification game

41

Markus Strohmaier 2011

If Output1 = Output2, both player get points

SLIDE 40

Knowledge Management Institute

Verbosity Verbosity

Luis von Ahn, Mihir Kedia and Manuel Blum, Verbosity: a game for collecting common-sense facts, CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems, : 75--78, 2006.

Symmetric vs. Asymmetric games Symmetric games: Symmetric games: Constraint is number of outputs per input

The ESP game If a given input has too many outputs a symmetric game

Asymmetric games:

Verbosity is not going to work, both players are never going to agree on the same output.

Asymmetric games: Constraint is number of inputs that yield the same output

> Is typically

near CEREAL If there are too many inputs that yield the same output, then given only the output you‘ll never guess what the

> Is a LIQUID

then given only the output you‘ll never guess what the input was.

42

Markus Strohmaier 2011

SLIDE 41

Knowledge Management Institute

Translating Wikipedia?

43

Markus Strohmaier 2011

SLIDE 42

Knowledge Management Institute

Examples: Fun with Data

http://games.freebaseapps.com/ http://www.gwap.com

44

Markus Strohmaier 2011

SLIDE 43

Knowledge Management Institute

Example

45

Markus Strohmaier 2011

SLIDE 44

Knowledge Management Institute

Crime Spotting As An Online Game

Whoever has a CCTV camera, be it the police, local authorities or business or home

wners can sign up to have their cameras
watched. We hope to include police cameras
watched. We hope to include police cameras

very soon.” Players are awarded one point for spotting a t d i d th i t if th suspected crime and three points if they see someone committing an actual crime They also lose points if the camera operator rules the alert was not a crime. Players who help catch the most criminals each month will win prizes up to £1,000.

Source: http://www.telegraph.co.uk/news/uknews/crime/6263882/ Snoopers-could-win-1000-prizes-for-monitoring-CCTV- cameras-on-the-internet.html

47

Markus Strohmaier 2011

SLIDE 45

Knowledge Management Institute

Tagging Images with your Mind

Researchers at Microsoft have invented a system for tagging images by reading brain scans from an electroencephalograph (EEG). Tagging images is an important task because Tagging images is an important task because many images on the web are unlabeled and have no semantic information. This new method allows an appropriate tag to be generated by an AI algorithm interpreting be generated by an AI algorithm interpreting the EEG scan of a person's brain while they view an image. The person need only view the image for as little as 500 ms. Other current methods for generating tags include current methods for generating tags include flat out paying people to do it manually, putting the task on Amazon Mechanical Turk,

r using Google Image Labeler.

Source: http://scitedaily.wordpress.com/2009/11/25/tag-images- with-your-mind/

48

Markus Strohmaier 2011

SLIDE 46

Knowledge Management Institute

Any questions? y q See you next week! y

49

Markus Strohmaier 2011