Vocabulary management and SKOS Putting Business in the Lead Jan - - PowerPoint PPT Presentation

vocabulary management
SMART_READER_LITE
LIVE PREVIEW

Vocabulary management and SKOS Putting Business in the Lead Jan - - PowerPoint PPT Presentation

Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014 Introduction Jan Voskuil Taxonic (co-founder) Consultancy in Semantic Technology SKOS is used for findability,


slide-1
SLIDE 1

Vocabulary management and SKOS

Putting Business in the Lead

Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014

slide-2
SLIDE 2

Introduction

Jan Voskuil Taxonic (co-founder) Consultancy in Semantic Technology “SKOS is used for findability, but should be used also for vocabulary management in organizations. Business owns the dictionary, not IT”

 What are dictionaries and what for?  SKOS: Tooling and benefits  Practicalities

slide-3
SLIDE 3

Dienst Justitiële Inrichtingen (DJI)

 Custodial Institutions Agency

 Ca. 10.000 employees  Ca. 70.000 inmates per year  Ca. 50 facilities

 Four groups of detainees

 Adult detainees  Juvenile offenders  Patients in forensic care  Foreign nationals

slide-4
SLIDE 4

4

Dictionaries: Benefits

  • Knowledge management
  • Quality of information
  • Manageability

– If your systems contain 100K+ of attribute names, then they contain unstructured information (Dave McComb)

  • Findability

– Document (DMS) – Data (DBMS)

  • Exchangeability
slide-5
SLIDE 5

5

Frequency of the most frequent word Frequency of the second most frequent word

How many key words are enough?

  • Zipf’s Law
  • 5000 words are enough to understand

95% of any corpus. For the other 5% you need to know the other 200,000 words

Source: Tiberius and Schoonheim A Frequency Dictionary of Dutch, 2014 Pocket dictionary: 5K General dictionary: 100K Lexicographic dictionary: 1M+

slide-6
SLIDE 6

6

The Real World

Dictionary Owner Begrippenwoordenboek DJI Dept X Begrippenlijst Project Y Project Y Mega Glossary ICT-Dept Information chain dictionaries Ketenwoordenboek Strafrecht JustID Ketenwoordenboek Vreemdelingen JustID Justitiethesaurus WODC Data Dictionaries Gegevenswoordenboek MITS ICT-Dept Datadictionary Tulp MIR ICT-Dept

… It just does not work!

What is the correct definition of x? Who decides this? My project introduces new terms, how can I get these accepted?

slide-7
SLIDE 7

7

OLD SITUATION NEW SITUATION Various lists Single source of truth Various versions Single source of truth Word-documents Intranet (Internet) Distribution per mail Intranet (Internet) Endless discussions Clear-cut governance Responsibility of IT dept or project Ownership by the business

slide-8
SLIDE 8

8

Some How To’s

  • Keep the dictionary lean and mean

– Create a “pocket dictionary” – Example: 1200 key words

  • Governance: be pragmatic
  • Ownership within the business!
  • Use clear, explanatory descriptions

– Language of the work force – Avoid legal speak!

  • Dictionary maintenance is a continuous proces!

– Release cycle – One major, four minor releases per year – Major release is approved by senior executives

slide-9
SLIDE 9

9

Why SKOS is so great: just enough semantics

  • Semantic relations

– Compare one-dimensional lists

  • A LIMITED number of

STANDARDIZED semantic relations

– Broader, Narrower, Related Term – Semantics is sufficiently vague

  • Intuitive, easy to understand

– Ideal for “pidginization” – Use is far broader than Class Diagrams, ERDs and ontologies

  • Only most relevant info
  • “GENERALIZED CLASSIFICATION”

Justitiabele (“Detainee”) Adult detainee Juvenile offender Foreign national Patient in forensic care

narrower

Criminal Law Penal Institution

narrower

Sex Male Female Unknown Undisclosed

narrower

slide-10
SLIDE 10

10

Why SKOS is so great: tooling

slide-11
SLIDE 11

11

Tooling: PoolParty Thesaurus Manager

slide-12
SLIDE 12

12

End User View

slide-13
SLIDE 13

13

SKOS is an Open Standard: Project Linking

slide-14
SLIDE 14

http://vocabulary.wolterskluwer.de

slide-15
SLIDE 15

15

prefLabel: Unfallverhütung Alternative labels Broaders Narrowers Related terms From DBPedia From lod.gesis.org From eurovoc.org

From Wolters Kluwer Other thesauri on the web

slide-16
SLIDE 16

16

prefLabel: Unfallverhütung Alternative labels Broaders Narrowers Related terms From DBPedia From lod.gesis.org From eurovoc.org

From Wolters Kluwer Other thesauri on the web DJI and the POLICE have very different meanings for the word ARRESTANT DO: > RESPECT DIFFERENCES BETWEEN ORGANIZATIONS > MAKE LEXICOGRAPHIC DIFFERENCES EXPLICIT USING LINKED THESAURI DON’T > TRY MAKING ALL ORGANIZATIONS USE EXACTLY THE SAME LANGUAGE

slide-17
SLIDE 17

17

Conclusion and next step: Linking Thesauri to Datamodels

  • Datamodels: not owned by business

– too detailed – too complex – NO ownership at the strategic level

  • Thesauri

– Relatively abstract – Relatively simple – Ownership by the business

  • SKOS bridges the gap

– With datamodels in RDF, the gap can be bridged!

slide-18
SLIDE 18

18

THESAURUS AND DOMAINMODELS: SCENARIO 1

DOMAIN MODEL | Data dictionary

:inmate#9818763 “B.23.a” :cell :pi_Dordrecht :isRegisteredAt :penitentiaryInstitution rdf:type

THESAURUS

skos:Concept voc:4862 “Penitentiary Institution” skos:prefLabel rdf:type “Detention Facility” skos:broader eurovoc:C877 Skos:Concept rdfs:type skos:exactMatch skos:prefLabel

“A prison,[3] gaol or jail[4] is a facility in which inmates are forcibly confined and denied a variety of freedoms under the authority of…

skos:Definition “място за лишаване от свобода ”@bg “Penal Institution”@en skos:prefLabel

  • wl:sameAs?

skos:exactMatch?

slide-19
SLIDE 19

19

DOMAIN MODEL | Data dictionary THESAURUS DOMAIN MODEL | Data dictionary

THESAURUS AND DOMAINMODELS: SCENARIO 2

skos:Concept “Penitentiary Institution” rdf:type “Detention Facility” eurovoc:C877 Skos:Concept rdfs:type skos:exactMatch skos:prefLabel

“A prison,[3] gaol or jail[4] is a facility in which inmates are forcibly confined and denied a variety of freedoms under the authority of…

“място за лишаване от свобода ”@bg “Penal Institution”@en skos:prefLabel :inmate#9818763 “B.23.a” :cell :pi_Dordrecht :isRegisteredAt :penitentiaryInstitution rdf:type

slide-20
SLIDE 20

jan.voskuil@taxonic.com www.taxonic.com