EXPLOITING AND REMODELLING SEMANTIC RELATIONSHIPS Fran Ale lexande - - PowerPoint PPT Presentation

exploiting and remodelling semantic
SMART_READER_LITE
LIVE PREVIEW

EXPLOITING AND REMODELLING SEMANTIC RELATIONSHIPS Fran Ale lexande - - PowerPoint PPT Presentation

TRANSFORMATION OF A LEGACY UDC- BASED CLASSIFICATION SYSTEM: EXPLOITING AND REMODELLING SEMANTIC RELATIONSHIPS Fran Ale lexande der, , Taxonomy Manager, BBC Information and Archives, London, UK Andy y Heather er, , Chief Technical


slide-1
SLIDE 1

TRANSFORMATION OF A LEGACY UDC- BASED CLASSIFICATION SYSTEM: EXPLOITING AND REMODELLING SEMANTIC RELATIONSHIPS

Fran Ale lexande der, , Taxonomy Manager, BBC Information and Archives, London, UK Andy y Heather er, , Chief Technical Officer, Dods Parliamentary Communications, London, UK (formerly Principal Programme Architect, BBC Technology, London, UK)

*All views expressed here are entirely our own personal views and in no way represent the BBC or official BBC policy.

slide-2
SLIDE 2

 2 million items of TV and video  300,000 hours of audio  still photographs, sheet music, and documents  4,000 loans per week  Lonclass (London Classification), based on

UDC, introduced 1964

 Telclass (Television Classification), used mainly

by the Natural History Unit (NHU), established 1979

INTRODUCTION TO THE BBC ARCHIVE

slide-3
SLIDE 3

DMI PROJECT – “Fabric”

 launched in 2008  preserve intellectual property and semantic richness of classifications  facilitate publishing of classification data in semantically rich and

interoperable forms

slide-4
SLIDE 4

FACET CLASSES AS A BASIS FOR ONTOLOGICAL RELATIONSHIP MODELLING

Facet/Class Example Format Subject Emergency Services Polyhierarchy Geographic Birmingham Simple hierarchy Event date 1585 Simple hierarchy Motion Takeoff Flat list Organisations The British Library Flat list, divided into sections Person Elizabeth I Flat list, divided into sections Artistic work The Mill on the Floss Flat list, divided into sections Shot type POV Flat list Shooting date (archive) 1971 Simple hierarchy

slide-5
SLIDE 5

ANALYSIS OF LONCLASS

Lonclass: 370,000 concepts

 20,000 simple concepts  350,000 compound concepts  some 150,000 KOS concepts (40%) used for only 1 catalogue item  50,000 (14%) used for only 2 catalogue items  300,000 (80%) used 10 times or fewer  less than 5% of the concepts (approximately 16,000) were used 100 times

  • r more
slide-6
SLIDE 6

ANALYSIS OF A LONCLASS COMPOUND TERM

slide-7
SLIDE 7

DECOMPOSITION METHODOLOGY

 decompose the PCCs in Lonclass and build term hierarchies of each of

the set of defined Classes of Concepts

 use multiple, redundant classification points to mitigate against loss of

semantic accuracy

 define a set of terms from the legacy KOS with value as classifications

for clustering assets

 utilise terms in the legacy KOS with additional semantic value

slide-8
SLIDE 8

PATHWAYS TO ASSET

slide-9
SLIDE 9

CLASSIFICATION DATA MODEL

 nodes in the classification space modelled as Concepts with a variable

number of alternate and preferred terms

 URIs to provide access to concepts and terms

http://fabric.bbc.co.uk/classification/<UUID>

 classification groups containing multiple sets of classification terms  classification groups attached at all levels in the Product Information

hierarchy

 classification groups attached at any point on the media timeline

slide-10
SLIDE 10

INFORMATION DISCOVERY ENVIRONMENT

 integrate the classification space into the Search environment  match queries against the taxonomy to increase the degree of relevance

  • f the response

 open source Solr search engine selected  classification space denormalised in the engine to allow runtime node

counts to be calculated

slide-11
SLIDE 11

PROBLEMS AND LIMITATIONS

 inability of SKOS to fully model the order of relationships between

multiple concept instances

 SKOS vocabulary of relationship types is limited  stopping point for decomposition

slide-12
SLIDE 12

BENEFITS OF EXPORTING TAXONOMIES IN OPEN FORMATS

slide-13
SLIDE 13

CONCLUSIONS

 preserve semantics through migrations  export in open formats

slide-14
SLIDE 14

KEY REFERENCES

Ben-Yitzhak, Neumann, Sznajder et al. (2008). Beyond Basic Faceted Search IBM Research Labs Bergman, M. K. (2009). Confronting Misconceptions with Adaptive Ontologies. [Blog post.] Available at: http://www.mkbergman.com/553/confronting-misconceptions-with-adaptive-ontologies/ Black, P . E. (2004). Dictionary of Algorithms and Data Structures [online], ed., U.S. National Institute of Standards and Technology. Available at: http://www.nist.gov/dads/HTML/directAcycGraph.html Bosch, M. (2006). Ontologies, Different Reasoning Strategies, Different Logics, Different Kinds of Knowledge Representation: Working T

  • gether. Knowledge Organization, 33(3), pp. 153-159.

Brickley, D. (2010). Lonclass and RDF. [Blog post.] Available at http://danbri.org/words/2010/11/18/585 Brickley, D. (2011). Video Linking: Archives and Encyclopedias. [Blog post.] Available at http://danbri.org/words/2011/02/01/658 Foskett, A. C. (1971). The Subject Approach to Information. London, UK: Clive Bingley. Frické, M. (2011). Classification, Facets, and Metaproperties. Journal of Information Architecture, 2 (2). Available at http://journalofia.org/volume2/ issue2/04-fricke/. NoTube http://notube.tv/about-3/partners/ Rodriguez-Castro, B.; Glaser, H.; Carr, L. (2010). How to Reuse a Faceted Classification and Put It on the Semantic Web. In ISWC 2010, Part I, LNCS 6496; P.F. Patel-Schneider et al. (eds.), pp. 663–678. Berlin/Heidelberg Springer-Verlag Acknowledgements Nicholas Chivers; Ken Haylock; Kathryn Stickley; Helen Pritchard (DMI development team); Oliver Gardiner; John Jordan Map of the Semantic Web http://www.flickr.com/photos/jurvetson/3277667570/