KNOWLEDGE-BASED LINGUISTIC ANNOTATION OF DIGITAL CULTURAL HERITAGE COLLECTION
Speaker: Chenhua Date: 24th Feb 2010
Tuukka Ruotsalo, Lora Aroyo and Guus Schreiber
Outline Introduction Motivation Methodology Experimental Results - - PowerPoint PPT Presentation
K NOWLEDGE -B ASED L INGUISTIC A NNOTATION OF D IGITAL C ULTURAL H ERITAGE C OLLECTION Tuukka Ruotsalo, Lora Aroyo and Guus Schreiber Speaker: Chenhua Date: 24 th Feb 2010 Outline Introduction Motivation Methodology Experimental
Speaker: Chenhua Date: 24th Feb 2010
Tuukka Ruotsalo, Lora Aroyo and Guus Schreiber
2/24/2010 2 Text Mining Seminar
2/24/2010 3 Text Mining Seminar
Better run …
2/24/2010 4 Text Mining Seminar
Is there a smart way to annotate such massive collection?
2/24/2010 5 Text Mining Seminar
– Structured vocabulary – Enhance performance of retrieval
– Concept identification
e.g. Paris as a city
– Role identification
e.g. Paris as a subject matter
Dependency structure analysis Morphological analysis Part of speech tagging Named entity tagging
Phase2: Concept Identification Phase3: Role Identification
Ontology knowledge base Feature knowledge base
Annotation
2/24/2010 7 Text Mining Seminar
2/24/2010 Text Mining Seminar 8
Dependency structure analysis Morphological analysis Part of speech tagging Named entity tagging
Internal dependency structure Subject, direct object Number: singular or plural Verbs, adjectives and nouns Persons, organization, locations, miscellaneous NE
Syntactic features
2/24/2010 9 Text Mining Seminar
Phase2: Concept Identification
Syntactic features
Mapping chucks, NE's, bi- words to KB Examples for matching NEs: NE tagged with persons ULAN others WordNet
Text Mining Seminar 10 2/24/2010
role identification
– “Rembrandt” is an instance of concept “person”, independent of context – “Rembrandt” can take various role , e.g, creator or subject of artworks, dependent of context
– SVM – Based on features:
path parsing constituent to verb or predicate
Phase2: Concept Identification Phase3: Role Identification
Syntactic features
Feature knowledge base
Text Mining Seminar 11 2/24/2010
– ARIA collection from Rijksmuseum Amsterdam – 250 artworks randomly selected – Typical descriptions on “what, who, where, when and which people or culture related to the artworks
– AAT, TGN,ULAN and WordNet
– Visual Resources Association(VRA) specialized
artwork
2/24/2010 12 Text Mining Seminar
2/24/2010 13 Text Mining Seminar
– 61.2% – Baseline method: 57.8% – Human Annotator: 65.1%
– Performance close to the level of human annotator – Performance better than baseline method
2/24/2010 14 Text Mining Seminar
Knowledge base and Natural language processing techniques Improved Performance
More extensive context Co-reference resolution w.r.t. NE
Advanced classification strategies
2/24/2010 15 Text Mining Seminar
description, a set of structured vocabularies, a metadata schema, and a training set of annotations of the text descriptions, the method automatically produces annotations for the objects, and its performance is close to the level of human annotator.
Knowledge- base Natural language techniques
Better performance
Annotation
2/24/2010 16 Text Mining Seminar
2/24/2010 17 Text Mining Seminar
2/24/2010 18 Text Mining Seminar
2/24/2010 19 Text Mining Seminar
Text Mining Seminar 20 2/24/2010