Outline Introduction Motivation Methodology Experimental Results - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Introduction Motivation Methodology Experimental Results - - PowerPoint PPT Presentation

K NOWLEDGE -B ASED L INGUISTIC A NNOTATION OF D IGITAL C ULTURAL H ERITAGE C OLLECTION Tuukka Ruotsalo, Lora Aroyo and Guus Schreiber Speaker: Chenhua Date: 24 th Feb 2010 Outline Introduction Motivation Methodology Experimental


slide-1
SLIDE 1

KNOWLEDGE-BASED LINGUISTIC ANNOTATION OF DIGITAL CULTURAL HERITAGE COLLECTION

Speaker: Chenhua Date: 24th Feb 2010

Tuukka Ruotsalo, Lora Aroyo and Guus Schreiber

slide-2
SLIDE 2

Outline

  • Introduction
  • Motivation
  • Methodology
  • Experimental Results
  • Conclusion

2/24/2010 2 Text Mining Seminar

slide-3
SLIDE 3

Introduction

  • Paris was painted in 1888.
  • In Paris, Van Gogh painted the work in 1888.

2/24/2010 3 Text Mining Seminar

slide-4
SLIDE 4

Motivation

Better run …

2/24/2010 4 Text Mining Seminar

slide-5
SLIDE 5

Research Question

Is there a smart way to annotate such massive collection?

2/24/2010 5 Text Mining Seminar

slide-6
SLIDE 6

Methodology

  • Background knowledge

– Structured vocabulary – Enhance performance of retrieval

  • Automatic annotation

– Concept identification

e.g. Paris as a city

– Role identification

e.g. Paris as a subject matter

slide-7
SLIDE 7

System Architecture

Phase1:Lingustic analysis

Dependency structure analysis Morphological analysis Part of speech tagging Named entity tagging

Phase2: Concept Identification Phase3: Role Identification

Ontology knowledge base Feature knowledge base

Annotation

2/24/2010 7 Text Mining Seminar

slide-8
SLIDE 8

Knowledge Base

  • Art and Architecture Thesaurus (AAT)
  • Getty Thesaurus of Geographic (TGN)
  • Union List of Artist Names (ULAN)
  • WordNet
  • etc.

2/24/2010 Text Mining Seminar 8

slide-9
SLIDE 9

Linguistic Analysis

Dependency structure analysis Morphological analysis Part of speech tagging Named entity tagging

Internal dependency structure Subject, direct object Number: singular or plural Verbs, adjectives and nouns Persons, organization, locations, miscellaneous NE

Syntactic features

2/24/2010 9 Text Mining Seminar

slide-10
SLIDE 10

Concept Identification

  • Define(chunking) and map meaningful units to

concepts in structured vocabularies

  • Perform differently for nouns, verbs and NE's

Phase2: Concept Identification

Syntactic features

Mapping chucks, NE's, bi- words to KB Examples for matching NEs: NE tagged with persons ULAN others WordNet

Text Mining Seminar 10 2/24/2010

slide-11
SLIDE 11

Role Identification

  • Difference between concept and

role identification

– “Rembrandt” is an instance of concept “person”, independent of context – “Rembrandt” can take various role , e.g, creator or subject of artworks, dependent of context

  • How to do role identification task?

– SVM – Based on features:

  • syntactic and semantic
  • E.g. PoS tag, Voice of a sentence verb, PoS

path parsing constituent to verb or predicate

Phase2: Concept Identification Phase3: Role Identification

Syntactic features

Feature knowledge base

Text Mining Seminar 11 2/24/2010

slide-12
SLIDE 12

Evaluation

  • Using a collection of natural language descriptions
  • f artworks.

– ARIA collection from Rijksmuseum Amsterdam – 250 artworks randomly selected – Typical descriptions on “what, who, where, when and which people or culture related to the artworks

  • Using 3 structured vocabularies (Knowledge Base)

– AAT, TGN,ULAN and WordNet

  • Using an artwork annotation schema

– Visual Resources Association(VRA) specialized

  • n

artwork

2/24/2010 12 Text Mining Seminar

slide-13
SLIDE 13

Evaluation (Cont.)

2/24/2010 13 Text Mining Seminar

slide-14
SLIDE 14

Experimental Results

  • Accuracy

– 61.2% – Baseline method: 57.8% – Human Annotator: 65.1%

  • Discussion

– Performance close to the level of human annotator – Performance better than baseline method

2/24/2010 14 Text Mining Seminar

slide-15
SLIDE 15

Knowledge base and Natural language processing techniques Improved Performance

More extensive context Co-reference resolution w.r.t. NE

Further Discussions & Future Work

Advanced classification strategies

2/24/2010 15 Text Mining Seminar

slide-16
SLIDE 16

Summary

  • Given a set of objects each accompanied by a text

description, a set of structured vocabularies, a metadata schema, and a training set of annotations of the text descriptions, the method automatically produces annotations for the objects, and its performance is close to the level of human annotator.

Knowledge- base Natural language techniques

Better performance

  • n

Annotation

2/24/2010 16 Text Mining Seminar

slide-17
SLIDE 17

THANKS!

2/24/2010 17 Text Mining Seminar

slide-18
SLIDE 18

APPENDIX

2/24/2010 18 Text Mining Seminar

slide-19
SLIDE 19

metadata

2/24/2010 19 Text Mining Seminar

slide-20
SLIDE 20

Feature knowledge base

Text Mining Seminar 20 2/24/2010