| 1
From Web 2.0 to Semantic Web A Semi-Automated Approach
Andreas Heß, Christian Maaß and Francis Dierick Lycos Europe 01/06/2008
From Web 2.0 to Semantic Web A Semi-Automated Approach Andreas He, - - PowerPoint PPT Presentation
From Web 2.0 to Semantic Web A Semi-Automated Approach Andreas He, Christian Maa and Francis Dierick Lycos Europe 01/06/2008 | 1 Outline Motivation Proposals for better tagging Tag suggestion / semi-automated tagging Tag
| 1
From Web 2.0 to Semantic Web A Semi-Automated Approach
Andreas Heß, Christian Maaß and Francis Dierick Lycos Europe 01/06/2008
| 2
Outline
» Motivation » Proposals for better tagging » Tag suggestion / semi-automated tagging » Tag merging » Conclusion
| 3
Motivation
» Ontologies: high entrance barriers for ordinary users » Folksonomies: widely used, low entrance barriers » Goals » Draw benefits from complementary nature » Improve quality of folksonomies » Eventually merge folksonomies and ontologies
Experts develop Ontology Person Thing Party
is_a has a is_a
Merkel Occupation
is a
Chancellor
Semantic Web Web 2.0
Angela Merkel
CDU
Berlin
Community provides Content Meta-data (tags) Refers to Search request: Angela Merkel Search result: 123.jpg 123.jpg Experts develop Ontology Person Thing Party
is_a has a is_a
Merkel Occupation
is a
Chancellor
Semantic Web Web 2.0
Angela Merkel
CDU
Berlin
Community provides Content Meta-data (tags) Refers to Search request: Angela Merkel Search result: 123.jpg 123.jpg Experts develop Ontology Person Thing Party
is_a has a is_a
Merkel Occupation
is a
Chancellor
Semantic Web Web 2.0
Advantages
+ Ontology controlled by experts + reasoning, inference
Disadvantages
language of users
Disadvantages
Advantages
+ user's vocabulary + high profliferation & cheap
Angela Merkel
CDU
Berlin
Community provides Content Meta-data (tags) Refers to Search request: Angela Merkel Search result: 123.jpg 123.jpg Experts develop Ontology Person Thing Party
is_a has a is_a
Merkel Occupation
is a
Chancellor
Semantic Web Web 2.0
Mutual assistance
Angela Merkel
CDU
Berlin
Community provides Content Meta-data (tags) Refers to Search request: Angela Merkel Search result: 123.jpg 123.jpg Experts develop Ontology Person Thing Party
is_a has a is_a
Merkel Occupation
is a
Chancellor
Semantic Web Web 2.0
Background information: Merkel → Chancellor
| 9
Moving from Folksonomies to Ontologies: Tag Quality
CPU
Berlin
Angela Merkel
politics
member of parliament Europe
Techno
Alt
screen ugagua hardware computers software
clock Lycos hard drive computer Germany watch histryo MP history Tiger
Tag Merging: Eliminate duplicates / synonyms / misspellings / nonsense
hard drives
| 10
Moving from Folksonomies to Ontologies: Tag Quality
CPU
Berlin
Angela Merkel
politics
member of parliament Europe
Techno
Alt
screen ugagua hardware computers software
clock Lycos hard drive computer Germany watch histryo hard drives MP history Tiger
Tag Merging: Eliminate duplicates / synonyms / misspellings / nonsense
| 11
Moving from Folksonomies to Ontologies: Tag Quality
CPU
Berlin
Angela Merkel
politics
member of parliament Europe
Techno
Alt
screen hardware computers software
clock Lycos Germany watch MP history Tiger
Tag Merging: Eliminate duplicates / synonyms / misspellings / nonsense
hard drives
| 12
Moving from Folksonomies to Ontologies: Tag Quality
CPU
Berlin
Angela Merkel
politics
member of parliament Europe
Techno
Alt
screen hardware computers software
clock Lycos Germany watch MP history Tiger
Topic Detection
hard drives
| 13
Moving from Folksonomies to Ontologies: Tag Quality
CPU screen hardware computers software
hard drives
Topic Detection
| 14
Moving from Folksonomies to Ontologies: Tag Quality
CPU screen hardware computers software
Relation Extraction
hard drives
| 15
Moving from Folksonomies to Ontologies: Tag Quality
CPU screen hardware computers software
Relation Qualification
hard drives
is_a p a r t _
i s _ a
| 16
Proposed Measures
»Semi-Automated Tagging » Lower the threshold towards creating meta-data »Tag Merging » Improving tag quality »Extract Relations » First step on the move from folksonomies to more structured form »User Rating » Involve user in refining quality »Information Extraction » Automatically fill blanks
| 17
Proposed Measures
»Semi-Automated Tagging » Lower the threshold towards creating meta-data »Tag Merging » Improving tag quality »Extract Relations » First step on the move from folksonomies to more structured form »User Rating » Involve user in refining quality »Information Extraction » Automatically fill blanks
| 18
Semi-Automated Tagging
» Text classification, training data needed » Semi-automated annotation of very short texts
| 19
Choice of Classification Algorithm
» Speed is important » Interactive: user does not want to wait » Use well-known Rocchio text classification algorithm » Simple, fast, incremental, suitable for high number of classes » Works well only if texts are short and of similar length » ... but this is the case here » Use part-of-speech-tagger for dimensionality reduction » Only nouns and proper nouns
| 20
Evaluation (I): Precision
» Tested precision with 4 test users » Original tagging far from perfect » Suggestion quality not great » But good enough for interactive use » In 87% at least one correct prediction within top 5
Original Tags Suggested Tags
10 20 30 40 50 60 70 80 90 100
Person 1 Person 2 Person 3 Person 4 Average
| 21
Evaluation (II): absolute numbers
» More correct suggestions than
» Assumption: People will tag more
Original Tags Suggested Tags
500 1000 1500 2000 2500 3000 3500 4000
Incorrect Correct
| 22
Tag Merging
» Goals » Elimination and merging of incorrectly spelled tags » Merging of different spelling variations » Example » „computer“ vs. „computers“ (singular/plural)
| 23
Tag Merging - Algorithm
ABC
Dictionary
tags
tag
tag tag tag Spell Checker Candidates
.543 .334 .275 ...
Inspect using different similarity measures Input Tag Input Tag Similar Tags
with score
.76
tag
.38
tag
| 24
Tag Merging - Algorithm
ABC
Dictionary
tags
tag
tag tag tag Spell Checker Candidates
.543 .334 .275 ...
Inspect using different similarity measures Input Tag Input Tag Similar Tags
with score
.76
tag
.38
tag » Why this extra step?
| 25
Tag Merging - Algorithm
ABC
Dictionary
tags
tag
tag tag tag Spell Checker Candidates
.543 .334 .275 ...
Inspect using different similarity measures Input Tag Input Tag Similar Tags
with score
.76
tag
.38
tag » Computing similarities is slow » Pairwise checking is Θ(n²)
| 26
ABC
tag
tag tag tag Spell Checker Candidates
.543 .334 .275 ...
Inspect using different similarity measures
.76
tag
.38
tag
Tag Merging - Algorithm Levenshtein Jaro-Winkler
| 27
ABC
tag
tag tag tag Spell Checker Candidates
.543 .334 .275 ...
Inspect using different similarity measures
.76
tag
.38
tag
Related Tags
Tag Relations
| 28
Tag Merging - Algorithm
ABC
Dictionary
tags
tag
tag tag tag Spell Checker Candidates
.543 .334 .275 ...
Inspect using different similarity measures Input Tag Input Tag Similar Tags
with score
.76
tag
.38
tag
Fine-tuning with Machine Learning!
| 29
Tag Merging - Evaluation
» Can reach high precision by fine tuning with machine learning » Trade-off between precision and recall tunable » Precision in sample (100 tags): 95% » Fully automated batch processing possible » With this setting 12% smaller tag cloud
| 30
Conclusion
» Proposed ways to combine strengths of folksonomies and ontologies » Semi-automated Tagging and ... » Tag Merging to increase folksonomy quality » Outlined plan for future work
| 31
Thank You for Your Attention!
» Questions?
| 32
Tag Suggestions - Algorithm
» Rocchio with dimensionality reduction
t a g t a g t a g
e x t r a c t t e r m s / d i m e n s i o n a l i t y r e d u c t i o n t a g g e d p o s t i n g s b a g
w o r d s f o r t a g i n d e x
q u e r y
t a g t a g t a g
t a g s