From Web 2.0 to Semantic Web A Semi-Automated Approach Andreas He, - - PowerPoint PPT Presentation

from web 2 0 to semantic web a semi automated approach
SMART_READER_LITE
LIVE PREVIEW

From Web 2.0 to Semantic Web A Semi-Automated Approach Andreas He, - - PowerPoint PPT Presentation

From Web 2.0 to Semantic Web A Semi-Automated Approach Andreas He, Christian Maa and Francis Dierick Lycos Europe 01/06/2008 | 1 Outline Motivation Proposals for better tagging Tag suggestion / semi-automated tagging Tag


slide-1
SLIDE 1

| 1

From Web 2.0 to Semantic Web A Semi-Automated Approach

Andreas Heß, Christian Maaß and Francis Dierick Lycos Europe 01/06/2008

slide-2
SLIDE 2

| 2

Outline

» Motivation » Proposals for better tagging » Tag suggestion / semi-automated tagging » Tag merging » Conclusion

slide-3
SLIDE 3

| 3

Motivation

» Ontologies: high entrance barriers for ordinary users » Folksonomies: widely used, low entrance barriers » Goals » Draw benefits from complementary nature » Improve quality of folksonomies » Eventually merge folksonomies and ontologies

slide-4
SLIDE 4

Experts develop Ontology Person Thing Party

is_a has a is_a

Merkel Occupation

is a

Chancellor

Semantic Web Web 2.0

slide-5
SLIDE 5

Angela Merkel

CDU

Berlin

Community provides Content Meta-data (tags) Refers to Search request: Angela Merkel Search result: 123.jpg 123.jpg Experts develop Ontology Person Thing Party

is_a has a is_a

Merkel Occupation

is a

Chancellor

Semantic Web Web 2.0

slide-6
SLIDE 6

Angela Merkel

CDU

Berlin

Community provides Content Meta-data (tags) Refers to Search request: Angela Merkel Search result: 123.jpg 123.jpg Experts develop Ontology Person Thing Party

is_a has a is_a

Merkel Occupation

is a

Chancellor

Semantic Web Web 2.0

Advantages

+ Ontology controlled by experts + reasoning, inference

Disadvantages

  • Language of experts !=

language of users

  • low proliferation & expensive

Disadvantages

  • error-prone & unstructured
  • lack of quality control

Advantages

+ user's vocabulary + high profliferation & cheap

slide-7
SLIDE 7

Angela Merkel

CDU

Berlin

Community provides Content Meta-data (tags) Refers to Search request: Angela Merkel Search result: 123.jpg 123.jpg Experts develop Ontology Person Thing Party

is_a has a is_a

Merkel Occupation

is a

Chancellor

Semantic Web Web 2.0

Mutual assistance

slide-8
SLIDE 8

Angela Merkel

CDU

Berlin

Community provides Content Meta-data (tags) Refers to Search request: Angela Merkel Search result: 123.jpg 123.jpg Experts develop Ontology Person Thing Party

is_a has a is_a

Merkel Occupation

is a

Chancellor

Semantic Web Web 2.0

Background information: Merkel → Chancellor

slide-9
SLIDE 9

| 9

Moving from Folksonomies to Ontologies: Tag Quality

CPU

Berlin

Angela Merkel

politics

member of parliament Europe

Techno

Alt

screen ugagua hardware computers software

clock Lycos hard drive computer Germany watch histryo MP history Tiger

Tag Merging: Eliminate duplicates / synonyms / misspellings / nonsense

hard drives

slide-10
SLIDE 10

| 10

Moving from Folksonomies to Ontologies: Tag Quality

CPU

Berlin

Angela Merkel

politics

member of parliament Europe

Techno

Alt

screen ugagua hardware computers software

clock Lycos hard drive computer Germany watch histryo hard drives MP history Tiger

Tag Merging: Eliminate duplicates / synonyms / misspellings / nonsense

slide-11
SLIDE 11

| 11

Moving from Folksonomies to Ontologies: Tag Quality

CPU

Berlin

Angela Merkel

politics

member of parliament Europe

Techno

Alt

screen hardware computers software

clock Lycos Germany watch MP history Tiger

Tag Merging: Eliminate duplicates / synonyms / misspellings / nonsense

hard drives

slide-12
SLIDE 12

| 12

Moving from Folksonomies to Ontologies: Tag Quality

CPU

Berlin

Angela Merkel

politics

member of parliament Europe

Techno

Alt

screen hardware computers software

clock Lycos Germany watch MP history Tiger

Topic Detection

hard drives

slide-13
SLIDE 13

| 13

Moving from Folksonomies to Ontologies: Tag Quality

CPU screen hardware computers software

hard drives

Topic Detection

slide-14
SLIDE 14

| 14

Moving from Folksonomies to Ontologies: Tag Quality

CPU screen hardware computers software

Relation Extraction

hard drives

slide-15
SLIDE 15

| 15

Moving from Folksonomies to Ontologies: Tag Quality

CPU screen hardware computers software

Relation Qualification

hard drives

is_a p a r t _

  • f

i s _ a

slide-16
SLIDE 16

| 16

Proposed Measures

»Semi-Automated Tagging » Lower the threshold towards creating meta-data »Tag Merging » Improving tag quality »Extract Relations » First step on the move from folksonomies to more structured form »User Rating » Involve user in refining quality »Information Extraction » Automatically fill blanks

slide-17
SLIDE 17

| 17

Proposed Measures

»Semi-Automated Tagging » Lower the threshold towards creating meta-data »Tag Merging » Improving tag quality »Extract Relations » First step on the move from folksonomies to more structured form »User Rating » Involve user in refining quality »Information Extraction » Automatically fill blanks

slide-18
SLIDE 18

| 18

Semi-Automated Tagging

» Text classification, training data needed » Semi-automated annotation of very short texts

slide-19
SLIDE 19

| 19

Choice of Classification Algorithm

» Speed is important » Interactive: user does not want to wait » Use well-known Rocchio text classification algorithm » Simple, fast, incremental, suitable for high number of classes » Works well only if texts are short and of similar length » ... but this is the case here » Use part-of-speech-tagger for dimensionality reduction » Only nouns and proper nouns

slide-20
SLIDE 20

| 20

Evaluation (I): Precision

» Tested precision with 4 test users » Original tagging far from perfect » Suggestion quality not great » But good enough for interactive use » In 87% at least one correct prediction within top 5

Original Tags Suggested Tags

10 20 30 40 50 60 70 80 90 100

Person 1 Person 2 Person 3 Person 4 Average

slide-21
SLIDE 21

| 21

Evaluation (II): absolute numbers

» More correct suggestions than

  • riginal tags in total

» Assumption: People will tag more

Original Tags Suggested Tags

500 1000 1500 2000 2500 3000 3500 4000

Incorrect Correct

slide-22
SLIDE 22

| 22

Tag Merging

» Goals » Elimination and merging of incorrectly spelled tags » Merging of different spelling variations » Example » „computer“ vs. „computers“ (singular/plural)

slide-23
SLIDE 23

| 23

Tag Merging - Algorithm

ABC

Dictionary

tags

tag

tag tag tag Spell Checker Candidates

.543 .334 .275 ...

Inspect using different similarity measures Input Tag Input Tag Similar Tags

with score

.76

tag

.38

tag

slide-24
SLIDE 24

| 24

Tag Merging - Algorithm

ABC

Dictionary

tags

tag

tag tag tag Spell Checker Candidates

.543 .334 .275 ...

Inspect using different similarity measures Input Tag Input Tag Similar Tags

with score

.76

tag

.38

tag » Why this extra step?

slide-25
SLIDE 25

| 25

Tag Merging - Algorithm

ABC

Dictionary

tags

tag

tag tag tag Spell Checker Candidates

.543 .334 .275 ...

Inspect using different similarity measures Input Tag Input Tag Similar Tags

with score

.76

tag

.38

tag » Computing similarities is slow » Pairwise checking is Θ(n²)

slide-26
SLIDE 26

| 26

ABC

tag

tag tag tag Spell Checker Candidates

.543 .334 .275 ...

Inspect using different similarity measures

.76

tag

.38

tag

Tag Merging - Algorithm Levenshtein Jaro-Winkler

slide-27
SLIDE 27

| 27

ABC

tag

tag tag tag Spell Checker Candidates

.543 .334 .275 ...

Inspect using different similarity measures

.76

tag

.38

tag

Related Tags

Tag Relations

slide-28
SLIDE 28

| 28

Tag Merging - Algorithm

ABC

Dictionary

tags

tag

tag tag tag Spell Checker Candidates

.543 .334 .275 ...

Inspect using different similarity measures Input Tag Input Tag Similar Tags

with score

.76

tag

.38

tag

Fine-tuning with Machine Learning!

slide-29
SLIDE 29

| 29

Tag Merging - Evaluation

» Can reach high precision by fine tuning with machine learning » Trade-off between precision and recall tunable » Precision in sample (100 tags): 95% » Fully automated batch processing possible » With this setting 12% smaller tag cloud

slide-30
SLIDE 30

| 30

Conclusion

» Proposed ways to combine strengths of folksonomies and ontologies » Semi-automated Tagging and ... » Tag Merging to increase folksonomy quality » Outlined plan for future work

slide-31
SLIDE 31

| 31

Thank You for Your Attention!

» Questions?

slide-32
SLIDE 32

| 32

Tag Suggestions - Algorithm

» Rocchio with dimensionality reduction

t a g t a g t a g

e x t r a c t t e r m s / d i m e n s i o n a l i t y r e d u c t i o n t a g g e d p o s t i n g s b a g

  • f

w o r d s f o r t a g i n d e x

q u e r y

t a g t a g t a g

t a g s