On the Use of Linked Open Data for Trusting Web Data Davide Ceolin - - PowerPoint PPT Presentation

on the use of linked open data for trusting web data
SMART_READER_LITE
LIVE PREVIEW

On the Use of Linked Open Data for Trusting Web Data Davide Ceolin - - PowerPoint PPT Presentation

On the Use of Linked Open Data for Trusting Web Data Davide Ceolin and Valentina Maccatrozzo VU University Amsterdam Outline Premises Introduction A Natural History Case Study A Cultural Heritage Case Study Future


slide-1
SLIDE 1

On the Use of Linked Open Data for Trusting Web Data

Davide Ceolin and Valentina Maccatrozzo VU University Amsterdam

slide-2
SLIDE 2

Outline

  • Premises
  • Introduction
  • A Natural History Case Study
  • A Cultural Heritage Case Study
  • Future directions
  • Recap, bibliography, etc.
slide-3
SLIDE 3

Premises

  • Trust ≈ Reliability.
  • We make no assumption about the

intentions of the data creator.

  • This presentation gives a reflection on past

work (see refs. ) and outlines future directions.

slide-4
SLIDE 4

Introduction

  • Trust Management: subjective logic (Jøsang,

2001)

  • Extends boolean and probabilistic logic.
  • Reasoning on “opinions” about

propositions based on evidence.

  • Accounts for source and uncertainty

(inversely proportional to size of evidence set).

slide-5
SLIDE 5

Subjective logic: basics

ωproposition = (b, d, u)

source

  • b + d + u = 1
  • b ≈ p(proposition)
  • u inversely proportional to evidence set.
  • operators: boolean, discounting, fusion...
slide-6
SLIDE 6

Trusting Web data using subjective logic

  • We adopt supervised learning algorithms (when a

trainingset is available).

  • Trust is subjective.
  • We look for different “opinions” about the data.

Subjective logic allows us to handle them.

  • First we estimate the data trustworthiness, then

we select the “best” data (based, e.g. on author reputation).

slide-7
SLIDE 7

Using LOD to assist evidential reasoning

  • LOD provide lots of useful data.
  • More evidence.
  • Subjective logic’s distributions ≈ (At least

some) LOD datasets distributions (Ceolin et al., 2011).

slide-8
SLIDE 8

Photo: flickr.com/clumsyjim

Museums...

slide-9
SLIDE 9

Photo: flickr.com/clumsyjim

Museums...

...have a problem.

Photo: flickr.com/grrrl

slide-10
SLIDE 10

Photo: flickr.com/anirudhkoul

So they recruit some help...

slide-11
SLIDE 11

Trusting Museum Annotations

  • Museums manage large collections.
  • Several Museums crowdsource annotations.
  • The quality and accuracy of annotations is

crucial for their business.

  • Can they trust crowdsourced annotations?
slide-12
SLIDE 12

A Natural History Case Study

specimen1 user1 aves xx ✓ specimen 2 user2 aves xyz ✗ specimen 3 user1 aves xz1 ✓ specimen 4 user2 aves yy ✓ specimen 5 user3 aves zz ✗

slide-13
SLIDE 13

A Natural History Case Study

specimen1 user1 aves xx ✓ specimen 2 user2 aves xyz ✗ specimen 3 user1 aves xz1 ✓ specimen 4 user2 aves yy ✓ specimen 5 user3 aves zz ✗

slide-14
SLIDE 14

A Natural History Case Study

specimen1 user1 aves xx ✓ specimen 2 user2 aves xyz ✗ specimen 3 user1 aves xz1 ✓ specimen 4 user2 aves yy ✓ specimen 5 user3 aves zz ✗ tax author1 tax author2 tax author1 tax author1 tax author2

slide-15
SLIDE 15

A Natural History Case Study

specimen1 user1 aves xx ✓ specimen 2 user2 aves xyz ✗ specimen 3 user1 aves xz1 ✓ specimen 4 user2 aves yy ✓ specimen 5 user3 aves zz ✗ tax author1 tax author2 tax author1 tax author1 tax author2

Increased accuracy, from 53% to 82%

  • n a museum dataset

Ceolin et al., 2010

slide-16
SLIDE 16

Another Case Study

Semantic similarity for weighing evidence.

Expertise

Training set New annotation

Tulip Flower Red Pink Purple

Semantic similarity

Rose

Up to: 84% accuracy, 88% precision, 96% Recall

  • n two museum datasets

(Ceolin et al. 2013a)

slide-17
SLIDE 17

Future work

  • We used similar methods (plus other

statistical techniques) for analzying the reliability of UK Police Open Data (Ceolin et al., 2013b).

  • We plan to extend them with LOD, e.g. for:
  • geodisambiguation;
  • crime type hierarchies.
slide-18
SLIDE 18

Recap

  • LOD + Evidential reasoning (subjective logic) is a

powerful combination for trust (reliability) estimation

  • enrichment;
  • weighing.
  • The more the better, but:
  • evidence quality counts;
  • data needs to be tracked (W3C PROV) and

properly managed.

slide-19
SLIDE 19

Bibliography

  • Jøsang, A., A logic for uncertain probabilities. International Journal of Uncertainty,

Fuzziness and Knowledge-Based Systems, 9(3), pp. 279-311, 2001

  • Ceolin, D. van Hage, W. R. Fokkink, W. A Trust Model to Estimate Quality of Annotations

using the Web. In WebSci, Web Science Repository, 2010.

  • Ceolin, D., van Hage, W.R., Fokkink, W., Schreiber, G. Estimating Uncertainty of

Categorical Web Data. In URSW, CEUR-ws.org, 2011.

  • Ceolin, D. Nottamkandath, A. Fokkink, W., Semi-automated Assessment of Annotations
  • Trustworthiness. In PST Conference, IEEE, 2013
  • Ceolin, D. Moreau, L. O'Hara, K. Schreiber, G. Sackley, A. Fokkink, W. van Hage, W.R.

Shadbolt, N., Reliability Analyses of Open Government Data. In URSW, CEUR-ws.org, 2013

  • Ceolin, D. Nottamkandath, A. Fokkink, W. Efficient Semi-automated Assessment of

Annotations Trustworthiness In Journal of Trust Management, Springer. (Accepted, 2014)

slide-20
SLIDE 20

Thank you! Any question? d.ceolin@vu.nl