Provenance in Dynamic Linked Data Marcin Wylot Linking - - PowerPoint PPT Presentation

provenance in dynamic linked data
SMART_READER_LITE
LIVE PREVIEW

Provenance in Dynamic Linked Data Marcin Wylot Linking - - PowerPoint PPT Presentation

Provenance in Dynamic Linked Data Marcin Wylot Linking Everything: Dynamic Graphs Integrated and summarized uncertain graph data Dynamic physical and logical network of things Necessity to established transparency 2 Data


slide-1
SLIDE 1

Provenance in Dynamic Linked Data

Marcin Wylot

slide-2
SLIDE 2

Linking Everything: Dynamic Graphs

➢ Integrated and summarized uncertain graph data ➢ Dynamic physical and logical network of “things” ➢ Necessity to established transparency

2

slide-3
SLIDE 3

Data Provenance

“Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.”

What pieces of data and how they were combined to produce the results?

3

slide-4
SLIDE 4

Outline

4

➢ Storing and tracking provenance in Linked Data [DONE] ➢ Restricting query execution with provenance data [DONE] ➢ Provenance in dynamic data [FUTUR] ➢ Provenance for performance [FUTUR]

slide-5
SLIDE 5

How to store and track provenance in Linked Data processing?

5

➢ a new way to express the provenance of query results at two different granularity levels by leveraging the concept of provenance polynomials ➢ two new storage models to represent provenance data in a native RDF data store compactly ➢ query execution strategies to derive the provenance polynomials while executing the queries

Wylot, Marcin, Philippe Cudre-Mauroux, and Paul Groth. "Tripleprov: Efficient processing of lineage queries in a native RDF store." Proceedings of the 23rd international conference on World wide web. ACM, 2014.

slide-6
SLIDE 6

Provenance Polynomials

➢ Ability to characterize ways each source contributed ➢ Pinpoint the exact source to each result ➢ Trace back the list of sources the way they were combined to deliver a result

"Algebraic structures for capturing the provenance of sparql queries." Geerts, Floris, et al. Proceedings of the 16th International Conference

  • n Database Theory. ACM, 2013.

6

slide-7
SLIDE 7

Polynomials Operators

➢ Union (⊕) ○constraint or projection satisfied with multiple sources l1 ⊕ l2 ⊕ l3 ○multiple entities satisfy a set of constraints or projections ➢ Join (⊗) ○sources joined to handle a constraint or a projection ○OS and OO joins between few sets of constraints (l1 ⊕ l2) ⊗ (l3 ⊕ l4)

7

slide-8
SLIDE 8

Example Polynomial

select ?lat ?long where { ?a ?p “Eiffel Tower”. ?a inCountry FR . ?a lat ?lat . ?a long ?long . }

(l1 ⊕ l2 ⊕ l3) ⊗ (l4 ⊕ l5) ⊗ ( l6 ⊕ l7) ⊗ (l8 ⊕ l9)

8

slide-9
SLIDE 9

How can we efficiently support queries tailored with provenance information?

9

➢ a characterization of provenance-enabled queries (RDF queries tailored with provenance data) ➢ storage model and indexing techniques extensions to handle provenance-aware query execution strategies ➢ five provenance-oriented query execution strategies

Wylot, Marcin, Philippe Cudre-Mauroux, and Paul Groth. "Executing provenance- enabled queries over web data." Proceedings of the 24th International Conference on World Wide Web. ACM, 2015.

slide-10
SLIDE 10

Provenance-Enabled Query

A Workload Query is a query producing results a user is interested in. These results are referred to as workload query results. A Provenance Query is a query that selects a set of data from which the workload query results should originate. A Provenance-Enabled Query is a pair consisting of a Workload Query and a Provenance Query, producing results a user is interested in (as specified by the Workload Query) and originating only from data pre-selected by the Provenance Query.

10

slide-11
SLIDE 11

Provenance-Enabled Query: Example

SELECT ?title WHERE { ?a <type> <article> . ?a <tag> <Obama> . ?a <title> ?title . } ➢ ensure that the articles come from sources attributed to the government SELECT ?ctx WHERE { ?ctx prov:wasAttributedTo <government> . } ➢ ensure that the data used to produce the answer was associated a “SeniorEditor” and a “Manager” SELECT ?ctx WHERE { ?ctx prov:wasGeneratedBy <articleProd>. <articleProd> prov:wasAssociatedWith ?ed . ?ed rdf:type <SeniorEdior> . <articleProd> prov:wasAssociatedWith ?m . ?m rdf:type <Manager> . }

11

slide-12
SLIDE 12

TripleProv: Query Execution Pipeline

12

slide-13
SLIDE 13

Lessons Learnt

13

  • Provenance overhead does not have

to be high.

  • We can leverage provenance

information to improve performance.

slide-14
SLIDE 14

Dynamic Linked Data

➢Velocity ➢Dynamic structure of the graph ➢Incompleate data ➢Heterogenous environment

14

slide-15
SLIDE 15

Continuous Provenance Polynomial

  • 1. It has to be computed

efficiently in a continuous fashion along with the execution of the query.

  • 2. It has to take into account

the missing and recovered pieces of the data.

  • 3. It has to show how the

query execution process evolves over time.

15

slide-16
SLIDE 16

Provenance for Performance

16

➢Heavy analytics ➢Hypothetical queries ➢Reasoning

slide-17
SLIDE 17

Take Home Message Provenance can be traced in an efficient way and can be leveraged to improve proformance of query execution.

17