SPARQLytics: Multidimensional Analytics for RDF Michael Rudolf - PowerPoint PPT Presentation

SPARQLytics: Multidimensional Analytics for RDF Michael Rudolf Database Technology Group, Technische Universität Dresden March 8, 2017

Agenda Motivation RDF and SPARQL Multidimensional Analytics for RDF 2

Motivation

Focus of Interest Focus moved from single To aggregations over sets To connections between entity (OLTP) of entities of the same kind entities (OLAP) Bookkeeping Who likes what and why? Reporting Where is what? What do the friends of What are the sales figures? your customers buy? 4

Business Use Cases Supply Chain Management Transportation & logistics: routing, tendering, tracking, auditing, payment http://787updates.newairplane.com/787-Suppliers/World-Class-Supplier-Quality 5

Business Use Cases Supply Chain Management Track & Trace Transportation & logistics: routing, Pinpoint product recalls tendering, tracking, auditing, payment Mandated by law for certain industries (e.g. pharmaceuticals, food, waste) EU Commission’s Rapid Alert System non-food (RAPEX) food & feed (RASFF) 2013 2364 3137 2014 2435 3157 http://787updates.newairplane.com/787-Suppliers/World-Class-Supplier-Quality 5

RDF and SPARQL

Resource Description Framework (RDF) [WLC14] Subjects name an entity no built-in schema Predicates describe the relationship can re-use vocabularies and ontologies Objects can be literals or name suitable for inferencing facts @prefix amazon: <http://www.amazon.com/ #> . @prefix customer: <http://www.amazon.com/customer #> . “Consumer contains 1 4 Electronics” @prefix product: <http://www.amazon.com/product #> . “Freddy” ordered 8 part of 16 black FR 24/02/14 @prefix category: <http://www.amazon.com/category #> . part of 32 GB 2 authors “Apple records “Tablets” “Phones” in 7 5 product:1 amazon:capacity "64 GB" . iPhone 5” 5/5 rates 12 in stars product:1 amazon:color "black" . in authors “Steve” 16 GB 11 product:1 amazon:in category:7 . US 5/5 stars 13 rates “Apple category:7 amazon:name "Tablets" . white 3 iPhone 4” “Apple iPad category:7 amazon:partOf category:6 . 1 64 GB likes likes MC707LL/A” contains 2 category:6 amazon:name "Computers & Accessories" . black 9 “Mike” 10 user:8 amazon:country "FR" . 15 US 14 records “Karl” rates contains 1 DE user:8 amazon:rates product:1 . delivered 4/5 stars 24/02/14 7

SPARQL Protocol and RDF Query Language [HS13] Built around pattern matching, produces pattern variable bindings Grouping and aggregation, CRUD operations No multidimensional concepts ➔ complex and error-prone queries PREFIX amazon: <http://www.amazon.com/#> SELECT (AVG(?capacity) AS ?avgCap) (?name AS ?categoryName) WHERE { ?product amazon:in ?category . ?category amazon:name ?name . ?category amazon:partOf+ category:6 . ?product amazon:capacity ?capacity } GROUP BY ?categoryName 8

Multidimensional Analytics for RDF

Multidimensional Data Model [KR13] (Base) Facts Describe events and measurements Mostly numeric and continuous Dimensions Provide context for facts If numeric, then often discrete Can embody structure Measures Are computed from grouped facts Are “arranged” in (hyper-)cubes 10

Multidimensional Data Model [KR13] (Base) Facts Describe events and measurements Slice Mostly numeric and continuous Dimensions Provide context for facts Dice If numeric, then often discrete Can embody structure Measures Drill-down Are computed from grouped facts Roll-up Are “arranged” in (hyper-)cubes 10

Multidimensional Data Model [KR13] (Base) Facts Star schema Describe events and measurements Slice Mostly numeric and continuous Dimensions Provide context for facts Dice If numeric, then often discrete Snowflake schema Can embody structure Measures Drill-down Are computed from grouped facts Roll-up Are “arranged” in (hyper-)cubes 10

From Intensional to Extensional Analytics MD Query User Intension ETL Data Warehouse Data Transformation Intension fixed by domain expert or metadata Import data using ETL process 11

From Intensional to Extensional Analytics MD Query User User Intension Intension ETL Graph Query MD Query Data Warehouse MD Model ... Data Transformation Query Generation Intension fixed by domain Intension fixed by metadata expert or metadata Generate SPARQL queries Import data using ETL from model process 11

From Intensional to Extensional Analytics MD Query Intension & MD Query User User User Intension Intension Graph Query ETL Graph Query MD Query Data Warehouse MD Model ... Time Data Transformation Query Generation Extensional Intension fixed by domain Intension fixed by metadata Intension not fixed up-front expert or metadata Generate SPARQL queries Generate graph queries Import data using ETL from model from user-specified process intension 11

SPARQLytics for the Data Enthusiast SPARQLytics Workflow Artifacts Repository DSL Commands Fact Message User Dimension Time Result Dimension Location Query Cube Postings Query . Generator . . SPARQL endpoint 12

SPARQLytics for the Data Enthusiast SPARQLytics Workflow Example Artifacts Repository USING REPOSITORY "myrepo"; DSL SELECT FACTS { Commands Fact Message ?person rdf:type snvoc:Person ; User snvoc:birthday ?birthday . Dimension Time FILTER (YEAR(NOW()) - YEAR(?birthday) >= 18) Result }; Dimension Location DEFINE DIMENSION "Location" FROM ( ?person snvoc:isLocatedIn ?city . Query Cube Postings Query ?city snvoc:isPartOf ?country . . Generator . ?country snvoc:isPartOf ?continent . ) WITH ( SPARQL LEVEL "City" AS ?city, endpoint LEVEL "Country" AS ?country, LEVEL "Continent" AS ?continent ); 1. Create artifacts in repository DEFINE MEASURE "Avg. No. Languages" AS COUNT(DISTINCT ?language) WHERE ( ?person snvoc:speaks ?language ) WITH "AVG"; CREATE CUBE "QB" FROM "Location", ... WITH "Avg. No. Languages", ...; 12

SPARQLytics for the Data Enthusiast SPARQLytics Workflow Example Artifacts Repository USING CUBE "QB" OVER <http://localhost:3030/ds/sparql>; DSL SLICE("Location", "Country", dbpedia:Italy); Commands Fact Message COMPUTE ("Avg. No. Languages"); User Dimension Time Result Dimension Location Query Cube Postings Query . Generator . . SPARQL endpoint 1. Create artifacts in repository 2. Start session re-using artifacts 12

SPARQLytics for the Data Enthusiast SPARQLytics Workflow Example Artifacts Repository USING CUBE "QB" OVER <http://localhost:3030/ds/sparql>; DSL SLICE("Location", "Country", dbpedia:Italy); Commands Fact Message COMPUTE ("Avg. No. Languages"); User Dimension Time RESET FILTER("Location", "Country"); Result Dimension Location ROLLUP("Location", 1); COMPUTE ("Avg. No. Languages"); Query Cube Postings Query ... . Generator . . SPARQL endpoint 1. Create artifacts in repository 2. Start session re-using artifacts 3. Iteratively explore data, optionally create additional artifacts 12

Summary Big Graph Data Not just social networks, also business scenarios Not enough data scientists, enable data enthusiasts RDF and SPARQL Linked Open Data a rich source of information SPARQL does not expose multidimensional concepts SPARQLytics Re-use core SPARQL elements for defining multidimensional model Generate complex SPARQL queries from analytical session Stateful approach integrates well with data enthusiasts workflow 13

Additional Material & References

References I Charu C. Aggarwal and Haixun Wang. A Survey of Clustering Algorithms for Graph Data. In Charu C. Aggarwal and Haixun Wang, editors, Managing and Mining Graph Data , volume 40 of Advances in Database Systems , chapter 9, pages 275–301. Springer US, 2010. Seyed-Mehdi-Reza Beheshti, Boualem Benatallah, Hamid Reza Motahari-Nezhad, and Mohammad Allahbakhsh. A framework and a language for on-line analytical processing on graphs. In Proceedings of the 13 th International Conference on Web Information Systems Engineering (WISE) , volume 7651 of Lecture Notes in Computer Science , pages 213–227. Springer, 2012. Peter Boncz. LDBC: Benchmarks for Graph and RDF Data Management. In Proc. IDEAS , pages 1–2. ACM, 2013. Fabio Crestani. Application of spreading activation techniques in information retrieval. Artificial Intelligence Review , 11(6):453–482, December 1997. Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han, and Philip S. Yu. Graph OLAP: Towards Online Analytical Processing on Graphs. In Proceedings of the 8 th International Conference on Data Mining , pages 103–112. IEEE, December 2008. Hartmut Ehrig, Gregor Engels, Hans-J¨ org Kreowski, and Grzegorz Rozenberg, editors. Handbook of Graph Grammars and Computing by Graph Transformation: Applications, Languages and Tools , volume 2. World Scientific, 1997. 15

SPARQLytics: Multidimensional Analytics for RDF Michael Rudolf - PowerPoint PPT Presentation

SPARQLytics: Multidimensional Analytics for RDF Michael Rudolf Database Technology Group, Technische Universitt Dresden March 8, 2017 Agenda Motivation RDF and SPARQL Multidimensional Analytics for RDF 2 Motivation Focus of Interest

The Resource Description Framework (RDF 1.1) M2 CPS RDF RDF is to the Semantic Web what HTML

The RDF* and SPARQL* Approach to Annotate Statements in RDF and to Reconcile RDF and Property

Economic and Environmental Rationales The RDF Industry Group welcomes you RDF Export: Analysis of

SPARQL Query Language for RDF Motivation RDF, RDF Schema, OWL provide data and meta- data

RDF* and SPARQL* An Alternatjve Approach to Statement-Level Metadata in RDF Olaf Hartjg

RDF Topics Finish up XML. What is RDF? Why is it interesting? SPARQL: The

EE 355 Unit 5 Multidimensional Arrays Mark Redekopp 2 MULTIDIMENSIONAL ARRAYS 3

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

CHS Field Solar Arrays RDF Advisory Group Presentation July 11, 2017 EP4-34 RDF Grant Contract

RDF and SRF Market Trends May 2019 Harriet Parke, RDF Industry Group Secretariat Agenda

A Transition from RDF to Petri Nets Jan Paredaens Universiteit Antwerpen 11.11.11 Jan Paredaens

RDF Grant Project Briefing for Xcel Energy RDF Advisory Group April 12, 2016 1 Agenda 1.

RDF Syntax RDF (Resource Description Framework) S ubj ect, Predicate and Obj ect Triplets

Module 15 RDF, SPARQL and Semantic Repositories Module 15 Outline 9.45-11.00 RDF/S and OWL

Introduction to RDF Sandro Hawke, W3C @sandhawke Semantic Web Tutorial ISWC 2010 Overview

RDF Beyond RDF Beyond Outline Outline RDFa RDFa Microformat Schema.org S h RDFa

Webinar 8: How to improve postharvest management for horticultural crops Horticulture for

Announcements Dont forget to work on your literature review (due April 11th) The literature

Reachi hing ng v very r y remo mote und nderscreene ned p popula lations ns cha

EVALUATION OF ANIMAL WELFARE DURING TRANSPORT OF SHEEP FOR SLAUGHTER. POR: MIGUEL ANGEL PULIDO

Bayesian Network Resampling for the Analysis of Functional Relationships Marco Scutari

BACK to Basics: The Diagnosis and Management of Low Back Pain Cindy J. Chang M.D. UCSF Primary

MULTI-FAMILY Vacancy is 6.4% +0.7% over 2010 and +1.3% since 2007 PRESENTED BY: Rent

Mother of Unity God is all goodness and everywhere present. He