Stream Reasoning introduction Emanuele Della Valle - - PowerPoint PPT Presentation

stream reasoning introduction
SMART_READER_LITE
LIVE PREVIEW

Stream Reasoning introduction Emanuele Della Valle - - PowerPoint PPT Presentation

Stream Reasoning For Linked Data M. Balduini, J-P Calbimonte, O. Corcho, D. Dell'Aglio, and E. Della Valle http://streamreasoning.org/events/sr4ld2014 Stream Reasoning introduction Emanuele Della Valle emanuele.dellavalle@polimi.it


slide-1
SLIDE 1

Stream Reasoning For Linked Data

  • M. Balduini, J-P Calbimonte, O. Corcho,
  • D. Dell'Aglio, and E. Della Valle

http://streamreasoning.org/events/sr4ld2014

Stream Reasoning introduction

Emanuele Della Valle emanuele.dellavalle@polimi.it http://emanueledellavalle.org

slide-2
SLIDE 2

http://streamreasoning.org/events/sr4ld2014

Share, Remix, Reuse — Legally

§ This work is licensed under the Creative Commons Attribution 3.0 Unported License. § Your are free:

  • to Share — to copy, distribute and transmit the work
  • to Remix — to adapt the work

§ Under the following conditions

  • Attribution — You must attribute the work by inserting

– “[source http://streamreasoning.org/events/sr4ld2014]” at the end of each reused slide – a credits slide stating

  • These slides are partially based on “Streaming Reasoning for Linked Data

2014” by M. Balduini, J-P Calbimonte, O. Corcho, D. Dell'Aglio, and

  • E. Della Valle, http://streamreasoning.org/events/sr4ld2013

§ To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/

2

slide-3
SLIDE 3

http://streamreasoning.org/events/sr4ld2014

Agenda

§ It's a streaming world § Continuous semantics § Data Stream Management Systems and Complex Event Processors § Stream Reasoning § Research Challenges § Approaches § Structure of the tutorial § More on Stream Reasoning at ISWC 2014

3

slide-4
SLIDE 4

http://streamreasoning.org/events/sr4ld2014

4

It‘s a streaming World! 1/3

[source http://y2socialcomputing.files.wordpress.com/2012/06/social-media-visual-last-blog-post-what-happens-in-an-internet-minute-infographic.jpg ]

slide-5
SLIDE 5

http://streamreasoning.org/events/sr4ld2014

It‘s a streaming World! 2/3

§ Oil operations § Traffic § Financial markets § Social networks § Generate data streams!

5

slide-6
SLIDE 6

http://streamreasoning.org/events/sr4ld2014

It's a streaming World! 2/2

§ What is the expected time to failure when that turbine's barring starts to vibrate as detected in the last 10 minutes? § Is public transportation where the people are? § Who are the best available agents to route all these unexpected contacts about the tariff plan launched yesterday? § Who is driving the discussion about the top 10 emerging topics ?

  • E. Della Valle, S. Ceri, F. van Harmelen, D. Fensel It's a Streaming World! Reasoning

upon Rapidly Changing Information. IEEE Intelligent Systems 24(6): 83-89 (2009) 6

slide-7
SLIDE 7

http://streamreasoning.org/events/sr4ld2014

Requirements 1/8

A system able to answer those queries must be able to § handle massive datasets

  • A typical oil production platform is equipped

with about 400.000 sensors

  • Telecom data is the most pervasive data

source in urban are, in Milano there are 1.8 million mobile users

  • A global contact centre of a Telecom
  • perator counts 500 millions of clients
  • Facebook alone has 1.1 billion
  • f active users

7

slide-8
SLIDE 8

http://streamreasoning.org/events/sr4ld2014

Requirements 2/8

A system able to answer those queries must be able to § process data streams on the fly

  • The sensors on typical oil production

platform generates 10,000 observations per minute with peaks of 100,000 o/m

  • The mobile users in Milano generates

20,000 call/sms/data connections per minute with peaks of 80,000 c/m

  • A global contact centre receives

10,000 contacts per minute with peaks of 30,000 c/m

  • Facebook, as of May 2013, observes

3 millions "I like" per minute

8

slide-9
SLIDE 9

http://streamreasoning.org/events/sr4ld2014

Requirements 3/8

A system able to answer those queries must be able to § cope with heterogeneous dataset

  • The sensors on typical oil production

have been deployed over 10 years by 10s of different producers

  • Tens of data sources are normally

needed to make sense of an urban phenomena

  • A global contact centre consists in 100s
  • f offices owned by different subsidiary

companies engaged yearly

  • Each social network has its own

data model, APIs, …

9

slide-10
SLIDE 10

http://streamreasoning.org/events/sr4ld2014

Requirements 4/8

A system able to answer those queries must be able to § cope with incomplete data

  • 10s of sensors and networking links

broke down daily

  • Coverage is incomplete
  • Only standard cases are covered by

fully machine processable data records 100s of contacts per minute are manage ad-hoc

  • Conversations happen outside the

social networks, too!

10

slide-11
SLIDE 11

http://streamreasoning.org/events/sr4ld2014

Requirements 5/8

A system able to answer those queries must be able to § cope with noisy data

  • Sensor out-of-operating range
  • Faulty sensors
  • Agents misunderstand, get tired, …
  • Irony, sarcasm, …

11

slide-12
SLIDE 12

http://streamreasoning.org/events/sr4ld2014

Requirements 6/8

A system able to answer those queries must be able to § provide reactive answers

  • detection of dangerous situations

must occur within minutes

  • recommendations to citizens must

be performed in few seconds

  • routing a contact through each step of

the decision tree must take less than a second

  • Search autocompleting may need

to be updated every few minutes

12

slide-13
SLIDE 13

http://streamreasoning.org/events/sr4ld2014

Requirements 7/8

A system able to answer those queries must be able to § support fine-grained information access

  • Identify a turbine among thousands
  • Locate a bus among thousands
  • Contact an agent among thousands
  • Identify an opinion maker among

thousands of influencers for a topic

13

slide-14
SLIDE 14

http://streamreasoning.org/events/sr4ld2014

Requirements 8/8

A system able to answer those queries must be able to § integrate complex domain models of

  • perational and control process
  • various city aspects
  • contact management, contract types,

agent skills, contactor profiles, …

  • topics, user profiles, …

14

slide-15
SLIDE 15

http://streamreasoning.org/events/sr4ld2014

Requirements (wrap up)

A system able to answer those queries must be able to § handle massive datasets § process data streams on the fly § cope with heterogeneous dataset § cope with incomplete data § cope with noisy data § provide reactive answers § support fine-grained information access § integrate complex domain models

15

slide-16
SLIDE 16

http://streamreasoning.org/events/sr4ld2014

What are data streams anyway?

§ Formally:

  • Data streams are unbounded sequences of time-varying data

elements

§ Less formally:

  • an (almost) “continuous” flow of information

§ Assumption

  • recent information is more relevant as it describes the

current state of a dynamic system

time

16

slide-17
SLIDE 17

http://streamreasoning.org/events/sr4ld2014

The continuous nature of streams

§ The nature of streams requires a paradigmatic change*

  • from persistent data

– to be stored and queried on demand – a.k.a. one time semantics

  • to transient data

– to be consumed on the fly by continuous queries – a.k.a. continuous semantics § * This paradigmatic change first arose in DB community [Henzinger98]

17

slide-18
SLIDE 18

http://streamreasoning.org/events/sr4ld2014

Continuous Semantics

§ Continuous queries registered over streams that, in most

  • f the cases, are observed trough windows

window input streams streams of answer

Registered ¡ Con-nuous ¡ Query ¡

18

Dynamic ¡ System

slide-19
SLIDE 19

http://streamreasoning.org/events/sr4ld2014

Example

§ Input

  • Smoke and Temperature sensors in many areas

§ Query

  • Alert me when there is a fire, i.e. smoke and temp>50

§ DSMS formulation

  • Stream the areas where smoke is detected over two windows
  • pen on smoke and temperature streams

Select IStream(Smoke.area) From Smoke[Rows 30 Slide 10], Temp[Rows 50 Slide 5] Where Smoke.area = Temp.area AND Temp.value > 50

§ CEP formulation

  • Rise a fire event in an area when smoke and high

temperature events are received within 1 minute define Fire(area: string, measuredTemp: double) from Smoke(area=$a) and each Temp(area=$a and val>50) within 1min. where area=Smoke.area and measuredTemp=Temp.value

19

slide-20
SLIDE 20

http://streamreasoning.org/events/sr4ld2014

DSMS/CEP State of the Art

§ Gianpaolo Cugola, Alessandro Margara: Processing flows of information: From data stream to complex event

  • processing. ACM Comput. Surv. 44(3): 15 (2012)

§ Content

  • Type of models compared

– Functional and processing – Deployment and interactions – Data, Time, and Rule – Language

  • # of systems surveyed:

– Academic: 24 – Industrial: 9 – Total: 33

  • To learn more:

– http://home.dei.polimi.it/margara/papers/survey.pdf

20

slide-21
SLIDE 21

http://streamreasoning.org/events/sr4ld2014

21

DSMS/CEP Market Players

[source https://ctrlaltcep.files.wordpress.com/2013/01/cepmarket1212.png ]

slide-22
SLIDE 22

http://streamreasoning.org/events/sr4ld2014

Existing solutions Requirement DSMS CEP massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models

22

✗ ✗ ✗

slide-23
SLIDE 23

http://streamreasoning.org/events/sr4ld2014

§ Given ontology O and query Q, use O to rewrite Q as Q’ so that, for any set of ground facts A contained in multiple databases:

  • answer(Q,O,A) = answer(Q’,,A)

– The answer of the query Q using the ontology O for any set of ground facts A is equal to answer of a query Q’ without considering the ontology O

§ Use mapping M to map Q’ to multiple SQL queries to the various databases

Ontology Based Data Access Rewrite

O Q Q’

Map SQL

M answer

A

23

slide-24
SLIDE 24

http://streamreasoning.org/events/sr4ld2014

Existing solutions Requirement DSMS CEP OBDA massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models

24

✗ ✗ ✗ ✗ ✗ ✗

slide-25
SLIDE 25

http://streamreasoning.org/events/sr4ld2014

Stream Reasoning Definition

§ Making sense

  • in real time
  • f multiple, heterogeneous, gigantic and inevitably noisy data

streams

  • in order to support the decision process of extremely large

numbers of concurrent user

§ Note: making sense of streams necessarily requires processing them against rich background knowledge, an unsolved problem in database

25

  • D. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V. Tresp, A.Rettinger, H. Wermser:

Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics IEEE Intelligent Systems, 30 Aug. 2010.

slide-26
SLIDE 26

http://streamreasoning.org/events/sr4ld2014

Research Challenges

§ Relation with DSMSs and CEPs

  • Just as RDF relates to data-base systems?

§ Data types and query languages for semantic streams

  • Just RDF and SPARQL but with continuous semantics?

§ Reasoning on Streams

  • Theory: formal semantics
  • Efficiency
  • Scalability and approximation

§ Dealing with incomplete & noisy data

  • Even more than on the current Web of Data

§ Distributed and parallel processing

  • Streams are parallel in nature, data stream sources are

distributed, …

§ Engineering Stream Reasoning Applications

  • Development Environment
  • Integration with other technologies
  • Benchmarks as rigorous means for comparison

26

slide-27
SLIDE 27

http://streamreasoning.org/events/sr4ld2014

Stream Reasoning feasibility (intuition)

§ Many relevant reasoning methods are not able to deal with high frequency data streams § However, trade-off exists between the complexity of the reasoning method and the frequency of the data stream the reasoner

27 Raw ¡Stream ¡Processing ¡ Seman-c ¡Streams ¡

Logic ¡Programs ¡

DL ¡ Complexity ¡ Reasoning ¡ Querying ¡ Rewri-ng ¡ Abstrac-on ¡ Selec-on ¡ Interpreta-on ¡ Change ¡Frequency ¡

PTIME ¡ 2NEXPTIME ¡

104 ¡Hz ¡ 1 ¡Hz ¡ ¡

Dynamics ¡and ¡Scale ¡vs. ¡Complexity ¡ Heiner Stuckenschmidt, Stefano Ceri, Emanuele Della Valle, Frank van Harmelen: Towards Expressive Stream Reasoning. Proceedings of the Dagstuhl Seminar on Semantic Aspects of Sensor Networks, 2010.

AC0 ¡

slide-28
SLIDE 28

http://streamreasoning.org/events/sr4ld2014

Approaches (a selection) 1/4

§ RDF Stream Processors (ordered by year)

  • Streaming SPARQL

– Andre Bolles, Marco Grawunder, Jonas Jacobi: Streaming SPARQL - Extending SPARQL to Process Data Streams. ESWC 2008: 448-462

  • C-SPARQL

– Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus: Querying RDF streams with C-SPARQL. SIGMOD Record 39(1): 20-26 (2010)

  • SPARQLstream

– Jean-Paul Calbimonte, Óscar Corcho, Alasdair J. G. Gray: Enabling Ontology-Based Access to Streaming Data Sources. International Semantic Web Conference (1) 2010: 96-111

  • CQELS

– Danh Le Phuoc, Minh Dao-Tran, Josiane Xavier Parreira, Manfred Hauswirth: A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data. International Semantic Web Conference (1) 2011: 370-388

  • It continues in next slide …

28

slide-29
SLIDE 29

http://streamreasoning.org/events/sr4ld2014

Approaches (a selection) 2/4

  • … it continues from previous slide
  • INSTANS

– Rinne, M., Nuutila, E., Törma, S.: INSTANS: High-Performance Event Processing with Standard RDF and SPARQL. Poster in ISWC2012.

  • Streaming Linked Data

– Marco Balduini, Emanuele Della Valle, Daniele Dell’Aglio, Mikalai Tsytsarau, Themis Palpanas, Cristian Confalonieri: Social listening of City Scale Events using the Streaming Linked Data

  • Framework. ISWC 2013
  • TEF-SPARQL (under development)

– Shen Gao, Thomas Scharrenbach, Abraham Bernstein: The CLOCK Data-Aware Eviction Approach: Towards Processing Linked Data Streams with Limited Resources. ESWC 2014: 6-20

29

slide-30
SLIDE 30

http://streamreasoning.org/events/sr4ld2014

Approaches (a selection) 3/4

§ Stream Reasoners (ordered by year)

  • Streaming Knowledge Bases

– Walavalkar, O., Joshi, A., Finin, T., Yesha, Y., 2008. Streaming knowl- edge bases. In: In International Workshop on Scalable Semantic Web Knowledge Base Systems

  • IMaRS

– Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus: Incremental Reasoning on Streams and Rich Background Knowledge. ESWC (1) 2010: 1-15

  • TrOWL

– Yuan Ren, Jeff Z. Pan: Optimising ontology stream reasoning with truth maintenance system. CIKM 2011: 831-836

  • ETALIS (EP-SPARQL)

– Darko Anicic, Paul Fodor, Sebastian Rudolph, Nenad Stojanovic: EP-SPARQL: a unified language for event processing and stream

  • reasoning. WWW 2011: 635-644
  • It continues in next slide …

30

slide-31
SLIDE 31

http://streamreasoning.org/events/sr4ld2014

Approaches (a selection) 4/4

  • … continues from previous slide
  • Sparkwave

– Srdjan Komazec, Davide Cerri, Dieter Fensel: Sparkwave: continuous schema-enhanced pattern matching over RDF data

  • streams. DEBS 2012: 58-68
  • SR-Based on Answer Set Programming

– Martin Gebser, Torsten Grote, Roland Kaminski, Philipp Obermeier, Orkunt Sabuncu, Torsten Schaub: Stream Reasoning with Answer Set Programming: Preliminary Report. KR 2012

  • Parallelising Stream Reasoning

– Jacopo Urbani, Alessandro Margara, Ceriel J. H. Jacobs, Frank van Harmelen, Henri E. Bal: DynamiTE: Parallel Materialization of Dynamic RDF Data. International Semantic Web Conference (1) 2013: 657-672 – Chang Liu, Jacopo Urbani, Guilin Qi: Efficient RDF stream reasoning with graphics processingunits (GPUs). WWW (Companion Volume) 2014: 343-344

  • STARQL (under development)

– ÖL Özçep, R Möller, C Neuenstadt, “A Stream-Temporal Query Language for Ontology Based Data Access”. Description Logics, 2014

31

slide-32
SLIDE 32

http://streamreasoning.org/events/sr4ld2014

32

Running Example

BlueRoom RedRoom RedSensor BlueSensor

R

f 4 f

Alice David Bob Carl Elena

R

RFID

4

Foursquare

f

Facebook is with

slide-33
SLIDE 33

http://streamreasoning.org/events/sr4ld2014

Running Example

§ Four ways to learn who is where

33 Sensor Room Person Time-stamp RedSensor RedRoom Alice T1 … … … … Person ChecksIn Time-stamp Bob BlueRoom T2 … … … Person IsIn With Time-stamp Carl null Bob T2 David RedRoom Elena T3 … … … …

slide-34
SLIDE 34

http://streamreasoning.org/events/sr4ld2014

34

Running Example – Data Model

Observation Sensor Person Post Room where discusses who

  • bserves

subClassOf subClassOf posts subPropOf

Streaming information Background information

isWith isIn isConnectedTo

slide-35
SLIDE 35

http://streamreasoning.org/events/sr4ld2014

Running Example – Data Model (formally)

§ Details about hands-on ontology

  • isConnectedTo is a symmetric property
  • discusses is a transitive property
  • isWith is a composition of posts and who
  • isIn is either a composition of posts and where
  • r a composition of isWith and isIn

§ Available online

  • http://www.streamreasoning.org/ontologies/sr4ld2014-onto.rdf

35

slide-36
SLIDE 36

http://streamreasoning.org/events/sr4ld2014

Structure of the tutorial

§ 9.00 - 10.30

  • Stream Reasoning introduction (30 min)
  • RDF stream processing models (60 min)

§ 11.00 – 12.30

  • An overview of Stream Reasoning (30 min)
  • C-SPARQL Engine: A RDF Stream Processing system for the

Continuous Extension of SPARQL (C-SPARQL) with Naive Stream Reasoning support (30m)

  • MorphStream: Ontology-based streaming data access (30m)

§ 13:45 - 15.30

  • Hands on session (120min)

– From Web tools to JavaCode in Eclipse

§ 16:00 - 17.30

  • IMaRS: Incremental Materialization for RDF Streams (30m)
  • Other Stream Reasoning approaches (30 min)
  • Wrap-up and conclusions (30 min)

36

slide-37
SLIDE 37

http://streamreasoning.org/events/sr4ld2014

Water, water, every where, Nor any drop to drink.

  • - The Rime of the Ancient Mariner

Samuel Taylor Coleridge, 1798

Streams, streams everywhere nor any actionable fact to use

  • - Emanuele and Daniele, 2013 :-P

37

Have fun! Any question?

slide-38
SLIDE 38

Stream Reasoning For Linked Data

  • M. Balduini, J-P Calbimonte, O. Corcho,
  • D. Dell'Aglio, and E. Della Valle

http://streamreasoning.org/events/sr4ld2014

Stream Reasoning introduction

Emanuele Della Valle emanuele.dellavalle@polimi.it http://emanueledellavalle.org