Tabled CLP for Reasoning over Stream Data Joaqun Arias 1 , 2 1 IMDEA - - PowerPoint PPT Presentation

tabled clp for reasoning over stream data
SMART_READER_LITE
LIVE PREVIEW

Tabled CLP for Reasoning over Stream Data Joaqun Arias 1 , 2 1 IMDEA - - PowerPoint PPT Presentation

October 18, 2016 Tabled CLP for Reasoning over Stream Data Joaqun Arias 1 , 2 1 IMDEA Software Institute, 2 Technical University of Madrid madrid institute for advanced studies in software development technologies www.software.imdea.org Goal:


slide-1
SLIDE 1

Tabled CLP for Reasoning over Stream Data

October 18, 2016

Joaquín Arias1,2

1IMDEA Software Institute, 2Technical University of Madrid

madrid institute for advanced studies in software development technologies

slide-2
SLIDE 2

1 / 18

www.software.imdea.org

Goal: Change the Way Stream Data is Analyzed

Stream data is a continuous flow of data. The stream data analysis should be updated dynamically because the data is changing. Every day there are more sensors collecting information and we want to analyze this information to make decisions. We propose a high level language rooted in logic and constraint, to write the programs which analyze the data:

  • The language will make it easier to maintenance the programs.
  • The constraint will prune the search space in early stages.
  • It will make possible to reuse previous results to update the analysis.

madrid institute for advanced studies in software development technologies

slide-3
SLIDE 3

2 / 18

www.software.imdea.org

Use Case: Fast Flower Delivery

A consortium of flower stores use independent van drivers to deliver the flowers.

madrid institute for advanced studies in software development technologies

slide-4
SLIDE 4

2 / 18

www.software.imdea.org

Use Case: Fast Flower Delivery

A consortium of flower stores use independent van drivers to deliver the flowers.

  • Broadcast the delivery request to the drives which satisfy a location and ranking.
  • Collect the driver’s bids and assigns the delivery based on the shop requirements.
  • Control the delivery process and

if it is the case generate alerts.

  • Evaluate each driver’s ranking.

madrid institute for advanced studies in software development technologies

slide-5
SLIDE 5

3 / 18

www.software.imdea.org

1 - Easy-to-maintain Programs: Prolog

In Stream Data analysis not only the data changes but also the requirements of the

  • problem. Therefore the programs have to be modified.

Most of the programs are written combining computational and query languages. As a result, the bottleneck is on the human side rather than machine side. Using logic programming the problems are expressed in a more natural way:

  • Etalis [2] (by Darko Anicic et al.)

Instead of 150.000 lines of relational database code, uses 2.500 lines of code.

  • DeALS [16] (developed in UCLA)
  • LogiQL [8] (by LogicBlox)
  • Yedalog [6] (by Google)

70% fewer lines of code than with C++. madrid institute for advanced studies in software development technologies

slide-6
SLIDE 6

4 / 18

www.software.imdea.org

1 - Easy-to-maintain Programs: Prolog

ce1 ( Result ) <− e (Name, Result ) SEQ e (Name, Result ) WHERE (Name = "a " , Result = 1 ) . ce2 ( Result ) <− ce1 ( Result ) AND ce1 ( Result ) WHERE ( Result = 1 ) .

−−−−−−−−−−−−−−−−−−−−−−−−

<Query name= " ce1 " t e x t =" i n s e r t i n t o tmpE(ceName, Result ) select " ce1 " as ceName, e1 . Result as Result from pattern [ every ( + e1=e ( e1 .Name="a" and e1 . Result =1) −> e2=e ( e2 .Name="a" and e2 . Result =1) ) ] " / > <Query name= " ce2 " t e x t =" select " ce2 " as Name, e1 . Result as Result from pattern [ every ( + e1=tmpE( e1 .ceName="ce1 " and e1 . Result =1) and e2=tmpE( e2 .ceName="ce1 " and e2 . Result =1) ) ] " / >

Figure: Two versions of events detection rules written in: ETALIS, a logic programming language (above)

vs ESPER, a relational database language (below)

madrid institute for advanced studies in software development technologies

slide-7
SLIDE 7

5 / 18

www.software.imdea.org

2 - Heterogeneous Data Sources: RDF Stream

Equivalent data (e.g. GPS position) may are generated by different sources which could provide extra information. RDF represents the data as labeled directed edges (triple). RDF_ Stream annotates them with a time reference Subject, Predicate, Object, Time . Using RDF / RDF_Stream the data model is independent from the source so adapting the model is easier when the requirements changes.

:− rdf_register_ns ( ffd , ’ http : / / f a s t _ f l o w e r _ d e li v e r y .com / ’ ) . delivery_ranking_position ( Delivery , Area , Rank) :− rdf_s (Shop , f f d : request , Delivery , _Time ) , r d f (Shop , f f d : area , Area ) , r d f (Shop , f f d : ranking , Rank ) .

Figure: Join query written in Prolog combining RDF and RDF_Stream. madrid institute for advanced studies in software development technologies

slide-8
SLIDE 8

6 / 18

www.software.imdea.org

3 - Background Knowledge: Ontology Domains (OWL)

A shop delivery request is contextualized with the position of the shop. Defining a delivery hierarchy: (⊑)

premium_delivery, type, delivery

  • We do not have to duplicate the code because premium_delivery is a delivery.
  • Additionally we can add specific rules for premium_delivery.

The Ontology Web Language (OWL) can be represented using RDF. The OWL can be used to define concept hierarchies and predicate properties. A common representation for stream data and background knowledge makes it easier to write, maintain and extend the programs. There are several RDF-APIs and ontology reasoners in Prolog, like F-OWL [18].

madrid institute for advanced studies in software development technologies

slide-9
SLIDE 9

7 / 18

www.software.imdea.org

4 - Define Event Relationships: Constraints (CLP)

To increase the rank of a driver we have to check that he picked up and delivered the flowers on time. D, on_time, F, T2 ← T2 #< T1 + 10, D, pickup, F, T1, D, delivery, F, T2.

NOTE: The rule should be fired also if D, delivery, F, T2 arrives before to the system.

Most of the event relationships are time-based and we have to deal with Out-Of-Order data arrival making this problem even more complex. When a event relationship is detected, the system generates a new events which may will be used in several rules. Constraint will make it possible to define more complex relationships.

madrid institute for advanced studies in software development technologies

slide-10
SLIDE 10

8 / 18

www.software.imdea.org

5 - Reuse Answers: Tabling (Answer on Demand)

The top-down execution of Prolog reduces the search tree but can enter loops where the bottom-up execution of Datalog terminates. Tabling solves this drawback and make it possible to reuse previous answers. Many tabling implementations use local scheduling which try to find the answers to a query (to reach the fix point) before returning them. Due to the unbounded nature of data streams the tabling engine should:

  • Discard repeated and redundant answers [15, 4, 3].
  • Return answers on demand [7, 13, 5].
  • Remove obsolete answers (a kind of non-monotonicity). E.g. due to expiration

timestamp.

madrid institute for advanced studies in software development technologies

slide-11
SLIDE 11

9 / 18

www.software.imdea.org

6 - Aggregate Rules: New Semantics

If we want to know the number of deliveries requested by each shop during the last hour, we does not need to store details of delivery requests. Aggregates (e.g. count) are meta-predicates and reduce the volume of data that we have to store. Some research has been done (see [9, 12, 17]) but it is still not clear the correct semantics of aggregates in recursive Prolog program under tabled execution. :- aggregate p(min). p(1). p(0) :- p(1). Neither p(0) nor p(1) are least Herbrand models consistent with the intended se- mantics of the program.

madrid institute for advanced studies in software development technologies

slide-12
SLIDE 12

10 / 18

www.software.imdea.org

7 - Incremental Evaluation: Incremental TCLP

Assume we choose the vans based on its time-distance. When an accident is reported their time-distance should be recalculate. Since does not all the vans are affected incremental strategies [10] can be used.

Figure: Sliding time window from time t to t+1. Example from [11].

Incremental tabling [14] performs dynamic updates of the tabled results taking into account the dependency structure in order to remove the results inferred using an expired fact. The presence of constraints makes it more complex.

madrid institute for advanced studies in software development technologies

slide-13
SLIDE 13

11 / 18

www.software.imdea.org

Our Proposal:

Stream TCLP Prolog RDF Stream OWL CLP Tabling AoD New semantics Incremental TLCP Language H e t e r

  • g

e n e

  • u

s d a t a Background knowledge Event relationships Reuse answers Aggregates Incremental evaluation

madrid institute for advanced studies in software development technologies

slide-14
SLIDE 14

12 / 18

www.software.imdea.org

Preliminary Results: TCLP

TCLP facilitates the integration of CLP solvers with the tabling engine in Ciao Prolog. We validate its advantages versus Prolog, CLP and tabling with respect to:

  • Declarativeness and logical reading.
  • Termination properties.
  • Performance.

Example: Find nodes in a weighted graph within a distance K from each other. It is a typical query for the analysis of social networks [15].

madrid institute for advanced studies in software development technologies

slide-15
SLIDE 15

13 / 18

www.software.imdea.org

Preliminary Results: Declarativeness and logical reading

The program dist/3 finds the nodes in the graph edge/3 within a distance K.

d i s t (X, Y, D) :− d i s t (X, Z , D1) , edge (Z , Y, D2) , D i s D1 + D2. d i s t (X, Y, D) :− edge (X, Y, D) . edge (1 , 2 , 9 ) . edge (2 , 3 , 13). edge ( . . . ?− d i s t (X,Y,D) , D < K. d i s t (X, Y, D) :− D1 #> 0 , D2 #> 0 , D #= D1 + D2, d i s t (X, Z , D1) , edge (Z , Y, D2 ) . d i s t (X, Y, D) :− edge (X, Y, D) . edge (1 , 2 , 9 ) . edge (2 , 3 , D) :− D #> 11 , D #< 15. edge ( . . . ?− D #< K, d i s t (X,Y,D) .

Figure: Versions of distance in a graph: Prolog / tabling (left) and CLP / TCLP (right). madrid institute for advanced studies in software development technologies

slide-16
SLIDE 16

14 / 18

www.software.imdea.org

Preliminary Results: Termination properties

Prolog CLP Tabling TCLP Left recursion x x

  • Without

Right recursion

  • cycles

Left recursion x x x

  • With

Right recursion x

  • x
  • cycles

Termination properties of similar programs.

n 1 2 3 4 madrid institute for advanced studies in software development technologies

slide-17
SLIDE 17

15 / 18

www.software.imdea.org

Preliminary Results: Performance

Prolog CLP(Q) Tabling TCLP(Q) Left recursion – – 144 45 Without Right recursion 1917 200 291 184 cycles1 Left recursion – – – 420 With Right recursion – 4261 – 1027 cycles2

Run time (ms) for dist/3. A ‘–’ means no termination.

n 1 2 3 4

1Graph of 25 nodes and 584 edges. 2Graph of 25 nodes and 785 edges.

madrid institute for advanced studies in software development technologies

slide-18
SLIDE 18

16 / 18

www.software.imdea.org

Open Issues I

Constraint solver over ontologies Only stores an answer if it is more general than the previous answers. Avoid the execution of queries where the concepts are more particular. Can state the relationships defined in the ontology as constraint before the analysis starts to prune the search space. Temporal constraint solver Should deal with the operation required by the temporal reasoning tasks [1]. Answer on demand The tabling engine should use an incremental answering strategy similar to batch scheduling [7], JET mechanism [13] or swapping evaluation [5].

madrid institute for advanced studies in software development technologies

slide-19
SLIDE 19

17 / 18

www.software.imdea.org

Open Issues II

Incremental TCLP A more complex technique, similar to incremental tabling [14], has to be defined in order to: deal with constraints, invalidate knowledge inferred by data which is updated / removed (e.g. when the temporal window slides); and remove previous tabled results. Stream recursive aggregates It is needed a new semantics of aggregates in recursive Prolog programs under tabled execution (see [9, 12, 17]) taken into account the presence of stream data.

madrid institute for advanced studies in software development technologies

slide-20
SLIDE 20

18 / 18

www.software.imdea.org

Questions

madrid institute for advanced studies in software development technologies

slide-21
SLIDE 21

18 / 18

www.software.imdea.org

Questions

THANKS

madrid institute for advanced studies in software development technologies

slide-22
SLIDE 22

19 / 18

www.software.imdea.org

Bibliography I

[1] James F Allen. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):832–843, 1983. [2] Darko Anicic, Paul Fodor, Sebastian Rudolph, Roland Stühmer, Nenad Stojanovic, and Rudi Studer. A rule-based language for complex event processing and reasoning. In International Conference on Web Reasoning and Rule Systems, pages 42–57. Springer, 2010. [3] J. Arias and M. Carro. Description and Evaluation of a Generic Design to Integrate CLP and Tabled Execution. In 18th Int’l. ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming (PPDP’16), pages 10–23. ACM Press, September 2016. [4] P . Chico de Guzmán, M. Carro, M. Hermenegildo, and P . Stuckey. A General Implementation Framework for Tabled CLP. In Tom Schrijvers and Peter Thiemann, editors, FLOPS’12, number 7294 in LNCS, pages 104–119. Springer Verlag, May 2012. madrid institute for advanced studies in software development technologies

slide-23
SLIDE 23

20 / 18

www.software.imdea.org

Bibliography II

[5] P . Chico de Guzmán, M. Carro, and David S. Warren. Swapping Evaluation: A Memory-Scalable Solution for Answer-On-Demand Tabling. Theory and Practice of Logic Programming, 26th Int’l. Conference on Logic Programming (ICLP’10) Special Issue, 10 (4–6):401–416, July 2010. [6] Brian Chin, Daniel von Dincklage, Vuk Ercegovac, Peter Hawkins, Mark S Miller, Franz Och, Christopher Olston, and Fernando Pereira. Yedalog: Exploring knowledge at scale. In LIPIcs-Leibniz International Proceedings in Informatics, volume 32. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2015. [7] Juliana Freire, Terrance Swift, and David Scott Warren. Beyond Depth-First Strategies: Improving Tabled Logic Programs through Alternative Scheduling. Journal of Functional and Logic Programming, 1998(3), 1998. [8] Todd J Green, Dan Olteanu, and Geoffrey Washburn. Live programming in the LogicBlox system: a MetaLogiQL approach. Proceedings of the VLDB Endowment, 8(12):1782–1791, 2015. madrid institute for advanced studies in software development technologies

slide-24
SLIDE 24

21 / 18

www.software.imdea.org

Bibliography III

[9] David B Kemp and Peter J Stuckey. Semantics of logic programs with aggregates. In ISLP, volume 91, pages 387–401. Citeseer, 1991. [10] Danh Le-Phuoc. Operator-aware approach for boosting performance in rdf stream

  • processing. Web Semantics: Science, Services and Agents on the World Wide Web,

2016. [11] Pei Lee, Laks VS Lakshmanan, and Evangelos E Milios. Incremental cluster evolution tracking from highly dynamic network data. In 2014 IEEE 30th International Conference

  • n Data Engineering, pages 3–14. IEEE, 2014.

[12] Nikolay Pelov, Marc Denecker, and Maurice Bruynooghe. Well-Founded and Stable Semantics of Logic Programs with Aggregates. TPLP, 7(3):301–353, 2007. [13] Konstantinos F. Sagonas and Peter J. Stuckey. Just Enough Tabling. In Principles and Practice of Declarative Programming, pages 78–89. ACM, August 2004. [14] Terrance Swift. Incremental tabling in support of knowledge representation and

  • reasoning. Theory and Practice of Logic Programming, 14(4-5):553–567, 2014.

madrid institute for advanced studies in software development technologies

slide-25
SLIDE 25

22 / 18

www.software.imdea.org

Bibliography IV

[15] Terrance Swift and David Scott Warren. Tabling with answer subsumption: Implementation, applications and performance. In Tomi Janhunen and Ilkka Niemelä, editors, JELIA, volume 6341 of Lecture Notes in Computer Science, pages 300–312. Springer, 2010. [16] Deductive Application Language System. http://wis.cs.ucla.edu/deals/. [17] Alexander Vandenbroucke, Maciej Pirog, Benoit Desouter, and Tom Schrijvers. Tabling with Sound Answer Subsumption. Theory and Practice of Logic Programming, 32th Int’l. Conference on Logic Programming (ICLP’16), 16, October 2016. [18] Youyong Zou, Tim Finin, and Harry Chen. F-OWL: An Inference Engine for Semantic

  • Web. In Formal Approaches to Agent-Based Systems, volume 3228 of Lecture Notes in

Computer Science, pages 238–248. Springer Verlag, January 2005. madrid institute for advanced studies in software development technologies