exploiting exploiting back end back end apis apis fo for
play

Exploiting Exploiting Back-End Back-End APIs APIs fo for Feasible - PowerPoint PPT Presentation

Simon Simon Schiff Schiff, zgr zgr L. L. zep zep Exploiting Exploiting Back-End Back-End APIs APIs fo for Feasible easible Ontology-Based Ontology-Based Stream Stream Access cess Fourth Stream Reasoning Workshop,


  1. Simon Simon Schiff Schiff, Özgür Özgür L. L. Özçep Özçep Exploiting Exploiting Back-End Back-End APIs APIs fo for Feasible easible Ontology-Based Ontology-Based Stream Stream Access cess Fourth Stream Reasoning Workshop, Linköping, 17 th April Institute of Information Systems University of Lübeck

  2. Q STARQL Q SQL historic data OBDA on huge datasets (w/o optimization) ⇒ Long processing times 2 / 6

  3. Q STARQL Q SQL Q SparkSQL historic historic data data Processing times are reducible OBDA on huge datasets using additional hardware and (w/o optimization) Idea parallelisation ⇒ Long processing times ⇒ Short processing times . . . Horizontal scaling 2 / 6

  4. STARQL Query Example Measurements Temp/C ◦ 94 . 0 93 . 0 92 . 0 91 . 0 90 . 0 Time/min 0 . 0 1 . 0 2 . 0 3 . 0 4 . 0 5 . 0 6 . 0 Information need for monotonicity Tell every minute whether the temperature measured by a sensor increased monotonically in the last 5 minutes. 3 / 6

  5. STARQL Query Example Measurements Temp/C ◦ 94 . 0 93 . 0 92 . 0 91 . 0 90 . 0 Time/min 0 . 0 1 . 0 2 . 0 3 . 0 4 . 0 5 . 0 6 . 0 STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 3 / 6

  6. STARQL Query Example Measurements Temp/C ◦ 94 . 0 93 . 0 92 . 0 91 . 0 90 . 0 Time/min 0 . 0 1 . 0 2 . 0 3 . 0 4 . 0 5 . 0 6 . 0 STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 3 / 6

  7. STARQL Query Example Measurements Temp/C ◦ 94 . 0 93 . 0 92 . 0 91 . 0 90 . 0 Time/min 0 . 0 1 . 0 2 . 0 3 . 0 4 . 0 5 . 0 6 . 0 STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 3 / 6

  8. STARQL Query Example Measurements Temp/C ◦ 94 . 0 93 . 0 92 . 0 91 . 0 90 . 0 Time/min 0 . 0 1 . 0 2 . 0 3 . 0 4 . 0 5 . 0 6 . 0 STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 3 / 6

  9. STARQL Query Example Measurements Temp/C ◦ 94 . 0 93 . 0 y2 at t 3 92 . 0 91 . 0 y1 at t 1 90 . 0 Time/min 0 . 0 1 . 0 2 . 0 3 . 0 4 . 0 5 . 0 6 . 0 STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 3 / 6

  10. STARQL Query Example STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 The FOL template language is domain independent 1 : STARQL HAVING clause can be unfolded into languages such as SQL. ⇒ Process historic (e.g. timestamped datasets) 1 Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases: The Logical Level. Addison-Wesley Longman Publishing Co., Inc., 1995. 3 / 6

  11. STARQL Query Example STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 No function exists for executing the unfolded query per window! ◮ First idea: Create table with window intervals and join with historic dataset. ◮ Too slow? 3 / 6

  12. STARQL Query Example STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 No function exists for executing the unfolded query per window! ◮ Second idea: Create a function for executing the unfolded query per window using PL/pgSQL. ◮ Sufficient? 3 / 6

  13. STARQL Query Example STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 No function exists for executing the unfolded query per window! ◮ Second idea: Create a function for executing the unfolded query per window using PL/pgSQL. ◮ Sufficient? ◮ 3 / 6

  14. STARQL Query Example CREATE TYPE window_state AS (memory measurements [], 1 wid bigint , start timestamp , stop timestamp , pulse timestamp); 2 CREATE TABLE measurements_data AS SELECT NULL :: bigint 3 AS wid , NULL :: timestamp AS timestamp , NULL :: integer AS sensor , NULL :: numeric (12 ,3) AS value WHERE false; 4 CREATE OR REPLACE FUNCTION moving_window (source text , 5 pulse interval , range interval , slide interval) RETURNS SETOF measurements_data AS $$ DECLARE 6 win window_state ; 7 8 cnt bigint; 9 10 line_cursor refcursor; 11 line measurements ; 12 BEGIN 13 OPEN line_cursor FOR EXECUTE ’SELECT * FROM ’ format 14 (’%1$s ’, source) ’ ORDER BY timestamp ASC ’; FETCH line_cursor INTO line; 15 16 win.start := line.timestamp; 17 3 / 6

  15. STARQL Query Example win.stop := line.timestamp + range; 18 win.pulse := line.timestamp; 19 20 WHILE line.timestamp < win.stop LOOP 21 win.memory := array_append (win.memory , line); 22 FETCH line_cursor INTO line; 23 END LOOP; 24 25 win.wid := 0; 26 RETURN QUERY SELECT win.wid , (unnest(win.memory :: 27 measurements [])).*; 28 win.pulse := win.pulse + pulse; 29 WHILE line.timestamp IS NOT NULL LOOP 30 IF win.pulse < win.stop AND win.pulse < win.start + 31 slide THEN win.wid := win.wid + 1; 32 RETURN QUERY SELECT win.wid , (unnest(win.memory :: 33 measurements [])).*; win.pulse := win.pulse + pulse; 34 ELSIF win.pulse >= win.stop AND win.pulse < win. 35 start + slide THEN win.pulse := win.pulse + pulse; 36 37 win.start := win.start + slide; 38 3 / 6

  16. STARQL Query Example win.stop := win.stop + slide; 39 WHILE line.timestamp < win.stop LOOP 40 win.memory := array_append (win.memory , line); 41 FETCH line_cursor INTO line; 42 END LOOP; 43 cnt := 1; 44 FOR i IN coalesce( array_lower (win.memory , 1), 1) .. 45 coalesce( array_upper (win.memory , 1), 1) LOOP IF win.memory[i]. timestamp < win.start THEN 46 cnt := cnt + 1; 47 ELSE 48 EXIT; 49 END IF; 50 END LOOP; 51 win.memory := win.memory[cnt :]; 52 ELSIF win.pulse >= win.start + slide THEN 53 win.start := win.start + slide; 54 win.stop := win.stop + slide; 55 WHILE line.timestamp < win.stop LOOP 56 win.memory := array_append (win.memory , line); 57 FETCH line_cursor INTO line; 58 END LOOP; 59 cnt := 1; 60 3 / 6

  17. STARQL Query Example FOR i IN coalesce( array_lower (win.memory , 1), 1) .. 61 coalesce( array_upper (win.memory , 1), 1) LOOP IF win.memory[i]. timestamp < win.start THEN 62 cnt := cnt + 1; 63 ELSE 64 EXIT; 65 END IF; 66 END LOOP; 67 win.memory := win.memory[cnt :]; 68 END IF; 69 END LOOP; 70 win.wid := win.wid + 1; 71 RETURN QUERY SELECT win.wid , (unnest(win.memory :: 72 measurements [])).*; CLOSE line_cursor ; 73 END 74 $$ language plpgsql; 75 3 / 6

  18. STARQL Query Example STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 Function exists for executing the . . . unfolded query per window! Horizontal scaling ◮ Apache Spark SQL scales horizontally and vertically. ◮ Scalable? 3 / 6

  19. Processing times using different back ends 12 PostgreSQL PL/pgSQL Apache Spark SQL 10 Processing time in minutes 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Measurement duration in days 4 / 6

  20. Processing times using different back ends 12 PL/pgSQL Apache Spark SQL 10 Processing time in minutes 8 6 4 2 0 0 20 40 60 80 100 120 140 160 Measurement duration in days 5 / 6

  21. Conclusion and Future Work Conclusion: ◮ Window function can be realized by using PL/pgSQL ◮ Speed gain by using Apache Spark SQL 1 ◮ Complexity hidden by STARQL Future Work: ◮ Incremental stream processing (Not possible for every STARQL query) 1 Simon Schiff, Özgür L. Özçep, and Ralf Möller. “Ontology-based Data Access to Big Data”. In: Open Journal of Databases (OJDB) 6 (1 2018). Postproceeding of Hidest’18, pp. 21–32. 6 / 6

  22. Processing times using distributed Apache Spark SQL 4 1 Node 2 Nodes 3 . 5 3 Nodes 4 Nodes 3 Processing time in hours 2 . 5 2 1 . 5 1 0 . 5 0 0 10 20 30 40 50 60 Measurement duration in years 7 / 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend