database learning toward a database that becomes smarter
play

Database Learning: Toward a Database that Becomes Smarter Over Time - PowerPoint PPT Presentation

Ahmad Shahab Tajik Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Database Learning: Toward a Database that Becomes Smarter Over Time Yongjoo Park Our Goal: reuse the work Users Database query Answer to query After


  1. Ahmad Shahab Tajik Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Database Learning: Toward a Database that Becomes Smarter Over Time Yongjoo Park

  2. Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases

  3. Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases

  4. Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases

  5. Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases

  6. Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases

  7. Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases Our Goal: reuse the work

  8. Query Synopsis Q Q Q Database A (2% err) A (10% err, 1 sec) A (10% err) Learning Users AQP engine 2 Our high-level approach

  9. Query Synopsis Q Q Database A (2% err) A (10% err, 1 sec) A (10% err) Learning Users AQP engine 2 Our high-level approach Q

  10. Query Synopsis Q Q Q Database A (2% err) A (10% err) Learning Users AQP engine 2 Our high-level approach A (10% err, 1 sec)

  11. Q Q Q A (2% err) A (10% err, 1 sec) A (10% err) Users AQP engine 2 Our high-level approach Query Synopsis Database Learning

  12. Q Q A (2% err) A (10% err, 1 sec) A (10% err) Users AQP engine 2 Our high-level approach Query Synopsis Q Database Learning

  13. Q A (2% err) A (10% err, 1 sec) A (10% err) Users AQP engine 2 Our high-level approach Query Synopsis Q Q Database Learning

  14. Q A (2% err) A (10% err, 1 sec) Users AQP engine 2 Our high-level approach Query Synopsis Q Q Database A (10% err) Learning

  15. Q A (10% err, 1 sec) Users AQP engine 2 Our high-level approach Query Synopsis Q Q Database A (2% err) A (10% err) ˆ Learning

  16. 3 Users Database learning AQP engine Error(%) Time (sec) 10 8 6 4 2 0 8 7 6 5 4 3 2 1 engine AQP Our high-level approach Query Synopsis Q Q Database A (2% err) A (10% err) ˆ Learning 9 10 11 12

  17. 3 Users Database learning AQP engine Error(%) Time (sec) 10 8 6 4 2 0 8 7 6 5 4 3 2 1 engine AQP Our high-level approach Query Synopsis Q Q Database A (2% err) A (10% err) ˆ Learning 9 10 11 12

  18. 3 Users Database learning AQP engine Error(%) Time (sec) 10 8 6 4 2 0 8 7 6 5 4 3 2 1 engine AQP Our high-level approach Query Synopsis Q Q Database A (2% err) A (10% err) ˆ Learning 9 10 11 12

  19. How to leverage those queries for future queries? 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · ·

  20. How to leverage those queries for future queries? 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · ·

  21. How to leverage those queries for future queries? 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · ·

  22. How to leverage those queries for future queries? 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · ·

  23. 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · · How to leverage those queries for future queries?

  24. more queries Q 2 A 2 Q 1 A 1 Q 2 Q 1 and answers 5 . . . . . . . . . . . . . . . Our idea · · · ?

  25. more queries Q 2 A 2 Q 1 A 1 Q 2 and answers 5 . . . . . . . . . . . . . . . Our idea · · · � Q 1 ?

  26. more queries Q 2 A 2 Q 2 Q 1 and answers 5 . . . . . . . . . . . . . . . Our idea · · · � ( Q 1 , A 1 ) ?

  27. more queries Q 2 A 2 Q 2 Q 1 and answers ? 5 . . . . . . . . . . . . . . . Our idea · · · ( Q 1 , A 1 )

  28. more queries Q 2 A 2 Q 1 A 1 Q 1 and answers ? 5 . . . . . . . . . . . . . . . Our idea · · · � Q 2

  29. more queries Q 1 A 1 Q 2 Q 1 and answers ? 5 . . . . . . . . . . . . . . . Our idea · · · � ( Q 2 , A 2 )

  30. more queries Q 1 A 1 Q 2 Q 1 and answers ? 5 . . . . . . . . . . . . . . . Our idea · · · ( Q 2 , A 2 )

  31. Q 2 A 2 Q 1 A 1 Q 2 Q 1 ? 5 . . . . . . . . . . . . . . . Our idea · · · more queries and answers

  32. 6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example

  33. 6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example

  34. 6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example

  35. 6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example

  36. 6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example

  37. 6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example

  38. latency 2. No Assumptions about Data 3. Lightweight BlinkDB DBL 7 Design goals select X3, avg(Y1) select sum(Y2) from t where 5 < X1 < 8; from t where X2 between Apr and May group by X3; 1. Support a wide class of SQL queries

  39. latency 3. Lightweight BlinkDB DBL 7 Design goals select X3, avg(Y1) select sum(Y2) from t where 5 < X1 < 8; from t where X2 between Apr and May group by X3; 1. Support a wide class of SQL queries 2. No Assumptions about Data

  40. BlinkDB DBL 7 Design goals select X3, avg(Y1) select sum(Y2) from t where 5 < X1 < 8; from t where X2 between Apr and May group by X3; 1. Support a wide class of SQL queries latency 2. No Assumptions about Data 3. Lightweight

  41. Our Approach

  42. Problem: Find the most likely answer to the new query ( q n Our result: our answer’s error bound original answer’s error bound Given past queries ( q 1 q n ), a new query ( q n 1 ), and their approximate answers, 1 ) and its estimated error. Under a certain model assumption , (in practice, much more accurate) if the error bounds provide the same probabilistic guarantees. 8 Problem statement

  43. Our result: our answer’s error bound original answer’s error bound Under a certain model assumption , (in practice, much more accurate) if the error bounds provide the same probabilistic guarantees. 8 Problem statement Problem: Given past queries ( q 1 , . . . , q n ), a new query ( q n + 1 ), and their approximate answers, Find the most likely answer to the new query ( q n + 1 ) and its estimated error.

  44. Under a certain model assumption , (in practice, much more accurate) if the error bounds provide the same probabilistic guarantees. 8 Problem statement Problem: Given past queries ( q 1 , . . . , q n ), a new query ( q n + 1 ), and their approximate answers, Find the most likely answer to the new query ( q n + 1 ) and its estimated error. Our result: our answer’s error bound ≤ original answer’s error bound

  45. Random variables (our uncertainty on answers) 1 select sum(Y2) from t where 5 < X1 < 8; 2 3 Probability distribution Estimated answer correlation between answers 2 3 1 3 Pr Two aggregations involve common values 9 2 1 Pr 3 2 1 Overview of our technique select count(Y2) select avg(Y2) from t from t where 1 < X1 < 2; where 6 < X1 < 8;

  46. Random variables (our uncertainty on answers) 1 2 3 Probability distribution Estimated answer correlation between answers 1 Two aggregations involve common values 2 1 3 Pr 3 2 9 Pr 3 2 1 Overview of our technique select count(Y2) select avg(Y2) select sum(Y2) from t from t from t where 1 < X1 < 2; where 6 < X1 < 8; where 5 < X1 < 8;

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend