automatically synthesizing sql queries
play

Automatically Synthesizing SQL Queries from Input-Output Examples - PowerPoint PPT Presentation

Automatically Synthesizing SQL Queries from Input-Output Examples Sai Zhang University of Washington Joint work with: Yuyin Sun Goal: making it easier for non-expert users to write correct SQL queries Non-expert database end-users


  1. Automatically Synthesizing SQL Queries from Input-Output Examples Sai Zhang University of Washington Joint work with: Yuyin Sun

  2. Goal: making it easier for non-expert users to write correct SQL queries • Non-expert database end-users – Business analysts, scientists, marketing managers, etc. can describe what the query task is do not know how write a correct query This paper: bridge the gap! 2

  3. An example Table: enrolled Output table Table: student name MAX(score) stu_id course_id score name stu_id 1 504 100 Alice 100 Alice 1 1 505 99 Charlie 88 Bob 2 2 504 96 Charlie 3 3 501 60 Dan 4 3 502 88 3 505 68 Find the name and the maximum course score of each student enrolled in more than 1 course. The correct SQL query: SELECT name, MAX(score) FROM student, enrolled WHERE student.stu_id = enrolled.stu_id GROUP BY student.stu_id HAVING COUNT(enrolled.course_id) > 1

  4. Existing solutions for querying a database • General programming languages + powerful − learning barriers • GUI tools + easy to use − limited in customization and personalization − hard to discover desired features in complex GUIs 4

  5. Our solution: programming by example Table: enrolled Output table Table: student name MAX(score) stu_id course_id score name stu_id 1 504 100 Alice 100 Alice 1 1 505 99 Charlie 88 Bob 2 2 504 96 Charlie 3 3 501 60 Dan 4 3 502 88 3 505 68 SQLSynthesizer SELECT name, MAX(score) FROM student, enrolled WHERE student.stu_id = enrolled.stu_id GROUP BY student.stu_id HAVING COUNT(enrolled.course_id) > 1

  6. How do end-users use SQLSynthesizer? SQL? Desired output result Real, large database tables SQLSynthesizer Small , representative Input-output examples 6

  7. SQLSynthesizer’s advantages • Fully automated − Only requires input-output examples − No need of annotations, hints, or specification of any form • Support a wide range of SQL queries − Beyond the “select -from- where” queries [Tran’09 ] 7

  8. Comparison of solutions GUI tools SQLSynthesizer Ease of Use Programming languages Expressiveness

  9. Outline • Motivation • A SQL Subset • Synthesis Approach • Evaluation • Related Work • Conclusion 9

  10. Designing a SQL subset • 1000+ pages specification • PSPace-Completeness [ Sarma’10 ] • Some features are rarely used The full SQL language A SQL Subset SQLSynthesizer’s focus: a widely-used SQL subset 10

  11. How to design a SQL subset? • Previous approaches: – Decided by the paper authors [ Kandel’11 ] [ Tran’09 ] The full SQL language ? A SQL Subset • Our approach : – Ask experienced IT professionals for the most widely-used SQL features 11

  12. Our approach in designing a SQL subset 1. Online survey : eliciting design requirement − Ask each participant to select 10 most widely-used SQL features − Got 12 responses 2. Designing the SQL subset Supported SQL features SELECT.. FROM…WHERE Supported in the previous work 1) [ Tran’09 ] 2) JOIN 3) GROUP BY / HAVING 4) Aggregators (e.g., MAX, COUNT, SUM, etc) 5) ORDER BY 3. Follow-up interview : obtaining feedback − Ask each participant to rate the sufficiency of the subset 0 5 Not sufficient at all Completely sufficient Average rating: 4.5

  13. Our approach in designing a SQL subset 1. Online survey : eliciting design requirement − Ask each participant to select 10 most widely-used SQL features − Got 12 respondents 1. Designing the SQL subset The SQL subset is enough to Supported SQL features SELECT.. FROM…WHERE Supported in the previous work - [ Tran’09 ] - JOIN write most common queries. - GROUP BY / HAVING - Aggregators (e.g., MAX, COUNT, SUM, etc) - ORDER BY 2. Follow-up interview : obtaining feedback − Ask each participant to rate the sufficiency of the subset 0 5 Not sufficient at all Completely sufficient Average rating: 4.5

  14. Outline • Motivation • Language Design • Synthesis Approach • Evaluation • Related Work • Conclusion 14

  15. SQLSynthesizer Workflow Select the desired query, or provide more examples Input-Output SQLSynthesizer Queries Examples Input Output tables table 15

  16. SQLSynthesizer Workflow Select the desired query, or provide more examples Input-Output SQLSynthesizer Queries Examples A SQL query Input Output tables table 16

  17. SQLSynthesizer Workflow Select the desired query, or provide more examples Input-Output SQLSynthesizer Queries Examples Filter Combine Project Input Output tables table 17

  18. SQLSynthesizer Workflow A complete SQL: Select the desired query, or provide more examples SELECT name, MAX(score) Projection columns FROM student, enrolled Join condition WHERE student.stu_id = enrolled.stu_id GROUP BY student.stu_id Input-Output Query condition SQLSynthesizer Queries HAVING COUNT(enrolled.course_id) > 1 Examples Join Query Projection condition condition columns Filter Combine Project Input Output tables table 18

  19. Multiple solutions Query 1 Query 2 Query 3 Input Output tables table … 19

  20. Multiple solutions Project Filter Combine Input Output tables table … … Computes all solutions, ranks them, and shows them to the user. 20

  21. Key techniques Join Query Projection condition condition columns Filter Combine Project Input Output tables table 1. Combine: Exhaustive search over legal combinations (e.g., cannot join columns with different types) 2. Filter: A machine learning approach to infer query conditions 3. Project: Exhaustive search over legal columns (e.g., cannot apply AVG to a string column) 21

  22. Learning query conditions Cast as a rule learning problem: Finding rules that can perfectly divide a search space into a positive part and a negative part All rows in the The rest of Rows contained joined table the rows in the output table 22

  23. Search space: the joined table Table: student name stu_id The joined table Alice 1 Name stu_id course_id score Bob 2 Join on the Alice 1 504 100 Charlie 3 stu_id column Alice 1 505 99 Dan 4 Bob 2 504 96 (inferred in the Table: enrolled Charlie 3 501 60 Combine step) Charlie 3 502 88 stu_id course_id score Charlie 3 505 68 1 504 100 1 505 99 2 504 96 3 501 60 3 502 88 3 505 68 23

  24. Finding rules selecting rows contained in the output table The joined table name stu_id course_id score Alice 1 504 100 Alice 1 505 99 Bob 2 504 96 Charlie 3 501 60 Charlie 3 502 88 Charlie 4 505 68 Output table name MAX(score) Alice 100 Charlie 88 24

  25. Finding rules selecting rows containing the output table The joined table name name stu_id stu_id course_id score course_id score Alice 1 504 100 Alice 1 504 100 Alice 1 505 99 Alice 1 505 99 Bob 2 504 96 Charlie 3 Charlie 3 501 501 60 60 Charlie 3 Charlie 3 502 502 88 88 Charlie 4 Charlie 4 505 505 68 68 name stu_id course_id score Alice 1 504 100 Alice 1 505 99 Charlie 3 501 60 Charlie 3 502 88 Charlie 4 505 68 25

  26. Finding rules selecting rows containing the output table The joined table name stu_id course_id score Alice 1 504 100 Alice 1 505 99 Bob 2 504 96 No good rules! Charlie 3 501 60 Charlie 3 502 88 Charlie 4 505 68 name stu_id course_id score Alice 1 504 100 Alice 1 505 99 Charlie 3 501 60 Charlie 3 502 88 Charlie 4 505 68

  27. Solution: computing additional features • Key idea: – Expand the search space with additional features • Enumerate all possibilities that a table can be aggregated • Precompute aggregation values as features additional features Suppose grouping it by stu_id MAX(score) SUM COUNT name stu_id course_id score ... (score) (course_id) Alice 1 504 100 100 199 2 Alice 1 505 99 100 199 2 Bob 2 504 96 96 96 1 Charlie 3 501 60 88 216 3 Charlie 3 502 88 88 216 3 Charlie 3 505 68 88 216 3 The joined table

  28. Finding rules without additional features The joined table name stu_id course_id score Alice 1 504 100 Alice 1 505 99 Bob 2 504 96 No good rules! Charlie 3 501 60 Charlie 3 502 88 Charlie 4 505 68 name stu_id course_id score Alice 1 504 100 Alice 1 505 99 Charlie 3 501 60 Charlie 3 502 88 Charlie 4 505 68 28

  29. Finding rules with additional features The joined table after the table is grouped by stu_id … name stu_id course_id score COUNT(course_id) MIN(score) Alice 1 504 100 2 99 Alice 1 505 99 2 99 Bob 2 504 96 1 96 Charlie 3 501 60 3 60 Charlie 3 502 88 3 60 Charlie 4 505 68 3 60 COUNT(course_id) > 1 (after groupping by stu_id ) name stu_id course_id score Alice 1 504 100 SELECT name, MAX(score) Alice 1 505 99 FROM student, enrolled WHERE student.stu_id = Charlie 3 501 60 enrolled.stu_id Charlie 3 502 88 GROUP BY student.stu_id HAVING COUNT(enrolled.course_id) > 1 Charlie 4 505 68

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend