bringing sql to the masses with program synthesis
play

Bringing SQL to the Masses with Program Synthesis Chenglong Wang, - PowerPoint PPT Presentation

Bringing SQL to the Masses with Program Synthesis Chenglong Wang, Alvin Cheung, Ras Bodik University of Washington 1 End-User SQL Select x.id, x.customer, x.total From PURCHASES x Select rows with Join (Select p.customer, maximum value


  1. Bringing SQL to the Masses with Program Synthesis Chenglong Wang, Alvin Cheung, Ras Bodik University of Washington 1

  2. End-User SQL Select x.id, x.customer, x.total From PURCHASES x Select rows with Join (Select p.customer, maximum value for Max(total) From PURCHASES p each group. Group By p.customer) y On y.customer = x.customer And y.max_total = x.total Select * Find rows From Users a containing duplicate Where Exists (Select * values. From Users b Where (a.name = b.name Or a.email = b.email) And a.ID <> b.id) Calculate the Select a.ord, a.total, Sum(b.total) running total for a From t As a Join t As b table. Where b.ord <= a.ord Group By a.ord,a.total Order By a.ord 2

  3. Observations A lot of common tasks require using complex SQL constructs . greatest-n-per-group Aggregation running-total Subquery duplicates Exists/In-clauses Many tasks can be concisely expressed with input-output examples . Idea: summarize our observation on StackOverflow Transition: these problems can be concisely expressed with examples, can we build some system that allows users to ask question using examples only? 3

  4. Programming by Example System T2 oid val T1 id date uid 1 30 Join two tables and return 1 12/25 1 1 10 the rows containing the 2 11/21 3 1 10 maximum val below 50 for 4 12/24 2 2 50 2 10 each group. Synthesize Out oid date uid oid MaxVal Select * 1 12/25 1 1 30 From T1 4 12/24 2 2 10 Join (Select id, Max(val) From T2 Where val < 50 Group By oid) T3 On T3.oid = T1.uid Idea: introduce what we want to do: build a PBE system. Transition: let’s first see what is the traditional algorithm to build such system. 4

  5. Synthesis Algorithm: Value-directed Search FlashFill Input : 2, 2, Output : 6, Operators : add, mul add(2, add(2, 2)) mul(2, add(2, 2)) add(2, 4) 6 add(2, 2) 2 4 add(2, mul(2, 2)) mul(2, 2) 2 add(4, 4) 8 mul(2, mul(2,2 )) SuperOptimizer add(2, add(2, 2)) = 6 add(2, mul(2, 2)) = 6 5

  6. id date uid oid MaxVal 1 12/25 1 1 30 Enumerative Search on SQL 2 11/21 1 1 30 4 12/24 2 1 30 1 12/25 1 1 10 2 11/21 1 1 10 Select * 4 12/24 2 1 10 From T1 Input: T1, T2, Output: T out , Operators: Select, Join, Aggr 1 12/25 1 2 50 Join (Select id, Max(val) As MaxVal 2 11/21 1 2 50 From T2 4 12/24 2 2 50 Where val < 50 1 12/25 1 2 10 evaluate enumerate Select * 2 11/21 1 2 10 Group By oid) T3 Select * From T1 From T3 Join T2 id date uid 4 12/24 2 2 10 Where id > 1 On T3.oid = T1.uid On True 2 11/21 3 Select * From T1 4 12/24 2 …… T1 Select id, Max(uid) Where id > uid From T3 id date uid Group By id T3 …… 1 12/25 1 …… id date uid q4 2 11/21 3 Select * …… 1 12/25 1 q1 From T3 4 12/24 2 Select * Select * Join T4 2 11/21 3 From T3 From T1 On uid = oid Join T5 Where True 4 12/24 2 On id = oid T2 …… T6 Select * q4 oid val T4 From T2 oid date uid oid MaxVal Select * 1 30 From T3 oid MaxVal q3 2 12/25 1 1 30 Join T5 1 10 Select oid,Max(val) 1 30 On uid = oid q2 4 12/24 2 2 10 T5 From T4 1 10 1 10 Select * Group By oid …… oid MaxVal = From T2 2 50 1 10 Where val < 50 1 30 2 10 2 10 2 10 Select oid,Max(val) oid date uid oid MaxVal From T4 Select * oid MaxVal Group By oid From T2 1 12/25 1 1 30 Having maxVal < 50 2 50 Tout Where val = 50 4 12/24 2 2 10 oid date uid oid MaxVal …… 1 12/25 1 1 30 4 12/24 2 2 10 Challenge 1: Challenge 2: Large number of queries Large tables 6

  7. Insight: Decomposition Select * From T1 T2 Join (Select id, Max(val) T1 oid val From T2 id date uid 1 30 1 12/25 1 1 10 Where val < 50 2 11/21 3 1 10 Group By oid) T3 4 12/24 2 2 50 On T3.oid = T1.uid 2 10 Search in the space of SQL queries Tout oid date uid oid MaxVal 2 12/25 1 1 30 4 12/24 2 2 10 7

  8. Insight: Decomposition Abstract Queries Select * Select * From (Select * From (Select * From T1 From T1 Where □ ) T2 Where True) Join (Select id, Max(val) oid val Join (Select id, Max(val) T1 From T2 1 30 From T2 id date uid Where □ 1 10 Where val < 50 1 12/25 1 Group By oid 1 10 Group By oid 2 11/21 3 2 50 Having □ ) T3 Having True) T3 4 12/24 2 2 10 On □ On T3.oid = T1.uid Search abstract Instantiate SQL queries abstract queries Tout oid date uid oid MaxVal 1 12/25 1 1 30 4 12/24 2 2 10 Pro: Smaller space of programs. Challenge: which ones to search for? 8

  9. Goal: Select * Search with Abstract Queries From (Select * From T1 Where □ ) Join (Select id, Max(val) From T2 Input: T1, T2, Output: T out , Operators: abstract query operators Where □ Group By oid Having □ ) T3 Select id,Max(uid) On □ … … From T3 Group By id T3 Having □ T1 id date uid id date uid T6 Select * 1 12/25 1 1 12/25 1 ? id date uid oid MaxVal From T1 2 11/21 3 2 11/21 3 Where □ 1 12/25 1 1 30 4 12/24 2 4 12/24 2 2 11/21 3 1 30 Select * 4 12/24 2 1 30 From T3 1 12/25 1 1 10 Join T5 T5 T2 T4 On □ 2 11/21 3 1 10 Select oid, MAX(val) oid MaxVal oid uid oid uid 4 12/24 2 1 10 From T4 1 30 1 30 1 30 1 12/25 1 2 50 Group By oid Select * 1 10 1 10 1 10 2 11/21 3 2 50 ? Having □ From T2 2 50 1 10 1 10 4 12/24 2 2 50 Where □ 2 10 2 50 2 50 1 12/25 1 2 10 … … 2 10 2 10 2 11/21 3 2 10 4 12/24 2 2 10 ⊆ Tout oid date uid oid MaxVal How to evaluate 1 12/25 1 1 30 oid date uid oid MaxVal 4 12/24 2 2 10 1 12/25 1 1 30 abstract queries? 4 12/24 2 2 10 9

  10. Instantiate Abstract Queries Abstract Queries Select * Select * From (Select * From (Select * From T1 From T1 Where □ ) T2 Where True) Join (Select id, Max(val) oid val Join (Select id, Max(val) T1 From T2 1 30 From T2 id date uid Where □ 1 10 Where val < 50 1 12/25 1 Group By oid 1 10 Group By oid 2 11/21 3 2 50 Having □ ) T3 Having True) T3 4 12/24 2 2 10 On □ On T3.oid = T1.uid Search abstract Instantiate SQL queries abstract queries Tout oid date uid oid MaxVal 1 12/25 1 1 30 4 12/24 2 2 10 10

  11. Instantiate Abstract Queries True, False, Select * uid < id, From (Select * … True + val < 50 + False + T1.uid = T3.oid From T1 Where □ ) Join (Select id, Max(val) False + val < 50 + False + T1.uid = T3.oid val < 50, From T2 val == 50, Where □ val > id, True + val == 50 + False + T1.uid = T3.oid Group By oid True, Having □ ) T3 … On □ True + val == 50 + MaxVal < 50 + T1.uid = T3.oid True, …… False, A intuitive solution that does no scale. MaxVal < 50, T1.uid = T3.oid, Transition: can we use … properties of the abstract T1.id = T3.oid,… query to optimize this? 11

  12. Select * From (Select * From T1 Instantiate Abstract Queries Where □ ) Join (Select id, Max(val) From T2 Where □ Group By oid True, Having □ ) T3 False, uid < id, On □ … T3 T1 id date uid T1.uid = T3.oid, id date uid T6 Select * 1 12/25 1 T1.id = T3.oid,… 1 12/25 1 id date uid oid MaxVal From T1 [100000000001] 2 11/21 3 2 11/21 3 Where □ 1 12/25 1 1 30 1 12/25 1 1 30 4 12/24 2 4 12/24 2 2 11/21 3 1 30 Select * 4 12/24 2 1 30 From T3 1 12/25 1 1 10 Join T5 T2 T4 2 11/21 3 1 10 On □ oid uid oid uid 4 12/24 2 1 10 Select oid, MAX(val) 1 30 1 30 1 12/25 1 2 50 Select * From T4 1 10 1 10 2 11/21 3 2 50 From T2 Group By oid 1 10 1 10 4 12/24 2 2 50 Where □ Having □ 2 50 2 50 1 12/25 1 2 10 2 10 2 10 2 11/21 3 2 10 4 12/24 2 2 10 4 12/24 2 2 10 val < 50, True, val == 50, ⊆ False, val > id, MaxVal < 50, True, oid date uid oid MaxVal … … 1 12/25 1 1 30 4 12/24 2 2 10 12

  13. Generating Solutions Select * Select * … True id <> uid From T1 From (Select * Join (Select id, Max(val) From T1 From T2 Where □ ) Where val < 50 Group By oid) T3 Join (Select id, Max(val) On T3.oid = T1.uid From T2 Where □ val < 50 Group By oid …… val <> 50 Having □ ) T3 … On □ Select * From (Select * From T1 Where id <> uid) Join (Select id, Max(val) … True True From T2 T1.uid = T3.oid Where val <> 50 Group By oid) T3 On T1.uid = T3.oid And T1.id <> T2.id T1.uid = T3.oid And T1.id <> T2.id … 13

  14. Ranking & Interaction • Heuristically rank candidate queries. Criteria: complexity, naturalness etc. • • When the result is not desirable: Provide new input-output examples. • 14

  15. Evaluation Benchmark: 193 34: more features 15: run out of time 1: fail to disambiguate Scythe: 143 Enum: 92 34X 15

  16. Demo 16

  17. Conclusion • Goal: Helping end users to program SQL with input-output examples. • Solution: An efficient two-phase synthesis algorithm. • Evaluation: Able to solve 143/193 problems on StackOverflow. 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend