program synthesis in the industrial world
play

Program Synthesis in the Industrial World: Inductive, Incremental, - PowerPoint PPT Presentation

Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov Sumit Gulwani polozov@cs.washington.edu sumitg@microsoft.com And the rest of the PROSE team! prose-contact@microsoft.com July 18, 2016 SYNT-16,


  1. Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov Sumit Gulwani polozov@cs.washington.edu sumitg@microsoft.com And the rest of the PROSE team! prose-contact@microsoft.com July 18, 2016 SYNT-16, Toronto, Canada 1

  2. PROgram Synthesis using Examples Ranvijay Sumit Daniel Allen Cypher Vu Le Kumar Gulwani Perelman Mohammad Abhishek Danny Adam Smith Alex Polozov Raza Udupa Simmons We are hiring! Interns or full-time  R&D team, MSR → industrial Microsoft July 18, 2016 SYNT-16, Toronto, Canada 2

  3. This talk Lessons Solutions Challenges July 18, 2016 SYNT-16, Toronto, Canada 3

  4. Outline  Programming by Examples (PBE) & PROSE: Quick Background  Mass-Market Deployment ↪ Goals ↪ Challenges ↪ Solutions  Discussion July 18, 2016 SYNT-16, Toronto, Canada 4

  5. PBE & PROSE A 3-slide Background July 18, 2016 SYNT-16, Toronto, Canada 5

  6. Motivation 99% of spreadsheet users do not know programming Data scientists spend 80% time wrangling raw data July 18, 2016 SYNT-16, Toronto, Canada 6

  7. PROSE Timeline PROSE FlashFill FlashExtract FlashRelate FlashMeta … (text transformations) (text extraction) (table transformations) (PBE framework) SDK 2014-2015 2010-2012 2012-2014 2012-2015 2015-present [OOPSLA 15] [POPL 11] [PLDI 14] [PLDI 15] July 18, 2016 SYNT-16, Toronto, Canada 7

  8. PBE Architecture Refined intent Example-based Ranked Intended program set ෩ intent spec 𝜒 program 𝑄 ∈ ℒ 𝑂 Debugging Program Synthesizer Translator DSL ℒ Test inputs Ԧ 𝜏 Ranking function ℎ Intended program in Python/C#/C++/… July 18, 2016 SYNT-16, Toronto, Canada 8

  9. Mass-Market Deployment Goals & Challenges July 18, 2016 SYNT-16, Toronto, Canada 9

  10. Inductive Scalable (snappy UI = responds in < 1 s) (intent is easily specified) Ambiguity resolution Incremental synthesis User Experience Predictive synthesis Engineering practices Interactive Agile (facilitates the debugging cycle) (quick software development) July 18, 2016 SYNT-16, Toronto, Canada 10

  11. Engineering practices • Production-quality library code • Prototyping still exists, but it’s not the final form • Unit tests & TDD • Integration tests: real-life scenarios • Close to 8K for all DSLs in total • Most are mined from public sources (e.g. help forums) • In preparation: benchmark suite release for the community July 18, 2016 SYNT-16, Toronto, Canada 13

  12. Performance-minded engineering • Parallelization of learning matters • E.g.: multi-user log file processing in Azure Log Analytics • Performance of program execution matters • E.g.: “Big Data” on an end - user’s machine • Smallest ≠ fastest! • (1) Synthesize many correct programs, then (2) optimize for the fast ones Robustness-based Performance-based ranking ranking July 18, 2016 SYNT-16, Toronto, Canada 14

  13. Should I process the string Development “25 -06- 11” with regexes? Treat it as a numeric computation? A date? • DSL design: ≈ 10 months → ≈ 2 weeks • This is not a bottleneck! * • Ranking: bulk of the effort • Designing a score for an operator 𝐺 is 2-3x longer than designing 𝐺 (incl. synthesis!) • E.g.: rock-paper-scissors among string processing operators * Once you learn the skill… July 18, 2016 SYNT-16, Toronto, Canada 15

  14. …and up to 10 20 more candidates From: all lines ending with “Number ∘ Dot” “Space ∘ Number ∘ Dot” starting with “Word ∘ Space ∘ CamelCase ” Extract: the first “Number” before a “Dot” the last “Number” before a “Dot” the last “Number” before a “Dot ∘ LineBreak” the last “Number” text between the last “Space” and the last “Dot” the first “Comma ∘ Space” and the last “Dot ∘ LineBreak ” July 18, 2016 SYNT-16, Toronto, Canada 18

  15. Anecdotes • FlashFill was not accepted to Excel until it solved the most common scenarios from 1 example Adam Smith Adam Alice Williams Alic • Some users still don’t know you can give 2 ! July 18, 2016 SYNT-16, Toronto, Canada 19

  16. Ambiguity resolution Option 1: machine-learned robustness-based ranking [CAV 15] • Idioms/patterns from test data can influence search & ranking • E.g.: bucketing 100 76-100 51 51-75 86 July 18, 2016 SYNT-16, Toronto, Canada 20

  17. Ambiguity resolution Option 1: machine-learned robustness-based ranking [CAV 15] • Idioms/patterns from test data can influence search & ranking • E.g.: bucketing 100 76-100 51 51-75 86 x ⇒ Concat(Round(x, Down, 25), Const (“ - ”), Round(x, Up, 25)) July 18, 2016 SYNT-16, Toronto, Canada 21

  18. Ambiguity resolution Option 1: machine-learned robustness-based ranking [CAV 15] • Idioms/patterns from test data can influence search & ranking • E.g.: bucketing 100 76-100 51 51-75 86 x ⇒ Concat(Round(x, Down, 25), Const (“ - ”), Round( x, Up, 25)) July 18, 2016 SYNT-16, Toronto, Canada 22

  19. Ambiguity resolution Option 1: machine-learned robustness-based ranking [CAV 15] • Idioms/patterns from test data can influence search & ranking • E.g.: bucketing 100 76-100 51 51-75 86 x ⇒ Concat(Round(x, Down, 25), Const (“ - ”), Round( x, Up, 25)) Option 2: interactive clarification July 18, 2016 SYNT-16, Toronto, Canada 23

  20. PBE Architecture Refined intent Example-based Ranked Intended program set ෩ intent spec 𝜒 program 𝑄 ∈ ℒ 𝑂 Debugging Program Synthesizer Translator DSL ℒ Test inputs Ԧ 𝜏 Ranking function ℎ Intended program in Python/C#/C++/… July 18, 2016 SYNT-16, Toronto, Canada 26

  21. PBE Architecture Refined intent Example-based Ranked Intended program set ෩ intent spec 𝜒 program 𝑄 ∈ ℒ 𝑂 User Program Synthesizer Translator Questions DSL ℒ Test inputs Ԧ 𝜏 Ranking function ℎ Intended program in Hypothesizer Python/C#/C++/… July 18, 2016 SYNT-16, Toronto, Canada 27

  22. PBE Architecture Refined intent Example-based Ranked Intended program set ෩ intent spec 𝜒 program 𝑄 ∈ ℒ 𝑂 User Program Synthesizer Translator Questions DSL ℒ Test inputs Ԧ 𝜏 Ranking function ℎ Intended program in Hypothesizer Python/C#/C++/… July 18, 2016 SYNT-16, Toronto, Canada 28

  23. Hypothesizer Given a program set ෩ 𝑶 , find program constraints (“hypotheses”) 𝝌 that best disambiguate among programs in ෩ 𝑶 , and present them to the user as multiple-choice questions.  Reduces the cognitive load on the user  Reduces the number of iterations by choosing the most effective disambiguating questions  Increases the user’s confidence in the system (“proactive = smart”) July 18, 2016 SYNT-16, Toronto, Canada 29

  24. Example Missing page numbers, 1993 1993 64-67, 1995 64 … … … 1995 Which output is correct here? a. 64 64 b. 67 67 c. 1995 ⊥ ෩ 𝑂 July 18, 2016 SYNT-16, Toronto, Canada 30

  25. Example Missing page numbers, 1993 1993 64-67, 1995 64 … … … 1995 Which output is correct here? a. 64 64 b. 67 67 c. 1995 ⊥ ෩ 𝑂 𝜒 𝑗+1 : 𝑄 𝜏 2 = "1995" July 18, 2016 SYNT-16, Toronto, Canada 31

  26. Example Missing page numbers, 1993 1993 64-67, 1995 1995 … … … Which output is correct here? a. 64 b. 67 c. 1995 𝜒 𝑗+1 : 𝑄 𝜏 2 = "1995" July 18, 2016 SYNT-16, Toronto, Canada 32

  27. Example – alternative Missing page numbers, 1993 1993 64-67 64-67, 1995 64 … … … 1995 Is this part of the input relevant? a. Yes 64 b. No 67 c. Maybe ⊥ ෩ 𝑂 July 18, 2016 SYNT-16, Toronto, Canada 33

  28. Picking the right question “Distinguishability” = effectiveness for disambiguation 1. An input is distinguishing if many top-ranked candidate programs disagree on the intended output on it. • Any response will partition the program set well 2. A question is distinguishing if the alternative candidate programs corresponding to all potential responses have high ranks. • Any response will lead to a good alternative program Preliminary results: good questions yield just 4-6 iterations until convergence July 18, 2016 SYNT-16, Toronto, Canada 34

  29. Big Data July 18, 2016 SYNT-16, Toronto, Canada 37

  30. Big Data + Program Synthesis July 18, 2016 SYNT-16, Toronto, Canada 38

  31. Problem definition Given a program set ෩ 𝑶 𝒋 ⊂ ℒ that satisfies the currently accumulated spec 𝝌 𝒋 , and a new constraint 𝝎 𝒋+𝟐 , learn a subset ෩ 𝑶 𝒋+𝟐 ⊂ ෩ 𝑶 𝒋 of programs that satisfy the new spec 𝝌 𝒋+𝟐 = 𝝌 𝒋 ∧ 𝝎 𝒋+𝟐 • ℒ is an industrial DSL (e.g., FlashFill) ෩ • 𝑂 𝑗 ≈ 10 20 • Time limit: ≈ 1 sec July 18, 2016 SYNT-16, Toronto, Canada 39

  32. Background: Version Space Algebra int positionIn[string s] := AbsPos(s, k) | RegPos(s, std.Pair(r, r), k); July 18, 2016 SYNT-16, Toronto, Canada 40

  33. Background: Version Space Algebra int positionIn[string s] := AbsPos(s, k) | RegPos(s, std.Pair(r, r), k); Sharing #1: cross-product representation July 18, 2016 SYNT-16, Toronto, Canada 41

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend