neural program synthesis with priority queue training
play

Neural Program Synthesis with Priority Queue Training Daniel A. - PowerPoint PPT Presentation

Neural Program Synthesis with Priority Queue Training Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le https://arxiv.org/abs/1801.03526 Why Program Synthesis? One of the hard AI reasoning domains A tool for


  1. Neural Program Synthesis with Priority Queue Training Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le https://arxiv.org/abs/1801.03526

  2. Why Program Synthesis? ● One of the hard AI reasoning domains ● A tool for planning in robotics ● Increased interpretability (human can read code more easily than NN weights)

  3. Deep Reinforcement Learning Value based RL e.g. Q-learning Policy based RL e.g. policy gradient Agent https://becominghuman.ai/the-very-basics-of-reinforcement-learning-154f28a79071

  4. Deep RL for Combinatorial Optimization ● Neural Architecture Search with Reinforcement Learning

  5. Deep RL for Combinatorial Optimization ● Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

  6. Deep RL for Combinatorial Optimization ● Neural Combinatorial Optimization with Reinforcement Learning

  7. "Fundamental" Program Synthesis ● Focus on algorithmic coding problems. ● No ground-truth program solutions. ● Simple Turing-complete language.

  8. HelloWorld.bf ++++++++ Set Cell #0 to 8 [ >++++ Add 4 to Cell #1; this will always set Cell #1 to 4 [ as the cell will be cleared by the loop ++++++++ [ >++++ [ >++>+++>+++>+<<<<- ] >+>+>- >++ Add 2 to Cell #2 >+++ Add 3 to Cell #3 >>+ [ < ] <- ] >> . >--- . +++++++ .. +++ . >> . <- . < . ++ >+++ Add 3 to Cell #4 >+ Add 1 to Cell #5 + . ------ . -------- . >>+ . >++ . <<<<- Decrement the loop counter in Cell #1 ] Loop till Cell #1 is zero; number of iterations is 4 >+ Add 1 to Cell #2 >+ Add 1 to Cell #3 >- Subtract 1 from Cell #4 >>+ Add 1 to Cell #6 [<] Move back to the first zero cell you find; this will be Cell #1 which was cleared by the previous loop <- Decrement the loop Counter in Cell #0 ] Loop till Cell #0 is zero; number of iterations is 8 The result of this is: Cell No : 0 1 2 3 4 5 6 Contents: 0 0 72 104 88 32 8 Pointer : ^ >>. Cell #2 has value 72 which is 'H' >---. Subtract 3 from Cell #3 to get 101 which is 'e' +++++++..+++. Likewise for 'llo' from Cell #3 >>. Cell #5 is 32 for the space <-. Subtract 1 from Cell #4 for 87 to give a 'W' <. Cell #3 was set to 'o' from the end of 'Hello' +++.------.--------. Cell #3 for 'rl' and 'd' >>+. Add 1 to Cell #5 gives us an exclamation point https://en.wikipedia.org/wiki/Brainfuck >++. And finally a newline from Cell #6

  9. Anatomy of BF Turing complete! https://esolangs.org/wiki/Brainfuck#Computational_class

  10. BF Execution Demo: Reverse a list

  11. Why BF ● Turing complete, and suitable for any algorithmic task in theory. ● Many algorithms have surprisingly elegant BF implementations. ● No syntax errors (with minor adjustment to interpreter). ● No names (variables, functions).

  12. Training Setup RNN Reward Function Inference code Code inputs Scoring Reward outputs Test BF interpreter Function cases Gradient Update reward

  13. Training Setup: Reward Function Score Test case S(Y, Y*) = d( ∅ , Y*) - d(Y, Y*) input X = [1, 2, 3, 4, 0] d(Y, Y*) = variable length Hamming distance output Y* = [4, 3, 2, 1, 0] program P = ,>,.<. program output P(X) = [2, 1] Y* = [4, 3, 2, 1, 0] Y* = [4, 3, 2, 1, 0] d = ∑ 2 2 B B B d = ∑ B B B B B ∅ = [] Y = [2, 1] Reward = ∑ S(P(I), Y*) base B = 256

  14. Problems with policy gradient (REINFORCE) ● Catastrophic forgetting and unstable learning ● Sample inefficient

  15. Solution: Priority Queue Training (PQT) code sampled from RNN Reward Function rewards Max-Unique Priority Queue training targets

  16. Results

  17. Fixed Length Programs remove <>[,[<+,.],<<]],[[-[+.>>][]>[>[>[[<+>>.+<>]>]<<>]],]>+-++--,>[+[[<----].->+]->]]]-[,.]+>>,-,,-]><,,] reverse ,[[>,<>]]-]+[<[.,++,<]<>[->.,+,[<+]<-]<,,<<>>[[[<+<[],.>->]>,<-]<]<>,-<,,[+>,<,><.[.<-+,+-<]+<[,+-<> add ,>,[-<+>][,],>]<]-<.+,,+,<.,>]>,[><<-,][+-[.[[+<[.>]],>.[]-<,],+,[,->]>>->+,[+[>]-,-]--,.,>+-<<]]<,+ length 100

  18. Synthesized vs "Ground Truth" Synthesized Experimenter's best solution reverse ,[>,]+[,<.] >,[>,]<[.<]. remove ,-[+.,-]+[,.] ,[-[+.[-]],]. count-char ,[-[>]>+<<<<,]>. >,[-[<->[-]]<+>,]<. add ,[+>,<<->],<.,. ,>,<[->+<]>. bool-logic ,+>,<[,>],<+<. ??? print ++++++++.---.+++++++..+++. ++++++++.---.+++++++..+++. zero-cascade ,.,[.>.-<,[[[.+,>+[-.>]..<]>+<<]>+<<]] ,[.>[->+>.<<]>+[-<+>]<<,] cascade ,[.,.[.,.[..,[....,[.....,[.>]<]].]] ,>>+<<[>>[-<+>]<[->+<<.>]>+<<,]. shift-left ,>,[.,]<.>. ,>,[.,]<.,. shift-right ,[>,]<.,<<<<<.[>.] >,[>,]<.[-]<[<]>[.>]. unriffle -[,>,[.,>,]<[>,]<.] >,[>,[.[-]],]<[.<]. remove-last ,>,[<.>>,]. ,>,[[<.[-]>[-<+>]],]. remove-last-two >,<,>>,[<.,[<.[>]],]. ,>,>,[[<<.[-]>[-<+>]>[-<+>]],]. echo-alternating ,[.,>,]<<<<.[>.] >,[.,>,]<<[<]>[.>]. length ,[>+<,]>. >+>,[[<]>+[>],]<[<]>-. echo-second-seq ,[,]-[,.] ,[,],[.,]. echo-nth-seq ,-[->-[,]<]-[,.] ,-[->,[,]<],[.,].

  19. What's next? Scale up to harder coding problems, and more complex programming languages. ● Augment RL with supervised training on a large corpus of programs. ● Give the code synthesizer access to auxiliary information, such as stack traces and program execution internals. ● Data augmentation techniques, such as Hindsight experience replay. ● Few-shot learning techniques can help with generalization issues, e.g. MAML.

  20. Thank you! Questions? Thank you to my coauthors: Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le.

  21. Prior Work ● Algorithm induction ○ Neural Programmer, A. Neelakantan, et al. ○ Neural Programmer-Interpreters, S. Reed, et al. ● Domain specific languages ○ RobustFill, J. Devlin, et al. ○ DeepCoder, M. Balog, et al. ○ TerpreT, A. Gaunt, et al. ● Precursors to PQT ○ Noisy Cross-Entropy Method, I. Szita, et al. ○ Neural Symbolic Machines, C. Liang, et al.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend