Neural Program Synthesis with Priority Queue Training Daniel A. - - PowerPoint PPT Presentation

neural program synthesis with priority queue training
SMART_READER_LITE
LIVE PREVIEW

Neural Program Synthesis with Priority Queue Training Daniel A. - - PowerPoint PPT Presentation

Neural Program Synthesis with Priority Queue Training Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le https://arxiv.org/abs/1801.03526 Why Program Synthesis? One of the hard AI reasoning domains A tool for


slide-1
SLIDE 1

Neural Program Synthesis with Priority Queue Training

Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le

https://arxiv.org/abs/1801.03526

slide-2
SLIDE 2

Why Program Synthesis?

  • One of the hard AI reasoning domains
  • A tool for planning in robotics
  • Increased interpretability (human can read code more easily than NN weights)
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

Deep Reinforcement Learning

https://becominghuman.ai/the-very-basics-of-reinforcement-learning-154f28a79071

Value based RL e.g. Q-learning Policy based RL e.g. policy gradient

Agent

slide-6
SLIDE 6

Deep RL for Combinatorial Optimization

  • Neural Architecture Search with Reinforcement Learning
slide-7
SLIDE 7

Deep RL for Combinatorial Optimization

  • Neural Symbolic Machines: Learning Semantic Parsers on Freebase with

Weak Supervision

slide-8
SLIDE 8

Deep RL for Combinatorial Optimization

  • Neural Combinatorial Optimization with Reinforcement Learning
slide-9
SLIDE 9

"Fundamental" Program Synthesis

  • Focus on algorithmic coding problems.
  • No ground-truth program solutions.
  • Simple Turing-complete language.
slide-10
SLIDE 10

++++++++ Set Cell #0 to 8 [ >++++ Add 4 to Cell #1; this will always set Cell #1 to 4 [ as the cell will be cleared by the loop >++ Add 2 to Cell #2 >+++ Add 3 to Cell #3 >+++ Add 3 to Cell #4 >+ Add 1 to Cell #5 <<<<- Decrement the loop counter in Cell #1 ] Loop till Cell #1 is zero; number of iterations is 4 >+ Add 1 to Cell #2 >+ Add 1 to Cell #3 >- Subtract 1 from Cell #4 >>+ Add 1 to Cell #6 [<] Move back to the first zero cell you find; this will be Cell #1 which was cleared by the previous loop <- Decrement the loop Counter in Cell #0 ] Loop till Cell #0 is zero; number of iterations is 8 The result of this is: Cell No : 0 1 2 3 4 5 6 Contents: 0 0 72 104 88 32 8 Pointer : ^ >>. Cell #2 has value 72 which is 'H' >---. Subtract 3 from Cell #3 to get 101 which is 'e' +++++++..+++. Likewise for 'llo' from Cell #3 >>. Cell #5 is 32 for the space <-. Subtract 1 from Cell #4 for 87 to give a 'W' <. Cell #3 was set to 'o' from the end of 'Hello' +++.------.--------. Cell #3 for 'rl' and 'd' >>+. Add 1 to Cell #5 gives us an exclamation point >++. And finally a newline from Cell #6

HelloWorld.bf

++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>- >>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.++ +.------.--------.>>+.>++. https://en.wikipedia.org/wiki/Brainfuck

slide-11
SLIDE 11

Anatomy of BF

Turing complete! https://esolangs.org/wiki/Brainfuck#Computational_class

slide-12
SLIDE 12

BF Execution Demo: Reverse a list

slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

Why BF

  • Turing complete, and suitable for any algorithmic task in theory.
  • Many algorithms have surprisingly elegant BF implementations.
  • No syntax errors (with minor adjustment to interpreter).
  • No names (variables, functions).
slide-16
SLIDE 16

Training Setup

RNN Code Reward

Inference Gradient Update

Reward Function

Test cases BF interpreter

code inputs

  • utputs

Scoring Function

reward

slide-17
SLIDE 17

Training Setup: Reward Function

Reward = ∑

S(P(I), Y*)

Test case

input

  • utput

program program output X = [1, 2, 3, 4, 0] Y* = [4, 3, 2, 1, 0] P = ,>,.<. P(X) = [2, 1]

Score

S(Y, Y*) = d(∅, Y*) - d(Y, Y*) d(Y, Y*) = variable length Hamming distance Y* = [4, 3, 2, 1, 0] Y = [2, 1] d = ∑ 2 2 B B B base B = 256 Y* = [4, 3, 2, 1, 0] ∅ = [] d = ∑ B B B B B

slide-18
SLIDE 18

Problems with policy gradient (REINFORCE)

  • Catastrophic forgetting and unstable learning
  • Sample inefficient
slide-19
SLIDE 19

Solution: Priority Queue Training (PQT)

Max-Unique Priority Queue training targets Reward Function code sampled from RNN rewards

slide-20
SLIDE 20

Results

slide-21
SLIDE 21

slide-22
SLIDE 22

Fixed Length Programs

remove

<>[,[<+,.],<<]],[[-[+.>>][]>[>[>[[<+>>.+<>]>]<<>]],]>+-++--,>[+[[<----].->+]->]]]-[,.]+>>,-,,-]><,,]

reverse

,[[>,<>]]-]+[<[.,++,<]<>[->.,+,[<+]<-]<,,<<>>[[[<+<[],.>->]>,<-]<]<>,-<,,[+>,<,><.[.<-+,+-<]+<[,+-<>

add

,>,[-<+>][,],>]<]-<.+,,+,<.,>]>,[><<-,][+-[.[[+<[.>]],>.[]-<,],+,[,->]>>->+,[+[>]-,-]--,.,>+-<<]]<,+

length 100

slide-23
SLIDE 23
slide-24
SLIDE 24

Synthesized vs "Ground Truth"

,[>,]+[,<.] ,-[+.,-]+[,.] ,[-[>]>+<<<<,]>. ,[+>,<<->],<.,. ,+>,<[,>],<+<. ++++++++.---.+++++++..+++. ,.,[.>.-<,[[[.+,>+[-.>]..<]>+<<]>+<<]] ,[.,.[.,.[..,[....,[.....,[.>]<]].]] ,>,[.,]<.>. ,[>,]<.,<<<<<.[>.]

  • [,>,[.,>,]<[>,]<.]

,>,[<.>>,]. >,<,>>,[<.,[<.[>]],]. ,[.,>,]<<<<.[>.] ,[>+<,]>. ,[,]-[,.] ,-[->-[,]<]-[,.] >,[>,]<[.<]. ,[-[+.[-]],]. >,[-[<->[-]]<+>,]<. ,>,<[->+<]>. ??? ++++++++.---.+++++++..+++. ,[.>[->+>.<<]>+[-<+>]<<,] ,>>+<<[>>[-<+>]<[->+<<.>]>+<<,]. ,>,[.,]<.,. >,[>,]<.[-]<[<]>[.>]. >,[>,[.[-]],]<[.<]. ,>,[[<.[-]>[-<+>]],]. ,>,>,[[<<.[-]>[-<+>]>[-<+>]],]. >,[.,>,]<<[<]>[.>]. >+>,[[<]>+[>],]<[<]>-. ,[,],[.,]. ,-[->,[,]<],[.,]. reverse remove count-char add bool-logic print zero-cascade cascade shift-left shift-right unriffle remove-last remove-last-two echo-alternating length echo-second-seq echo-nth-seq Synthesized Experimenter's best solution

slide-25
SLIDE 25

What's next?

Scale up to harder coding problems, and more complex programming languages.

  • Augment RL with supervised training on a large corpus of programs.
  • Give the code synthesizer access to auxiliary information, such as stack traces

and program execution internals.

  • Data augmentation techniques, such as Hindsight experience replay.
  • Few-shot learning techniques can help with generalization issues, e.g. MAML.
slide-26
SLIDE 26

Thank you! Questions?

Thank you to my coauthors: Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc

  • V. Le.
slide-27
SLIDE 27

Prior Work

  • Algorithm induction

○ Neural Programmer, A. Neelakantan, et al. ○ Neural Programmer-Interpreters, S. Reed, et al.

  • Domain specific languages

○ RobustFill, J. Devlin, et al. ○ DeepCoder, M. Balog, et al. ○ TerpreT, A. Gaunt, et al.

  • Precursors to PQT

○ Noisy Cross-Entropy Method, I. Szita, et al. ○ Neural Symbolic Machines, C. Liang, et al.