M emory A ugmented P olicy O ptimization ( MAPO ) for Program - - PowerPoint PPT Presentation

m emory a ugmented p olicy o ptimization mapo for program
SMART_READER_LITE
LIVE PREVIEW

M emory A ugmented P olicy O ptimization ( MAPO ) for Program - - PowerPoint PPT Presentation

M emory A ugmented P olicy O ptimization ( MAPO ) for Program Synthesis and Semantic Parsing Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc Le, Ni Lao Program Synthesis / Semantic Parsing how many more passengers flew to los angeles than to


slide-1
SLIDE 1

Memory Augmented Policy Optimization (MAPO) for Program Synthesis and Semantic Parsing

Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc Le, Ni Lao

slide-2
SLIDE 2

Program Synthesis / Semantic Parsing

how many more passengers flew to los angeles than to saskatoon?

slide-3
SLIDE 3

(filterin rows ['saskatoon'] r.city) (filterin rows ['los angeles'] r.city) (diff v1 v0 r.passengers)

how many more passengers flew to los angeles than to saskatoon? 12,467

Program Synthesis / Semantic Parsing

slide-4
SLIDE 4

(filterin rows ['saskatoon'] r.city) (filterin rows ['los angeles'] r.city) (diff v1 v0 r.passengers)

how many more passengers flew to los angeles than to saskatoon? 12,467

Program Synthesis / Semantic Parsing

slide-5
SLIDE 5

(filterin rows ['saskatoon'] r.city) (filterin rows ['los angeles'] r.city) (diff v1 v0 r.passengers)

how many more passengers flew to los angeles than to saskatoon? 12,467

Program Synthesis / Semantic Parsing

slide-6
SLIDE 6

(filterin rows ['saskatoon'] r.city) (filterin rows ['los angeles'] r.city) (diff v1 v0 r.passengers)

how many more passengers flew to los angeles than to saskatoon? 12,467

Program Synthesis / Semantic Parsing

slide-7
SLIDE 7

(filterin rows ['saskatoon'] r.city) (filterin rows ['los angeles'] r.city) (diff v1 v0 r.passengers)

how many more passengers flew to los angeles than to saskatoon? 12,467

Program Synthesis / Semantic Parsing

slide-8
SLIDE 8

(filterin rows ['saskatoon'] r.city) (filterin rows ['los angeles'] r.city) (diff v1 v0 r.passengers)

how many more passengers flew to los angeles than to saskatoon? 12,467

Program Synthesis / Semantic Parsing

Latent

slide-9
SLIDE 9

(filterin rows ['saskatoon'] r.city) (filterin rows ['los angeles'] r.city) (diff v1 v0 r.passengers)

how many more passengers flew to los angeles than to saskatoon? 12,467

Latent

Program Synthesis / Semantic Parsing

Sparse

slide-10
SLIDE 10
slide-11
SLIDE 11

Policy Gradient

Actor Learner

On-policy Samples Updated Policy

High variance => slow training Unbiased => optimal solution

slide-12
SLIDE 12

Imitation Learning

Actor Learner

Updated Policy

Demonstration

Low variance => fast training Biased => suboptimal solution

slide-13
SLIDE 13

Imitation Learning

Actor Learner

Updated Policy

Demonstration

Low variance => fast training Biased => suboptimal solution Requires human supervision

slide-14
SLIDE 14

Actor Learner

Updated Policy

MAPO

Unbiased => optimal solution Low variance => fast training

slide-15
SLIDE 15

Actor Learner

High-reward samples Updated Policy

Memory buffer

MAPO

Unbiased => optimal solution Low variance => fast training

slide-16
SLIDE 16

Actor Learner

Samples inside memory Samples outside memory High-reward samples Updated Policy

Memory buffer

MAPO

Unbiased => optimal solution Low variance => fast training

slide-17
SLIDE 17

Expectation

Gradient Estimate Program space

slide-18
SLIDE 18

Unbiased High variance Expectation

Gradient Estimate Program space

Sampling

slide-19
SLIDE 19

Sampling from a smaller space => variance reduction Enumeration Sampling

Gradient Estimate

Unbiased

Programs outside Memory Programs inside Memory

MAPO

slide-20
SLIDE 20

Stratified sampling => variance reduction Unbiased Enumeration Sampling

Gradient Estimate Programs outside Memory Programs inside Memory

Sampling

MAPO

slide-21
SLIDE 21

MAPO

( = a program) ( = correct or not)

slide-22
SLIDE 22

MAPO

( = a program) ( = correct or not)

slide-23
SLIDE 23

WikiTableQuestions: first SOTA using RL

slide-24
SLIDE 24

WikiTableQuestions: first SOTA using RL

slide-25
SLIDE 25

WikiSQL: strong vs. weak supervision!

Strong supervision

slide-26
SLIDE 26

WikiSQL: strong vs. weak supervision!

Strong supervision

slide-27
SLIDE 27
  • MAPO converges slower than iterative maximum likelihood,

but reaches a better solution.

  • REINFORCE doesn’t make much progress (<10% accuracy).
slide-28
SLIDE 28
  • MAPO converges slower than maximum likelihood training,

but reaches a better solution.

  • REINFORCE doesn’t make much progress (<10% accuracy).
slide-29
SLIDE 29

https://github.com/crazydonkey200/neural-symbolic-machines https://arxiv.org/abs/1807.02322 http://crazydonkey200.github.io/

An efficient policy

  • ptimization method for

learning to generate sequences from sparse rewards. Poster: Room 517 AB #137