CSC421/2516 Lecture 20: Policy Gradient
Roger Grosse and Jimmy Ba
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 1 / 21
CSC421/2516 Lecture 20: Policy Gradient Roger Grosse and Jimmy Ba - - PowerPoint PPT Presentation
CSC421/2516 Lecture 20: Policy Gradient Roger Grosse and Jimmy Ba Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 1 / 21 Overview Most of this course was about supervised learning, plus a little unsupervised learning.
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 1 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 2 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 3 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 4 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 5 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 6 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 7 / 21
∂ ∂θp(τ)
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 8 / 21
T
T
T
T
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 9 / 21
k=0 r(sk, ak)
∂θ log πθ(at | st)
k=t r(sk, ak)
∂θ log πθ(at | st) Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 10 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 11 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 12 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 13 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 14 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 15 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 16 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 17 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 18 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 19 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 20 / 21
Roger Grosse and Jimmy Ba CSC421/2516 Lecture 20: Policy Gradient 21 / 21