Reinforcement learning
Chapter 21, Sections 1–4
Chapter 21, Sections 1–4 1
Reinforcement learning Chapter 21, Sections 14 Chapter 21, Sections - - PowerPoint PPT Presentation
Reinforcement learning Chapter 21, Sections 14 Chapter 21, Sections 14 1 Outline Examples Learning a value function for a fixed policy temporal difference learning Q-learning Function approximation Exploration
Chapter 21, Sections 1–4 1
Chapter 21, Sections 1–4 2
Chapter 21, Sections 1–4 3
1 2 3 1 2 3 − 1 + 1 4
START
Chapter 21, Sections 1–4 4
1 2 3 4 5 6 7 8 9 10 11 12 24 23 22 21 20 19 18 17 16 15 14 13 25
Chapter 21, Sections 1–4 5
Chapter 21, Sections 1–4 6
Chapter 21, Sections 1–4 7
Chapter 21, Sections 1–4 8
Chapter 21, Sections 1–4 9
0.2 0.4 0.6 0.8 1 100 200 300 400 500 Utility estimates Number of trials (1,1) (1,3) (2,1) (3,3) (4,3) 0.1 0.2 0.3 0.4 0.5 0.6 20 40 60 80 100 RMS error in utility Number of trials
Chapter 21, Sections 1–4 10
Chapter 21, Sections 1–4 11
Chapter 21, Sections 1–4 12
0.5 1 1.5 2 50 100 150 200 250 300 350 400 450 500 RMS error, policy loss Number of trials RMS error Policy loss
Chapter 21, Sections 1–4 13
Chapter 21, Sections 1–4 14