Implicit Imitation in Multiagent Reinforcement Learning
Bob Price and Craig Boutilier Slides: Dana Dahlstrom CSE 254, UCSD 2002.04.23
1
Implicit Imitation in Multiagent Reinforcement Learning Bob Price - - PowerPoint PPT Presentation
Implicit Imitation in Multiagent Reinforcement Learning Bob Price and Craig Boutilier Slides: Dana Dahlstrom CSE 254, UCSD 2002.04.23 1 Overview In imitation, a learner observes a mentor in action. The approach proposed in this paper
1
2
3
4
5
6
7
a∈Ao
8
9
10
a∈Ao t∈S
11
m on V (s) within a suitable confidence interval.
m < v−
12
a∈Ao
a∈Ao
t∈S
t∈S
13
14
15
a′∈Ao Q(t, a′)
16
17
18
500 1000 1500 2000 2500 3000 3500 4000 4500 −10 10 20 30 40 50 Imitation Control (Imitation − Control)
Goals over previous 1000 time steps Time step
19
1000 2000 3000 4000 5000 6000 −5 5 10 15 20 25 30 10x10, 10% noise 13x13, 10% noise 10x10, 40% noise
Time step Goals (imitation − control) over previous 1000 time steps
20
21
2000 4000 6000 8000 10000 12000 −5 5 10 15 20 25 30 35 40 45 Control Imitation (Imitation − Control)
Time step Goals over previous 1000 time steps
22
23
0.5 1 1.5 2 2.5
Goals over previous 1000 time steps Time step (x 100000)
1 2 3 4 5 6 7 (Imitation − Control) Imitation Control
24
25
0.5 1 1.5 2 2.5 3 5 10 15 20 25 30 35 (Imitation − Control) Imitation Control
Goals over previous 1000 time steps Time steps (x 10000)
26
27
1000 2000 3000 4000 5000 6000 10 20 30 40 50 60 Control Imitation (Imitation − Control)
Time step Goals over previous 1000 time steps
28
29
30