Reinforcement Learning for Safe Decision-Making in Autonomous Driving
Edouard Leurent1,2,3, Odalric-Ambrym Maillard1, Denis Efimov2
1Inria SequeL, 2Inria Valse, 3Renault Group
Reinforcement Learning for Safe Decision-Making in Autonomous - - PowerPoint PPT Presentation
Reinforcement Learning for Safe Decision-Making in Autonomous Driving Edouard Leurent 1,2,3 , Odalric-Ambrym Maillard 1 , Denis Efimov 2 1 Inria SequeL, 2 Inria Valse, 3 Renault Group 01 Motivation and Scope 2 -Reinforcement Learning for
1Inria SequeL, 2Inria Valse, 3Renault Group
2 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
3 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
3 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
3 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
4 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
5 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
5 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
5 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
5 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
t=0 γtrt]
5 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
t=0 γtrt]
5 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
6 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
π
6 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
7 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
7 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
7 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
7 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
7 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
8 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
π
s′ max a′ Q∗(s′, a′)
9 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
i
i
1.0 3.0 5.0
1.0 3.0 5.0 7.0 9.0
x1, y1 x2, y2 x3, y3
10 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
11 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
1.0 3.0 5.0
1.0 3.0 5.0 7.0 9.0
12 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
ego vehicle1 . . . vehicleN Encoder Encoder . . . Encoder Ego-attention Ego-attention Ego-attention Decoder Q
ego encoding . . . vehicleN encoding Lk Lv Lq q0 k0 v0 Lk Lv kn vn Q = q0
k0 . . . kn V = v0 . . . vn σ QKT √dk
V
13 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
Agent FCN/List CNN/Grid Ego-Attention Input sizes [15, 7] [32, 32, 7] [ · , 7] Layers sizes [128, 128] Convolutional layers: 3 Kernel Size: 2 Stride: 2 Head: [20] Encoder: [64, 64] Attention: 2 heads dk = 32 Decoder: [64, 64] Number of parameters 3.0e4 3.2e4 3.4e4 Variable input size No No Yes Permutation invariant No Yes Yes
14 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
500 1000 1500 2000 2500 3000 3500 4000 episode 2 3 4 5 6 7 total reward agent FCN/List CNN/Grid Ego-Attention
500 1000 1500 2000 2500 3000 3500 4000 episode 5.5 6.0 6.5 7.0 7.5 velocity agent FCN/List CNN/Grid Ego-Attention 500 1000 1500 2000 2500 3000 3500 4000 episode 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 length agent FCN/List CNN/Grid Ego-Attention
15 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
16 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
17 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
18 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
19 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
19 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
19 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
19 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
19 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
20 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
Safety
21 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
𝑢
𝜌
22 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
argmax
𝜌
∑𝛿𝑢𝑆𝑢(𝑆𝑠, −𝑆𝑑)
23 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
argmax
𝜌
∑𝛿𝑢𝑆𝑢(𝑆𝑠, −𝑆𝑑) 𝜌∗
24 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
argmax
𝜌
∑𝛿𝑢𝑆𝑠
𝑢
𝑡. 𝑢. ∑𝛿𝑢𝑆𝑑
𝑢 < 𝛾
𝜌∗
25 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
t=0 γtRr(st, at) | s0 = s]
26 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
t=0 γtRr(st, at) | s0 = s]
t=0 γtRc(st, at) | s0 = s] ≤ β
26 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
β
argmax
𝜌
∑𝛿𝑢𝑆𝑠
𝑢
𝑡. 𝑢. ∑𝛿𝑢𝑆𝑑
𝑢 < 𝛾
𝜌∗
27 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
β
argmax
𝜌
∑𝛿𝑢𝑆𝑠
𝑢
𝑡. 𝑢. ∑𝛿𝑢𝑆𝑑
𝑢 < 𝛾
𝜌∗
27 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
t=0 γtRr(st, at) | s0 = s, β0 = β]
t=0 γtRc(st, at) | s0 = s, β0 = β] ≤ β
28 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
s
a
29 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
r , Gπ c ) def
t=0 γtR(st, at)
r , V π c ) def
r , Qπ c ) def
30 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
c (s, β)≤ β}
r (s) def
r (s)
r (s)
c (s) def
c (s),
c (s)
31 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
def
r
a∼ρ Qc(s, a),
r def
a∼ρ Qr(s, a)
a∼ρ Qc(s, a)≤ β
32 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
33 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
fixed−point Q∗ −
tractable πhull(Q∗) −
equal πgreedy(Q∗) −
34 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
fixed−point Q∗ −
tractable πhull(Q∗) −
equal πgreedy(Q∗) −
34 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
fixed−point Q∗ −
tractable πhull(Q∗) −
equal πgreedy(Q∗) −
34 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
γ − 1.
γ − 1 : ∀s ∈ S, a1, a2 ∈ A,
35 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
π
36 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
Gπ
r
Gπ
c
37 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
38 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
T
2
T
π
39 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
Agent Environment Planner
40 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
Agent Environment Planner state
40 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
Agent Environment Planner state
40 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
Agent Environment Planner state recommendation
40 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
Agent Environment Planner state action recommendation
40 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
Agent Environment Planner state, reward state action recommendation
40 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
41 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
42 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
42 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
42 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
42 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
43 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
44 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
log κ
45 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
log κ
45 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
a
46 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
a
46 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
a
a
46 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
47 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
48 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
follow the sequence
act optimally
1:t 49 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
follow the sequence
act optimally
1:t
49 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
a (m) Upper bound def
Empirical mean
50 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
a (m) Upper bound def
Empirical mean
h
a1:t(m)
Future rewards
50 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
a (m) Upper bound def
Empirical mean
h
a1:t(m)
Future rewards
1≤t≤L Ua1:t(m)
50 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
log κ′
2
51 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
52 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
a (m) > 1, ∀a.
a (m) = ˆ
∈[0,1]
53 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
a (m) > 1, ∀a.
a (m) = ˆ
∈[0,1]
a1(m) + γ21
a1(m) + γ2 Uµ a2
53 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
a (m) > 1, ∀a.
a (m) = ˆ
∈[0,1]
a1(m) + γ21
a1(m) + γ2 Uµ a2
53 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
54 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
55 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
a (m) def
56 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
a (m) def
56 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
0 Lµ
a
ˆ µa U µ
a
1
1 Taf(m)
dber(ˆ µa, q)
57 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
0 Lµ
a
ˆ µa U µ
a
1
1 Taf(m)
dber(ˆ µa, q)
a (m) ∈ I = [0, 1], ∀a.
57 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
0 Lµ
a
ˆ µa U µ
a
1
1 Taf(m)
dber(ˆ µa, q)
a (m) ∈ I = [0, 1], ∀a.
57 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
0 Lµ
a
ˆ µa U µ
a
1
1 Taf(m)
dber(ˆ µa, q)
a (m) ∈ I = [0, 1], ∀a.
57 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
log κ′
2
58 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
59 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
59 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
59 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
101 102 103 104
budget
3.25 3.50 3.75 4.00 4.25 4.50 4.75 5.00
return Highway
agent OPD KL-OLOP KL-OLOP(1) OLOP Random
60 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
101 102 103 104
budget
0.8 1.0 1.2 1.4 1.6 1.8
return Stochastic Gridworld
agent OPD KL-OLOP KL-OLOP(1) OLOP Random
60 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
61 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
π
62 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
π
T∈Cδ ∞
63 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
d
θ ˙
2
64 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
Np,λΦT [Np]Y[Np];
[Np]Φ[Np] + λId.
65 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
66 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
67 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
68 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
68 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
68 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
68 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
+x− − A−x+ + A −x− ≤ Ax
+x+ − A+x− − A −x+ + A−x−.
68 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
+x− − A−x+ + A −x− ≤ Ax
+x+ − A+x− − A −x+ + A−x−.
68 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
+x−(t) − A−x+(t)
−x−(t) + B+d(t) − B−d(t),
+x+(t) − A+x−(t) − A −x+(t)
69 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
+x−(t) − A−x+(t)
−x−(t) + B+d(t) − B−d(t),
+x+(t) − A+x−(t) − A −x+(t)
69 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
+x−(t) − A−x+(t)
−x−(t) + B+d(t) − B−d(t),
+x+(t) − A+x−(t) − A −x+(t)
69 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
1 2 3 4 5 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
x(t)
70 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
1 2 3 4 5 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
x(t) x(t), x(t)
70 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
dynamics
N
N
≥0
71 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
dynamics
N
N
≥0
71 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
dynamics
N
N
≥0
71 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
N
i , ∆A− = N
i ,
72 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
73 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
1 2 3 4 5 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
x(t) x(t), x(t)
74 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
75 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
76 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
A(θ)∈Cδ H
H
[x∈x(t),x(t)] R(x, π(x))
77 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
78 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
79 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
80 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
80 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
Optimistic evaluation of paths at the leaves for all dynamics Worst-case aggregation
dynamics min
m
Optimal planning of action sequences max
a
81 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
i (n) def
m∈[1,M]
t=0 γtrt + γd 1−γ
a∈A br ia(n)
− log 1/γ
logκ
82 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
83 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
84 -Reinforcement Learning for Autonomous Driving- Edouard Leurent
85 -Reinforcement Learning for Autonomous Driving- Edouard Leurent