Background Neural Fitted Actor-Critic Future works
Neural Fitted Actor-Critic
Matthieu Zimmer
Alain Dutech Yann Boniface
University of Lorraine, LORIA
8th July 2016
1/18
Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface - - PowerPoint PPT Presentation
Background Neural Fitted Actor-Critic Future works Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface University of Lorraine, LORIA 8 th July 2016 1/18 Background Neural Fitted Actor-Critic Future works Outline
Background Neural Fitted Actor-Critic Future works
University of Lorraine, LORIA
1/18
Background Neural Fitted Actor-Critic Future works
1
2
3
2/18
Background Neural Fitted Actor-Critic Future works
3/18
Background Neural Fitted Actor-Critic Future works
Background Neural Fitted Actor-Critic Future works
1
2
3
4
4/18
Background Neural Fitted Actor-Critic Future works
∞
5/18
Background Neural Fitted Actor-Critic Future works
∞
∞
5/18
Background Neural Fitted Actor-Critic Future works
∞
∞
∞
5/18
Background Neural Fitted Actor-Critic Future works
6/18
Unsatisfied Constraints : (1) No Continuous environments (2) No prior models of agent or environment (3) Use linear approximator (4) No prior goal states or trajectories
Background Neural Fitted Actor-Critic Future works
7/18
Background Neural Fitted Actor-Critic Future works
8/18
Background Neural Fitted Actor-Critic Future works
8/18
NFQ decisional complexity data required
Background Neural Fitted Actor-Critic Future works
Q∈Fc N
a′∈A Qk(st+1, a′)
a∈A
s1 s2 s3 a1 a2 Q(s, a) Hidden layer Inputs Outputs
9/18
CACLA CMA-ES decisional complexity data required
Background Neural Fitted Actor-Critic Future works
i,k+1 = θV i,k+1 + αvδt
i,k+1
∂θt ,
10/18
Background Neural Fitted Actor-Critic Future works
> 0 ≤ 0
Rprop
Rprop
11/18
NFAC decisional complexity data required
Background Neural Fitted Actor-Critic Future works
V ∈Fc
s1 s2 s3 V (s) Hidden layer Inputs Outputs
π∈Fa
s1 s2 s3 a1 a2 Hidden layer Inputs Outputs
12/18
Background Neural Fitted Actor-Critic Future works
13/18
Background Neural Fitted Actor-Critic Future works
14/18
Background Neural Fitted Actor-Critic Future works
15/18
Background Neural Fitted Actor-Critic Future works
16/18
Background Neural Fitted Actor-Critic Future works
k+1 = argmin Q∈Fc N
k (st+1, π(st+1))
π∈Fa N
Background Neural Fitted Actor-Critic Future works
18/18