SLIDE 27 ˆ ( , ) ( )
T i k Vi Vi k
V x W W x
ˆ ( , ) ( )
T i k ui ui k
u x W W x
1 1
ˆ ˆ ˆ ( ( ), ) ( ) ( ) ( ) ˆ ˆ ( ) ( ) ( )
T T T k Vi k k i k i k i k T T T k k i k i k Vi k
d x W x Qx u x Ru x V x x Qx u x Ru x W x
1
( ) arg min( ( ))
T T i k k k i k u
u x x Qx u Ru V x
1 1
( ) ( )
T T i k k k i k
V x x Qx u Ru V x
Standard Neural Network VFA for On-Line Implementation Define target cost function NN for Value - Critic NN for control action HDP Backpropagation- P. Werbos Implicit equation for DT control- use gradient descent for action update
( ) ( ) 1 ( 1) ( ) ( )
ˆ ˆ ˆ ( ( )
T T k k i j i j i k ui j ui j ui j
x Qx u Ru V x W W W
1 1 ( ) 1
( ) ˆ ( )(2 ( ) )
T j j T T k ui ui k i j k Vi k
x W W x Ru g x W x
ˆ ˆ ( , ) ( , ) argmin ˆ ˆ ( ( ) ( ) ( , ))
T T k k k k ui W i k k k
x Qx u x W Ru x W W V f x g x u x W
(can use 2-layer NN)
1
2 1 1
arg min{ | ( ) ( ( ), ) | }
Vi
T T Vi Vi k k Vi k W
W W x d x W dx
Explicit equation for cost – use LS for Critic NN update or RLS
1 1
( ) ( ) ( ) ( ( ), , )
T T T T Vi k k k k Vi ui
W x x dx x d x W W dx
1 1 1 1 1
( ) ( ) ( , ) ( )
T T T Vi Vi k Vi k k k Vi k m m m
W W x W x r x u W x
1
( ) ( ) ( )
k k k k
x f x g x u x
Asma Al-Tamimi & F. Lewis