Machine Learning
Reinforcement learning Hamid Beigy
Sharif University of Technology
Fall 1396
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32
Machine Learning Reinforcement learning Hamid Beigy Sharif - - PowerPoint PPT Presentation
Machine Learning Reinforcement learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32 Table of contents Introduction 1 Non-associative reinforcement
Sharif University of Technology
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32
1
2
3
4
5
6
7
8
9
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 2 / 32
1
2
3
4
5
6
7
8
9
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 3 / 32
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 3 / 32
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 4 / 32
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 5 / 32
. .
starting position
b c* d e e*
c
g
.
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 6 / 32
Agent Environment
action at st reward rt rt+1 st+1 state Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 7 / 32
1
2
3
4
5
6
7
8
9
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 8 / 32
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 8 / 32
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 9 / 32
b expQt(b)/τ
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 10 / 32
1
2
3
1
2
3
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 11 / 32
b r−1 + pj(k)(1 − b)
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 12 / 32
r
i=1
k→∞ E[M(k)] < M(0)
k→∞ E[M(k)] = min i
k→∞ E[M(k)] < min i
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 13 / 32
1
2
3
4
5
6
7
8
9
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 14 / 32
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 14 / 32
1
2
3
4
5
6
7
8
9
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 15 / 32
∞
k=0
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 15 / 32
1
2
3
4
5
6
7
8
9
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 16 / 32
ss′
ss′
search
high low
1, 0 1–β , –3 search recharge wait wait
search
1–α , R β , R
search
α, Rsearch 1, Rwait 1, Rwait
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 16 / 32
k=0
π
s′
ss′
ss′ + γV π(s′)
k=0
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 17 / 32
π
π
s,a s a s' r a' s' r (b) (a)
max max
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 18 / 32
1
2
3
4
5
6
7
8
9
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 19 / 32
a
a
s′
ss′
ss′ + γV π(s′)
a′ Q∗(st+1, a′)|st = s, at = a}
s′
ss′
ss′ + γ max a′ Q∗(s′, a′)
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 19 / 32
E
I
E
I
E
I
E
k=0
π
s′
ss′
ss′ + γV π(s′)
s′
ss′
ss′ + γV π(s′)
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 20 / 32
a
a
s′
ss′
ss′ + γV ( ks′)
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 21 / 32
T! T! T! T!
T! T! T! T! T! T! T! T! T!
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 22 / 32
1
2
3
4
5
6
7
8
9
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 23 / 32
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 23 / 32
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 24 / 32
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 25 / 32
T! T! T! T! T! T! T! T! T! T!
T! T! T! T! T! T! T! T! T! T!
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 26 / 32
1
2
3
4
5
6
7
8
9
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 27 / 32
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 27 / 32
T! T! T! T! T! T! T! T! T! T!
T! T! T! T! T! T! T! T! T! T! Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 28 / 32
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 29 / 32
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 30 / 32
a
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 31 / 32
1
2
3
4
5
6
7
8
9
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 32 / 32
Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 32 / 32