Monte Carlo Methods
CS60077: Reinforcement Learning Abir Das
IIT Kharagpur
Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT - - PowerPoint PPT Presentation
Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT Kharagpur Sep 06 and 12, 2019 Agenda Introduction MC Evaluation MC Control Agenda Understand how to evaluate policies in model-free setting using Monte Carlo methods
IIT Kharagpur
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 2 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 3 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 4 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 5 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 6 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 7 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 8 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 9 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 10 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 11 / 32
Agenda Introduction MC Evaluation MC Control
Image taken from: www.livescience.com Image taken from: www.digitaltrends.com Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 12 / 32
Agenda Introduction MC Evaluation MC Control
i=1 from the probability density p(x)
Image taken from: Nando de Freitas: MLSS 08
N N
N N
1 N N
N N
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 13 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 14 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 15 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 16 / 32
Agenda Introduction MC Evaluation MC Control
Slide courtesy: David Silver [Deepmind] Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 17 / 32
Agenda Introduction MC Evaluation MC Control
Slide courtesy: David Silver [Deepmind] Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 18 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 19 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 19 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 19 / 32
Agenda Introduction MC Evaluation MC Control
a∈A
s′∈S
CS60077 Sep 06 and 12, 2019 19 / 32
Agenda Introduction MC Evaluation MC Control
a∈A
s′∈S
CS60077 Sep 06 and 12, 2019 20 / 32
Agenda Introduction MC Evaluation MC Control
a∈A
s′∈S
a∈A
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 20 / 32
Agenda Introduction MC Evaluation MC Control
a∈A
s′∈S
a∈A
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 20 / 32
Agenda Introduction MC Evaluation MC Control
a∈A
s′∈S
a∈A
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 20 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 21 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 21 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 22 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 22 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 23 / 32
Agenda Introduction MC Evaluation MC Control
ǫ |A(s)| whereas remaining
ǫ |A(s)|, is given to the greedy action.
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 23 / 32
Agenda Introduction MC Evaluation MC Control
ǫ |A(s)| for all states and actions, for
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 24 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 25 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 25 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 25 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 26 / 32
Agenda Introduction MC Evaluation MC Control
1 Sample x(i) ∼ q(x) and u ∼ U(0,1) 2 If u <
p(x(i)) Mq(x(i)), then accept x(i) and increment counter i by 1.
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 27 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 28 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 28 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 28 / 32
Agenda Introduction MC Evaluation MC Control
N
p(x(i)) q(x(i)) is called the importance weight.
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 28 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 29 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 30 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 30 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 30 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 30 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 30 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 31 / 32
Agenda Introduction MC Evaluation MC Control
CS60077 Sep 06 and 12, 2019 31 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 31 / 32
Agenda Introduction MC Evaluation MC Control
p(x(i)) q(x(i)) = ✟✟
p(s1)π(a1|s1)✭✭✭✭
p(s2|s1,a1)π(a2|s2)✭✭✭✭
p(s3|s2,a2)···
p(s1)µ(a1|s1)✭✭✭✭
p(s2|s1,a1)µ(a2|s2)✭✭✭✭
p(s3|s2,a2)··· = π(a1|s1)π(a2|s2)··· µ(a1|s1)µ(a2|s2)··· = Ti
π(at|st) µ(at|st)
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 31 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
N
Ti
π(a(i)
t |s(i) t )
µ(a(i)
t |s(i) t )
N
Ti
π(a(i)
t |s(i) t )
µ(a(i)
t |s(i) t )
Abir Das (IIT Kharagpur) CS60077 Sep 06 and 12, 2019 32 / 32