Monte Carlo Methods
CS60077: Reinforcement Learning Abir Das
IIT Kharagpur
Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT - - PowerPoint PPT Presentation
Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT Kharagpur Oct 05 and 06, 2020 Agenda Introduction MC Evaluation MC Control Agenda Understand how to evaluate policies in model-free setting using Monte Carlo methods
IIT Kharagpur
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 2 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 3 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 4 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 5 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 6 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 7 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 8 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 9 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 10 / 32
Agenda Introduction MC Evaluation MC Control
& ', 0
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 11 / 32
Agenda Introduction MC Evaluation MC Control
Image taken from: www.livescience.com Image taken from: www.digitaltrends.com Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 12 / 32
Agenda Introduction MC Evaluation MC Control
i=1 from the probability density p(x)
Image taken from: Nando de Freitas: MLSS 08
N N
N N
1 N N
N N
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 13 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 14 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 15 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 16 / 32
Agenda Introduction MC Evaluation MC Control
Slide courtesy: David Silver [Deepmind] Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 17 / 32
Agenda Introduction MC Evaluation MC Control
Slide courtesy: David Silver [Deepmind] Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 18 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 19 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 19 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 19 / 32
Agenda Introduction MC Evaluation MC Control
a∈A
s′∈S
CS60077 Oct 05 and 06, 2020 19 / 32
Agenda Introduction MC Evaluation MC Control
a∈A
s′∈S
CS60077 Oct 05 and 06, 2020 20 / 32
Agenda Introduction MC Evaluation MC Control
a∈A
s′∈S
a∈A
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 20 / 32
Agenda Introduction MC Evaluation MC Control
a∈A
s′∈S
a∈A
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 20 / 32
Agenda Introduction MC Evaluation MC Control
a∈A
s′∈S
a∈A
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 20 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 21 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 21 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 22 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 22 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 23 / 32
Agenda Introduction MC Evaluation MC Control
ǫ |A(s)| whereas remaining
ǫ |A(s)|, is given to the greedy action.
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 23 / 32
Agenda Introduction MC Evaluation MC Control
ǫ |A(s)| for all states and actions, for
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 24 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 25 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 25 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 25 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 26 / 32
Agenda Introduction MC Evaluation MC Control
1 Sample x(i) ∼ q(x) and u ∼ U(0,1) 2 If u <
p(x(i)) Mq(x(i)), then accept x(i) and increment counter i by 1.
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 27 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 28 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 28 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 28 / 32
Agenda Introduction MC Evaluation MC Control
N
p(x(i)) q(x(i)) is called the importance weight.
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 28 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 29 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 30 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 30 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 30 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 30 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 30 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 31 / 32
Agenda Introduction MC Evaluation MC Control
CS60077 Oct 05 and 06, 2020 31 / 32
Agenda Introduction MC Evaluation MC Control
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 31 / 32
Agenda Introduction MC Evaluation MC Control
p(x(i)) q(x(i)) = ✟✟
p(s1)π(a1|s1)✭✭✭✭
p(s2|s1,a1)π(a2|s2)✭✭✭✭
p(s3|s2,a2)···
p(s1)µ(a1|s1)✭✭✭✭
p(s2|s1,a1)µ(a2|s2)✭✭✭✭
p(s3|s2,a2)··· = π(a1|s1)π(a2|s2)··· µ(a1|s1)µ(a2|s2)··· = Ti
π(at|st) µ(at|st)
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 31 / 32
Agenda Introduction MC Evaluation MC Control
q(x(i))
p(x(i)) q(x(i))
N
Ti
π(a(i)
t |s(i) t )
µ(a(i)
t |s(i) t )
N
Ti
π(a(i)
t |s(i) t )
µ(a(i)
t |s(i) t )
Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 32 / 32