Exploration Conscious Reinforcement Learning Revisited
Lior Shani* Yonathan Efroni* Shie Mannor Technion Institute of Technology
Exploration Conscious Reinforcement Learning Revisited Lior Shani* - - PowerPoint PPT Presentation
Exploration Conscious Reinforcement Learning Revisited Lior Shani* Yonathan Efroni* Shie Mannor Technion Institute of Technology Why? To learn a good policy, an RL agent must explore! However, it can cause hazardous behavior during
Lior Shani* Yonathan Efroni* Shie Mannor Technion Institute of Technology
Shani, Efroni & Mannor /19
I LOVE π-GREEDY
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 2
Shani, Efroni & Mannor /19
I LOVE π-GREEDY Damn you Exploration!
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 3
Shani, Efroni & Mannor /19
ππ·
β β argmaxπβπ¬π½ πβπ· π+π·ππ ΰ· π=π β
πΉππ ππ, ππ
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 4
Shani, Efroni & Mannor /19
ππ·
β β argmaxπβπ¬π½ πβπ· π+π·ππ ΰ· π=π β
πΉππ ππ, ππ
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 5
Shani, Efroni & Mannor /19
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 6
Iβm Exploration Conscious
Shani, Efroni & Mannor /19
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 7
Choose Greedy Action πππππππ
Shani, Efroni & Mannor /19
Choose Greedy Action πππππππ Draw Exploratory Action πππ π
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 8
w.p. π β π· ππ else
Shani, Efroni & Mannor /19
Choose Greedy Action πππππππ Draw Exploratory Action πππ π Act πππ π
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 9
w.p. π β π· ππ else
Shani, Efroni & Mannor /19
Choose Greedy Action πππππππ Draw Exploratory Action πππ π Act πππ π Recieve π, πβ²
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 10
w.p. π β π· ππ else
Shani, Efroni & Mannor /19
Choose Greedy Action πππππππ Draw Exploratory Action πππ π Act πππ π Recieve π, πβ²
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 11
Shani, Efroni & Mannor /19
Choose Greedy Action πππππππ Draw Exploratory Action πππ π Act πππ π Recieve π, πβ²
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 12
Shani, Efroni & Mannor /19
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 13
1. Update πΉππ· ππ, ππ
ππ π
2. Expect that the agent might explore in the next state πΉππ· ππ, ππ
ππ π += π½ ππ + πΉπ½ πβπ· π+π·πππΉππ· ππ+π, π
β πΉππ· ππ, ππ
ππ π
Shani, Efroni & Mannor /19
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 14
1. Update πΉππ· ππ, ππ
ππ π
2. Expect that the agent might explore in the next state πΉππ· ππ, ππ
ππ π += π½ ππ + πΉπ½ πβπ· π+π·πππΉππ· ππ+π, π
β πΉππ· ππ, ππ
ππ π
Shani, Efroni & Mannor /19
1. Update πΉππ· ππ, ππ
ππππππ
2. The rewards and next state ππ ,ππ+π are given by the acted action ππ
ππ π
πΉππ· ππ, ππ
ππππππ += π½ ππ + πΉπΉππ· ππ+π, ππ+π ππππππ
β πΉππ· ππ, ππ
ππππππ
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 15
Shani, Efroni & Mannor /19
1. Update πΉππ· ππ, ππ
ππππππ
2. The rewards and next state ππ ,ππ+π are given by the acted action ππ
ππ π
πΉππ· ππ, ππ
ππππππ += π½ ππ + πΉπΉππ· ππ+π, ππ+π ππππππ
β πΉππ· ππ, ππ
ππππππ
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 16
Shani, Efroni & Mannor /19
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 17
Training Evaluation
Shani, Efroni & Mannor /19
and evaluation regimes.
approach, can easily help to improve variety of RL algorithms.
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 18
Shani, Efroni & Mannor /19
and evaluation regimes.
approach, can easily help to improve variety of RL algorithms. SEE YOU AT POSTER #90
12-Jun-19 Exploration Conscious Reinforcement Learning revisited 19