DS595/CS525 Reinforcement Learning
- Prof. Yanhua Li
Welcome to
Time: 6:00pm –8:50pm R Zoom Lecture Fall 2020 This lecture will be recorded!!!
DS595/CS525 Reinforcement Learning Prof. Yanhua Li Time: 6:00pm - - PowerPoint PPT Presentation
This lecture will be recorded!!! Welcome to DS595/CS525 Reinforcement Learning Prof. Yanhua Li Time: 6:00pm 8:50pm R Zoom Lecture Fall 2020 No Quiz Today Project 3 due today 3 Next Thursday: No class Happy Thanksgiving 4 Project 4
Time: 6:00pm –8:50pm R Zoom Lecture Fall 2020 This lecture will be recorded!!!
3
4
5
v https://github.com/yingxue-
v Important Dates: v Project Proposal: Thursday 11/12/2020 v Progressive report: Thursday 11/26/2020 v Final Project:
v Actor-Critic methods
§ Advanced RL Techniques
Tabular representation of reward Model-based control Model-free control (MC, SARSA, Q-Learning) Function representation of reward
(MC, SARSA, Q-Learning)
(Deep Q-Learning, Double DQN, prioritized DQN, Dueling DQN)
(Policy gradient, PPO, TRPO)
A3C, Pathwise Derivative PG) Advanced topics in RL (Sparse Rewards) Review of Deep Learning As bases for non-linear function approximation (used in 2-4). Linear reward function learning Imitation learning Apprenticeship learning Inverse reinforcement learning MaxEnt IRL MaxCausalEnt IRL MaxRelEnt IRL Non-linear reward function learning Generative adversarial imitation learning (GAIL) Adversarial inverse reinforcement learning (AIRL) Review of Generative Adversarial nets As bases for non-linear IRL Multi-Agent Reinforcement Learning Multi-agent Actor-Critic etc. Multi-Agent Inverse Reinforcement Learning MA-GAIL MA-AIRL AMA-GAIL
v Actor-Critic methods
§ Advanced RL Techniques
§ Project #4 progress update
v Value-based v Policy-based
§ Actor-Critic
(Learned Value Function) (Learned Policy Function) (Learned both Value and Policy Functions)
v Value-based
v Policy-based
§ Actor-Critic
(Learned Value Function) (Learned Policy Function) (Learned both Value and Policy Functions)
Unbiased estimator
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller, “Deterministic Policy Gradient Algorithms”, ICML, 2014 Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra, “CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING”, ICLR, 2016
Replaced ε-greedy policy with πnetwork.
v Value-based
v Policy-based
§ Actor-Critic
(Learned Value Function) (Learned Policy Function) (Learned both Value and Policy Functions)
v Actor-Critic methods
§ Advanced RL Techniques
v Noise on Action (Epsilon Greedy) v Noise on Parameters
https://arxiv.org/abs/1706.01905 https://arxiv.org/abs/1706.10295
https://blog.openai.com/better- exploration-with-parameter-noise/ Which one is action noise vs parameter noise?
https://youtu.be/yFBwyPuO2Vg
https://arxiv.org/abs/1710.02298
P r
a n d C
s ? ? ?
https://www.youtube.com/watch?v=ZhsEKTo7V04
v Actor-Critic methods
§ Advanced RL Techniques
§ Project #4 progress update
https://openreview.net/forum?id=Hk3 mPK5gg¬eId=Hk3mPK5gg
https://openreview.net/pdf?id=Hk3mPK5gg
https://arxiv.org/abs/1705.05363
v Starting from simple training examples, and
https://arxiv.org/abs/1805.08180
Tabular representation of reward Model-based control Model-free control (MC, SARSA, Q-Learning) Function representation of reward
(MC, SARSA, Q-Learning)
(Deep Q-Learning, Double DQN, prioritized DQN, Dueling DQN)
(Policy gradient, PPO, TRPO)
A3C, Pathwise Derivative PG) Advanced topics in RL (Sparse Rewards) Review of Deep Learning As bases for non-linear function approximation (used in 2-4). Linear reward function learning Imitation learning Apprenticeship learning Inverse reinforcement learning MaxEnt IRL MaxCausalEnt IRL MaxRelEnt IRL Non-linear reward function learning Generative adversarial imitation learning (GAIL) Adversarial inverse reinforcement learning (AIRL) Review of Generative Adversarial nets As bases for non-linear IRL Multi-Agent Reinforcement Learning Multi-agent Actor-Critic etc. Multi-Agent Inverse Reinforcement Learning MA-GAIL MA-AIRL AMA-GAIL