lunarlander v2 using deep reinforcement learning
play

LunarLander-v2 using Deep Reinforcement Learning A project - PowerPoint PPT Presentation

LunarLander-v2 using Deep Reinforcement Learning A project developed for Autonomous Agents Course PLH513 Portokalakis Petros February 2020 Simple Game 8-Dimensional state space 4 actions per state +100 points for landing


  1. LunarLander-v2 using Deep Reinforcement Learning A project developed for Autonomous Agents Course PLH513 Portokalakis Petros February 2020

  2. Simple Game 8-Dimensional state space ● 4 actions per state ● +100 points for landing ● -100 points when crashed ● Infinite fuel, but -0.3 points per ● frame when firing main engine +10 for each leg ground contact (to ● encourage smooth landing)

  3. Deep Reinforcement Learning Objective: approximate the optimal Q-Function (which satisfies the Bellman Equation) Neural network: 8 node input layer - dimensionality of state space ● 150 node fully connected 1st hidden layer ● 128 node fully connected 2nd hidden layer ● 4 node output layer - q-values for actions ● 4 layer approach works well with a variety of hidden layer node number 5 layers prove insufficient to even train the agent

  4. Deep Reinforcement Learning: Advancing performance Experience replay: Every tuple(s,a,r,s’,done) is stored in a replay buffer (maxlength=1M) ● Randomly sample a batch of previous experiences (64). Break correlation ● between consecutive samples Predict best action for all items in the batch via the NN ● Update neural network weights ● Generate episodes via exploration or exploitation ●

  5. Deep Reinforcement Learning: Advancing performance Calculating loss between output Q-value and target Q-value requires a seconds ● pass to the network for the next state s and s’ share the same network and have one step difference ● Optimization becomes unstable ● Target network: Use an identical network to the policy network, but update target network weight’s every C iterations (C is a hyperparameter) First pass occures with the policy network Second pass occures with the target network

  6. Deep Reinforcement Learning: Advancing performance Abstract version of the agent algorithm implemented

  7. Deep Reinforcement Learning: Performance of Lunar Lander

  8. Deep Reinforcement Learning: Performance of Lunar Lander Adding a third hidden layer

  9. Deep Reinforcement Learning: Hyperparameter Tuning Hyperparameter Value Starting epsilon 1 Minimum epsilon 0.01 Decay factor of epsilon 0.99 Discount factor gamma 0.99 Learning rate 0.001 Batch size 64 Replay buffer 1000000

  10. Thank you Questions? Contact: pportokalakis@gmail.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend