Can Increasing Input Dimensionality Improve Deep Reinforcement - - PowerPoint PPT Presentation

can increasing input dimensionality improve deep
SMART_READER_LITE
LIVE PREVIEW

Can Increasing Input Dimensionality Improve Deep Reinforcement - - PowerPoint PPT Presentation

Can Increasing Input Dimensionality Improve Deep Reinforcement Learning? Kei Ota 1 , Tomoaki Oiki 1 , Devesh K. Jha 2 , Toshisada Mariyama 1 , and Daniel Nikovski 2 1. Mitsubishi Electrics, Kanagawa, JP 2. Mitsubishi Electric Research Labs, MA,


slide-1
SLIDE 1

Kei Ota1, Tomoaki Oiki1, Devesh K. Jha2, Toshisada Mariyama1, and Daniel Nikovski2

  • 1. Mitsubishi Electrics, Kanagawa, JP
  • 2. Mitsubishi Electric Research Labs, MA, US.

Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?

slide-2
SLIDE 2
  • Deep RL algorithms have achieved impressive success

ü Can solve complex tasks X Learning representations requires a large amount of data

Introduction

https://www.youtube.com/watch?v=rQIShnTz1kU [Akkaya, 2019]

slide-3
SLIDE 3
  • Deep RL algorithms have achieved impressive success

ü Can solve complex tasks X Learning representations requires a large amount of data

  • State Representation Learning (SRL)

– Learned features are in low dimension, evolve through time, and are influenced by actions of an agent – The lower the dimensionality, the faster and better RL algorithms will learn

Introduction

Policy !

"# $#

Feature Extractor

%&'

Policy !

$# "#

Standard RL SRL + RL

slide-4
SLIDE 4
  • Deep RL algorithms have achieved impressive success

ü Can solve complex tasks X Learning representations requires a large amount of data

  • State Representation Learning (SRL)

– Learned features are in low dimension, evolve through time, and are influenced by actions of an agent – The lower the dimensionality, the faster and better RL algorithms will learn

Introduction

Policy !

"# $#

Feature Extractor

%&'

Policy !

$# "#

Standard RL SRL + RL

Can Increasing Input Dimensionality Improve Deep RL?

slide-5
SLIDE 5

!"#,%#

&",%

State-Action Feature Extractor

  • OFENet

– Train feature extractor network &" and &",% that produces high-dimensional representation !"# and !"#,%#

OFENet: Online Feature Extractor Network

!"# !"#,%# '( ) *(, '(

)

Value Function Networks

+

Policy Network

*( '(

&"

State Feature Extractor

!"#

slide-6
SLIDE 6

!"#,%#

&",%

State-Action Feature Extractor

  • OFENet

– Train feature extractor network &" and &",% that produces high-dimensional representation !"# and !"#,%# – Optimize '()* = ',-, ',-,., '/012 by learning to predict next state

OFENet: Online Feature Extractor Network

34 54

&"

State Feature Extractor

!"#

6

/012

Linear Network

',- ',-,. '/012 7()* = 8 "#,%# ∼:,; 6

/012 !"#,%# − 34=> ?

@ 34=>

– Increasing the search space allows the agent to learn much more complex policies

slide-7
SLIDE 7

Feature Extractor !

"#$ "#$,&$

  • What is best architecture to extract features?

– Deeper networks: optimization ability and expressiveness – Shallow layers: physically meaningful output

Network Architecture

Policy '

()

– Use Batch Normalization to suppress changes in input distributions

*) ()

FC concat concat

  • MLP DenseNet

– Combine advantages of deep layers and shallow layers

Value Func +

+(*), ())

FC

slide-8
SLIDE 8
  • 1. What is a good architecture that learns effective state and

state-action representations for training better RL agents?

  • 2. Can OFENet learn more sample efficient and better

performant polices when compared to some of the state-of-the- art techniques?

  • 3. What leads to the performance gain obtained by OFENet?

Experiments

slide-9
SLIDE 9
  • Compare aux. score and actual RL score to search a good architecture from:

What is a good architecture?

!"#$ = & '(,"( ∼+,,

  • +./0 1'(,"( − 3456

7

  • Aux. score: randomly collect 100K transitions for training, 20K for evaluation
  • Actual score: measure returns of SAC agent with 500K steps training

– Number of layers: 89:;<=> ∈ {1,2,3,4} for MLP, 89:;<=> ∈ {2,4,6,8} for others – Activation function: {ReLU, tanh, Leaky ReLU, swish, SELU} – Connectivity architecture: {MLP, MLP ResNet, MLP DenseNet}

1'(

FC

34

FC FC

34 1'(

FC

34 1'(

concat FC concat FC MLP Net MLP ResNet MLP DenseNet

slide-10
SLIDE 10

What is a good architecture?

  • We can select architecture

with the smallest aux. score without solving heavy RL problem!

better better

  • MLP-DenseNet consistently

achieves higher actual score

  • Smaller the aux. score,

better the actual score

slide-11
SLIDE 11

More sample efficient and better performant polices?

  • Measure performance of SAC, TD3, and PPO with and without OFENet

– No changes in hyperparameters for each algorithm

Policy !

"# $#

OFENet

%&'

Policy !

$# "#

Raw observation OFENet representation Feature Extractor

"# %&'

Feature Extractor

"# %&'

OFENet ML-DDPG

  • Compare to closest work: ML-DDPG [Munk2016]

– Reduce the dimension of the observation to one third of its original

slide-12
SLIDE 12

More sample efficient and better performant polices?

  • OFENet improves sample

efficiency and returns without changing any hyperparameters

OFE (OURS) Original ML-SAC (1/3) ML-SAC (OFE like) OFE (OURS) Original OFE (OURS) Original SAC TD3 PPO

  • OFENet effectively learns

meaningful features

slide-13
SLIDE 13

What leads to the performance gain?

  • Just increasing network size

doesnʼt improve performance

slide-14
SLIDE 14

What leads to the performance gain?

  • Just increasing network size

doesnʼt improve performance

  • BN stabilizes training
slide-15
SLIDE 15

What leads to the performance gain?

  • Just increasing network size

doesnʼt improve performance

  • BN stabilizes training
  • Decoupling feature extraction

and control policy is important

  • Online SRL handles unknown

distribution during training

slide-16
SLIDE 16
  • Proposed Online Feature Extractor Network (OFENet)

–Provides much higher-dimensional representation –Demonstrated OFENet can significantly accelerate RL

  • OFENet can be used as New RL tool box

–Just put OFENet as base layer of RL algorithms –No need to tune hyperparameters of original algorithms! –Code link: www.merl.com/research/license/OFENet

Conclusion Can increasing input dimensionality improve deep RL? Yes, it can!