Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, - - PowerPoint PPT Presentation
Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, - - PowerPoint PPT Presentation
Trade and Manage Wealth with Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, Founder, CEO, Head R&D Problem Retail investor customer demands Manage portfolio more actively Get additional return from smart
Problem
Retail investor customer demands
- Manage portfolio more actively
- Get additional return from smart investment decisions
Regulations limits access to hedge fund style active products Existing actively managed products charge high fees Current digital platforms focus on passively managed products
Lack of products Market barriers Costs
Question
Can modern AI technology replace a PM or trader?
Solution β AI Agents
Smart data-aware AI agents for active investment decisions
- AI powered alternative to smart beta ETFs
- AI supervised and automated trading strategies
Fully automated without human interaction Donβt miss market
- pportunities
Delegate work to smart agents Extra return from smart data driven decisions
Smart returns Save costs Save time
Market Validation
Intergeneration wealth transfer Digital channels Growth of Smart Beta ETFs Investor behavior change
Smart beta 30% growth in
- 2017. Source: EY EFT report
2017.
Reduced margins Robo advisors Growth of
- nline brokers
Growth of ETF market 90% of the robo-advisors today are ETF-based. ETFs alone have run out of steam to fuel the next growth of robo advisors. Robo advisors need more than 6 years to make a profit of a customer, post acquisition. Source: Burnmark report April 2017.
12tn wealth transferring from 1920/30 to 1946/64 generation. Source: Burnmark report April 2017.
AI Foundations
- Several recent innovations in AI and Big Data
- Deep Reinforcement Learning
- Differentiable Neural Computer
- Large scale data streaming infrastructures from eCommerce
Classical Portfolio Construction
- Information bottleneck
- Signal based, not end-to-end
- Partial and staged data usage
- Many data sources cannot be integrated with current
factor based portfolio construction
- Retrospective
- Design, fit, deploy, re-engineer offline
- Missing feedback link and on-line learning
- Difficult to account for nonlinear dependencies
Supervised learning
Signal Based
Market state St Trade rule system π2 Forecasting system π1 P&L/Utility π(π1, π2) Transaction costs
Information bottleneck Error Trades Signals
Next Market state St+1
- Weights
Reinforcement Learning Based
- Full information in portfolio weights and trades
- Feedback loop to improve on good decisions and
avoid unsuccessful decisions
- Allows for more realistic modeling of intelligence of a
successful PM or trader
- Much easier process
Reinforcement Learning Based
Market state St Next Market state St+1 Trade system π1 P&L/Utility π(π1) Transaction costs
Delay Weights/Trades Reinforcement learning π(π1)
AI Trading Agents
My performance I learn from Frequency 1h Universe
LOB
News News
AAPL GOOG BAC JMP AMZN WMT VIX Price
Objective Risk Excess return Medium Scenarios
Historic Live
Stats
Bull Bear Crash
Stats Stats StatsStats
Strategy & Style
Training Data
Streaming architecture
AI Agent Online Learning Initial Training
Brokers & Exchanges
1 3 5 4 2 2 5 5
Online architecture
5 5
Batch architecture
2
PaaS to design, train, test, deploy and run agents
1 2 3 4 5
Challenges and Insights
Reinforcement Learning Setup
Learning a behavioral strategy which maximizes long term sum of rewards by a direct interaction with an unknown and uncertain environment
Agent Environment
Reward State Action While not terminal do: Agent perceives state st Agent performs action at Agent receives reward rt Environment evolves to state st+1
RL - State
Environment state
- What is the market state?
- Which data required
- Price data top of the book
- LOB L1, L2
- LOB messages
- Secondary and non standard data
- Event data versus time clocked data
- How to combine agent and market state to
environment state?
Agent Environment
RL - Policy
Agent policy specification
- What is the agent action?
- Continuous action for percentages of wealth
- Discrete units of lots to buy/sell
- Order implementation using market/limit orders
- Long only vs. long/short
- Long only vs. long/short?
- Long only agent do not face bankruptcy
- Short position can lead to bankruptcy
Agent Environment
RL - Policy
Distributions on the simplex
- Commonly known distributions (Dirichlet, β¦) not appropriate
- Exploiting less known Hilbert space structure on (open) simplex
leading to isometry to Euclidian space (Aitchison)
Agent Environment
Isometry Pull back normal distribution, Student-t, etc.
Interaction of agent and environment
- Market evolution, LOB resilience
- Temporal and permanent market impact
- Position change
- Order cancellations
- Partial fills
RL - Interaction
Agent Environment Market prices Agent positions Target trades
State Policy
Executed trades
Market liquidity
Market prices Agent positions
New state Market evolution and impact Filled trades Time
Our 6 Main Challenges
- Sparse trading, learn to use cash and wait for opportunities
- Robustness of RL
- Scaling up RL training
- Handling high resolution event time series data
- Adapting agents to changing markets while not forgetting
- Explaining agent decisions and behavior
Sparse Trading
- Reward modelling, including realistic transaction cost modelling
- Adding risk to give cash a value
- Properly balance risk and reward
- Combining tree search and RL or option framework
to learn to postpone trading
Robustness
- Reward modelling
- Very long history
- Looking at different scales of time series
- Training on synthesized data, e.g. reconstruct prices
from skewed sampling from empirical return distribution
- Breaking long episodes into partial episodes with differential
memory
Scaling up RL
Environment reaction Partial roll out Environment states
Partial states DNC p s
Estimated initial state Policy Action
s a
β¦β¦β¦
s a s a
High Resolution Event TS
- New hybrid RNN β CNN network topology
- Properly apply convolution over time and cross section
- Cross section should be permutation invariant!
- Convolution at different time frequencies
- Residual NN
OHLC OHLC too simplistic
Adapting while not Forgetting
- New attention mechanism relative to prior attention with penalty
- Prior attention reflects βagent styleβ
History New
Important time marked in prior Important time marked in prior Prioritization
- f new data
Explaining Agent Decisions
- Learning supervised model to explain agent returns
- Compare to different ETF and investment products following a
specific investment style
- Value
- Growth
- Momentum
- Mean reversion