Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, - - PowerPoint PPT Presentation

β–Ά
deep rl and memory
SMART_READER_LITE
LIVE PREVIEW

Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, - - PowerPoint PPT Presentation

Trade and Manage Wealth with Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, Founder, CEO, Head R&D Problem Retail investor customer demands Manage portfolio more actively Get additional return from smart


slide-1
SLIDE 1

Trade and Manage Wealth with Deep RL and Memory

Daniel Egloff, Founder, CEO, Head R&D March 26, 2018 NVIDIA GTC 2018

slide-2
SLIDE 2

Problem

Retail investor customer demands

  • Manage portfolio more actively
  • Get additional return from smart investment decisions

Regulations limits access to hedge fund style active products Existing actively managed products charge high fees Current digital platforms focus on passively managed products

Lack of products Market barriers Costs

slide-3
SLIDE 3

Question

Can modern AI technology replace a PM or trader?

slide-4
SLIDE 4

Solution – AI Agents

Smart data-aware AI agents for active investment decisions

  • AI powered alternative to smart beta ETFs
  • AI supervised and automated trading strategies

Fully automated without human interaction Don’t miss market

  • pportunities

Delegate work to smart agents Extra return from smart data driven decisions

Smart returns Save costs Save time

slide-5
SLIDE 5

Market Validation

Intergeneration wealth transfer Digital channels Growth of Smart Beta ETFs Investor behavior change

Smart beta 30% growth in

  • 2017. Source: EY EFT report

2017.

Reduced margins Robo advisors Growth of

  • nline brokers

Growth of ETF market 90% of the robo-advisors today are ETF-based. ETFs alone have run out of steam to fuel the next growth of robo advisors. Robo advisors need more than 6 years to make a profit of a customer, post acquisition. Source: Burnmark report April 2017.

12tn wealth transferring from 1920/30 to 1946/64 generation. Source: Burnmark report April 2017.

slide-6
SLIDE 6

AI Foundations

  • Several recent innovations in AI and Big Data
  • Deep Reinforcement Learning
  • Differentiable Neural Computer
  • Large scale data streaming infrastructures from eCommerce
slide-7
SLIDE 7

Classical Portfolio Construction

  • Information bottleneck
  • Signal based, not end-to-end
  • Partial and staged data usage
  • Many data sources cannot be integrated with current

factor based portfolio construction

  • Retrospective
  • Design, fit, deploy, re-engineer offline
  • Missing feedback link and on-line learning
  • Difficult to account for nonlinear dependencies
slide-8
SLIDE 8

Supervised learning

Signal Based

Market state St Trade rule system πœ„2 Forecasting system πœ„1 P&L/Utility 𝑉(πœ„1, πœ„2) Transaction costs

Information bottleneck Error Trades Signals

Next Market state St+1

  • Weights
slide-9
SLIDE 9

Reinforcement Learning Based

  • Full information in portfolio weights and trades
  • Feedback loop to improve on good decisions and

avoid unsuccessful decisions

  • Allows for more realistic modeling of intelligence of a

successful PM or trader

  • Much easier process
slide-10
SLIDE 10

Reinforcement Learning Based

Market state St Next Market state St+1 Trade system πœ„1 P&L/Utility 𝑉(πœ„1) Transaction costs

Delay Weights/Trades Reinforcement learning 𝑉(πœ„1)

slide-11
SLIDE 11

AI Trading Agents

My performance I learn from Frequency 1h Universe

LOB

News News

AAPL GOOG BAC JMP AMZN WMT VIX Price

Objective Risk Excess return Medium Scenarios

Historic Live

Stats

Bull Bear Crash

Stats Stats Stats

Stats

Strategy & Style

Training Data

Streaming architecture

AI Agent Online Learning Initial Training

Brokers & Exchanges

1 3 5 4 2 2 5 5

Online architecture

5 5

Batch architecture

2

PaaS to design, train, test, deploy and run agents

1 2 3 4 5

slide-12
SLIDE 12

Challenges and Insights

slide-13
SLIDE 13

Reinforcement Learning Setup

Learning a behavioral strategy which maximizes long term sum of rewards by a direct interaction with an unknown and uncertain environment

Agent Environment

Reward State Action While not terminal do: Agent perceives state st Agent performs action at Agent receives reward rt Environment evolves to state st+1

slide-14
SLIDE 14

RL - State

Environment state

  • What is the market state?
  • Which data required
  • Price data top of the book
  • LOB L1, L2
  • LOB messages
  • Secondary and non standard data
  • Event data versus time clocked data
  • How to combine agent and market state to

environment state?

Agent Environment

slide-15
SLIDE 15

RL - Policy

Agent policy specification

  • What is the agent action?
  • Continuous action for percentages of wealth
  • Discrete units of lots to buy/sell
  • Order implementation using market/limit orders
  • Long only vs. long/short
  • Long only vs. long/short?
  • Long only agent do not face bankruptcy
  • Short position can lead to bankruptcy

Agent Environment

slide-16
SLIDE 16

RL - Policy

Distributions on the simplex

  • Commonly known distributions (Dirichlet, …) not appropriate
  • Exploiting less known Hilbert space structure on (open) simplex

leading to isometry to Euclidian space (Aitchison)

Agent Environment

Isometry Pull back normal distribution, Student-t, etc.

slide-17
SLIDE 17

Interaction of agent and environment

  • Market evolution, LOB resilience
  • Temporal and permanent market impact
  • Position change
  • Order cancellations
  • Partial fills

RL - Interaction

Agent Environment Market prices Agent positions Target trades

State Policy

Executed trades

Market liquidity

Market prices Agent positions

New state Market evolution and impact Filled trades Time

slide-18
SLIDE 18

Our 6 Main Challenges

  • Sparse trading, learn to use cash and wait for opportunities
  • Robustness of RL
  • Scaling up RL training
  • Handling high resolution event time series data
  • Adapting agents to changing markets while not forgetting
  • Explaining agent decisions and behavior
slide-19
SLIDE 19

Sparse Trading

  • Reward modelling, including realistic transaction cost modelling
  • Adding risk to give cash a value
  • Properly balance risk and reward
  • Combining tree search and RL or option framework

to learn to postpone trading

slide-20
SLIDE 20

Robustness

  • Reward modelling
  • Very long history
  • Looking at different scales of time series
  • Training on synthesized data, e.g. reconstruct prices

from skewed sampling from empirical return distribution

slide-21
SLIDE 21
  • Breaking long episodes into partial episodes with differential

memory

Scaling up RL

Environment reaction Partial roll out Environment states

Partial states DNC p s

Estimated initial state Policy Action

s a

………

s a s a

slide-22
SLIDE 22

High Resolution Event TS

  • New hybrid RNN – CNN network topology
  • Properly apply convolution over time and cross section
  • Cross section should be permutation invariant!
  • Convolution at different time frequencies
  • Residual NN

OHLC OHLC too simplistic

slide-23
SLIDE 23

Adapting while not Forgetting

  • New attention mechanism relative to prior attention with penalty
  • Prior attention reflects β€œagent style”

History New

Important time marked in prior Important time marked in prior Prioritization

  • f new data
slide-24
SLIDE 24

Explaining Agent Decisions

  • Learning supervised model to explain agent returns
  • Compare to different ETF and investment products following a

specific investment style

  • Value
  • Growth
  • Momentum
  • Mean reversion
slide-25
SLIDE 25

Contact Info www.flink.ai | daniel@flink.ai