solving high dimensional pdes using deep learning
play

Solving High-dimensional PDEs Using Deep Learning Jiequn Han The - PowerPoint PPT Presentation

Solving High-dimensional PDEs Using Deep Learning Jiequn Han The Program in Applied & Computational Mathematics, Princeton University Joint work with Weinan E and Arnulf Jentzen Inverse Problems and Machine Learning, Caltech, February 9,


  1. Solving High-dimensional PDEs Using Deep Learning Jiequn Han The Program in Applied & Computational Mathematics, Princeton University Joint work with Weinan E and Arnulf Jentzen Inverse Problems and Machine Learning, Caltech, February 9, 2018 1 / 32

  2. Outline 1. Introduction 2. Mathematical Formulation 3. Neural Network Approximation 4. Numerical Examples 5. Summary 2 / 32

  3. Table of Contents 1. Introduction 2. Mathematical Formulation 3. Neural Network Approximation 4. Numerical Examples 5. Summary 3 / 32

  4. Well-known Examples of PDEs • The Schr¨ odinger equation in quantum many-body problem, i � ∂ ∂t Ψ( t, x ) = ( − 1 2∆ + V )Ψ( t, x ) . • The Black-Scholes equation for pricing financial derivatives, v t + 1 � � σσ T ( Hess x v ) 2 Tr + r ∇ v · x − rv = 0 . • The Hamilton-Jacobi-Bellman equation in stochastic control (dynamic programming), � 1 � � � σσ T ( Hess x v ) v t + max 2 Tr + ∇ v · b + f = 0 . u 4 / 32

  5. Curse of Dimensionality • The dimension of PDEs can be easily large in practice. Equation Dimension (roughly) Schr¨ odinger equation # of electrons × 3 Black-Scholes equation # of underlying financial assets HJB equation the same as the state space • A key computational challenge is the curse of dimensionality: the complexity is exponential in dimension d for finite difference/element method – usually unavailable for d ≥ 4 . • There is a huge gap between PDE modelings and computational algorithms. 5 / 32

  6. Remarkable Success of Deep Learning • Machine learning/data analysis also face the same curse of dimensionality • In recent years, deep learning has achieved remarkable success • An old but essential idea: represent functions in a compositional form rather than additive 6 / 32

  7. Related Work in High-dimensional Case • Linear parabolic PDEs: Monte Carlo methods based on the Feynman-Kac formula • Semilinear parabolic PDEs: 1. branching diffusion approach (Henry-Labord` ere 2012, Henry-Labord` ere et al. 2014) 2. multilevel Picard approximation (E et al. 2016) • Hamilton-Jacobi PDEs: using Hopf formula and fast convex/nonconvex optimization methods (Darbon & Osher 2016, Chow et al. 2017) 7 / 32

  8. Table of Contents 1. Introduction 2. Mathematical Formulation 3. Neural Network Approximation 4. Numerical Examples 5. Summary 8 / 32

  9. Semilinear Parabolic PDE We consider a general semilinear parabolic PDE in [0 , T ] × R d : ∂u ∂t ( t, x ) + 1 � � σσ T ( t, x )( Hess x u )( t, x ) 2Tr + ∇ u ( t, x ) · µ ( t, x ) � = 0 . � t, x, u ( t, x ) , σ T ( t, x ) ∇ u ( t, x ) + f • Terminal condition is given: u ( T, x ) = g ( x ) . • To fix ideas, we are interested in the solution at t = 0 , x = ξ for some vector ξ ∈ R d . 9 / 32

  10. Connection between PDE and BSDE • The link between parabolic PDEs and backward stochastic differential equations (BSDEs) has been extensively investigated (Pardoux & Peng 1992, El Karoui et al. 1997, etc). • In particular, Markovian BSDEs give a nonlinear Feynman-Kac representation of some nonlinear parabolic PDEs. • Consider the following BSDE � t � t  X t = ξ + µ ( s, X s ) ds + σ ( s, X s ) dW s ,     0 0 � T � T ( Z s ) T dW s ,  Y t = g ( X T ) + f ( s, X s , Y s , Z s ) ds −    t t The solution is an adapted process { ( X t , Y t , Z t ) } t ∈ [0 ,T ] with values in R d × R × R d . 10 / 32

  11. Connection between PDE and BSDE • Under suitable regularity assumptions, the BSDE is well-posed and related to the PDE in the sense that for all t ∈ [0 , T ] it holds a.s. that Z t = σ T ( t, X t ) ∇ u ( t, X t ) . Y t = u ( t, X t ) and • In other words, given the stochastic process satisfying � t � t X t = ξ + µ ( s, X s ) ds + σ ( s, X s ) dW s , 0 0 the solution of PDE satisfies the following SDE u ( t, X t ) − u (0 , X 0 ) � t � ds � s, X s , u ( s, X s ) , σ T ( s, X s ) ∇ u ( s, X s ) = − f 0 � t [ ∇ u ( s, X s )] T σ ( s, X s ) dW s . + 0 11 / 32

  12. BSDE and Control – A LQG Example Consider a classical linear-quadratic-Gaussian (LQG) control problem in R d : √ √ dX t = 2 λ m t dt + 2 dW t , � � T 0 � m t � 2 with cost functional J ( { m t } 0 ≤ t ≤ T ) = ❊ 2 dt + g ( X T ) � . The HJB equation for this problem is ∂u ∂t ( t, x ) + ∆ u ( t, x ) − λ �∇ u ( t, x ) � 2 2 = 0 . The optimal control is given by t = ∇ u ( t, x ) ( recall Z t = σ T ( t, X t ) ∇ u ( t, X t )) . m ∗ √ , 2 λ In the context of BSDE for control, Y t denotes the optimal value and Z t denotes the optimal control (up to a constant scaling). 12 / 32

  13. Table of Contents 1. Introduction 2. Mathematical Formulation 3. Neural Network Approximation 4. Numerical Examples 5. Summary 13 / 32

  14. Neural Network Approximation • Key step: approximate the function x �→ σ T ( t, x ) ∇ u ( t, x ) at each discretized time step t = t n by a feedforward neural network σ T ( t n , X t n ) ∇ u ( t n , X t n ) = ( σ T ∇ u )( t n , X t n ) ≈ ( σ T ∇ u )( t n , X t n | θ n ) , where θ n denotes neural network parameters. • Observation: we can stack all the subnetworks together to form a deep neural network (DNN) as a whole, based on the time discretization (see the next two slides). 14 / 32

  15. Time Discretization We consider the simple Euler scheme of the BSDE, with a partition of the time interval [0 , T ] , 0 = t 0 < t 1 < . . . < t N = T : X t n +1 − X t n ≈ µ ( t n , X t n ) ∆ t n + σ ( t n , X t n ) ∆ W n , and u ( t n +1 , X t n +1 ) − u ( t n , X t n ) � ∆ t n � t n , X t n , u ( t n , X t n ) , σ T ( t n , X t n ) ∇ u ( t n , X t n ) ≈ − f + [ ∇ u ( t n , X t n )] T σ ( t n , X t n ) ∆ W n , where ∆ t n = t n +1 − t n , ∆ W n = W t n +1 − W t n . 15 / 32

  16. Network Architecture Figure: Network architecture for solving parabolic PDEs. Each column corresponds to a subnetwork at time t = t n . The whole network has ( H + 2)( N − 1) layers in total. 16 / 32

  17. Optimization • This network takes the paths { X t n } 0 ≤ n ≤ N and { W t n } 0 ≤ n ≤ N as the input data and gives the final output, denoted by ˆ u ( { X t n } 0 ≤ n ≤ N , { W t n } 0 ≤ n ≤ N ) , as an approximation to u ( t N , X t N ) . • The error in the matching of given terminal condition defines the expected loss function � 2 � �� �� l ( θ ) = ❊ � g ( X t N ) − ˆ u � { X t n } 0 ≤ n ≤ N , { W t n } 0 ≤ n ≤ N . • The paths can be simulated easily. Therefore the commonly used SGD algorithm fits this problem well. • We call the introduced methodology deep BSDE method since we use the BSDE and DNN as essential tools. 17 / 32

  18. Time Discretization as Skip Connection Why such deep networks can be trained? Intuition: there are skip connections between different subnetworks u ( t n +1 , X t n +1 ) − u ( t n , X t n ) � ∆ t n � t n , X t n , u ( t n , X t n ) , ( σ T ∇ u )( t n , X t n | θ n ) ≈ − f + ( σ T ∇ u )( t n , X t n | θ n ) ∆ W n , 18 / 32

  19. Analogy to Deep Reinforcement Learning • Deep Reinforcement Learning (DRL) has achieved great success in game domains and sophisticated control tasks. A common strategy is to represent policy function (control) through neural networks. • Recall that in the example of LQG control problem, Z t denotes the optimal control, which is approximated by neural networks. Table: Informal analogy Deep BSDE method DRL BSDE ← → Markov decision model gradient of the solution ← → optimal policy function 19 / 32

  20. Table of Contents 1. Introduction 2. Mathematical Formulation 3. Neural Network Approximation 4. Numerical Examples 5. Summary 20 / 32

  21. Implementation • Each subnetwork has 4 layers, with 1 input layer ( d -dimensional), 2 hidden layers (both d + 10 -dimensional), and 1 output layer ( d -dimensional). • Choose the rectifier function (ReLU) as the activation function and optimize with Adam method. • Implement in Tensorflow and reported examples are all run on a Macbook Pro. • Github: https://github.com/frankhan91/DeepBSDE 21 / 32

  22. LQG Example Revisited We solve the introduced HJB equation in [0 , 1] × R 100 . It admits an explicit formula, which allows accuracy test: √ u ( t, x ) = − 1 � ��� � � λ ln exp − λg ( x + 2 W T − t ) . ❊ 4.7 Deep BSDE Solver 4.6 Monte Carlo 4.5 u(0,0,...,0) 4.4 4.3 4.2 4.1 4.0 0 10 20 30 40 50 lambda Figure: Left: Relative error of the deep BSDE method for u ( t =0 , x =(0 , . . . , 0)) when λ = 1 , which achieves 0 . 17% in a runtime of 330 seconds. Right: Optimal cost u ( t =0 , x =(0 , . . . , 0)) against different λ . 22 / 32

  23. Black-Scholes Equation with Default Risk • The classical Black-Scholes model can and should be augmented by some important factors in real markets, including defaultable securities, transactions costs, uncertainties in the model parameters, etc. • Ideally the pricing models should take into account the whole basket of financial derivative underlyings, resulting in high-dimensional nonlinear PDEs. • To test the deep BSDE method, we study a special case of the recursive valuation model with default risk (Duffie et al. 1996, Bender et al. 2015). 23 / 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend