Solving High-dimensional PDEs Using Deep Learning Jiequn Han The - PowerPoint PPT Presentation

Solving High-dimensional PDEs Using Deep Learning Jiequn Han The Program in Applied & Computational Mathematics, Princeton University Joint work with Weinan E and Arnulf Jentzen Inverse Problems and Machine Learning, Caltech, February 9, 2018 1 / 32

Outline 1. Introduction 2. Mathematical Formulation 3. Neural Network Approximation 4. Numerical Examples 5. Summary 2 / 32

Table of Contents 1. Introduction 2. Mathematical Formulation 3. Neural Network Approximation 4. Numerical Examples 5. Summary 3 / 32

Well-known Examples of PDEs • The Schr¨ odinger equation in quantum many-body problem, i � ∂ ∂t Ψ( t, x ) = ( − 1 2∆ + V )Ψ( t, x ) . • The Black-Scholes equation for pricing financial derivatives, v t + 1 � � σσ T ( Hess x v ) 2 Tr + r ∇ v · x − rv = 0 . • The Hamilton-Jacobi-Bellman equation in stochastic control (dynamic programming), � 1 � � � σσ T ( Hess x v ) v t + max 2 Tr + ∇ v · b + f = 0 . u 4 / 32

Curse of Dimensionality • The dimension of PDEs can be easily large in practice. Equation Dimension (roughly) Schr¨ odinger equation # of electrons × 3 Black-Scholes equation # of underlying financial assets HJB equation the same as the state space • A key computational challenge is the curse of dimensionality: the complexity is exponential in dimension d for finite difference/element method – usually unavailable for d ≥ 4 . • There is a huge gap between PDE modelings and computational algorithms. 5 / 32

Remarkable Success of Deep Learning • Machine learning/data analysis also face the same curse of dimensionality • In recent years, deep learning has achieved remarkable success • An old but essential idea: represent functions in a compositional form rather than additive 6 / 32

Related Work in High-dimensional Case • Linear parabolic PDEs: Monte Carlo methods based on the Feynman-Kac formula • Semilinear parabolic PDEs: 1. branching diffusion approach (Henry-Labord` ere 2012, Henry-Labord` ere et al. 2014) 2. multilevel Picard approximation (E et al. 2016) • Hamilton-Jacobi PDEs: using Hopf formula and fast convex/nonconvex optimization methods (Darbon & Osher 2016, Chow et al. 2017) 7 / 32

Semilinear Parabolic PDE We consider a general semilinear parabolic PDE in [0 , T ] × R d : ∂u ∂t ( t, x ) + 1 � � σσ T ( t, x )( Hess x u )( t, x ) 2Tr + ∇ u ( t, x ) · µ ( t, x ) � = 0 . � t, x, u ( t, x ) , σ T ( t, x ) ∇ u ( t, x ) + f • Terminal condition is given: u ( T, x ) = g ( x ) . • To fix ideas, we are interested in the solution at t = 0 , x = ξ for some vector ξ ∈ R d . 9 / 32

Connection between PDE and BSDE • The link between parabolic PDEs and backward stochastic differential equations (BSDEs) has been extensively investigated (Pardoux & Peng 1992, El Karoui et al. 1997, etc). • In particular, Markovian BSDEs give a nonlinear Feynman-Kac representation of some nonlinear parabolic PDEs. • Consider the following BSDE � t � t  X t = ξ + µ ( s, X s ) ds + σ ( s, X s ) dW s ,     0 0 � T � T ( Z s ) T dW s ,  Y t = g ( X T ) + f ( s, X s , Y s , Z s ) ds −    t t The solution is an adapted process { ( X t , Y t , Z t ) } t ∈ [0 ,T ] with values in R d × R × R d . 10 / 32

Connection between PDE and BSDE • Under suitable regularity assumptions, the BSDE is well-posed and related to the PDE in the sense that for all t ∈ [0 , T ] it holds a.s. that Z t = σ T ( t, X t ) ∇ u ( t, X t ) . Y t = u ( t, X t ) and • In other words, given the stochastic process satisfying � t � t X t = ξ + µ ( s, X s ) ds + σ ( s, X s ) dW s , 0 0 the solution of PDE satisfies the following SDE u ( t, X t ) − u (0 , X 0 ) � t � ds � s, X s , u ( s, X s ) , σ T ( s, X s ) ∇ u ( s, X s ) = − f 0 � t [ ∇ u ( s, X s )] T σ ( s, X s ) dW s . + 0 11 / 32

BSDE and Control – A LQG Example Consider a classical linear-quadratic-Gaussian (LQG) control problem in R d : √ √ dX t = 2 λ m t dt + 2 dW t , � � T 0 � m t � 2 with cost functional J ( { m t } 0 ≤ t ≤ T ) = ❊ 2 dt + g ( X T ) � . The HJB equation for this problem is ∂u ∂t ( t, x ) + ∆ u ( t, x ) − λ �∇ u ( t, x ) � 2 2 = 0 . The optimal control is given by t = ∇ u ( t, x ) ( recall Z t = σ T ( t, X t ) ∇ u ( t, X t )) . m ∗ √ , 2 λ In the context of BSDE for control, Y t denotes the optimal value and Z t denotes the optimal control (up to a constant scaling). 12 / 32

Neural Network Approximation • Key step: approximate the function x �→ σ T ( t, x ) ∇ u ( t, x ) at each discretized time step t = t n by a feedforward neural network σ T ( t n , X t n ) ∇ u ( t n , X t n ) = ( σ T ∇ u )( t n , X t n ) ≈ ( σ T ∇ u )( t n , X t n | θ n ) , where θ n denotes neural network parameters. • Observation: we can stack all the subnetworks together to form a deep neural network (DNN) as a whole, based on the time discretization (see the next two slides). 14 / 32

Time Discretization We consider the simple Euler scheme of the BSDE, with a partition of the time interval [0 , T ] , 0 = t 0 < t 1 < . . . < t N = T : X t n +1 − X t n ≈ µ ( t n , X t n ) ∆ t n + σ ( t n , X t n ) ∆ W n , and u ( t n +1 , X t n +1 ) − u ( t n , X t n ) � ∆ t n � t n , X t n , u ( t n , X t n ) , σ T ( t n , X t n ) ∇ u ( t n , X t n ) ≈ − f + [ ∇ u ( t n , X t n )] T σ ( t n , X t n ) ∆ W n , where ∆ t n = t n +1 − t n , ∆ W n = W t n +1 − W t n . 15 / 32

Network Architecture Figure: Network architecture for solving parabolic PDEs. Each column corresponds to a subnetwork at time t = t n . The whole network has ( H + 2)( N − 1) layers in total. 16 / 32

Optimization • This network takes the paths { X t n } 0 ≤ n ≤ N and { W t n } 0 ≤ n ≤ N as the input data and gives the final output, denoted by ˆ u ( { X t n } 0 ≤ n ≤ N , { W t n } 0 ≤ n ≤ N ) , as an approximation to u ( t N , X t N ) . • The error in the matching of given terminal condition defines the expected loss function � 2 � �� l ( θ ) = ❊ � g ( X t N ) − ˆ u � { X t n } 0 ≤ n ≤ N , { W t n } 0 ≤ n ≤ N . • The paths can be simulated easily. Therefore the commonly used SGD algorithm fits this problem well. • We call the introduced methodology deep BSDE method since we use the BSDE and DNN as essential tools. 17 / 32

Time Discretization as Skip Connection Why such deep networks can be trained? Intuition: there are skip connections between different subnetworks u ( t n +1 , X t n +1 ) − u ( t n , X t n ) � ∆ t n � t n , X t n , u ( t n , X t n ) , ( σ T ∇ u )( t n , X t n | θ n ) ≈ − f + ( σ T ∇ u )( t n , X t n | θ n ) ∆ W n , 18 / 32

Analogy to Deep Reinforcement Learning • Deep Reinforcement Learning (DRL) has achieved great success in game domains and sophisticated control tasks. A common strategy is to represent policy function (control) through neural networks. • Recall that in the example of LQG control problem, Z t denotes the optimal control, which is approximated by neural networks. Table: Informal analogy Deep BSDE method DRL BSDE ← → Markov decision model gradient of the solution ← → optimal policy function 19 / 32

Implementation • Each subnetwork has 4 layers, with 1 input layer ( d -dimensional), 2 hidden layers (both d + 10 -dimensional), and 1 output layer ( d -dimensional). • Choose the rectifier function (ReLU) as the activation function and optimize with Adam method. • Implement in Tensorflow and reported examples are all run on a Macbook Pro. • Github: https://github.com/frankhan91/DeepBSDE 21 / 32

LQG Example Revisited We solve the introduced HJB equation in [0 , 1] × R 100 . It admits an explicit formula, which allows accuracy test: √ u ( t, x ) = − 1 � �� λ ln exp − λg ( x + 2 W T − t ) . ❊ 4.7 Deep BSDE Solver 4.6 Monte Carlo 4.5 u(0,0,...,0) 4.4 4.3 4.2 4.1 4.0 0 10 20 30 40 50 lambda Figure: Left: Relative error of the deep BSDE method for u ( t =0 , x =(0 , . . . , 0)) when λ = 1 , which achieves 0 . 17% in a runtime of 330 seconds. Right: Optimal cost u ( t =0 , x =(0 , . . . , 0)) against different λ . 22 / 32

Black-Scholes Equation with Default Risk • The classical Black-Scholes model can and should be augmented by some important factors in real markets, including defaultable securities, transactions costs, uncertainties in the model parameters, etc. • Ideally the pricing models should take into account the whole basket of financial derivative underlyings, resulting in high-dimensional nonlinear PDEs. • To test the deep BSDE method, we study a special case of the recursive valuation model with default risk (Duffie et al. 1996, Bender et al. 2015). 23 / 32

Solving High-dimensional PDEs Using Deep Learning Jiequn Han The - PowerPoint PPT Presentation

Solving High-dimensional PDEs Using Deep Learning Jiequn Han The Program in Applied & Computational Mathematics, Princeton University Joint work with Weinan E and Arnulf Jentzen Inverse Problems and Machine Learning, Caltech, February 9,

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Network Mathematical Mysteries for High Dimensional Learning Stphane Mallat

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

On some applied problems using nonlinear elliptic PDEs C. Finlay Department of Mathematics and

Sparse methods and high dimensional parametric PDEs Albert Cohen Laboratoire Jacques-Louis

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Branching for PDEs Xavier Warin CEMRACS July Xavier Warin Branching for PDEs

AM 205: lecture 16 Last time: hyperbolic PDEs Today: parabolic and elliptic PDEs,

AM 205: lecture 16 Last time: hyperbolic PDEs Today: parabolic and elliptic PDEs,

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Solving Differential Equations through Means of Deep Learning Juliane Braunsmann February 8,

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli with Bart Goethals, Emmanuel

Deep Neural Networks for PDEs Philipp Grohs DL and Vis, September 2018 Short Reading List 1 Ian

Towards Evaluating the Robustness of Neural Networks Nicholas Carlini and David Wagner

Reasoning with Deep Learning: an Open Challenge Marco Lippi marco.lippi@unimore.it Marco Lippi

Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2014 A. G.

Neural Network Part 2: Regularization Yingyu Liang Computer Sciences 760 Fall 2017

Deep Learning for Broad Coverage Semantics: SRL, Coreference, and Beyond Luke Zettlemoyer *

Deep Canonical Correlation Analysis Galen Andrew 1 Raman Arora 2 Jeff Bilmes 1 Karen Livescu 2 1

Deep Argument Inspection Linux Plumbers Conference 2019 Kees Cook <keescook@chromium.org>

Deep Equilibrium Models Shaojie Bai Carnegie Mellon University joint work with J. Zico Kolter

Solving High-dimensional PDEs Using Deep Learning Jiequn Han The - PowerPoint PPT Presentation

Solving High-dimensional PDEs Using Deep Learning Jiequn Han The Program in Applied & Computational Mathematics, Princeton University Joint work with Weinan E and Arnulf Jentzen Inverse Problems and Machine Learning, Caltech, February 9,

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Network Mathematical Mysteries for High Dimensional Learning Stphane Mallat

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

On some applied problems using nonlinear elliptic PDEs C. Finlay Department of Mathematics and

Sparse methods and high dimensional parametric PDEs Albert Cohen Laboratoire Jacques-Louis

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Branching for PDEs Xavier Warin CEMRACS July Xavier Warin Branching for PDEs

AM 205: lecture 16 Last time: hyperbolic PDEs Today: parabolic and elliptic PDEs,

AM 205: lecture 16 Last time: hyperbolic PDEs Today: parabolic and elliptic PDEs,

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Solving Differential Equations through Means of Deep Learning Juliane Braunsmann February 8,

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli with Bart Goethals, Emmanuel

Deep Neural Networks for PDEs Philipp Grohs DL and Vis, September 2018 Short Reading List 1 Ian

Towards Evaluating the Robustness of Neural Networks Nicholas Carlini and David Wagner

Reasoning with Deep Learning: an Open Challenge Marco Lippi marco.lippi@unimore.it Marco Lippi

Introduction to Deep Learning A. G. Schwing &amp; S. Fidler University of Toronto, 2014 A. G.

Neural Network Part 2: Regularization Yingyu Liang Computer Sciences 760 Fall 2017

Deep Learning for Broad Coverage Semantics: SRL, Coreference, and Beyond Luke Zettlemoyer *

Deep Canonical Correlation Analysis Galen Andrew 1 Raman Arora 2 Jeff Bilmes 1 Karen Livescu 2 1

Deep Argument Inspection Linux Plumbers Conference 2019 Kees Cook &lt;keescook@chromium.org&gt;

Deep Equilibrium Models Shaojie Bai Carnegie Mellon University joint work with J. Zico Kolter

Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2014 A. G.

Deep Argument Inspection Linux Plumbers Conference 2019 Kees Cook <keescook@chromium.org>