Inverse Optimization and Equilibrium with Applications in Finance - PowerPoint PPT Presentation

Inverse Optimization and Equilibrium with Applications in Finance and Statistics Jong-Shi Pang Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign presented at SPECIAL SEMESTER on Stochastics with Emphasis on Finance at RICAM Linz, Austria Monday October 27, 2008, 10:50–11:40 AM 1

Contents of Presentation • A preface • What is inverse optimization? • 3 applications — cross validated support-vector regression — optimal mixing in statistics — implied volatility of American options • What is the methodology? — Focusing on concepts and ideas; omitting technical details. 2

Preface Up to now, most inverse problems in mathematics involve the inversion of partial-differential equations (pde’s)—the forward models—in the presence of observed and/or experimental data. They lead to optimization problems with pde constraints. In contrast, the kind of inverse problems we are interested in involves optimization or equilibrium problems as the forward models, and requires the solution of finite-dimensional optimization problems with algebraic inequality together with certain complementarity constraints. The latter inverse problems require theories and methods of contemporary optimization and variational inequalities where inequalities provide the key challenges. Inequalities lead to non-smoothness, multi-valuedness, and dis- junctions, which are the atypical characteristics in modern computational mathematics. 3

Forward versus Inverse Optimization Optimization pertains to the computation of the maximum or mimimum value of an objective function in the presence of constraints, which are, for all practical purposes, expressed in terms of a finite number of algebraic equations and/or inequalities de- fined by a finite number of decision variables. Traditional optimization is a forward process; namely, input data are fed into the optimization model, yielding a resolution of the problem—the model solution. Inverse Optimization attempts to build improved optimization models for the goal of better generalization, by choosing the model parameters so that the model solution could optimize a secondary objective, e.g., reproducing an observed solution, either exactly or as closely as possible. 4

An illustration: Inverse convex quadratic programming Given a set Ω and an outer objective function θ , find ( x, Q, A, b, c ): minimize θ ( x, Q, A, b, c ) ( x,Q,A,b,c ) subject to ( x, Q, A, b, c ) ∈ Ω 2 ( x ′ ) T Qx ′ + c T x ′ 1 x ∈ argmin x ′ and Ax ′ ≤ b subject to 3 salient features: • for each ( Q, A, b, c ) there is a lower-level quadratic program (in box) • for which an optimal solution x is sought such that the upper-level constraint ( x, Q, A, b, c ) ∈ Ω is satisfied • a tuple ( x, Q, A, b, c ) with the above properties is sought to minimize the upper-level objective function θ ( x, Q, A, b, c ). 5

Bilevel support-vector regression Given a finite set of in-sample data points { ( x i , y i ) } n i =1 , fit a hyperplane y = x T w + b by solving the convex quadratic program for ( w, b ): n | w T x i + b − y i | − ε, 0 + 1 � � 2 w T w � minimize max C ( w,b ) i =1 for given ( C, ε ) > 0. Let ( w ( C, ε ) , b ( C, ε )) be optimal. The inverse problem is to choose ( C, ε ) to minimize an error of a set of out-of-sample data, such as n + k | w ( C, ε ) T x j + b ( C, ε ) − y j | . � minimize ( C,ε ) j = n +1 Extension to the statistical methodology of cross validation, in- cluding the leave-one-out validation. 6

The (inner-level) SVM quadratic program n � e i + 1 2 w T w minimize C ( w,b ) i =1 subject to for all i = 1 , · · · , n w T x i + b − y i − ε e i ≥ e i ≥ − w T x i − b + y i − ε, and e i ≥ 0 , and its Karush-Kuhn-Tucker optimality conditions: ⊥ e i − w T x i − b + y i + ε ≥ 0  0 ≤ λ + i     ⊥ e i + w T x i + b − y i + ε ≥ 0 0 ≤ λ − i = 1 , · · · , n i   0 ≤ e i ⊥ C − λ + − λ − ≥ 0   m m � ( λ + i − λ − � ( λ + i − λ − i ) x i , 0 = and 0 = i ) , i =1 i =1 where ⊥ denotes the complementary slackness condition; thus 0 ≤ a ⊥ b ≥ 0 if and only if [ a = 0 ≤ b ] or [ a ≥ 0 = b ]. 7

The bilevel SVM problem n + k � minimize e j ( C,ε,e,w,b ) j = n +1 e j ≥ | w T x j + b − y j | , subject to j = n + 1 , · · · , n + k and the inner SVM KKT conditions . • An instance of a linear program with linear complementarity constraints, abbreviated as an LPCC; i.e., a linear program except for the disjunctive complementarity slackness constraints. • As such, it is a nonconvex optimization problem, albeit of a very special kind. • In this application, the inverse process is to optimize an out-of-sample error based on an in-sample training set of data. 8

Extension: Cross-validated support-vector regression T : a positive integer (the number of folds) T � Ω = Ω t : a partitioning of the data into disjoint subgroups t =1 N t : index set of data in Ω t , with complement N t . The fold t training subproblem: C 2 � w t � 2 | ( w t ) T x i + b t − y i | − ε, 0 � 1 � � minimize 2 + max | N t | ( w t ,b t ) ∈ℜ n +1 i ∈N t − w ≤ w t ≤ w, subject to for feature selection yielding the fold t loss, which depends on the choice of ( C, ε, w ): | ( w t ) T x i + b t − y i | . � i ∈N t 9

Cross-validated support-vector regression ( cont. ) Given C > C ≥ 0, ε > ε ≥ 0, and w ub > w lb , T 1 | ( w t ) T x i + b t − y i | � � minimize | N t | C, ε, w t =1 i ∈N t { ( w t ,b t ) } T t =1 w lb ≤ w ≤ w ub subject to C ≤ C ≤ C, ε ≤ ε ≤ ε, and for t = 1 , · · · , T , C � ( w t ) T x i + b t − y i � � − ε, 0 2 � w t � 2 1 �� ( w t , b t ) ∈ � argmin 2 + max |N t | ( w t ,b t ) i ∈N t − w ≤ w t ≤ w subject to • same ( C, ε, w ) across all folds • can easily accommodate other convex loss functions and constraints • extension to parameterized kernel selection • other tasks, such as classification and semi-supervised learning, can be similarly handled. 10

A bilevel maxi-likelihood approach to target classification Problem. Given data points for 2 target classes, identified by the columns in the 2 matrices: X I � � X I , 1 , · · · X I ,d � ∈ R n 1 × d , for target class I X II � � X II , 1 , · · · X II ,d � ∈ R n 2 × d , for target class II , determine a statistical model to classify future data as type I or II. Our approach is as follows: • Aggregate the data via a common set of weights: w ∈ W ⊆ R d , obtaining the aggregated data: X I , II w . • Apply an m -term mixture Gaussian model: � x − µ i m � � 2 � 1 � − 1 Ψ( x, µ, σ, p ) � √ p i exp 2 2 π σ i σ i i =1 to the aggregated data X I , II w . 11

• Determine the mixing coefficients p i via a log-likelihood maximization: n 1 ,n 2 p I , II ∈ � log Ψ( X I , II argmax j • w, µ, σ, p ) p j =1 m � subject to p i = 1 , p ≥ 0 . i =1 • The overall process chooses the parameters ( p I , II , w, µ, σ ) by maximizing a measure of separation between the two classes based on the given data X I , II : θ ( p I , II , w, µ, σ ) argmax p I , II ,w,µ,σ subject to w ∈ W n 1 ,n 2 p I , II ∈ argmax log Ψ( X I , II � and j • w, µ, σ, p ) p j =1 m � subject to p i = 1 , p ≥ 0 . i =1 12

Pricing American Options: the vanilla Black-Scholes model Consider the forward pricing of an American put/call option of an underlying asset whose (random) price pattern S ( t ) satisfies the stochastic differential equation: dS = ( µ S − D ( S, t ) ) dt + σ ( S, t ) S dW, where µ drift of the price process prevalent interest rate, assumed constant r D ( S, t ) dividend rate of the asset σ ( S, t ) non-constant volatility of the asset standard Wiener process with mean zero and variance dt . dW Let the Black-Scholes operation be denoted by 2 σ 2 ( S, t ) S 2 ∂ 2 L BS � ∂ ∂S 2 + ( r S − D ( S, t ) ) ∂ ∂t + 1 ∂S − r. 13

The forward pricing model The American option price V ( S, t ) satisfies the partial differential linear complementarity system: for ( S, t ) ∈ (0 , ∞ ) × [0 , T ], 0 ≤ V ( S, t ) − Λ( S, t ) ⊥ L BS ( V ) ≤ 0 , plus boundary conditions at terminal time t = T and extreme asset values S = 0 , ∞ , where time of expiry of option T Λ( S, t ) payoff function of option at expiry. The complementarity expresses the early exercise feature of an American option. 14

The discretized complementarity problem Discreting time and asset values, obtain a finite-dimensional linear complementarity problem, parameterized by the asset volatilities: 0 ≤ V − Λ ⊥ q ( σ ) + M ( σ ) V ≥ 0 , where V � { V ( mδS, nδt ) } is the vector of approximated option prices at times t = nδt and asset values S = mδS . With suitable discretization, M ( σ ) is a strictly row diagonally dominant, albeit not always symmetric, matrix for fixed σ . Extensions to multiple-state problems, such as options on several assets, models with stochastic volatilities and interest rates, as well as some exotic options. 15

Inverse Optimization and Equilibrium with Applications in Finance - PowerPoint PPT Presentation

Inverse Optimization and Equilibrium with Applications in Finance and Statistics Jong-Shi Pang Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign presented at SPECIAL SEMESTER on Stochastics

LABOR MARKET EQUILIBRIUM Competitive Equilibrium I Equilibrium as the intersection of supply and

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

Dynamic Inverse Problems: Schmitt Efficient Algorithms and Approximate Inverse Problems

PHYSICS OF BIOLOGICAL SYSTEMS ph549 LECTURE 9 Energy and Equilibrium LIFE and ENERGY

Chemical Equilibrium Chemical equilibrium occurs when a reaction and its reverse reaction proceed

Inverse Kinematics Inverse Kinematics Inverse Kinematics Carnegie Carnegie Sebastian Grassia

Course on Inverse Problems Albert Tarantola Lesson VI: a) General Formulation of the Inverse

Equilibrium Refinements Mihai Manea MIT Sequential Equilibrium In many games information is

New Tier 1 Boron Guideline for Alberta Greg Huber, M.Sc., P.Eng., PMP (Equilibrium) Anthony

fi Finnish Centre of Excellence in Inverse Problems Research p. 1/28 1 Inverse problem in

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Inverse Functions Inverse Functions If f is a one-to-one function with domain A and range B , we

Composition of functions and inverse functions Goals: 1. Increase fluency in the notation of

1. Algorithms for Inverse Reinforcement Learning 2. Apprenticeship learning via Inverse

Beyond Nash Equilibrium: Solution Concepts for the 21st Century Joe Halpern and many

Inverse Problems Recovering x 0 R N from noisy observations y = x 0 + w R P Inverse

Networks Mostly adapted from Goodfellows 2016 NIPS tutorial:

On equilibrium problems on the real axis. Applications Ram on Orive Universidad de La Laguna.

90 Years of Computability and Complexity Stathis Zachos National Technical University of Athens

Greenhouse Gas Emissions Andre Barbe USAEE Annual Conference November 14, 2017 Disclaimer This

Parameterized Two-Player Nash Equilibrium Danny Hermelin, Chien-Chung Huang, .. Stefan Kratsch,

Noise vs Computational unpredictability in dynamics Crist obal Rojas Joint with M. Braverman

Monte Carlo methods for magnetic systems Zoltn Nda Babe-Bolyai University Dept of

Bi-Lipschitz Solutions to the Prescribed Jacobian Inequality in the Plane and Applications to