From CVA to the Resolution of a Large Number of Small Random Systems - - PowerPoint PPT Presentation

from cva to the resolution of a large number of small
SMART_READER_LITE
LIVE PREVIEW

From CVA to the Resolution of a Large Number of Small Random Systems - - PowerPoint PPT Presentation

From CVA to the Resolution of a Large Number of Small Random Systems Lokman Abbas-Turki First part from a joint work with M. A. Mikou Last part from a joint work with S. Graillat UPMC, LPMA 6 April 2016 Lokman (UPMC, LPMA) GPU Tech. Conf.


slide-1
SLIDE 1

From CVA to the Resolution of a Large Number of Small Random Systems

Lokman Abbas-Turki

First part from a joint work with M. A. Mikou Last part from a joint work with S. Graillat

UPMC, LPMA

6 April 2016

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 1 / 38

slide-2
SLIDE 2

Plan

Introduction From linear/linear to linear/nonlinear From nonlinear/linear to nonlinear/nonlinear Simulation algorithms Without funding constraints With funding constraints Adaptation issues to American options The difference with the references LDLt Householder tridiagonalization + PCR Divide and conquer for eigenproblem Some simulation results Conclusion

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 2 / 38

slide-3
SLIDE 3

Introduction

Plan

Introduction From linear/linear to linear/nonlinear From nonlinear/linear to nonlinear/nonlinear Simulation algorithms Without funding constraints With funding constraints Adaptation issues to American options The difference with the references LDLt Householder tridiagonalization + PCR Divide and conquer for eigenproblem Some simulation results Conclusion

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 3 / 38

slide-4
SLIDE 4

Introduction From linear/linear to linear/nonlinear

Credit Valuation Adjustment

In a financial transaction between a party C that has to pay another party B some amount V , the CVA value is the price of the insurance contract that covers the default of party C to pay the whole sum V . CVAt,T = (1 − R)Et

  • V +

τ 1t<τ≤T

  • (1)

R is the recovery to make if the counterparty defaults (Assume R = 0),

τ is the random default time of the counterparty,

T is the protection time horizon.

Numerical simulation

CVA0,T ≈

N−1

  • k=0

E

  • V +

tk 1τ∈(tk ,tk+1]

  • ,

(2) N ≤ the number of time steps used for SDEs discretization.

Importance

Hold sufficient amount of liquid assets to face the counterparty default.

Basel III includes the calculation of the CVA (Credit Valuation Adjustment) as an important part of the prudential rules.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 4 / 38

slide-5
SLIDE 5

Introduction From linear/linear to linear/nonlinear

Various kind of contracts

Simulating assets St = (S1

t , ..., Sd t ) trajectories then contracts trajectories

to get Vt as a sum: Vt =

  • ie

φexp

ie (St) +

  • ii

φeui

ii (St) +

  • id

φeud

id,t(St) +

  • ia

φam

ia,t(St),

(3) where ie, ii, id and ia are the exposure indices and: φexp explicit function, for example: φexp(Stk ) = S1

tk − S2 tk .

φeui is a path-independent European contract, φeui(Stk ) = E(f eui(ST )|Stk ). (4) φeud

t

is a path-dependent European contract, φeud

tk (Stk ) = E(f eud tk

(Stk+1)|Stk ), (5) for example: f eud

tk

(Stk+1) = ( max

i=0,..,k S1 ti ∨ S1 tk+1 − S2 tk+1)+.

φam

t

is an American contract, involving an optimal stopping problem φam

tk (Stk ) = f (Stk ) ∨ E(φam tk+1(Stk+1)|Stk )

(6) with f an explicit payoff that generally does not depend on the asset path.

Common problem

ϕ(x) = E(f (Stk+1)|Stk = x).

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 5 / 38

slide-6
SLIDE 6

Introduction From nonlinear/linear to nonlinear/nonlinear

TVA definition

Total valuation adjustment (>CVA+DVA+FVA), it covers:

Both defaults: τ = τ c ∧ τ b, CVA and DVA.

Funding our risk and the risk of the counterparty: Nonlinear BSDE part, FVA.

  • S. Crépey(2012)

Ignoring the external funding and denoting βt = e−

t

0 rudu where r is the

risk-free short rate process, Θ satisfies the following BSDE on [0, τ ∧ T] βtΘt = E

  • βτ1τ<T (Vτ − Rτ) +

τ∧T

t

βsgs(Vs − Θs)ds

  • Gt
  • (7)

where G is the extension of F by the natural filtration generated by τ c and by τ b. R is the total close-out cash-flow specified thanks to CSA (Credit Support Annex) and g is the funding coefficient.

TVA BSDE simulation

Only for European contracts.

Requires a good approximation of the exposure V .

Practitioners usually use rough approximations.

No trustable procedure in the general case.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 6 / 38

slide-7
SLIDE 7

Introduction From nonlinear/linear to nonlinear/nonlinear

WOLOG: V = P

For one American contract

Pt is the American derivative exposition.

Let τ ∗ ∈ [0, T] be the optimal stopping time associated to Pt.

Theorem

Θ = ˜ ΘJ + (1 − J)ξτ, where Jt = 1{τ>t} and the pre-default TVA ˜ Θ satisfies the following BSDE on [0, τ ∗] ˜ βt ˜ Θt = Et τ∗

t

˜ βs ˜ gs(Ps − ˜ Θs)ds

  • , ˜

gt(Pt − ˜ Θt) = gt(Pt − ˜ Θt) + γt ˜ ξt. (8)

˜ βt = e−

t

0(γu+ru)du. ◮

˜ ξt :=

1 Et(1{τ>t}) Et(ξt1{τ>t}) with ξt := Pt − E(R1{t=¯ τ}|Gt).

Extension

Possible to extend (8) on a portfolio of various American options.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 7 / 38

slide-8
SLIDE 8

Simulation algorithms

Plan

Introduction From linear/linear to linear/nonlinear From nonlinear/linear to nonlinear/nonlinear Simulation algorithms Without funding constraints With funding constraints Adaptation issues to American options The difference with the references LDLt Householder tridiagonalization + PCR Divide and conquer for eigenproblem Some simulation results Conclusion

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 8 / 38

slide-9
SLIDE 9

Simulation algorithms

Without funding constraints

CVA0,T =

N−1

  • k=0

E

  • P+

k+11τ∈(kh,(k+1)h]

  • ,

h = T N .

With funding constraints

Θk = Ek (Θk+1 + hg(k + 1, Pk+1, Θk+1)) , ΘN = 0.

The exposure

Pk(x) = E

  • Φk,τk (Sτk )|Sk = x
  • with

τN = N, ∀k ∈ {N − 1, ..., 0}, τk = k1Γk + τk+11Γc

k ,

(9) where Γk =

  • Φk+1,k(Sk) > E
  • Pk+1(Sk+1)
  • Sk
  • . The conditional

expectation involved in Γk is approximated using a regression on a basis

  • f monomial functions where Kk is its cardinal.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 9 / 38

slide-10
SLIDE 10

Simulation algorithms

Without funding constraints

CVA0,T =

N−1

  • k=0

E

  • P+

k+11τ∈(kh,(k+1)h]

  • ,

h = T N .

With funding constraints

Θk = Ek (Θk+1 + hg(k + 1, Pk+1, Θk+1)) , ΘN = 0.

An example of a two stage simulation with M0 = 2, M6 = 8 and M8 = 4

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 10 / 38

slide-11
SLIDE 11

Simulation algorithms

CVA0,T approximation

  • CVA0,T =

N−1

  • k=0

1 M0

M0

  • i=1

F 2

k+1

  • P1(Si

1), ...,

Pk+1(Si

k+1)

  • (10)

With F 1

k+1(x1, ..., xk+1) = E

  • 1τ∈(kh,(k+1)h]|P1 = x1, ..., Pk+1 = xk+1
  • ,

F 2

k+1(x1, ..., xk+1) = (xk+1)+F 1 k+1(x1, ..., xk+1).

(11)

Θk approximation

                   For k = 1, ..., N − 1

  • Θk(x)= tψ(x)Ψ−1

k

  1 M0

M0

  • j=1

ψ(Sj

k)

  • Θk+1(Sj

k+1) + 1

N g

  • k+1,

Θk+1(Sj

k+1),

Pk+1(Sj

k+1)

 and ΘN(x) = 0,

  • Θ0(S0) =

1 M0

M0

  • j=1
  • Θ1(Sj

1) + 1

N g

  • 1,

Θ1(Sj

1),

P1(Sj

1)

  • .

(12) Where Ψk = T   1 M0

M0

  • i=0

ψ( Si

k)tψ(

Si

k)

  with: {Si}i∈{1,...,M0} and { Si}i∈{1,...,M0} are two independent simulations of the underlying asset S, ψ is a basis of monomial functions where K is its cardinal and T is an operator that must satisfy some desired properties.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 11 / 38

slide-12
SLIDE 12

Simulation algorithms Without funding constraints

Theorem

E

  • CVA0,T − CVA0,T

2 ≤ N2 M0 max

k∈{0,...,N−1} Var

  • F 2

k+1

  • P1(Si

1), ...,

Pk+1(Si

k+1)

  • +

N

  • j=1

1 4NM2

j

  • E
  • Vj(Si

j )f ′′ j (Pj(Si j ))F 3 j (P1(Si 1), ..., Pj(Si j ))

2 +

N

  • j=1

1 4NM2

j

  E   Vj(Si

j )f ′′ j (Pj(Si j )) N−1

  • k=j

F 4

k+1(P1(Si 1), ..., Pj(Si j ), Si j )

   

2

+

N

  • j=1

N 4M2

j

  • E
  • Vj(Si

j )F 1 j (P1(Si 1), ..., Pj(Si j ))|Pj(Si j ) = 0

  • ϕj(0)

2 + N

N

  • j=1

(N − j + 1)2O

  • 1

M4

j

  • Where

ϕj is the density of Pj(Si

j ), Vj(x) = Var

Mj

  • Pj(x) − Pj(x)
  • .

Good choices

If ϕj(0) is big then take Mj ∼ √M0, otherwise Mj ∼ √M0/N. In both cases, N must be small when compared to √M0. When American options are involved, make sure that K 3

j /Mj is small enough.

For example

Mj = N − j N − 1 M1 with either M1 = √M0 N

  • r M1 =
  • M0.

(13)

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 12 / 38

slide-13
SLIDE 13

Simulation algorithms With funding constraints

Theorem

As long as {Θi(x)}0≤i≤N−1 are of class Cs on the support of S ∈ Rd, there exists a positive constant C such that for each 0 ≤ k ≤ N − 1 E

  • Θk(Si

k) − Θk(Si k)

2 ≤ CK N2

N−1

  • l=k

  E   Vl+1(Sj

l+1)∂2 Pg

  • l + 1, Θl+1(Sj

l+1), Pl+1(Sj l+1)

  • 2Ml

   

2

+O

  • K

M0 + K 2 N2M0 + K N4M2

l

+ K 1−2s/d N2 + K −2s/d

  • .

Good choice

Take Ml ∼

  • M0/N, N must be sufficiently small N ∼ 10. When

American options are involved, make sure that K 3

l /Ml is small enough.

For example

Ml = N − l N − 1 M1 with M1 =

  • M0

N . (14)

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 13 / 38

slide-14
SLIDE 14

Adaptation issues to American options

Plan

Introduction From linear/linear to linear/nonlinear From nonlinear/linear to nonlinear/nonlinear Simulation algorithms Without funding constraints With funding constraints Adaptation issues to American options The difference with the references LDLt Householder tridiagonalization + PCR Divide and conquer for eigenproblem Some simulation results Conclusion

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 14 / 38

slide-15
SLIDE 15

Adaptation issues to American options

An example of a two stage simulation with M0 = 2, M6 = 8 and M8 = 4 Inner dynamic programming

τN = N, ∀k ∈ {N − 1, ..., 0}, τk = k1Γk + τk+11Γc

k ,

(15) where Γk =

  • Φk+1,k(Sk) > Rk · ψ(Sk)
  • . The vector Rk minimizes the

quadratic error ||Pk+1(Sk+1) − Rk · ψ(Sk)||L2 (16) and denoting Ak = E (ψ(Sk)tψ(Sk)), we get Rk = A−1

k

E (ψ(Sk)Pk+1(Sk+1)) . (17)

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 15 / 38

slide-16
SLIDE 16

Adaptation issues to American options The difference with the references

Three main methods for symmetric big matrices

Cholesky factorization

  • V. Volkov and J. Demmel. LU, QR and Cholesky Factorizations using

Vector Capabilities of GPUs. Berkeley Technical Report. 2008.

  • G. Ballard, J. Demmel, O. Holtz and O. Schwartz,

Communication-Optimal Parallel and Sequential Cholesky Decomposition. SIAM J. SCI. COMPUT. 32(6), 3495–3523. 2010.

Tridiagonal form + cyclic reduction

  • Y. Zhang , J. Cohen and J. D. Owens. 15th ACM SIGPLAN Symposium
  • n Principles and Practice of Parallel Programming, 127–136. 2010.

  • D. Goddeke and R. Strzodka. Cyclic Reduction Tridiagonal Solvers on

GPUs Applied to Mixed Precision Multigrid. Parallel and Distributed Systems, IEEE Trans. 22(1), 22–32. 2010.

Tridiagonal form + eigenproblem

  • J. W. Demmel, O. A. Marques, B. N. Parlett and C. Vomel. Performance

and Accuracy of Lapack’s Symmetric Tridiagonal Eigensolvers. SIAM J.

  • SCI. COMPUT. 30(3), 1508–1526. 2008.

  • C. Vomel, S. Tomov and J. Dongarra. Divide & Conquer on Hybrid

Gpu-Accelerated Multicore Systems. SIAM J. SCI. COMPUT. 34(2), 70–82. 2012.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 16 / 38

slide-17
SLIDE 17

Adaptation issues to American options The difference with the references

None of the previous works can be used directly

The reason

Large number of small random linear systems: The size does not exceed 64 and the communication is reduced.

Some of these random systems could be ill-conditioned.

  • Ak,l =

1 Mk

Mk

  • j=1

ψl(S(j)

tk )ψl(S(j) tk )t

(18)

Typical condition numbers for linear regression n = 30 in the Black & Scholes model

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 17 / 38

slide-18
SLIDE 18

Adaptation issues to American options The difference with the references

Must we systematically use Householder tridiagonalization with divide & conquer when we suspect the random linear systems to be ill-conditioned?

Our answer

Perform Householder tridiagonalization O(4n3/3) and solve the linear systems cheaply using parallel cyclic reduction O(n log2(n)).

Take a decision according to the value of the residue error:

* If the residue error is small then we already have good

solutions.

* Otherwise, we must perform divide & conquer O(4n3/3)

diagonalizations and discard the smallest eigenvalues.

The next time we solve this same kind of linear systems:

* If they used to be well-conditioned then we just process LDLt

O(n3/6).

* Otherwise we execute directly the combination of

Householder tridiagonalization and divide & conquer diagonalization.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 18 / 38

slide-19
SLIDE 19

Adaptation issues to American options LDLt

Shared occupation n(n + 1)/2 + n and complexity O(n3/6)

A = LDLt, Dj,j = Aj,j −

j−1

  • k=1

L2

j,kDk,k,

Li,j = 1 Dj,j  Ai,j −

j−1

  • k=1

Li,kLj,kDk,k   if i > j. (19)

Standard LDLt parallel strategy

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 19 / 38

slide-20
SLIDE 20

Adaptation issues to American options LDLt

Three different versions studied

1. An SIMD version that requires only independent threads, one for each linear system. 2. A collaborative version that involves n collaborative threads for each linear system with n unknowns. 3. An optimal hybrid solution that involves n∗ (n∗ < n) collaborative threads for each linear system with n unknowns.

The speedup of the collaborative and the hybrid versions when compared to the SIMD implementation.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 20 / 38

slide-21
SLIDE 21

Adaptation issues to American options LDLt

(a) Optimal number of collaborative threads (b) Number of systems solved within a second

(a) (b)

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 21 / 38

slide-22
SLIDE 22

Adaptation issues to American options Householder tridiagonalization + PCR

Householder tridiagonalization: Shared occupation n2 + 2n and complexity O(4n3/3)

1. An SIMD version that requires only independent threads, one for each linear system. 2. A collaborative version that involves n collaborative threads for each linear system with n unknowns.

For symmetric A

U = Ht

3...Ht nAHn...H3 =

             d1 c1 c1 d2 c2 c2 d3 ... ... ... ... ... ... cn−1 cn−1 dn              , (20) with each Householder matrix H given by H = I − uut/b, b = utu/2. (21)

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 22 / 38

slide-23
SLIDE 23

Adaptation issues to American options Householder tridiagonalization + PCR

From CR: Shared occupation 3n and complexity O(nlog2(n))

CR when n = 8

e1 e7 e5 z2 z8 z4 z6

Stepg1:gForwardgreductiongtog ag4-unknowngsystemginvolvingg z2,gz4,gz6gandgz8 Stepg2:gForwardgreductiongto ag2-unknowngsystemginvolvingg z4gandgz8 Stepg3:gSolveg2-unknowngsystem Stepg4:gBackwardgsubstitutiongto solvegthegrestg2gunknowns Stepg5:gBackwardgsubstitutiongto solvegthegrestg4gunknowns

A A A A A A A A A A A A A A A A A A A A A A A A A A A A

A A A A A A

A

A

e3 z3 z1 z5 z7 e1 e2 e3 e4 e5 e6 e7 e8 e'8 e'6 e'4 e'2 e''4 e''8 e'6 e'2 z4 z8 z2 z4 z6 z8

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 23 / 38

slide-24
SLIDE 24

Adaptation issues to American options Householder tridiagonalization + PCR

To a new version of PCR: Shared occupation 4n and complexity O(nlog2(n))

PCR when n = 8

Step 1: Reduced to 2 systems

  • f 4 unknowns

Step 2: Reduced to 4 systems

  • f 2 unknowns

Step 3: Solve

A A A A A A A A A A A A A A A A A A A A A A e'7 e'5 e'3 e'1 A A A A A A A A A A A A A A A A A A A A e''6 e''2 e''5 e''1 e''8 e''4 e''7 e''3 e'7 e'5 e'3 e'1 e'2 e'4 e'6 e'8 A A A A A A A A A A A A A A A A z4 z2 z3 z1 z8 z6 z7 z5 e''7 e''3 e''5 e''1 e''2 e''6 e''4 e''8 e'8 e'6 e'4 e'2 e4 e3 e2 e1 e5 e6 e7 e8

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 24 / 38

slide-25
SLIDE 25

Adaptation issues to American options Householder tridiagonalization + PCR

         d1 c1 c1 d2 c2 c2 d3 c3 c3 d4 c4 c4 d5 c5 c5 d6 c6 c6 d7          (R) − →          d′

1

c′

2

d′

2

c′

3

c′

2

d′

3

c′

4

c′

3

d′

4

c′

5

c′

4

d′

5

c′

6

c′

5

d′

6

c′

6

d′

7

         (P) − →          d′

1

c′

2

c′

2

d′

3

c′

4

c′

4

d′

5

c′

6

c′

6

d′

7

d′

2

c′

3

c′

3

d′

4

c′

5

c′

5

d′

6

         (R) − →          d′′

1

c′′

2

d′′

3

c′′

4

c′′

2

d′′

5

c′

4

d′

7

d′′

2

c′′

3

d′′

4

c′′

3

d′′

6

         (P) − →          d′′

1

c′′

2

c′′

2

d′′

5

d′′

3

c′′

4

c′′

4

d′

7

d′′

2

c′′

3

c′′

3

d′′

6

d′′

4

        

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 25 / 38

slide-26
SLIDE 26

Adaptation issues to American options Householder tridiagonalization + PCR

(a) Number of PCRs + two matrix/vector multiplications performed per second. (b) LDLt vs. tridiagonal + PCR

(a) (b)

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 26 / 38

slide-27
SLIDE 27

Adaptation issues to American options Divide and conquer for eigenproblem

For ill-conditioned systems

Standard procedure

Tridiagonal Householder decomposition A = QUQt where Q is orthogonal and U is symmetric tridiagonal.

Divide & conquer algorithm for symmetric tridiagonal eigenproblems to establish U = ODOt where O is orthogonal and D is diagonal.

Discard the smallest eigenvalues of D that provide a condition number larger than 105. U =                   d1 c1 c1 ... ... ... ... cm−1 cm−1 dm − cm dm+1 − cm cm+1 cm+1 ... ... ... ... cn−1 cn−1 dn                   + cm1m,m+11t

m,m+1

= U1 U2

  • + cm1m,m+11t

m,m+1

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 27 / 38

slide-28
SLIDE 28

Adaptation issues to American options Divide and conquer for eigenproblem

Shared occupation 2n(n + 2) + 21+⌊log2(n−1)⌋ and complexity O(4n3/3)

Steps

1. U = O1 O2 D1 D2

  • + cmuut

Ot

1

Ot

2

  • where u =

Ot

1

Ot

2

  • 1m,m+1 =

last column of Ot

1

first column of Ot

2

  • .

2. Let Λ = {λ1, ..., λn}, ordered family of eigenvalues of D1 D2

  • . If

cm = 0 and the eigenvalue λ of U satisfies λ / ∈ Λ, then its value is

  • btained as a solution of

n

  • i=1

u2

i

λi − λ + 1 cm = 0. (23) 3. From u and the solutions of (23), Löwner’s Theorem provides vector u that is used to compute the eigenvector Vλ of D1 D2

  • + cmuut

4. Let W = (Vλ)λ eigenvalue of U, we get the eigenvectors of U thanks to the multiplication Q1 Q2

  • W .

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 28 / 38

slide-29
SLIDE 29

Adaptation issues to American options Divide and conquer for eigenproblem

Additional details on steps 1. and 2.

Step 1.

Advantage: Pure divide and conquer algorithm, it prevents to have eigenvalues of multiplicity bigger than two at each conquering step.

Step 2.

Using Gragg’s scheme (based on Newton’s method): Choose hk such that hk(λ) = xk,0 + xk,1/(λk − λ) + xk,2/(λk+1 − λ) matches

n

  • i=1

u2

i

λi − λ + 1 cm at its root ∈ (λk, λk+1) up to the second derivative. Advantage: Cubic monotonic convergence.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 29 / 38

slide-30
SLIDE 30

Adaptation issues to American options Divide and conquer for eigenproblem

Comparison with Householder tridiagonalization

Execution time comparison: TDC/THouseholder

10 20 30 40 50 60 70 2 3 4 5 6 7 8 9 10 System size

Commments

Small matrices.

Iterative algorithm to solve the secular equation.

Divergence produced by deflation.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 30 / 38

slide-31
SLIDE 31

Some simulation results

Plan

Introduction From linear/linear to linear/nonlinear From nonlinear/linear to nonlinear/nonlinear Simulation algorithms Without funding constraints With funding constraints Adaptation issues to American options The difference with the references LDLt Householder tridiagonalization + PCR Divide and conquer for eigenproblem Some simulation results Conclusion

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 31 / 38

slide-32
SLIDE 32

Some simulation results

Within less than 1 minute simulation on GPU: M0 = 131K, N = 10, Neds = 50

European Path-dependent

  • ption

Φ(ST ) =

  • S1

T

2 + S2

T

2 − S

3 T

  • +

M1 Θ0 Θ0 std CVA0,T CVA0,T std √M0 N 0.01364 4 ∗ 10−5 0.0296 2 ∗ 10−4 √M0 √ N 0.01307 4 ∗ 10−5 0.0294 2 ∗ 10−4 √M0 0.01265 3 ∗ 10−5 0.0291 2 ∗ 10−4

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 32 / 38

slide-33
SLIDE 33

Some simulation results

Within less than 1 minute simulation on GPU: M0 = 131K, N = 10, Neds = 50

European Path-dependent

  • ption

Φ(ST ) =

  • 3S1

T

10 + 7S2

T

10 − S

3 T

  • +

  • 7S1

T

10 + 3S2

T

10 − S

3 T

  • +

M1 Θ0 Θ0 std CVA0,T CVA0,T std √M0 N 2.72 ∗ 10−3 10−5 0.0365 8 ∗ 10−4 √M0 √ N 2.44 ∗ 10−3 10−5 0.0453 8 ∗ 10−4 √M0 2.28 ∗ 10−3 10−5 0.0520 8 ∗ 10−4 √ N√M0 2.24 × 10−3 10−5 0.0528 8 × 10−4

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 33 / 38

slide-34
SLIDE 34

Some simulation results

Within less than 1 minute simulation on GPU: M0 = 131K, N = 10, Neds = 50 and n = 10

American option

Φ(ST ) =

  • K − S1

T

3 − S2

T

3 − S3

T

3

  • +

M1 Θ0 Θ0 std CVA0,T CVA0,T std √M0 √ N 0.0242 10−4 0.0356 2 ∗ 10−4 √M0 0.0229 10−4 0.0351 2 ∗ 10−4

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 34 / 38

slide-35
SLIDE 35

Conclusion

Plan

Introduction From linear/linear to linear/nonlinear From nonlinear/linear to nonlinear/nonlinear Simulation algorithms Without funding constraints With funding constraints Adaptation issues to American options The difference with the references LDLt Householder tridiagonalization + PCR Divide and conquer for eigenproblem Some simulation results Conclusion

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 35 / 38

slide-36
SLIDE 36

Conclusion

Mathematical and computing work suited to GPUs

Mathematical part

Extending CVA (TVA) on American options.

Judicious choice of the number of inner and outer trajectories.

Computing part

CUDA source code of: LDLt, Householder reduction, parallel cyclic reduction that is not necessary a power of two and divide and conquer for eigenproblem.

Execution time comparison of the different methods mentioned above.

Original method to further optimize the adaptation of LDLt to our context.

Original parallel cyclic reduction that can be used for any vector size and not only a power of two.

Precise answer to the following question: Must we systematically use Householder tridiagonalization with divide & conquer when we suspect the random linear systems to be ill-conditioned?

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 36 / 38

slide-37
SLIDE 37

Conclusion

Future work

Further developments on the use of Nested Monte Carlo on GPUs for BSDEs.

Studying the rounding errors and error propagation.

Use CADNA library to test each procedure: http://www-pequan.lip6.fr/cadna/

Source code

http://www.proba.jussieu.fr/~abbasturki/soft.htm

Premia library: https://www.rocq.inria.fr/mathfi/Premia/index.html

References

L.A. Abbas-Turki and Stef Graillat. Resolution of a large number of small random symmetric linear systems in single precision arithmetic on GPUs: https://hal.archives-ouvertes.fr/hal-01295549

L.A. Abbas-Turki and M.A. Mikou. TVA on American Derivatives: https://hal.archives-ouvertes.fr/hal-01142874

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 37 / 38

slide-38
SLIDE 38

The End

Thank you Questions?

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 38 / 38