CDS Rate Construction Methods by Machine Learning Techniques - - PDF document

β–Ά
cds rate construction methods by machine learning
SMART_READER_LITE
LIVE PREVIEW

CDS Rate Construction Methods by Machine Learning Techniques - - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/317273929 CDS Rate Construction Methods by Machine Learning Techniques (Presentation Slides) Article in SSRN Electronic Journal


slide-1
SLIDE 1

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/317273929

CDS Rate Construction Methods by Machine Learning Techniques (Presentation Slides)

Article in SSRN Electronic Journal Β· January 2017

DOI: 10.2139/ssrn.2973065

CITATIONS READS

468

1 author: Some of the authors of this publication are also working on these related projects: Machine Learning and Application in Finance View project No-arbitrage conditions for CDS-term-structure, Aribitrage opportunities and Implications View project Zhongmin Luo Birkbeck, University of London

18 PUBLICATIONS 16 CITATIONS

SEE PROFILE

All content following this page was uploaded by Zhongmin Luo on 27 November 2018.

The user has requested enhancement of the downloaded file.

slide-2
SLIDE 2

CDS Rate Construction by Machine Learning Techniques

Zhongmin Luo

07-March-2017 *

Department of Economics, Mathematics and Statistics, Birkbeck, University of London and Standard Chartered Bank, London, UK jointly

* A presentation delivered at the invitation by London School of Economics. Disclaimer; thanks for feedbacks from participants and the views and opinions expressed in the presentation are those of the authors and do not necessarily reflect the practices or positions of above affiliated institutions.

Raymond Brummelhuis

Uni niversity of

  • f Re

Reims Champagne-Ardenne Department of Mathematics and Computer Science Reims, France And Department of Economics, Mathematics and Statistics, Birkbeck, University of London, UK

Raymond and Luo, Zhongmin, CDS Rate Construction Methods by Machine Learning Techniques (May 12, 2017). Available at SSRN: https://ssrn.com/abstract=2967184

slide-3
SLIDE 3

Agenda

  • 1. Set the scene and contexts
  • 2. The Machine Learning Technique based Solution

1) A very brief summary of Machine Learning Technique based CDS Proxy Methods 2) A very brief summary of three top classification performers: Neural Network (NN), Support Vector Machine (SVM) and Ensemble/Bagged Tree. 3) A very brief summary of Cross-classifier Performances including other five classifier families: Discriminant Analysis (DA), NaΓ―ve Bayes (NB), 𝑙 Nearest Neighbours

(𝑙NN), Logistic Regression (LR) and Decision Tree (DT) 4) Parameterization choices for classifiers; regularization; optimal parameterizations and tuning 5) Correlation impacts on classification performances.

  • 3. Conclusions
  • 4. Q&A

2

slide-4
SLIDE 4

The Scene: what would be our lo losses if if X or Y default?

  • Lehman declared bankruptcy on 15Sep08; on the side of leafy Green Park in London at
  • ne of largest European hedge funds, two questions pop up:
  • What would be our losses to Bank X (which has liquid CDS quotes) if it defaults within the coming

year? It’s a fair question after seeing what happened to Lehman.

  • What would be our losses to a Pension Fund Y (which doesn’t have liquid CDS quotes) if it defaults

within the coming year? It’s a tricky question!

  • To answer the second question above, financial institutions employ so-called CDS proxy

method; CDS proxies are extensively used in XVA pricing, credit risk management.

3

slide-5
SLIDE 5

1.1 .1 A Shortage of f Liq iquidity Pro

roblem

  • In response to the financial crisis in 2008, Banking regulator and Accounting Standard Board have required

Financial Institutions to measure Counterparty Default Risks and make CVA/FVA (XVA) Adjustment based on Credit Default Swap (CDS) for their counterparties, either observed or proxied based on so-called CDS Proxy Method.

  • Shortage of Liquidity problem: in reality, the vast majority of FIs’ counterparties don’t have liquid CDS quotes.

4

87% 70% 84% 90% 96% 82% 99% 94% 89% 83% 63% 91% 95%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

% of Counterparties in Regions/Sectors An European Bank Counterparty Distribution by Regions/Sectors (Overall: 84.4%; EBA Survey: >75%)

Observables Nonobservables

slide-6
SLIDE 6

1.2 .2. . Regulatory ry Cri riteria and Two Exis xisting CDS S Pro

roxy Methods

Two Existing CDS Proxy Methods

1. Credit Curve Mapping: proxies CDS rates by the mean/median of CDS rates within a Region/Sector/Rating bucket. 2. Cross-sectional Regression: explains a term-specific CDS rate for counterparty 𝑗 (denoted by 𝑇𝑗) by its response (𝛾) to the event whether the counterparty belongs to a region (𝑆), sector (𝑇), rating (𝑠) or seniority (s), indicated by respective indicator function 𝐽, estimated by running a Cross-sectional regression for each CDS term as shown below:

5

Regulatory Criteria

1. The CDS Proxy Method has to be based on an algorithm that discriminates at least 3 types of variables: Region, Industry and Credit Quality (e.g., rating). 2. Both the observable counterparty (or observables) and the nonobservable counterparty (or nonobservables) come from the same peer group defined by the above three variables. 3. The appropriateness of a Proxy CDS Spread should be determined by its CDS spread volatility across the constituents within the bucket and not by its level; i.e., any CDS Proxy Method should reflect the idiosyncratic components of counterparty default risks.

slide-7
SLIDE 7

1.3. Research Gaps and Research Objectives

Research Gaps

1. As CDS Curve Proxy Method

  • Credit Curve Mapping: the bucket-specific CDS means/medians only represent bucket-average

default risk level; neither does it represent counterparty-specific default risk nor does it represent the volatility of default risks across the counterparties within the bucket.

  • Cross-sectional Regression: for each CDS term, one regression is run for the bucket; like Curve

Mapping Method, it fails to account for counterparty-specific default risk and volatility within the

  • bucket. Furthermore, it potentially can introduce Arbitrage Opportunities by producing β€˜β€™inverted

CDS curve’’ for the bucket even in cases where many counterparties within the bucket have β€˜β€™normal’’ upward-sloping curves.

  • Bond spreads include significant liquidity premiums as indicated by literature, thus, are not

good choice for CDS Proxy.

  • 2. As Classifier Performance Comparison based on financial market data: Given the large

number of potential classifier candidates, the question of cross-classifier performance comparison arises; existing Classifier Performance Comparison studies [7][8] are based on non-financial market data.

Research Objectives

1. A research for CDS Proxy Methods based on Machine Learning Techniques. 2. Classifier Performance Comparison based on financial market data in the search for a best-of-best solution to the problem discussed on Slide #4.

6

slide-8
SLIDE 8

1.4 .4. . Mean and Std. of 5-year CDS for Europe/Banking/A-rated counterparties

7

  • On 15Sep08, Lehman Brothers EU (declared Bankrupt), CommmerzBank, Credit Suisse, Standard Chartered, Macquarie EU,

Wachovia, UniCredit, Fortis (Nationalized by Dutch government), AIB (Nationalized by Irish government), North Rock (Nationalized by UK government) are all rated β€˜A’, should they all be treated equal for counterparty default risk based on the fact that they belong to the Europe/Banking/A-rating bucket?

  • On 15Sep08, it’s obviously wrong to use one bucket-level CDS proxy spread (Average-A 370 bps in Blue below) to represent

the default risks of all counterparties, which have significant amount of idiosyncrasies as shown by the day’s CDS volatility of 620 bps (std-A in Red below) within the above Europe/Banking/A-rating bucket.

slide-9
SLIDE 9

2.1 .1 A Machine Learning Technique based Solu lution

8

Given a so-called Training Set πΈπ‘ˆwith 𝑧𝑗 for class label and 𝑦𝑗 for Feature Vector, as shown below:

Machine Learning Techniques enable us to construct a mapping called 𝐺

πœ„(𝑦)

shown below; πœ„ is learned from πΈπ‘ˆbased on a given algorithm or a Classifier Family available from Machine Learning. An algorithm from a Classifier Family with a specific parameterization choice is referred to as a Classifier. In this paper, we studied 8 Classifier Families and presented 156 Classifiers out of hundreds of different parameterization choices based on β€œtop-3” principle.

slide-10
SLIDE 10

2.2. Feature Sele

lections: based on empirical experience and lit literature

9

slide-11
SLIDE 11

2.3 .3 Eig ight Cla lassifier Families and 156 Cla lassifiers

10

  • 1. Neural Network (NN): e.g., Activation Functions, # of hidden units
  • 2. Support Vector Machine (SVM): e.g., kernel functions
  • 3. Ensemble Bagged Tree (BT): e.g., the number of learning cycles.
  • 4. Discriminant Analysis (DA): e.g., Linear/Quadratic; regularization.
  • 5. NaΓ―ve Bayes (NB): e.g., Kernel choices; bandwidth selections.

6. 𝑙 Nearest Neighbours (𝑙NN): e.g., Distance metrics; 𝑙 in 𝑙NN.

  • 7. Logistic Regression (LR).
  • 8. Decision Tree (DT): e.g., Impurity measure choices; Tree sizes.
slide-12
SLIDE 12

2.4 Eig ight Classifier Families and 156 Classifiers

11

slide-13
SLIDE 13

2.5 Cro ross/Intra-classifi fier Performance for Our CDS Pro roxy Methods

12

slide-14
SLIDE 14

2.6 A Sim imple 𝒐 Unit 2-layer Neural Network

13

Activation Functions with n Hidden Units: e.g., Sigmoid function Output transform functions: Softmax function FS/d is # of features Fitting of Neural Network

slide-15
SLIDE 15

2.7 Mathematical Representation of f Neural Network

14

Top preforming Activation Functions

Minimizing the Cross-Entropy

slide-16
SLIDE 16

2.8 Neural Network performance for r CDS Pro roxy Method

15

slide-17
SLIDE 17

2.9 Support Vector Machine for Li Linearly Separable Data

16

Maximizing the margin in case of linearly separable data

slide-18
SLIDE 18

2.10 Support Vec ector Machine for Nonlinearly Sep eparable Data

17

  • In case of non-linearly separable data, the data can be transformed into a

linearly separable data first; by limiting ourselves to 𝛾 = 𝑗 𝛽𝑗𝑦𝑗, the previous

  • ptimization problem becomes one on the Left.
  • Then, one can replace the π’€π‘ˆπ’€ with a kernel function denoted by 𝑙(π’šπ‘—, π’šπ‘˜),

which is also called β€œkernel trick” as indicated on the Right.

  • Linear kernel: 𝑙 𝑦𝑗, π‘¦π‘˜ = 𝑦𝑗

π‘ˆπ‘¦π‘˜

  • Polynomial kernel: 𝑙 𝑦𝑗, π‘¦π‘˜ = 𝛿𝑦𝑗

π‘ˆπ‘¦π‘˜ + 𝑠 𝑒 where 𝛿 > 0

  • Gaussian kernel: 𝑙 𝑦𝑗, π‘¦π‘˜ = 𝑓

π‘¦π‘—βˆ’π‘¦π‘˜ 2 2𝜏2

where 𝜏 > 0

slide-19
SLIDE 19

2.11 Performance of f SVM

18

  • Across different kernel functions and different feature selections.
slide-20
SLIDE 20

2.12 Bagged Tree / / Ensemble

19

  • Ensemble is based on a committee of learning algorithms; e.g., Bagged Trees

is based on Bootstrapping (sampling with replacement);

  • The learning outcome is determined by Majority Vote Rule from a sequence
  • f Decision Tree classification results.
slide-21
SLIDE 21

2.1 .13 Model Assessments: : K-fold Cross Validation

20

  • First, we split observable data 𝐸𝑃into 𝐿 folds typically of equal sizes

𝑬𝑷 =

𝒐=𝟐 𝑳

𝑬𝒐(𝑳)

  • Second, for π‘œ = 1,2, … , 𝐿, define holdout sample πΈπΌπ‘œ = πΈπ‘œ(𝐿) and define

the π‘œ-th Training Set by

𝑬𝑼𝒐 = 𝑬𝑷 βˆ’ 𝑬𝑰𝒐

  • Third, for π‘œ = 1,2, … , 𝐿, we apply the Classifier trained from Training Set

πΈπ‘ˆ

π‘œ to estimate

π‘§π‘œ for each data (𝑦, 𝑧) of the holdout sample πΈπΌπ‘œ and calculate the expected Misclassification Rate as:

𝝑𝒐

𝑰 =

𝟐 #𝑬𝑰𝒐

π’š,𝒛 βˆˆπ‘¬π‘°π’

𝑱(𝒛, 𝒛 π’š )

slide-22
SLIDE 22

3.1 Summary ry of f Cross-classifi fier Performances

21

  • We studied other 5 classifier families: Discriminant Analysis (DA), NaΓ―ve Bayes (NB), 𝑙

Nearest Neighbours (𝑙NN), Logistic Regression (LR) and Decision Tree (DT).

  • We rank all 156 classifiers and present the top three from each Classifier Family.
  • The order of top performing Classifier Families is in line with those reported in

classic performance comparison literature based on non-financial data e.g., [7][8].

slide-23
SLIDE 23

3.2. Conclusions

  • 1. Relative to two existing CDS Proxy Methods, the Machine Learning Technique based

solution to Shortage of Liquidity problem satisfies all regulatory requirements and provide more accurate default-risk proxy by addressing counterparty-specific default risk, as required for CVA pricing, Counterparty Credit risk management.

  • 2. Model assessment is based on sound Statistical/Machine Learning theories and

produces satisfactory results based on Cross-validation procedure.

  • 3. Based on our empirical studies of 156 classifiers across 8 most popular algorithms

[11], Neural Network, SVM and Ensemble/Bagged Tree are top 3 performers; the results are in line with classic literature in the area (Accuracy Rates/Std. Dev.):

  • Neural Network-Tangent Activation Function: 99.3% (0.6%);
  • SVM-Polynomial Kernel: 96.8% (1.6%);
  • Ensemble-Bagged Tree: 96.0% (2.2%).
  • 4. To the best of our knowledge, the study is:
  • the first Machine Learning Technique based CDS Proxy Method.
  • the first Classifier Performance Comparison research based on Financial Market Data.
  • the first research effort to look at Correlation impacts on cross-classifier performance.

22

slide-24
SLIDE 24

4.1. References

23

  • Due to time constraints, we refer interested readers to check out

the following paper: Raymond and Luo, Zhongmin, CDS Rate Construction Methods by Machine Learning Techniques (May 12, 2017). Available at SSRN: https://ssrn.com/abstract=2967184

View publication stats View publication stats