SHOULD A MODEL ‘KNOW’ ITS OWN ID?
7TH ANNUAL RISK AMERICAS CONFERENCE
MAY 17-18, 2018 NEW YORK CITY
PRESENTED BY JON HILL, PH. D.
FORMER MANAGING DIRECTOR GLOBAL MODEL RISK GOVERNANCE CREDIT SUISSE NEWYORK
JONHILL@OPTONLINE.NET
SHOULD A MODEL KNOW ITS OWN ID? Some Thoughts About 7 TH ANNUAL - - PowerPoint PPT Presentation
SHOULD A MODEL KNOW ITS OWN ID? Some Thoughts About 7 TH ANNUAL RISK AMERICAS CONFERENCE Mitigating Inventory MAY 17-18, 2018 NEW YORK CITY Risk By Accurately Tracking Model P RESENTED BY J ON H ILL , P H . D. FORMER MANAGING DIRECTOR
7TH ANNUAL RISK AMERICAS CONFERENCE
MAY 17-18, 2018 NEW YORK CITY
PRESENTED BY JON HILL, PH. D.
FORMER MANAGING DIRECTOR GLOBAL MODEL RISK GOVERNANCE CREDIT SUISSE NEWYORK
JONHILL@OPTONLINE.NET
All of the ideas, opinions, suggestions, notions or asides offered in this presentation are entirely the opinions of the speaker and should not be construed to represent in any way those of Credit Suisse, Morgan Stanley, Citigroup or any other previous employers. Furthermore, any anecdotes, cautionary tales or war stories that may insinuate themselves into this presentation shall be understood to have occurred at a mythical institution that will
model usage and inventory are difficult to answer with today’s database inventorIes?
answered accurately?
functionality to support a Model Transponder Function.
implementation approach can minimize disruption and production overhead.
My Definition of Inventory Risk (adapted from SR11-7): Inventory risk is the risk resulting from incomplete or inaccurate quantitative model inventories, the use of models that have previously been retired or remain unvalidated or the use of models that have not be entered into inventory.
and accurate. 1
managers or functional heads for every asset class and business unit are asked to sign off on the complete set of models that fall within their domain of ownership and responsibility.
models that inventory indicates is owned and maintained by the model supervisor or functional and
some models may simply be overlooked in the process (the technical term is “falling through the cracks”), some may be ‘orphans’, models mis-assigned due to staff turnover or re-allocation of responsibilities and therefore without owners; some orphans may no longer be in use. Model Inventory Attestation Is Still Primarily a Manual Process! Why Is That?
1 None of the firms I have worked at (Salomon Smith-Barney, Citigroup, Morgan Stanley, Credit Suisse) or with as a consultant have any accurate quantitative way of
answering these types of questions other than to query model owners/developers or their downstream users and receiving qualitative estimates. It is also un uncomfortable fact that model supervisors/owners/developers do not always know a who all of their downstream users are.
determine the current correct ownership of orphan models. And of course, it is not an uncommon experience that some models have no owner assigned at all due to staff turnover. Particularly problematic are upstream and downstream dependencies between models. We rely on model
downstream models, models that receive other models’ output as their input.
model owners should be aware of upstream dependencies (these can be traced by following all model inputs back to their source 2), but very often they will not have complete knowledge of all downstream models, models that receive other models’ output as their input. These would be best known to the downstream model users. Model Inventory Attestation Is Still Primarily a Manual Process! Why Is That?
1 Note: the role of model risk manager is relatively new and complements the role of model validator in mitigating model risk 2 If they cannot there are larger problems in model development management.
1) What is the exact number of different models that have been used over the last year? 2) How often has each model been executed, by day, by month, by year? 3) Where are the firm’s models being used? Business unit, legal entity, geographic regions? 4) Are there any models in your inventory that were not executed during the last year? 5) Are there any models that were executed on any of your firm’s computers that do not appear in inventory? Please provide a full listing. 6) Are you able to provide a full list of the IDs of models that exhibit significant seasonality? If so, what are the peak and trough’s of seasonal model usage. 7) Were there any instances of a retired model still being executed during the last year?
1 There are likely other types of questions regarding model inventory that are difficult to answer accurately. These seven are the most
important questions I can think of. Perhaps you can think of some others.
Inability to answer the previous questions regarding model usage is indicative of a form of model risk that is that is not often identified or analyzed in its totality because it belongs to a class of seldom recognized risks that reside outside of and between models.1
What are the sorts of liabilities that may arise from model inventory risk? Here are a few ….
legal entities.
In an age of automation, machine learning and big data we really should ask ourselves if we cannot find better ways to make firmwide model usage more transparent and in doing so help to automate the model attestation process
1 A tip of the hat to Martin Goldberg for his seminal 2017 paper entitled “Much of Model Risk Does Not Come From Any Model”, The Journal of Structured Finance, Spring,
2017, pp. 32-37. Although not described in this paper, inventory risk is clearly from the class of less well-recognized model risks that are external to models. Martin is currently working at Bloomberg on credit risk models.
The root cause of model usage opacity may be traced to this single surprising blind spot in most firms’ model risk management framework.1 Let’s try to put this into perspective by comparing to some other familiar technologies:
memory that stays with the phone for life).
embedded in the onboard electronics that control these devices.
Henry Ford produced and somewhere on almost all manufactured products of any significance. Today, Tesla can track every car they’ve ever made, its location, travel speed, level of charge, etc. Most important financial models are assigned Model IDs as a convenient lookup index into the automated model databases that almost all firms have to maintain today. These databases typically house all of the relevant documentation for each model such as development and validation documents, and in some rare cases, even source code. Yet the models themselves do not ‘know’ their
will introduce the concept of a Transponder Function, which if added to every model in a firm’s inventory, can go a long way towards improving the transparency of model usage and mitigating many of the risks listed in slide #8.
1 At first blush this may not seem to be a true root cause. This presentation will endeavor to convince any doubters
that this is indeed the case.
This can be implemented by developers once a new model is assigned an ID by adding a single trivial assignment statement as the first executable line in the model’s main routine: Main () { Global Int Model_ID = 1234567; Model Code() ; Exit; } Embedding Model IDs into Source Code is a Trivially Simple But Necessary First Step
An Aside: An obvious question is why weren’t embedded IDs standard practice from the first models? The answer is probably because model IDs only became standard with the introduction of centralized model databases sometime in the new millennium. Apparently, none in the industry saw an incentive to retrofit thousands of models with embedded IDs.1This presentation will attempt to provide that incentive. This first step is very simple, requires almost no effort and will not impact performance. Once done, it is hard-coded (like a serial number stamped onto an automobile frame) for the life of the model so long as model IDs are uniquely assigned and not re-used.
1 I suspect a more accurate answer to this question may well be “pure sloth”. Quantitative models for use in finance go back at least 50 years or more, long before
model inventory databases became a regulatory requirement sometime in the new millennium. Only then did it become necessary to assign unique IDs to models as a clean way to index them into the databases.
could we do with that information? The answer is: with a little thought, quite a lot
the radio transponders that air traffic controllers rely on to track civilian and commercial aircraft
database via the firm’s intranet? Here are a few important indicative data fields to start with : 1) Model ID 2) Name of the Model (as a text string) – model names may not be unique, so cannot serve as index 3) Timestamp at execution – date, hour, minute granularity 4) Type of model – pricing, risk, credit, forecasting, finance, HR, etc. 5) Implementation – production code (C++, JAVA, etc.), or EUC model 6) A MAC address 1 - uniquely identifies the processor executing the model 7) Vector of Upstream model IDs – this information would be invaluable if the model ID is also embedded in any results produced by the model. If deployed comprehensively across the firm this information could capture all upstream and downstream dependencies. Embedding Model IDs into Source Code is a Necessary First Step But The Second and Final Step Will Require More Investment
1 The Media Access Control, or MAC, address is the hardware equivalent of an IP address. It is a unique identifier embedded in every computer’s network interface card and can be used
to identify not only the actual computer executing the software but through a lookup function its physical location. A computer’s unique MAC address can be obtained via a function call to the computer’s Operating System.
In order to use embedded Model IDs to track model usage globally, a firm’s developers would need to add a basic new functionality to each model in the form of a Transponder Function:
1) The Transponder Function would be called once each time the model code is executed. 2) The Transponder should be have the ability to transmit indicative data about the model via the Firm’s Intranet to a central database. (These data fields are listed in the previous slide) 3) Transmission permission must be strictly one-way, from model to database, in order to avoid
4) As an option to #3, to avoid the risk of jamming the firm’s intranet Transponder output could be written into local temporary file systems (or databases). 5) Since the usage data is not timely, a sweep of all temp files into a central database could be made
6) At the end of a year’s worth of data collection a treasure trove of information about model usage could be available in the central database.1 Embedding Model IDs into Source Code is a Necessary First Step But The Second and Final Step Will Require More Investment
1 The resulting trove would constitute a voluminous audit trail of information about model usage amenable to analysis using data mining and Machine
Language algorithms to find patterns of model usage not readily detectable by human inspection and analysis of the usage data.
A proof of concept could be demonstrated via a simulation that doesn’t require modifying any production models and very little time or IT resources: 1) Create a set of hundreds of ‘dummy’ skeleton models that contain only an embedded test ID and a prototype Model Transponder Function.1 2) A script could be developed that will call of the dummy models with randomly assigned frequencies, some very frequent, some infrequent and one or two not at all. 3) Use the script to simulate seasonality and regionality for a subset of the models. 4) Simulate a full year’s worth of model usage. 5) Mine the resulting database information to create various types of analyses (frequency histograms, seasonality charts, distribution by regions, usage spikes, dead periods, etc.) and to identify patterns of usage. 6) Use the simulation to identify flaws in the Transponder Function, communication pipelines and the centralized database. This can help to identify problems and refine the method before production 7) Present results to management to make the a case for authorizing formal production. A practical way to establish the value added by embedding IDs and installing a Transponder Function Using ‘Dummy’ Models
1 Note that the source code for the transponder does not have to be included in the model’s source code, in fact it probably should not be. Rather, the
Transponder Function code should be maintained separately of any model and compiled into a Dynamically Linked Library (DLL) that can be joined with the compiled model code during the build process. This will allow the Transponder Function to be modified without modifying the model codes.
Transponder Function Centralized Database But the devil may be hiding in the details … Intranet or temp file Dummy Model with ID
Model usage indicative data Model usage indicative data
Note: It may not be necessary for the Transponder to send data to a centralized database via the Firm’s intranet – this is really a placeholder for any type of communication pipe that a Firms’ IT staff choose, For the purposes of this presentation it is not particularly important to specify how the communication is to be implemented, only that the final destination is a central database with a log of model usage statistics from slide #12, indexed by model ID and collected over a significant length of time, e.g. at least one year.
Model usage indicative data
This is what is what a dummy model might look like in pseudo code:
Main () { Global Int Model_ID = 1234567; /* Embed the Model ID */ Int Time = SystemClock(); /* Get the current date and time */ Char Name = *GetModelName(); /* Get the model’s name as a text string */ Int Mac_Adress = GetMac(this computer); /* Get the MAC address from the operating system */ Int *Upstream _Array = Get_Upstream_IDS[]’; /* Get an array of one or more upstream model IDs *. Char DB_Name = *GetDbName(): /* Get the name of the centralized destination database */ Int ModelReturnCode = Model Code() ; /* Execute the dummy model */ Int TransponderReturnCode = Transponder (Model ID, Time, Name, Type, Implementation, MAC, Upstream_Array, DB_Name) ; /* Call the transponder and pass the indicative data to it */ Exit(); } A practical way to establish the value added by embedding IDs and installing a Transponder Function Using ‘Dummy’ Models
Pros:
time beginning with limited sets of models such as those used for CCAR/DFAST stress testing or the set of pricing models in the high risk tier. Changes could be included in the regular release cycles.
relying on an external execution platform to track and store usage statistics.1
that has access to the firm’s intranet (or that can write results to a temporary file).
Cons:
in their models. But there may be workarounds through the inhouse execution scripts or host programs that Firms use to interface between the vendor code and the Firm’s computers.
This presentation has described an innovative method for improving the transparency of model usage across an entire Firm, but not without cost. Here are the pros and cons of this approach:
1 Most production models at banks are managed by host execution platforms, although most EUC models are not. It is possible for execution
platforms to be designed or modified to track usage statistics but large firms may have hundreds of different platforms and each would have to be customized to provide similar data. Any changes would have to be made to all such platforms.
Have you ever wondered why, while browsing say the NYTimes online, ads pop up for items similar to items you have purchased online recently (shower curtains or bedsheets for example) using from different websites? That is because when you made the transaction an embedded ‘transponder’ sent indicative data about your activities & interests to a centralized database maintained by the vendor. This information is useful to vendors to target their online ads to potential customers, to monitor consumer interests and to build profiles of each of millions or hundreds of millions of clients.This is why we see those popup ads mysteriously tailored to our individual purchasing patterns. One-way transponder functions have been used for years to track external clients behavior patterns.
There is nothing new about the concept of embedded one-way ‘transponder’ functions. Google, Amazon, Ebay &Tesla have been had their equivalents in place for a decade or more. Now consider Tesla. Somewhere at Tesla Central Command is a large screen that can display the location of every Tesla vehicle ever sold, along with its current speed, direction, time since last charge, driving patterns of the owner, and a host of tracking data that help Tesla to understand usage, geographic concentrations, charging stations used, etc., so they can improve and expand their services and market share optimally. They can do this because every Tesla vehicle has the equivalent of a Transponder Function embedded in each vehicle in its onboard computers. So why is it that financial firms are so far behind and cannot manage to collect similar patterns of behavior for their internal clients (i.e. their quantitative models)? Are we not as good as the Techs?
Note: nothing in this presentation addresses the problems associated with ‘near models’ or ‘calculator tools’ whose owners refuse to acknowledge whether they function as models that must be assigned IDs, entered into inventory and submitted for validation. The distinction between model and tool or near-model is a governance issue. Treatment of these grey area quasi-models should fall under the Firm’s model governance policies and procedures.