robextremes robust extreme value statistics outline a new
play

"RobExtremes" Robust Extreme Value Statistics Outline a - PowerPoint PPT Presentation

"RobExtremes" Robust Extreme Value Statistics Outline a New Member in the RobASt-Family of R Packages Introduction Project Robust Risk Estimation UseR! 2013 Infrastructure of R packages Packages for distributions: distr


  1. "RobExtremes" – Robust Extreme Value Statistics Outline – a New Member in the RobASt-Family of R Packages Introduction Project “Robust Risk Estimation” UseR! 2013 Infrastructure of R packages Packages for distributions: distr family Package for models: distrMod Nataliya Horbenko 1 , Matthias Kohl 2 , Our Robustification Approach Peter Ruckdeschel 3 , 4 Contribution of Robust Statistics: Outliers Packages for robust asymptotic statistics: RobASt family 1 KPMG AG Wirtschaftsprüfungsgesellschaft, Optimally robust estimation in R The SQUAIRE / Am Flughafen, 60549 Frankfurt/Main , Germany Storing interpolation grids 2 Furtwangen University , Dept. of Medical and Life Sciences, Jakob-Kienzle-Straße 17, 78054 Villingen-Schwenningen , Germany New package RobExtremes 3 Fraunhofer ITWM , Dept. of Financial Mathematics, Concept Fraunhofer-Platz 1, 67663 Kaiserslautern , Germany GPD as parametric model 4 TU Kaiserslautern , Dept. of Mathematics, Erwin-Schrödinger-Straße, Geb 48, 67663 Kaiserslautern , Germany Application to OpRisk quantification Setup Challenges Albacete, Spain, July 11, 2013 Optimally robust estimation in RobExtremes Diagnostic plots 1 2 Who we are: Project funded by Introduction Project “Robust Risk Estimation” . . . aims at theoretical foundation, development and application of robust procedures for risk management of complex systems in the presence of extreme events common base: < theory > robust statistics < implementation > Rpkg’s of distr & RobASt families 3 4

  2. Available Infrastructure for Distributions: distr family Package name Short description distr S4 classes for distributions distrEx Functionals for distributions Infrastructure of R packages distrMod S4 classes for probability models distrEllipse S4 classes for elliptically contoured distrs Packages for distributions: distr family distrRmetrics S4 classes for distrs from fBasics & fGarch Package for models: distrMod distrSim S4 classes for simulations distrTEst S4 classes for estimation and testing distrTeach Extensions for teaching distrDoc Documentation for distr packages startupmsg Utilities for start-up messages SweaveListingUtils Utilities for Sweave 5 6 Distributions as Objects and Arithmetics Functionals for Distributions – the E Operator (distrEx) ## Initialize GPD-object R> GPD <- GPareto(loc = 10, scale = 2, shape = 0.5) Density of AbscontDistribution CDF of AbscontDistribution Quantile function of AbscontDistribution ## Create a normal and a Poisson Distribution 1.0 ## Returns analytical value 1.0 R> N <- Norm(mean = 0, sd = 3) R> E(GPD) R> P <- Pois(lambda = 2) [1] 14 0.8 ## identical calls for r (RNG), d (density), 0.8 ## Classical integration of density 0.5 ## p (cdf), q (quantile fct) R> E(as(GPD, "AbscontDistribution")) R> c(p(N)(.5), p(P)(2)) 0.6 [1] 0.5661838 0.6766764 0.6 � ∞ i.e., numerically compute E GPD = −∞ x d GPD ( x ) λ ( dx ) R> c(q(N)(.5), q(P)(.5)) d(x) p(q) q(p) 0.0 [1] 0 2 0.4 0.4 [1] 13.40216 ## Arithmetics R> X <- sin(N+P) ## Integration with probability integral transform −0.5 R> c(p(X)(.5), q(X)(.5)) 0.2 0.2 [1] 0.6642434 0.008785884 � 1 i.e., numerically compute E fun ( GPD ) = 0 fun ( q GPD ( x )) λ ( dx ) ## plotting density, cdf and quantile fct 0.0 −1.0 0.0 R> plot(X) R> E(GPD, fun=function(x){x}) [1] 13.99747 −1.0 0.0 0.5 1.0 −1.0 0.0 0.5 1.0 0.0 0.4 0.8 x q p ## Identical code for all distribution objects R> E(Pois(lambda = 10)) [1] 10 7 8

  3. Maximum Likelihood Estimation (distrMod) Maximum Likelihood Estimation (distrMod) Copper in in wholemeal flour 30 Operational risk data Operational risk data mean ● zoomed 95%−CI of mean ML fit 0.008 25 0.0015 0.006 20 Parts per million [ppm] 0.0010 Density Density 0.004 15 0.0005 0.002 10 0.0000 0.000 ● 5 ● ● ● ● ● ● ● 0 10000 30000 50000 0 500 1000 1500 2000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Amount of loss Amount of loss 5 10 15 20 Observation Index 9 10 Outliers • What makes an observation an outlier? – happens rarely ( 5 % – 10 % ) – u ncontrollable, of u nknown distribution, u npredictable Our Robustification Approach – often: no error-free distinction from ideal obs.’ Contribution of Robust Statistics: Outliers – outlier situation may change from obs. to obs. • Despite outliers: Packages for robust asymptotic statistics: RobASt family realistic sample should be close to one from ideal setting Optimally robust estimation in R • neighborhoods : U = { Q | Q = ( 1 − ε ) P θ + ε H } note: no standard mixing model Storing interpolation grids H may vary from obs. to obs.! • accuracy as maxMSE ( S n ) := max U E Q | S n − θ | 2 11 12

  4. Available Infrastructure for Robust Asymptotic Statistics: Optimally robust estimation in R RobASt family ## data: wholemeal flour Copper in in wholemeal flour R> library(ROptEst) 30 R> s0 <- c(median(chem), mad(chem)) mean ● R> ROest1 <- roptest(chem, 95%−CI of mean NormLocationScaleFamily()) Package name Short description RMXE 95%−CI of RMXE 25 # speed-up by interpolation: RandVar Implementation of random variables R> library(RobLox) RobAStBase Robust Asymptotic Statistics R> ROest2 <- roblox(chem, returnIC = TRUE) 20 Parts per million [ppm] ROptEst Optimally robust estimation R> rbind(estimate(ROest1),estimate(ROest2)) mean sd RobAStRDA sysdata.rda for pkg’s of RobASt - Family [1,] 3.163591 0.6613414 15 [2,] 3.338290 0.6184967 RobExtremes Opt-rob. est’ors for extreme value distr’s. RobLox Opt-rob. ICs and est’ors for location and scale 10 • roptest computes estimator with RobLoxBioC Opt-rob. est’ors for preprocessing omics data min max (as)MSE ROptEstOld Optimally robust estimation - old version ● 5 • w/o specifying outlier rate, roptest ● ● ● ● ● ● ● ● ● ● ● ● ● ● ROptRegTS Opt-rob. est’ors for regression-type models ● ● ● ● ● ● ● ● selects least favor. rate � RMXE RobRex Opt-rob. est’ors for regression and scale 5 10 15 20 • roptest takes 28sec, Observation Index roblox ∼ 0.1sec 13 14 Storing interpolation grids — some R insights just seen: interpolation is useful; technique: not only store grids, but also interpolating fct’s Issues and solutions / lessons learnt New package RobExtremes issue return values from approxfun and splinefun generated ≤ R-2.15.2 no longer valid ≥ R-3.0.0 and vice versa Concept solution store two sets of interpolators and switch acc. to R-version at run-time GPD as parametric model issue with many models/procedures, pkg containing interpolators can get large � conflict with CRAN policies solution delegate interpolators to separate (less frequently updated) package � RobAStRDA issue conflicts with namespaces when modifying interpolators outside pkg solution functions to generate/manipulate interpolators from within pkg namespace 15 16

  5. New package RobExtremes Extreme Value Setup: GPD as parametric model • infrastructure for opt-rob. estimation for extreme value • Fisher-Tippett-Gnedenko Theorem: distributions / scale shape models, i.e., possible limit distributions of max ( X i ) have H θ ( x ) = exp ( − ( 1 + ξ ( x − µ ) /β ) − 1 /ξ ) ( GEVD ) cdf – Gamma – Generalized Extreme Value D. – Weibull – Gumbel – Generalized Pareto D. – Pareto • Pickands-Balkema-de Haan Theorem: • particular methods for expectations linked to tails ∼ Generalized Pareto distribution ( GPD ) • high breakdown starting estimators GPD: F θ ( x ) 1.0 – scale functionals: Sn and Qn (Rousseeuw&Croux[93]), F θ ( x ) = 1 − ( 1 + ξ ( x − µ ) /β ) − 1 /ξ cdf kMAD (asym. variant of mad , R.&Horbenko[12]) 0.8 – LDEstimators (Marazzi&Ruffieux[99]) Parameter θ = ( ξ, β, µ ) τ : in particular medkMAD , medSn , and medQn 0.6 – Pickands’ estimator (including asy. variance and IC) • shape ξ ( ≥ 0 ) (tail behavior) 0.4 ξ = 0.7 [goal] – Quantile estimator for Weibull (Boudt et al.[11]) β = 1 • scale β 0.2 µ = 0 [goal] • speed up by interpolation for opt-rob. estimators • location/threshold µ ( ≤ x ) 0.0 [fixed] • enhanced diagnostic plots (from RobAStBase ) 1e−04 1e−01 1e+02 1e+05 17 18 Application to OpRisk quantification Setup • OpRisk :: risk of loss resulting from inadequate or failed internal processes, people and systems or from external events Application to OpRisk quantification • Basel II: standards for regulatory capital required to cover losses from OpRisk • assessed by Loss Distribution Approach (LDA): Setup model severity and frequency of losses separately and cell-wise in a matrix built by business lines and event types , Challenges OpRisk quantified as 99 % -OpVaR, i.e., 99 % quantile of resp. compound distr. • involves parameter estimation in GPD Optimally robust estimation in RobExtremes Data [source Algorithmics, Inc. (IBM) ] Diagnostic plots • data: losses in business line Asset Management (AM in the sequel) • collected from 2431 institutes in last 20yrs • 600 observed damages > 1Mio USD • frequency: λ = 0 . 012 / yr 19 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend