 
              Semiparametric regression with hierarchical models Yanwei (Wayne) Zhang Statistical Research CNA Insurance Company New Orleans March 17, 2011 Wayne Zhang (CNA insurance company) Semiparametric models March 17, 2011 1 / 27
Antitrust Notice Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly The Casualty Actuarial Society is committed to adhering strictly • • to the letter and spirit of the antitrust laws. Seminars conducted to the letter and spirit of the antitrust laws. Seminars conduc ted under the auspices of the CAS are designed solely to provide a under the auspices of the CAS are designed solely to provide a forum for the expression of various points of view on topics forum for the expression of various points of view on topics described in the programs or agendas for such meetings. described in the programs or agendas for such meetings. Under no circumstances shall CAS seminars be used as a means Under no circumstances shall CAS seminars be used as a means • • for competing companies or firms to reach any understanding – for competing companies or firms to reach any understanding – expressed or implied – expressed or implied – that restricts competition or in any way that restricts competition or in any way impairs the ability of members to exercise independent business impairs the ability of members to exercise independent business judgment regarding matters affecting competition. judgment regarding matters affecting competition. It is the responsibility of all seminar participants to be aware of of It is the responsibility of all seminar participants to be aware • • antitrust regulations, to prevent any written or verbal discussions ons antitrust regulations, to prevent any written or verbal discussi that appear to violate these laws, and to adhere in every respect that appear to violate these laws, and to adhere in every respec t to the CAS antitrust compliance policy. to the CAS antitrust compliance policy.
Outline Outline Case study I: Review basic concepts and theories in hierarchical models Case study II: Build connection between penalized splines and hierarchical models Case study III: Geo-spatial smoothing with bivariate penalized splines Wayne Zhang (CNA insurance company) Semiparametric models March 17, 2011 3 / 27
Case study I: Hierarchical models
Review of hierarchical models Overview Hierarchies in insurance data Insurance data often come with an inherent hierarchy (classification) Homogeneity VS stability? Insurance Company California New York …… Texas 2009 2010 2009 2010 2008 2009 2010 Property Liability Auto Property Liability Auto …... Wayne Zhang (CNA insurance company) Semiparametric models March 17, 2011 5 / 27
Review of hierarchical models Overview Hierarchical models Three methods to deal with data with inherent hierarchies: Complete pooling, assuming all groups are exactly the same No pooling, assuming complete heterogeneity Partial pooling (hierarchical), a compromise between the two extremes Advantages using hierarchical models: Using all data to make robust inference (group with small sample size) Inference of group level variation Inclusion of group level predictors Prediction for new group is available, and accounts for group variation Wayne Zhang (CNA insurance company) Semiparametric models March 17, 2011 6 / 27
Review of hierarchical models Example Example: contents loss due to theft/burglary Suppose we have reported loss data (severity) for contents coverage due to theft/burglary in California: Y : reported loss for each claim, X : contents insurance amount Can build a simple model log E ( Y i ) = α + β log X i for severity Exponentiating it will lead to E ( Y i ) = exp( α ) X β i exp( α ) will be the rate per insurance amount (or per 1,000,...) β determines the curvature of the curve 25000 12 20000 10 Log loss Loss 15000 8 10000 6 5000 0 1 2 3 4 5 6 0 200 400 600 800 Log insurance amount Insurance amount (10,000s) Wayne Zhang (CNA insurance company) Semiparametric models March 17, 2011 7 / 27
Review of hierarchical models Example Data and model How would you determine the rate exp( α ) for each county? Run one big regression using 4 all data? Does not fit well, 7 and can not get rate for each 15 5 5 county! 27 6 65 14 14 55 34 86 117 46 Run separate regression for 25 7 82 10 10 33 each county? Estimate is so 5 12 volatile for small county, and 15 37 even get slope reversal! 89 17 40 580 Hierarchical model[random 90 144 156 intercepts]: log E ( Y i ) = α j [ i ] + β log X i Wayne Zhang (CNA insurance company) Semiparametric models March 17, 2011 8 / 27
Review of hierarchical models Example Visualization of the three models Butte Los Angeles Merced Monterey Riverside 12 10 8 6 Log Insurance Loss San Diego San Francisco Santa Barbara Santa Cruz Ventura 12 10 8 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Log Insurance Amount Model Complete−pooling No−pooling Multi−level Wayne Zhang (CNA insurance company) Semiparametric models March 17, 2011 9 / 27
Review of hierarchical models Example Adding group level predictor Improve the model by adding county-level predictors ( Z )- the crime index: log E ( Y i ) = α j [ i ] + β log X i and α j = a + bZ j Reduce group-level variation Make groups conditionally exchangeable Models: M3: logloss ~ 1 + logamt + (1 | county) M4: logloss ~ 1 + logamt + crime + (1 | county) Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) M3 4 6228.9 6251.0 -3110.5 M4 5 6226.1 6253.7 -3108.1 4.8153 1 0.02821 * Wayne Zhang (CNA insurance company) Semiparametric models March 17, 2011 10 / 27
Review of hierarchical models Example Visualizing group-level regression ● ● 8.2 ● ● ● ● ● 8.0 ● ● Estimated intercepts ● ● ● ● ● ● ● ● 7.8 ● ● ● ● ● ● ● ● ● ● ● 7.6 ● ● ● ● 7.4 40 50 60 70 80 90 100 Crime index Wayne Zhang (CNA insurance company) Semiparametric models March 17, 2011 11 / 27
Review of hierarchical models Example Comparison to hierarchical model with no group-level predictors Butte(71) Los Angeles(103) Merced(100) Monterey(99) Riverside(95) Log Insurance Loss San Diego(70) San Francisco(100) Santa Barbara(55) Santa Cruz(49) Ventura(45) Log Insurance Amount Model No county predictor County predictor Wayne Zhang (CNA insurance company) Semiparametric models March 17, 2011 12 / 27
Review of hierarchical models Example Rate map Can produce rate relativity (average county loss / average state loss) for a fixed insurance Relativity amount, assuming the modeled 0.7−0.8 0.8−0.9 frequency is flat 0.9−1.0 If a county is not available in the 1.0 1.0−1.15 data, it is automatically set to be 1.15−1.30 state average 1.30−1.5 The right is a rate map at 10,000 insurance amount Wayne Zhang (CNA insurance company) Semiparametric models March 17, 2011 13 / 27
Review of hierarchical models Inference Inference on linear models Suppose y | u ∼ N ( X β + Zu , R ) (1) u ∼ N (0 , G ) (2) Maximum likelihood estimation leads to minimizing the following: ( y − X β − Zu ) T R − 1 ( y − X β − Zu ) + u T G − 1 u (3) This yields the GLS estimator β = ( X T V − 1 X ) − 1 X T V − 1 y , V = ZGZ T + R , ˆ (4) and the best linear unbiased predictor u = GZ T V − 1 ( y − X ˆ ˆ β ) (5) Using these to maximize the profile likelihood to get estimate for V and R and plug back into (4) and (5). Wayne Zhang (CNA insurance company) Semiparametric models March 17, 2011 14 / 27
Case study II: Semiparametric models
Penalized Splines Introduction Motivation Flexible modeling of nonlinear pattern Hard to find a parametric nonlinear ● ● ● ● ● ● model ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Even found, hard to estimate ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Rely on basis functions (e.g., two ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● knots κ 1 , κ 2 ) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Linear: 1 , x , ( x − κ 1 ) + , ( x − κ 2 ) + ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Quadratic: ● ● ● ● ● ● ● ● ● ● ● ● ● 1 , x , x 2 , ( x − κ 1 ) 2 + , ( x − κ 2 ) 2 ● ● ● ● ● ● ● ● ● + ● ● ● ● ● ● Cubic: ● ● ● 1 , x , x 2 , x 3 , ( x − κ 1 ) 3 + , ( x − κ 2 ) 3 + Wayne Zhang (CNA insurance company) Semiparametric models March 17, 2011 16 / 27
Recommend
More recommend