Balancing robust statistics and data mining in ratemaking: Gradient - PowerPoint PPT Presentation

. . Balancing robust statistics and data mining in ratemaking: Gradient Boosting Modeling . . . . . Leo Guelman, Simon Lee, and Helen Gao Royal Bank of Canada - RBC Insurance March, 2012 . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 1 / 35

Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to provide a forum for the expression of various points of view on topics described in the programs or agendas for such meetings. Under no circumstances shall CAS seminars be used as a means for competing companies or firms to reach any understanding expressed or implied that restricts competition or in any way impairs the ability of members to exercise independent business judgment regarding matters affecting competition. It is the responsibility of all seminar participants to be aware of antitrust regulations, to prevent any written or verbal discussions that appear to violate these laws, and to adhere in every respect to the CAS antitrust compliance policy. . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 2 / 35

Agenda Introduction to boosting methods Connection between boosting and statistical concepts (linear models, additive models, etc.) Gradient boosting trees in detail An application to auto insurance loss cost modeling Limitation of Gradient Boosting and proposed improvement - Direct Boosting Comparison of various modeling techniques Additional features of Boosting machines. . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 3 / 35

Non-life insurance ratemaking models: The two cultures Data generating process in ratemaking models x → nature → y x : driver, vehicle and policy characteristics. y : claim frequency, claim severity, loss cost, etc. The data modeling culture x → Poisson, Gamma, Tweedie → y The algorithmic modeling culture x → unknown → y Algorithms (e.g., decision trees, NN, SVMs) operate on x to predict y Objectives of statistical modeling Accurate Prediction Extract useful information . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 4 / 35

Boosting methods: A compromise between both cultures In particular, Gradient Boosting Trees provide . . . Accuracy comparable to Neural Networks, SVMs and Random Forests Interpretable results ‘Little’ data pre-processing Detects and identifies important interactions Built-in feature selection Results invariant under order preserving transformations of variables No need to ever consider functional form revision (log, sqrt, power) Applicable to a variety of response distributions (e.g., Poisson, Bernoulli, Gaussian, etc.) Not too much parameter tuning . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 5 / 35

Boosting framework Boosting idea Based on "strength of weak learnability" principles Example: IF Gender=MALE AND Age<=25 THEN claim_freq.=‘high’ Simple or “weak" learners are not perfect! Combination of weak learners ⇒ increased accuracy Problems What to use as the weak learner? How to generate a sequence of weak learners? How to combine them? . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 6 / 35

The predictive learning problem Let x = { x 1 , . . . , x p } be a vector of predictor variables, y be a target variable, and M a collection of instances { ( y i , x i ) ; i = 1 , . . . , M } of known ( y , x ) values. The objective is to learn a prediction function ˆ f ( x ) : x → y that minimizes the expectation of some loss function L ( y , f ) over the joint distribution of all ( y , x ) -values ˆ f ( x ) = argmin E y , x L ( y , f ( x )) f ( x ) (e.g., L ( y , f ( x )) = squared-error, absolute-error, exponential loss, etc.) . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 7 / 35

Boosting ⊇ Additive Model ⊇ Linear Model p ∑ Linear Model : E ( y | x ) = f ( x ) = β j x j j = 1 p ∑ Additive Model : E ( y | x ) = f ( x ) = f j ( x j ) j = 1 T ∑ Boosting : E ( y | x ) = f ( x ) = β t h ( x ; a t ) t = 1 where the functions h ( x ; a t ) represent the weak learner, characterized by a set of parameters a = { a 1 , a 2 , . . . } . Parameter estimation in Boosting amounts to solving M ( T ) ∑ ∑ min L y i , β t h ( x i ; a t ) { β t , a t } T t = 1 1 i = 1 . . . . . . where L ( y , f ( x )) is the chosen loss function to define lack-of-fit. (RBC Insurance) Balancing robust statistics... March, 2012 8 / 35

Gradient boosting Friedman (2001) proposed a Gradient Boosting algorithm to solve the minimization problem above, which works well with a variety of different loss functions Models include regression (e.g., Gaussian, Poisson), outlier-resistant regression (Huber) and K-class classification, among others Trees are used as the weak learner Tree size is a parameter that determines the order of interaction Number of trees T in the sequence is chosen using a validation set ( T too big will overfit). . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 9 / 35

Gradient boosting in detail Algorithm 1 Gradient Boosting ∑ M 1: Initialize f 0 ( x ) to be a constant, f 0 ( x ) = argmin i = 1 L ( y i , β ) β 2: for t = 1 to T do Compute the negative gradient as the working response 3: [ ∂ L ( y i , f ( x i )) ] r i = − , i = { 1 , . . . , M } ∂ f ( x i ) f ( x )= f t − 1 ( x ) Fit a regression tree to r i by least-squares using the input x i and get 4: the estimate a t of β h ( x ; a ) Get the estimate β t by minimizing L ( y i , f t − 1 ( x i ) + β h ( x i ; a t )) 5: Update f t ( x ) = f t − 1 ( x ) + β t h ( x ; a t ) 6: 7: end for 8: Output ˆ f ( x ) = f T ( x ) . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 10 / 35

Gradient boosting for squared-error loss For squared-error loss, the gradient of L is just the usual residuals L = ( y i − f ( x i )) 2 ∂ L ( y i , f ( x i )) = 2 ( y i − f ( x i )) = r i ∂ f ( x i ) In this case, the gradient boosting algorithm simply becomes ˆ f ( x ) = Tree 1 ( x ) + Tree 2 ( x ) + . . . + Tree T ( x ) . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 11 / 35

Injecting randomness and shrinkage Two additional ingredients to the boosting algorithm: Shrinkage Scale the contribution of each tree by a factor τ ∈ ( 0 , 1 ] . The update at each iteration is then f t ( x ) = f t − 1 ( x ) + τ.β t h ( x ; a t ) Low values of τ slow down the learning rate Requires a higher number of trees in compensation Accuracy is better Randomness Sample the training data without replacement before fitting each tree – usually 1/2 size ↑ Variance of the individual trees ↓ Correlation between trees in the sequence Net effect is a ↓ in the variance of the combined model. . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 12 / 35

An application to Loss Cost modeling The Data Extracted from a major Canadian insurer Approx. 3.5 accident-years At-fault collision coverage Approx. 427,000 earned exposures (vehicle-years) Approx. 15,000 claims Data randomly partitioned into train (70%) and test (30%) data sets . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 13 / 35

Overview of model candidate input variables Driver Accidents/convictions Policy Vehicle Age of p/o # at-fault accidents (1-3 yrs.) Time on risk Vehicle make Yrs. Licensed # at-fault accidents (4-6 yrs.) Multi-vehicle flag Vehicle new/used Age Licensed # Not-at-fault accidents (1-3 yrs.) Deductible Vehicle lease flag License class # Not-at-fault accidents (4-6 yrs.) Billing type hpwr Gender # driving convictions (1-3 yrs.) Billing status Vehicle age Marital status Examination costs (AB claims) Territory Vehicle price Prior FA occ. driver under 25 u/w score occ. driver over 25 Insurance lapses Group business Insurance suspensions Business origin Property flag . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 14 / 35

Building the model Loss functions Frequency model: Bernoulli deviance 29000 Train Error Severity Model: Squared-error loss CV−Error 28000 Shrinkage parameter τ = 0 . 001 Squared−Error Loss Sub-sampling rate = 50% 27000 Size of the individual trees : started 26000 with single-split (no interactions), followed by (2-6)-way interactions. 25000 Number of trees : selected by 0 5000 10000 15000 cross-validation. Boosting Iterations . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 15 / 35

Relative importance of predictors Frequency ( left ) and Severity ( right ). Vehicle lease flag Group business Territory ODU25 u/w score # chg. acc Age licensed u/w score Hpwr # Convictions Vehicle age Yrs. licensed Age of p/o Deduct. # Convictions Hpwr ODU25 Vehicle price Yrs. licensed Vehicle age 0 20 40 60 80 100 0 20 40 60 80 100 Relative Importance Relative Importance . . . . . . (RBC Insurance) Balancing robust statistics... March, 2012 16 / 35

Balancing robust statistics and data mining in ratemaking: Gradient - PowerPoint PPT Presentation

. . Balancing robust statistics and data mining in ratemaking: Gradient Boosting Modeling . . . . . Leo Guelman, Simon Lee, and Helen Gao Royal Bank of Canada - RBC Insurance March, 2012 . . . . . . (RBC Insurance) Balancing

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -> 2

Load Balancing Load Balancing Load balancing: distributing data and/or computations across

Ground Rules The purpose of this session is to educate actuaries in various methods used to

PM-11: Multilevel Models, Credibility Theory, and Ratemaking Fred Klinker, ISO CAS Ratemaking

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Internal Load Balancing in 5 mins Deliver scalable and resilient internal-only services on GCP

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele

Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal e 15th April 1 / 25

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Frequency and Severity vs. Loss Cost Modeling vs. Loss Cost Modeling CAS 2012 Ratemaking and

Supplying evidence based international statistics for decision-making The undiscovered wealth of

Developing a World Class Lithium Project Corporate Presentation January 2020 DISCLAIMER

INVESTING IN MANUFACTURING COMPANIES . NNMF: EIGHT NMTC A LLOCATIONS ($427M) - 34 INVESTMENTS

Bid Shading and Bidder Surplus in U.S. Treasury Auctions Ali Horta csu, Jakub Kastl, Allen

GPU Surface Extraction using the Closest Point Embedding Mark Kim and Charles Hansen Scientific

Chapter 5 Evidence ASJ Stages of an Audit Evidence and Auditor ASJ ISA 500 Audit evidence sets

Can Direct Payments work for older people? Evidence about outcomes and why we might need a

WEBJET LIMITED FY19 RESULTS PRESENTATION JOHN GUSCIC, Managing Director TONY RISTEVSKI, Chief

Balancing robust statistics and data mining in ratemaking: Gradient - PowerPoint PPT Presentation

. . Balancing robust statistics and data mining in ratemaking: Gradient Boosting Modeling . . . . . Leo Guelman, Simon Lee, and Helen Gao Royal Bank of Canada - RBC Insurance March, 2012 . . . . . . (RBC Insurance) Balancing

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -&gt; 2

Load Balancing Load Balancing Load balancing: distributing data and/or computations across

Ground Rules The purpose of this session is to educate actuaries in various methods used to

PM-11: Multilevel Models, Credibility Theory, and Ratemaking Fred Klinker, ISO CAS Ratemaking

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Internal Load Balancing in 5 mins Deliver scalable and resilient internal-only services on GCP

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele

Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal e 15th April 1 / 25

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Frequency and Severity vs. Loss Cost Modeling vs. Loss Cost Modeling CAS 2012 Ratemaking and

Supplying evidence based international statistics for decision-making The undiscovered wealth of

Developing a World Class Lithium Project Corporate Presentation January 2020 DISCLAIMER

INVESTING IN MANUFACTURING COMPANIES . NNMF: EIGHT NMTC A LLOCATIONS ($427M) - 34 INVESTMENTS

Bid Shading and Bidder Surplus in U.S. Treasury Auctions Ali Horta csu, Jakub Kastl, Allen

GPU Surface Extraction using the Closest Point Embedding Mark Kim and Charles Hansen Scientific

Chapter 5 Evidence ASJ Stages of an Audit Evidence and Auditor ASJ ISA 500 Audit evidence sets

Can Direct Payments work for older people? Evidence about outcomes and why we might need a

WEBJET LIMITED FY19 RESULTS PRESENTATION JOHN GUSCIC, Managing Director TONY RISTEVSKI, Chief

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -> 2