Tree-based estimators and actuarial applications Lyon-Columbia - PowerPoint PPT Presentation

Tree-based estimators and actuarial applications Lyon-Columbia Workshop (Lyon), 06/27/2016 Xavier Milhaud Joint work with O. Lopez and P . Thérond 1 / 33

Two problems with censoring - Lifetime / Claim amount Estimate some individual lifetime T given features X ∈ R d , Only observe the follow-up time Y : censored observation. The claim is still opened and has been under payment for a time Y (the claim is not closed). The total claim amount M is still unknown : just paid N ≤ M . M to predict (or total claim lifetime T ) from X ∈ R d . 2 / 33

Clustering by trees : key components To estimate our quantity of interest, use a tree approach where : the root : whole population to segment ⇒ starting point ; 1 the branches : correspond to splitting rules ; 2 the leaves : homogeneous disjoint subsamples of the initial 3 population, give the estimation of the quantity of interest. A reference in actuarial sciences → [Olb12] : builds experimental mortality tables of a reinsurance portfolio by predicting death rates. 3 / 33

Example : predicting owner status | income and size 4 / 33

Partition and tree : maximal global homogeneity Create subspaces maximizing homogeneity within each partitions. 5 / 33

6 / 33

Building the tree - steps 2 Building steps to estimate the expectation Stopping rules Pruning criterion 7 / 33

Regression trees : Y continuous and fully observed Regression problem : π 0 ( x ) = E 0 [ Y | X = x ] (1) → Most famous option : linear relationship b/w Y and X (limit ourselves to a given class of estimator) ⇒ mean squared error. → In full generality, we cannot consider all potential estimators of π 0 ( x ) ⇒ trees are another class : piecewise constant functions. Building a tree provides a sieve of estimators , obtained from successive splits of covariate space X . 8 / 33

CART estimator : a piecewise constant estimator L � π L ( x ) = ˆ π ( x ) := ˆ ˆ γ l R l ( x ) (2) l = 1 L is the number of leaves for the tree, l its index, R l ( x ) = 1 1 ( x ∈ X l ) : splitting rule, ˆ γ l = E n [ Y | x ∈ X l ] : empirical mean of Y in leaf l , The partitions X l ⊆ X are ′ ), disjoints ( X l ∩ X l ′ = ∅ , l � l exhaustive ( X = ∪ l X l ). This (piecewise constant) form can be generalized whatever the quantity of interest (expectation, median, ...). 9 / 33

Building the tree : splitting criterion → Must be suitable to our task. → To solve (1), OLS are used since the solution is given by � � π 0 ( x ) = arg min π ( x ) E 0 φ ( T , π ( x )) | X = x (3) where φ ( T , π ( x )) = ( T − π ( x )) 2 ( φ loss function) → Here, results in minimizing the intra-node variance at each step. → If T is fully observed, building the regression tree with this criterion is consistent ([BFOS84]). 10 / 33

Pruning : penalize by tree complexity CART principle : do not stop the splitting process, and buid the “maximal” tree (size K ( n ) ), then prune it. π K ( x )) K = 1 ,..., K ( n ) . → We get a sieve of estimators (ˆ Avoid overfitting ⇒ find the best subtree of the maximal tree, with a trade-off betwwen good fit and complexity : π K ( x )) = E n [ Φ( Y , ˆ π K ( x )) ] + α ( K / n ) . R α (ˆ If α fixed, the final estimator (pruned tree) yields π K π K ( x )) . ˆ α ( x ) = arg min R α (ˆ (4) (ˆ π K ) K = 1 ,..., K ( n ) 11 / 33

Extend to (potentially) censored data 3 12 / 33

Back to our data We observe a sample of i.i.d. random variables ( Y i , N i , δ i , X i ) 1 ≤ i ≤ n with same distribution ( Y , N , δ, X ) , where � = inf ( T , C ) , Y N = inf ( M , D ) , and δ = 1 T ≤ C = 1 M ≤ D . C et D are the censoring variables, for instance : C = time b/w the declaration date and the extraction date ; D = current amount paid for this claim. 13 / 33

Focus on lifetime T : what we would like to do In practice, we only observe i.i.d. replications ( Y i , δ i , X i ) 1 ≤ i ≤ n where � Y = inf ( T , C ) = δ 1 T ≤ C Current lifetime Y , not closed : δ = 0 . We seek T ∗ = E [ T | δ = 0 , Y , X ] . Goal : find an estimator of T ∗ from observations. Pitfalls : we do not observe i.i.d. replications of M ⇒ standard methods do not apply (LLN). 14 / 33

Ingredients : Kaplan-Meier estimator and IPCW Assume that T is independent from C . Define :   δ i � ˆ   F ( t ) = 1 −   1 −  .     � n   j = 1 1 Y j ≥ Y i  Y i ≤ t This estimator tends to F ( t ) = P ( T ≤ t ) . ˆ F ( t ) = � n Additive version : i = 1 W i , n 1 Y i ≤ t , where δ i W i , n = , n [ 1 − ˆ G ( Y i − )] with ˆ G ( t ) the Kaplan-Meier estimator of G ( t ) = P ( C ≤ t ) . 15 / 33

Why does it work ? Recall that W i , n = 1 δ i i , n = 1 δ i G ( Y i − ) is "close" to W ∗ 1 − G ( Y i − ) . 1 n 1 − ˆ n Moreover (LLN), 2 n n δ i φ ( Y i ) � δφ ( Y ) � i , n φ ( Y i ) = 1 � � W ∗ 1 − G ( Y i − ) → p . s . E . 1 − G ( Y − ) n i = 1 i = 1 Proposition For all function φ such that E [ φ ( T )] < ∞ , � � δφ ( Y ) = E [ φ ( T )] . E 1 − G ( Y − ) 16 / 33

Application to our context Would like to estimate quantities like E [ φ ( T , X )] (see eq. (3)). Proposition Assume that : C is independent from ( T , X ); Then � δφ ( Y , X ) � E = E [ φ ( T , X )] , n ( 1 − G ( Y − )) and � δφ ( Y , X ) � = E [ φ ( T , X ) | X ] . E n ( 1 − G ( Y − )) | X 17 / 33

Thus to estimate E [ φ ( T , X )] , we use n n δ i φ ( Y i , X i ) 1 � � = W i , n φ ( Y i , X i ) . 1 − ˆ n G ( Y i − ) i = 1 i = 1 Therefore, to estimate quantities like � � ( φ ( T i ) − a ) 2 1 X i ∈X E , where X is a subspace, we compute n � W i , n ( φ ( Y i ) − a ) 2 1 X i ∈X . i = 1 18 / 33

Quality of our CART estimator : simulation study Consider the following simulation scheme : draw n + v iid replications ( X 1 , ..., X n ) of the covariate, with 1 X i ∼ U ( 0 , 1 ) ; draw n + v iid lifetimes ( T 1 , ..., T n ) following an exponential 2 distribution such that T i ∼ E ( β = α 1 1 1 X i ∈ [ a , b [ + α 2 1 1 X i ∈ [ b , c [ + α 3 1 1 X i ∈ [ c , d [ + α 4 1 1 X i ∈ [ d , e ] ) . (notice that there thus exist four subgroups in the whole population) draw n + v iid censoring times, Pareto-distributed : 3 C i ∼ P areto ( λ, µ ) ; from the simulated lifetimes and censoring times, get for all i the 4 actual observed lifetime Y i = inf ( T i , C i ) and the indicator δ i = 1 T i ≤ C i ; compute the estimator ˆ G from the whole generated sample 5 ( Y i , δ i ) 1 ≤ i ≤ n + v . 19 / 33

% of Sample Group-specific MWSE Global censored size Group 1 Group 2 Group 3 Group 4 MWSE observations MWSE MWSE MWSE MWSE n 100 0.19516 0.42008 0.17937 0.30992 1.10454 500 0.03058 0.07523 0.03183 0.06029 0.19796 10% 1 000 0.01509 0.03650 0.01517 0.02619 0.09306 5 000 0.00295 0.00714 0.00289 0.00530 0.01804 10 000 0.00105 0.00378 0.00117 0.00292 0.00910 100 0.20060 0.43664 0.17448 0.29022 1.10765 500 0.03736 0.07604 0.04301 0.06584 0.22217 30% 1 000 0.01748 0.04095 0.01535 0.02674 0.10043 5 000 0.00319 0.00758 0.00291 0.00547 0.01904 10 000 0.00117 0.00372 0.00125 0.00292 0.00930 100 0.19784 0.45945 0.17387 0.28363 1.11476 500 0.04906 0.08993 0.05301 0.06466 0.25668 50% 1 000 0.02481 0.05115 0.01788 0.03004 0.12387 5 000 0.00520 0.00867 0.00389 0.00516 0.02299 10 000 0.00153 0.00407 0.00162 0.00308 0.01057 20 / 33

Censorship rate: 10% 1.2 1.2 1.2 Censorship rate: 30% Censorship rate: 50% 1.0 1.0 1.0 0.8 0.8 0.8 Global MSE 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 n (logarithmic scale) 21 / 33

Applications 4 22 / 33

Application 1 : income protection We refer to short-term disability contracts over 6 years with the following information : 83 547 claims ; PH ID, cause (sickness or accident), gender, SPC, age, duration in disability state (censored or not), distribution channel ; the censoring rate equals 7.2% ; mean lifetime in disability state : 100 days. Goal : find a segmentation to predict how much time the disability state lasts. 23 / 33

Tree estimator : the age at claim seems to be key F igure : Disability duration explained by sex, SPC, network, age, cause. 24 / 33

Usually, the recovery rates used to compute technical provisions for this guarantee depends on the age at the claim date due to local prudential regulation ⇒ we fit a Cox PH with this covariate : leads to consider the high predictive power of this variable ; PH assumption rejected by all tests (LR, Wald and log-rank) ; obtained results will be considered as benchmarks to enable a comparison with those resulting from the tree approach. Classes Mean Age Tree Cox a 26.83 64.44 80.01 b 34.19 85.48 96.35 c 39.57 100.04 110.19 d 45.05 111.38 126.03 e 51.29 126.40 146.28 T able : Expected disability time (days) depending on age at disability time. 25 / 33

Tree-based estimators and actuarial applications Lyon-Columbia - PowerPoint PPT Presentation

Tree-based estimators and actuarial applications Lyon-Columbia Workshop (Lyon), 06/27/2016 Xavier Milhaud Joint work with O. Lopez and P . Thrond 1 / 33 Two problems with censoring - Lifetime / Claim amount Estimate some individual lifetime

L-estimators, R-estimators, Redescending M gr. Jakub Petr asek Estimators Revision Seminar

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Barry McKeown Barry McKeown Committee on Actuarial Diversity Committee on Actuarial Diversity

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Regression Discontinuity Estimators and LATE James Heckman University of Chicago Econ 312 May

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Dynamic Panel Data estimators Christopher F Baum EC 823: Applied Econometrics Boston College,

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Dynamic Panel Data estimators Christopher F Baum ECON 8823: Applied Econometrics Boston College,

From Importance Sampling to Doubly Robust Policy Gradient Jiawei Huang (UIUC) Nan Jiang (UIUC)

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

An actuarial toolkit for microinsurance Daniel Clarke University of Oxford and UK actuarial

Actuarial Science with 1. life insurance & actuarial notations Arthur Charpentier joint work

Your Degree Bachelor of Science in Actuarial Science BSc(Actuarial Science)

Unique Aspects of Construction Risks from an Actuarial Perspective Risks from an Actuarial

Characterizing uncertainty of the exposure point concentration based on left-censored data

Pricing Policy Time for Change or Status Quo The Coalition for Government Procurement Business

EAPG Bi-Weekly Meeting 303 East 17 th Avenue, Denver, CO 80203. 7 th Floor Room B Conference Line:

Altura Lithium Disclaimer This presentation has been prepared by Altura Mining Limited

Triclosan concentrations in freshwater systems in the United States Angela L. Perez, Ph.D.

Know Can Hurt Your Bottom Line. Gatan Veilleux, Valen Technologies 2011 CAS Ratemaking and

Adaptive Neymans smooth tests of homogeneity of two samples of survival data David Kraus

Whats New & Upcoming in 2017 Jeff T. Behler Regional Director, New York Regional Census

Tree-based estimators and actuarial applications Lyon-Columbia - PowerPoint PPT Presentation

Tree-based estimators and actuarial applications Lyon-Columbia Workshop (Lyon), 06/27/2016 Xavier Milhaud Joint work with O. Lopez and P . Thrond 1 / 33 Two problems with censoring - Lifetime / Claim amount Estimate some individual lifetime

L-estimators, R-estimators, Redescending M gr. Jakub Petr asek Estimators Revision Seminar

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Barry McKeown Barry McKeown Committee on Actuarial Diversity Committee on Actuarial Diversity

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Regression Discontinuity Estimators and LATE James Heckman University of Chicago Econ 312 May

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Dynamic Panel Data estimators Christopher F Baum EC 823: Applied Econometrics Boston College,

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Dynamic Panel Data estimators Christopher F Baum ECON 8823: Applied Econometrics Boston College,

From Importance Sampling to Doubly Robust Policy Gradient Jiawei Huang (UIUC) Nan Jiang (UIUC)

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

An actuarial toolkit for microinsurance Daniel Clarke University of Oxford and UK actuarial

Actuarial Science with 1. life insurance &amp; actuarial notations Arthur Charpentier joint work

Your Degree Bachelor of Science in Actuarial Science BSc(Actuarial Science)

Unique Aspects of Construction Risks from an Actuarial Perspective Risks from an Actuarial

Characterizing uncertainty of the exposure point concentration based on left-censored data

Pricing Policy Time for Change or Status Quo The Coalition for Government Procurement Business

EAPG Bi-Weekly Meeting 303 East 17 th Avenue, Denver, CO 80203. 7 th Floor Room B Conference Line:

Altura Lithium Disclaimer This presentation has been prepared by Altura Mining Limited

Triclosan concentrations in freshwater systems in the United States Angela L. Perez, Ph.D.

Know Can Hurt Your Bottom Line. Gatan Veilleux, Valen Technologies 2011 CAS Ratemaking and

Adaptive Neymans smooth tests of homogeneity of two samples of survival data David Kraus

Whats New &amp; Upcoming in 2017 Jeff T. Behler Regional Director, New York Regional Census

Actuarial Science with 1. life insurance & actuarial notations Arthur Charpentier joint work

Whats New & Upcoming in 2017 Jeff T. Behler Regional Director, New York Regional Census