tree based estimators and actuarial applications
play

Tree-based estimators and actuarial applications Lyon-Columbia - PowerPoint PPT Presentation

Tree-based estimators and actuarial applications Lyon-Columbia Workshop (Lyon), 06/27/2016 Xavier Milhaud Joint work with O. Lopez and P . Thrond 1 / 33 Two problems with censoring - Lifetime / Claim amount Estimate some individual lifetime


  1. Tree-based estimators and actuarial applications Lyon-Columbia Workshop (Lyon), 06/27/2016 Xavier Milhaud Joint work with O. Lopez and P . Thérond 1 / 33

  2. Two problems with censoring - Lifetime / Claim amount Estimate some individual lifetime T given features X ∈ R d , Only observe the follow-up time Y : censored observation. The claim is still opened and has been under payment for a time Y (the claim is not closed). The total claim amount M is still unknown : just paid N ≤ M . M to predict (or total claim lifetime T ) from X ∈ R d . 2 / 33

  3. Clustering by trees : key components To estimate our quantity of interest, use a tree approach where : the root : whole population to segment ⇒ starting point ; 1 the branches : correspond to splitting rules ; 2 the leaves : homogeneous disjoint subsamples of the initial 3 population, give the estimation of the quantity of interest. A reference in actuarial sciences → [Olb12] : builds experimental mortality tables of a reinsurance portfolio by predicting death rates. 3 / 33

  4. Example : predicting owner status | income and size 4 / 33

  5. Partition and tree : maximal global homogeneity Create subspaces maximizing homogeneity within each partitions. 5 / 33

  6. 6 / 33

  7. Building the tree - steps 2 Building steps to estimate the expectation Stopping rules Pruning criterion 7 / 33

  8. Regression trees : Y continuous and fully observed Regression problem : π 0 ( x ) = E 0 [ Y | X = x ] (1) → Most famous option : linear relationship b/w Y and X (limit ourselves to a given class of estimator) ⇒ mean squared error. → In full generality, we cannot consider all potential estimators of π 0 ( x ) ⇒ trees are another class : piecewise constant functions. Building a tree provides a sieve of estimators , obtained from successive splits of covariate space X . 8 / 33

  9. CART estimator : a piecewise constant estimator L � π L ( x ) = ˆ π ( x ) := ˆ ˆ γ l R l ( x ) (2) l = 1 L is the number of leaves for the tree, l its index, R l ( x ) = 1 1 ( x ∈ X l ) : splitting rule, ˆ γ l = E n [ Y | x ∈ X l ] : empirical mean of Y in leaf l , The partitions X l ⊆ X are ′ ), disjoints ( X l ∩ X l ′ = ∅ , l � l exhaustive ( X = ∪ l X l ). This (piecewise constant) form can be generalized whatever the quantity of interest (expectation, median, ...). 9 / 33

  10. Building the tree : splitting criterion → Must be suitable to our task. → To solve (1), OLS are used since the solution is given by � � π 0 ( x ) = arg min π ( x ) E 0 φ ( T , π ( x )) | X = x (3) where φ ( T , π ( x )) = ( T − π ( x )) 2 ( φ loss function) → Here, results in minimizing the intra-node variance at each step. → If T is fully observed, building the regression tree with this criterion is consistent ([BFOS84]). 10 / 33

  11. Pruning : penalize by tree complexity CART principle : do not stop the splitting process, and buid the “maximal” tree (size K ( n ) ), then prune it. π K ( x )) K = 1 ,..., K ( n ) . → We get a sieve of estimators (ˆ Avoid overfitting ⇒ find the best subtree of the maximal tree, with a trade-off betwwen good fit and complexity : π K ( x )) = E n [ Φ( Y , ˆ π K ( x )) ] + α ( K / n ) . R α (ˆ If α fixed, the final estimator (pruned tree) yields π K π K ( x )) . ˆ α ( x ) = arg min R α (ˆ (4) (ˆ π K ) K = 1 ,..., K ( n ) 11 / 33

  12. Extend to (potentially) censored data 3 12 / 33

  13. Back to our data We observe a sample of i.i.d. random variables ( Y i , N i , δ i , X i ) 1 ≤ i ≤ n with same distribution ( Y , N , δ, X ) , where � = inf ( T , C ) , Y N = inf ( M , D ) , and δ = 1 T ≤ C = 1 M ≤ D . C et D are the censoring variables, for instance : C = time b/w the declaration date and the extraction date ; D = current amount paid for this claim. 13 / 33

  14. Focus on lifetime T : what we would like to do In practice, we only observe i.i.d. replications ( Y i , δ i , X i ) 1 ≤ i ≤ n where � Y = inf ( T , C ) = δ 1 T ≤ C Current lifetime Y , not closed : δ = 0 . We seek T ∗ = E [ T | δ = 0 , Y , X ] . Goal : find an estimator of T ∗ from observations. Pitfalls : we do not observe i.i.d. replications of M ⇒ standard methods do not apply (LLN). 14 / 33

  15. Ingredients : Kaplan-Meier estimator and IPCW Assume that T is independent from C . Define :   δ i � ˆ   F ( t ) = 1 −   1 −  .     � n   j = 1 1 Y j ≥ Y i  Y i ≤ t This estimator tends to F ( t ) = P ( T ≤ t ) . ˆ F ( t ) = � n Additive version : i = 1 W i , n 1 Y i ≤ t , where δ i W i , n = , n [ 1 − ˆ G ( Y i − )] with ˆ G ( t ) the Kaplan-Meier estimator of G ( t ) = P ( C ≤ t ) . 15 / 33

  16. Why does it work ? Recall that W i , n = 1 δ i i , n = 1 δ i G ( Y i − ) is "close" to W ∗ 1 − G ( Y i − ) . 1 n 1 − ˆ n Moreover (LLN), 2 n n δ i φ ( Y i ) � δφ ( Y ) � i , n φ ( Y i ) = 1 � � W ∗ 1 − G ( Y i − ) → p . s . E . 1 − G ( Y − ) n i = 1 i = 1 Proposition For all function φ such that E [ φ ( T )] < ∞ , � � δφ ( Y ) = E [ φ ( T )] . E 1 − G ( Y − ) 16 / 33

  17. Application to our context Would like to estimate quantities like E [ φ ( T , X )] (see eq. (3)). Proposition Assume that : C is independent from ( T , X ); Then � δφ ( Y , X ) � E = E [ φ ( T , X )] , n ( 1 − G ( Y − )) and � δφ ( Y , X ) � = E [ φ ( T , X ) | X ] . E n ( 1 − G ( Y − )) | X 17 / 33

  18. Thus to estimate E [ φ ( T , X )] , we use n n δ i φ ( Y i , X i ) 1 � � = W i , n φ ( Y i , X i ) . 1 − ˆ n G ( Y i − ) i = 1 i = 1 Therefore, to estimate quantities like � � ( φ ( T i ) − a ) 2 1 X i ∈X E , where X is a subspace, we compute n � W i , n ( φ ( Y i ) − a ) 2 1 X i ∈X . i = 1 18 / 33

  19. Quality of our CART estimator : simulation study Consider the following simulation scheme : draw n + v iid replications ( X 1 , ..., X n ) of the covariate, with 1 X i ∼ U ( 0 , 1 ) ; draw n + v iid lifetimes ( T 1 , ..., T n ) following an exponential 2 distribution such that T i ∼ E ( β = α 1 1 1 X i ∈ [ a , b [ + α 2 1 1 X i ∈ [ b , c [ + α 3 1 1 X i ∈ [ c , d [ + α 4 1 1 X i ∈ [ d , e ] ) . (notice that there thus exist four subgroups in the whole population) draw n + v iid censoring times, Pareto-distributed : 3 C i ∼ P areto ( λ, µ ) ; from the simulated lifetimes and censoring times, get for all i the 4 actual observed lifetime Y i = inf ( T i , C i ) and the indicator δ i = 1 T i ≤ C i ; compute the estimator ˆ G from the whole generated sample 5 ( Y i , δ i ) 1 ≤ i ≤ n + v . 19 / 33

  20. % of Sample Group-specific MWSE Global censored size Group 1 Group 2 Group 3 Group 4 MWSE observations MWSE MWSE MWSE MWSE n 100 0.19516 0.42008 0.17937 0.30992 1.10454 500 0.03058 0.07523 0.03183 0.06029 0.19796 10% 1 000 0.01509 0.03650 0.01517 0.02619 0.09306 5 000 0.00295 0.00714 0.00289 0.00530 0.01804 10 000 0.00105 0.00378 0.00117 0.00292 0.00910 100 0.20060 0.43664 0.17448 0.29022 1.10765 500 0.03736 0.07604 0.04301 0.06584 0.22217 30% 1 000 0.01748 0.04095 0.01535 0.02674 0.10043 5 000 0.00319 0.00758 0.00291 0.00547 0.01904 10 000 0.00117 0.00372 0.00125 0.00292 0.00930 100 0.19784 0.45945 0.17387 0.28363 1.11476 500 0.04906 0.08993 0.05301 0.06466 0.25668 50% 1 000 0.02481 0.05115 0.01788 0.03004 0.12387 5 000 0.00520 0.00867 0.00389 0.00516 0.02299 10 000 0.00153 0.00407 0.00162 0.00308 0.01057 20 / 33

  21. Censorship rate: 10% 1.2 1.2 1.2 Censorship rate: 30% Censorship rate: 50% 1.0 1.0 1.0 0.8 0.8 0.8 Global MSE 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 n (logarithmic scale) 21 / 33

  22. Applications 4 22 / 33

  23. Application 1 : income protection We refer to short-term disability contracts over 6 years with the following information : 83 547 claims ; PH ID, cause (sickness or accident), gender, SPC, age, duration in disability state (censored or not), distribution channel ; the censoring rate equals 7.2% ; mean lifetime in disability state : 100 days. Goal : find a segmentation to predict how much time the disability state lasts. 23 / 33

  24. Tree estimator : the age at claim seems to be key F igure : Disability duration explained by sex, SPC, network, age, cause. 24 / 33

  25. Usually, the recovery rates used to compute technical provisions for this guarantee depends on the age at the claim date due to local prudential regulation ⇒ we fit a Cox PH with this covariate : leads to consider the high predictive power of this variable ; PH assumption rejected by all tests (LR, Wald and log-rank) ; obtained results will be considered as benchmarks to enable a comparison with those resulting from the tree approach. Classes Mean Age Tree Cox a 26.83 64.44 80.01 b 34.19 85.48 96.35 c 39.57 100.04 110.19 d 45.05 111.38 126.03 e 51.29 126.40 146.28 T able : Expected disability time (days) depending on age at disability time. 25 / 33

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend