Predicting Return to Work Predicting Return to Work with Data - - PowerPoint PPT Presentation

predicting return to work predicting return to work
SMART_READER_LITE
LIVE PREVIEW

Predicting Return to Work Predicting Return to Work with Data - - PowerPoint PPT Presentation

Predicting Return to Work Predicting Return to Work with Data Mining with Data Mining Claim A nalytics Claim Analytics nalytics Claim A Jonathan Polon Jonathan Polon Barry Senensky Barry Senensky Jonathan Polon Barry Senensky www.


slide-1
SLIDE 1

Predicting Return to Work Predicting Return to Work

with Data Mining with Data Mining

Claim A Claim A Claim Analytics nalytics nalytics

IAAHS Colloquium IAAHS Colloquium IAAHS Colloquium Dresden, Germany Dresden, Germany Dresden, Germany April 27 April 27 April 27-

  • 29, 2004

29, 2004 29, 2004 Jonathan Polon Jonathan Polon Jonathan Polon Barry Senensky Barry Senensky Barry Senensky www. www. www.claimanalytics claimanalytics claimanalytics.com .com .com

slide-2
SLIDE 2

Predicting Return to Work with Predicting Return to Work with Data Mining Data Mining

About Us Why Score Claims? Data Mining Model Benefits About the Model Building the Model Results Conclusion

slide-3
SLIDE 3

About Us About Us

Founded by Barry Senensky and Jonathan Polon

in 2001

To use data mining tools to bring new insights

and solutions to the insurance industry.

1.

Create a predictive scoring model for group LTD claimants.

2.

Produce case study of creating predictive model for SOA Health Section.

Recent Projects

slide-4
SLIDE 4

Why S Why Score Claims? core Claims?

Order an independent medical

examination for Derek T.?

Provide extensive rehab to Pat B.? Call Jacob Z. again, to monitor his

progress?

Have an investigator check out that

suspicious-sounding bad back of Brenda B.? Every day, claims managers make many decisions and choices. Should they:

slide-5
SLIDE 5

Imagine a system that: Imagine a system that:

Scores each claim with a

number from 1 to 10, predicting likelihood of recovery within a given time frame.

Is a fast, objective, consistent

method of ranking claims.

Helps claims managers spend

time where it is most productive, and optimize resource allocation.

What benefits would accrue from such a system?

slide-6
SLIDE 6

Benefits Benefits

slide-7
SLIDE 7

Benefits Benefits

Helps claims managers to quickly establish a starting point for a new claim.

slide-8
SLIDE 8

Benefits Benefits

Helps claims managers to optimize allocation of time and resources.

slide-9
SLIDE 9

Benefits Benefits

Can be used to balance the workload among claims personnel.

slide-10
SLIDE 10

Benefits Benefits

Provides an early indication of changes in aggregate claim quality, allowing claims management to take appropriate financial measures.

slide-11
SLIDE 11

Model Benefits Model Benefits

Does not tie up claims staff in new operational activities... Entails no costly interference in established working procedures.

slide-12
SLIDE 12

Benefits Benefits

Facilitates early intervention in claims management.

slide-13
SLIDE 13

Benefits Benefits

Distinguishes between ‘gray area’ claims (those claims which are neither particularly promising nor particularly unpromising) by scoring them as low as ‘4’ or as high as ‘7.’

slide-14
SLIDE 14

Data Mining Data Mining

How it works What it can do Data mining tools

slide-15
SLIDE 15

Data Mining Data Mining

  • Uses sophisticated statistical

tools to “mine” through databases to find hidden patterns and trends

  • Harnesses speed, capability,

and capacity of modern computers.

  • Predate the invention of

electric light, cars and the telephone.

  • Were developed under

the constraints of what humans were able to calculate, in a reasonable timespan. Most statistical methods used in business:

slide-16
SLIDE 16

Our Data Mining Tools Our Data Mining Tools

  • 1. CART
  • 2. Neural Networks
  • 3. Genetic Algorithms

Powerful filter. Identifies factors with greatest impact; reduces amount of ‘noise’ being introduced to the model from non-impacting factors Optimization tools

We use three:

slide-17
SLIDE 17

Neural Networks Neural Networks

How they learn How they learn

Network is presented with data sample

with known outcomes

Network predicts result, and compares it

to actual outcome

Network parameters are changed to

better approximate the sample…

…Over and over again.

slide-18
SLIDE 18

Uses Uses of Neural Networks

  • f Neural Networks

Neural nets are a statistical

tool for making predictions

Used in: – Detection of credit card,

tax, and securities fraud

– Bioinformatics – Customer behaviour

prediction

– Text analysis

But, as yet, rarely in the insurance industry.

Example: Design a neural networks model to predict the results of professional rugby matches.

But, as yet, rarely in the insurance industry.

slide-19
SLIDE 19

Neural Networks: Example Neural Networks: Example

Who’s going to win the footy game? Who’s going to win the footy game?

Hidden Layer

Output Input Weights

The neural network weights each variable as it sees fit.

Weights

slide-20
SLIDE 20

Use of This Use of This Ne Neural ural Network Analysis Network Analysis

A tipping model that outperforms all but one professional in its first year of use.

  • Alan McCabe, a computer scientist from James Cook University,

developed software to predict the results of Australian Rugby League matches.

  • He used data from a number of different seasons of the Australian

National Rugby League to develop his model.

  • In its first year of use, the model achieved 67% accuracy, tying the

top newspaper tipper and beating every one one of the rest. In the Final Series matches, having ‘learned’ from the season, the model achieved a 78% success rate.

slide-21
SLIDE 21

Genetic Algorithms Genetic Algorithms

Inspired by Darwinian

concept of survival of the fittest

Multiple solutions

considered in simultaneity

Best of these solutions

are most likely to “survive”

slide-22
SLIDE 22

Genetic Algorithms Genetic Algorithms

Process Process

Solutions evolve in two manners:

– Reproduction – Mutation

Solution A Solution B

+ =

New Solution

slide-23
SLIDE 23

Genetic Algorithms Genetic Algorithms

Summary Summary

Solutions evolve over several generations When process stops, best surviving solution is

chosen

Duck-billed platypus

slide-24
SLIDE 24

About t About the Model he Model

Assigns each claim a score from one to ten, predicting recovery within a given time frame Incorporates predictive strengths of both neural networks and genetic algorithms Incorporates industry and other external data to enhance robustness and predictivity Uses a committee of experts approach: final score averages output from several hundred models.

slide-25
SLIDE 25

How We Built the Model How We Built the Model

State the Goal Data Requirements Split the Data Filter the Factors Prepare the Data Train the Model Neural Netw orks and Genetic Algorithms Validate The Completed Model

slide-26
SLIDE 26

How We Built the Model How We Built the Model

The Goal

Build a model to predict likeliness of recovery for LTD claims, producing a single comprehensible output, the score. Define a benchmark for success:

  • 75% or more of claims scored with an 8 - 10

return to work within 2 years.

  • 5% or less of claims scored with a 1 - 3

return to work within 2 years.

slide-27
SLIDE 27

How We Built the Model How We Built the Model

Data Requirements

  • Determine the factors that influence recovery
  • Determine what data is needed to decide if a

claim has recovered

  • Determine how many records are required.
slide-28
SLIDE 28

How We Built the Model How We Built the Model

Split the Data

Validate the data, using a series of manual and automatic checks, and then split it into three parts:

(i)80% for training the model (ii)10% for testing (iii)10% for final validation.

slide-29
SLIDE 29

How We Built the Model How We Built the Model

Filter the Factors

Use an initial filtering tool (we used Salford Systems CART) to key in on which data factors impact recovery most.

slide-30
SLIDE 30

How We Built the Model How We Built the Model

Prepare the Data

Considerable data manipulation goes into readying the data for modeling. 3 Likelihood of being fatal 2 Likelihood of drug treatment 9 I mpact on gross motor skills 9 I mpact on fine motor skills I mpact on vision Brain & Nervous System Diagnostic category Muscular Dystrophy Diagnosis Example I: Diagnostic Category

slide-31
SLIDE 31

How We Built the Model How We Built the Model

Age treated as 3 categorical (unordered) variables: 18-34, 35-49 and 50-65.

Prepare the Data - II The absolute differences found in the three age categories are much more meaningful to the neural network than what it would have found in just one category, comparing the relative values of the ages.

Example II – Age

Claire, 53 Bruce, 37 Ashley, 21 1 Claire 1 Bruce 1 Ashley 50-65 35-49 18-34

slide-32
SLIDE 32

How We Built the Model How We Built the Model

Train the Model

Make design decisions to maximize the ability of the model to learn.

Examples: Set network size Set training tolerance

slide-33
SLIDE 33

How We Built the Model How We Built the Model

Set network size

  • Determines # of weights (parameters) in the model
  • Probably the most critical setting.

There is a trade off between accuracy (more weights) and ability to generalize (less weights).

Train the Model: Example I

slide-34
SLIDE 34

How We Built the Model How We Built the Model

Train the Model: Example II

During training:

  • Neural network cycles through data one record at a time.
  • At each record the network compares predicted output to

actual output, and adjusts its weights, if necessary, to better approximate the actual output.

  • The network continues cycling through the data until there is a

set of weights for which every record is within the training tolerance.

Set Training Tolerance

How accurate must the output be to be considered correct?

slide-35
SLIDE 35

How We Built the Model How We Built the Model

Neural Netw orks and Genetic Algorithms…

Neural Nets are:

Genetic Algorithms are: Not fooled by local minima but may be slow and inefficient Fast and efficient but carry risk of getting caught in local minima

Characteristics of a Hybrid Approach

Genetic Algorithm finds a good solution ... Neural Network optimizes it

slide-36
SLIDE 36

How We Built the Model How We Built the Model

Validate

The completed model was validated by comparing its recovery predictions for the validation data (the 10% of set-aside historic data not seen by the model until this point) to real-world outcomes.

slide-37
SLIDE 37

Model Results Model Results

0% 20% 40% 60% 80% 100% % Recover 0% 0% 11% 29% 44% 56% 70% 80% 75% 91% 1 2 3 4 5 6 7 8 9 10

Recovery Rate by Score

(real-life example)

Actual Recovery

Model’s Predictive Score

slide-38
SLIDE 38

Summary Summary

We were delighted with the results of our project. Our model proved itself able to accurately score claimants and predict recovery, providing a valuable tool to help with claims management. The Claim Analytics scoring model is a Fast Objective Consistent method of ranking claims, that Integrates easily into the workplace, and Helps claims managers optimize resource allocation.

slide-39
SLIDE 39

Questions? Questions?