Data Cleansing for Predictive Models: The Next Level Roosevelt C. - PowerPoint PPT Presentation

Data Cleansing for Predictive Models: The Next Level Roosevelt C. Mosley, Jr., FCAS, MAAA CAS Ratemaking & Product Management Seminar Philadelphia, PA March 19 – 21, 2012 Experience the Pinnacle Difference!

Data Cleaning Data cleansing – the next level • Why simple visualization may not tell the whole story Data homogeneity • There are distinct groups in your underlying data Multivariate data anomalies • Certain combinations of variables may point to data issues

Data Cleansing – The Next Level

Data Validation – One and Two Way Summaries

Data Cleansing – the Next Level � One and two way data summarization and visualization is absolutely key in determining that individual factors are valid � In building predictive models, multivariate techniques consider independent variables simultaneously to account for dependencies � Data issues don’t just exist in one and two dimensions, they can exist in n dimensions (where n is the number of individual elements) � Underlying causes : heterogeneity, data anomalies � Multivariate data exploration techniques can be used to address these issues

Data Homogeneity

Clustering/Segmentation � Unsupervised classification technique � Groups data into set of discrete clusters or contiguous groups of cases � Performs disjoint cluster analysis on the basis of Euclidean distances computed from one or more quantitative input variables and cluster seeds � Objects in each cluster tend to be similar, objects in different clusters tend to be dissimilar � Can be used as a dimension reduction technique

Example � Homeowners dataset � Ran clustering analysis using key risk characteristics � Amount of insurance � Age of home � Billing option � Construction � Protection class � Deductible � Multiline � State/territory � Developed predictive model on clusters independently

Cluster Distance Map

Cluster Characteristics Coverage A Age of Home Total Total 43 219,585 20 20 25 267,415 56 9 9 155,509 ‐ 100,000 200,000 300,000 0 10 20 30 40 50 60 Coverage A Age of Home Percent without Multiline Discount 25% Total 15% 20 9 35% 0% 10% 20% 30% 40% Percent without Multiline Discount

Billing Plan Indications Bill Plan 1.600 1.407 1.400 1.346 I 1.281 n d 1.192 1.200 i 1.116 1.112 c 1.076 1.035 a 1.000 1.000 1.000 0.992 t 1.000 e d 0.800 R e l 0.600 a t i 0.400 v i t y 0.200 0.000 Monthly Semi‐Annual Pay in Full Mortgagee Bill Plan Total Cluster 9 Cluster 20

Deductible Indications Deductible 1.400 I n d 1.200 i c a t 1.000 e d R 0.800 e l a t 0.600 i v i t 0.400 y 0.200 50 100 250 500 1000 2500 5000 10000 Deductible Total Cluster 9 Cluster 20

Multi Line Indications Multi Line 0.950 0.942 0.940 I n d 0.930 i c a 0.920 t e d 0.910 0.907 R 0.900 e l 0.892 a 0.890 t i v 0.880 i t y 0.870 0.860 Auto & Home Multi Line Total Cluster 9 Cluster 20

Multivariate Data Anomalies – Back to Cluster 1 � Higher value homes Cluster 1 Total Av erage $1,109,048 $219,585 � Segment of the Amount of business that is Insurance certainly heterogeneous A verage Age of 19.6 years 42.7 years Home – will behave differently Pe rcentage of 19.9% 1.9% that overall population Deductibles > � Represents 0.2% of the $2500 overall exposures � Should we exclude data points such as these?

Outlier Data Points Midpoint of the cluster, represents an average risk for that cluster Risk that is slightly different than average, but still fits well with that cluster Potential anomaly – data point fits best within this cluster but is actually an outlier for the cluster. This generally means it doesn’t fit well anywhere.

Data “Cleanup” � Reflect heterogeneity in final product (rating plan adjustments, underwriting, tiering) � Data verification � Modify data � Exclude data

Data Cleansing for Predictive Models: The Next Level Roosevelt C. - PowerPoint PPT Presentation

Data Cleansing for Predictive Models: The Next Level Roosevelt C. Mosley, Jr., FCAS, MAAA CAS Ratemaking & Product Management Seminar Philadelphia, PA March 19 21, 2012 Experience the Pinnacle Difference! Data Cleaning Data cleansing

Cleansing the Sanctuary Daniel 8:14 A Focus on Cleansing Cleansing in the sanctuary. A

Data Cleansing for Web Information Retrieval Data Cleansing for Web Information Retrieval using

A Scrutiny Committee update on: Street Cleansing Waste and Recycling 11 th October 2016 Street

Project Nexus Data Cleansing initiative 1 Background To enable Shippers to take full

Rand Stagen March 5, 2019 THE NEXT LEVEL NEXT LEVEL You cannot solve a problem from the

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Visible Services & Transport Waste Management & Cleansing Waste Management Update Colin

Predictive Analytics for Capacity Planning HIC 2015 Andrae Gaeth What is predictive

High-Fidelity Coupling of Predictive Plasma-Wall Models Goal: Develop a predictive model of the

Why the Best Predictive What Do We Mean by . . . Models Are Often Different Main Result: . . .

The Importance of Healthcare Data Cleansing and Validation NAHDO October 28, 2015 The Spinach

Enhance Pricing and Predictive Models with Historical Exposure Data Visit www.advisenltd.com at

PowerWizard Level 1.0 & Level 2.0 Control Systems Training Systems Comparison Level 2

Evaluating predictive loss for models with observation-level latent variables Russell Millar

Data Cleansing and Data Understanding Best Practices and Lessons from the Field Casey Stella

COALA : A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High

How can you help your child? Reading Records Reading at home is of great importance in

REGULATION CF Date Most Date Most State of Amount Sought Maximum Recent Form Date of First C

COMPANY OVERVIEW Marsmint & Freeman is a member company of the Kellett & Singleton Group.

A special approach of urban inequalities: the French metropolitan area example Quentin Godoye,

Predicting Hourly Ozone Pollution in Dallas Fort Worth Area Using Spatio Temporal

Strategic Self-presentation in the Sharing Economy: Implications for Host Branding Chapter

Multidimensional Clustering of Massiv Open Online Course (MOOC) offers Applying unsupervised

Sambuz

Useful Links

Newsletter

Mail Us

Data Cleansing for Predictive Models: The Next Level Roosevelt C. - PowerPoint PPT Presentation

Data Cleansing for Predictive Models: The Next Level Roosevelt C. Mosley, Jr., FCAS, MAAA CAS Ratemaking & Product Management Seminar Philadelphia, PA March 19 21, 2012 Experience the Pinnacle Difference! Data Cleaning Data cleansing

Cleansing the Sanctuary Daniel 8:14 A Focus on Cleansing Cleansing in the sanctuary. A

Data Cleansing for Web Information Retrieval Data Cleansing for Web Information Retrieval using

A Scrutiny Committee update on: Street Cleansing Waste and Recycling 11 th October 2016 Street

Project Nexus Data Cleansing initiative 1 Background To enable Shippers to take full

Rand Stagen March 5, 2019 THE NEXT LEVEL NEXT LEVEL You cannot solve a problem from the

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Visible Services &amp; Transport Waste Management &amp; Cleansing Waste Management Update Colin

Predictive Analytics for Capacity Planning HIC 2015 Andrae Gaeth What is predictive

High-Fidelity Coupling of Predictive Plasma-Wall Models Goal: Develop a predictive model of the

Why the Best Predictive What Do We Mean by . . . Models Are Often Different Main Result: . . .

The Importance of Healthcare Data Cleansing and Validation NAHDO October 28, 2015 The Spinach

Enhance Pricing and Predictive Models with Historical Exposure Data Visit www.advisenltd.com at

PowerWizard Level 1.0 &amp; Level 2.0 Control Systems Training Systems Comparison Level 2

Evaluating predictive loss for models with observation-level latent variables Russell Millar

Data Cleansing and Data Understanding Best Practices and Lessons from the Field Casey Stella

COALA : A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High

How can you help your child? Reading Records Reading at home is of great importance in

REGULATION CF Date Most Date Most State of Amount Sought Maximum Recent Form Date of First C

COMPANY OVERVIEW Marsmint &amp; Freeman is a member company of the Kellett &amp; Singleton Group.

A special approach of urban inequalities: the French metropolitan area example Quentin Godoye,

Predicting Hourly Ozone Pollution in Dallas Fort Worth Area Using Spatio Temporal

Strategic Self-presentation in the Sharing Economy: Implications for Host Branding Chapter

Multidimensional Clustering of Massiv Open Online Course (MOOC) offers Applying unsupervised

Sambuz

Useful Links

Newsletter

Mail Us

Visible Services & Transport Waste Management & Cleansing Waste Management Update Colin

PowerWizard Level 1.0 & Level 2.0 Control Systems Training Systems Comparison Level 2

COMPANY OVERVIEW Marsmint & Freeman is a member company of the Kellett & Singleton Group.