confidential missing marital status prediction
play

Confidential Missing Marital Status Prediction for Hypermarkets - PowerPoint PPT Presentation

Confidential Missing Marital Status Prediction for Hypermarkets Project Presentation BADM Team B-5: Sankalp Gaur, Sonali Gadekar, Harshita Jujjuru, Tushna Mistry, Vineet Jain Business Problem Missing values for Marital Status 13%


  1. Confidential Missing Marital Status Prediction for Hypermarkets Project Presentation BADM Team B-5: Sankalp Gaur, Sonali Gadekar, Harshita Jujjuru, Tushna Mistry, Vineet Jain

  2. Business Problem Missing values for ‘Marital Status’ 13% Stakeholder • Marketing team of the supermarket could be the client Use Case • Targeting family bulk shopping offers to family customers Objective • To identify married customers in the customer data set. Benefit • Correct grasp of the marital status for customer segmentation. 2 BADM B-5

  3. Data Mining Problem Analytics Objective • To successful predict (classify) marital status in case the same is missing. Methodology • Supervised Predictive (Classification) task, and both forward-looking and retrospective task as new and old records would fall under its purview. Outcome Variable Objective • Marital status for rows where marital status • To identify FAMILY • To identify MARRIED is currently missing. In fact even those who customers in the customer customers in the customer are unmarried seem to exhibit married data set. data set. behavior 3 BADM B-5

  4. Data Description Customer Data Transaction Data Transaction Level Basket Level Customer Level • KNN • KNN (SKU#, • Classification (Frequency of Age (derived Trees (Age, Class/Subclass, field), Qty Qty Sold, Sex, Age, Dummy Sold, Extended Sex) Extended Price) • Logistic Price) • Association Regression Rules (Classes within a basket) 4 BADM B-5

  5. KNN (Transaction Level Data) Validation error log for different k Training Data scoring - Summary Report (for k=12) Cut off Prob.Val. for Success (Updatable) 0.5 % Error % Error Value of k Training Validation Classification Confusion Matrix 1 2.14 38.53 Predicted Class 2 19.17 37.98 Actual Class Y N 3 19.82 37.06 Y 3951 1008 4 23.96 36.07 N 1933 3105 5 24.52 36.22 6 26.21 35.07 Error Report 7 26.61 35.59 Class # Cases # Errors % Error 8 27.59 34.99 Y 4959 1008 20.33 9 27.98 35.29 N 5038 1933 38.37 10 28.66 34.81 Overall 9997 2941 29.42 11 28.93 35.04 12 29.42 34.71 <--- Best k Validation Data scoring - Summary Report (for k=12) 13 29.63 35.18 14 29.92 34.77 Cut off Prob.Val. for Success (Updatable) 0.5 15 29.94 35.19 Classification Confusion Matrix Predicted Class Actual Class Y N Y 12244 4199 N 7257 9301 Error Report Class # Cases # Errors % Error Y 16443 4199 25.54 N 16558 7257 43.83 Overall 33001 11456 34.71 5 BADM B-5

  6. KNN – Customer Level Aggregation 6 BADM B-5

  7. Classification Tree (Basket Level Data) Cut off Prob.Val. for Success (Updatable) 0.5 Classification Confusion Matrix Predicted Class Actual Class Y N 6309 1408 Y 4172 4887 N Error Report Class # Cases # Errors % Error 7717 1408 18.24543216 Y 9059 4172 46.05364831 N Overall 16776 5580 33.26180258 7 BADM B-5

  8. Association Rules (Basket Level Data) 8 BADM B-5

  9. KNN (Customer Level Data) Predictors • Frequency of Class (in transaction level data) • Age, Dummy variable for Sex 9 BADM B-5

  10. Logistic Regression (Customer Level Data) 10 BADM B-5

  11. Ensemble 11 BADM B-5

  12. 12 BADM B-5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend