variable selection
play

Variable selection IN TR OD U C TION TO P R E D IC TIVE AN ALYTIC - PowerPoint PPT Presentation

Variable selection IN TR OD U C TION TO P R E D IC TIVE AN ALYTIC S IN P YTH ON Nele Verbiest , Ph . D Data Scientist @ P y thonPredictions Candidate predictors age max_gift income_low min_gift , mean_gift , median_gift country_USA ,


  1. Variable selection IN TR OD U C TION TO P R E D IC TIVE AN ALYTIC S IN P YTH ON Nele Verbiest , Ph . D Data Scientist @ P y thonPredictions

  2. Candidate predictors age max_gift income_low min_gift , mean_gift , median_gift country_USA , country_India , country_UK number_gift_min50 , number_gift_min100 , number_gift_min150 INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  3. Variable selection : moti v ation Dra w backs of models w ith man y v ariables : O v er -� � ing Hard to maintain or implement Hard to interpret , m u lti - collinearit y INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  4. Model e v al u ation : AUC import numpy as np from sklearn.metrics import roc_auc_score roc_auc_score(true_target, prob_target) INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  5. Let ' s practice ! IN TR OD U C TION TO P R E D IC TIVE AN ALYTIC S IN P YTH ON

  6. For w ard step w ise v ariable selection IN TR OD U C TION TO P R E D IC TIVE AN ALYTIC S IN P YTH ON Nele Verbiest , Ph . D Data Scientist @ P y thonPredictions

  7. The for w ard step w ise v ariable selection proced u re Empt y set Find best v ariable v 1 Find best v ariable v in combination w ith v 2 1 Find best v ariable v in combination w ith v , v 3 1 2 ... ( Until all v ariables are added or u ntil prede � ned n u mber of v ariables is added ) INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  8. F u nctions in P y thon def function_sum(a,b): s = a + b return(s) print(function_sum(1,2)) 3 INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  9. Implementation of the for w ard step w ise proced u re F u nction auc that calc u lates AUC gi v en a certain set of v ariables F u nction best_next that ret u rns ne x t best v ariable in combination w ith c u rrent v ariables Loop u ntil desired n u mber of v ariables INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  10. Implementation of the AUC f u nction from sklearn import linear_model from sklearn.metrics import roc_auc_score def auc(variables, target, basetable): X = basetable[variables] y = basetable[target] logreg = linear_model.LogisticRegression() logreg.fit(X, y) predictions = logreg.predict_proba(X)[:,1] auc = roc_auc_score(y, predictions) return(auc) auc = auc(["age","gender_F"],["target"],basetable) print(round(auc,2)) 0.54 INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  11. Calc u lating the ne x t best v ariable def next_best(current_variables,candidate_variables, target, basetable): best_auc = -1 best_variable = None for v in candidate_variables: auc_v = auc(current_variables + [v], target, basetable) if auc_v >= best_auc: best_auc = auc_v best_variable = v return best_variable current_variables = ["age","gender_F"] candidate_variables = ["min_gift","max_gift","mean_gift"] next_variable = next_best(current_variables, candidate_variables, basetable) print(next_variable) min_gift INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  12. The for w ard step w ise v ariable selection proced u re candidate_variables = ["mean_gift","min_gift","max_gift", "age","gender_F","country_USA","income_low"] current_variables = [] target = ["target"] max_number_variables = 5 number_iterations = min(max_number_variables, len(candidate_variables)) for i in range(0,number_iterations): next_var = next_best(current_variables,candidate_variables,target,basetable) current_variables = current_variables + [next_variable] candidate_variables.remove(next_variable) print(current_variables) ['max_gift', 'mean_gift', 'min_gift', 'age', 'gender_F'] INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  13. Let ' s practice ! IN TR OD U C TION TO P R E D IC TIVE AN ALYTIC S IN P YTH ON

  14. Deciding on the n u mber of v ariables IN TR OD U C TION TO P R E D IC TIVE AN ALYTIC S IN P YTH ON Nele Verbiest , Ph . D Data Scientist @ P y thonPredictions

  15. E v al u ating the AUC auc_values = [] variables_evaluate = [] for v in variables_forward: variables_evaluate.append(v) auc_value = auc(variables_evaluate, ["target"], basetable) auc_values.append(auc_value) INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  16. E v al u ating the AUC INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  17. O v er - fitting INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  18. Detecting o v er - fitting INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  19. Partitioning from sklearn.cross_validation import train_test_split X = basetable.drop("target", 1) y = basetable["target"] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, stratify = Y) train = pd.concat([X_train, y_train], axis=1) test = pd.concat([X_test, y_test], axis=1) INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  20. Deciding the c u t - off High test AUC Lo w n u mber of v ariables INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  21. Deciding the c u t - off INTRODUCTION TO PREDICTIVE ANALYTICS IN PYTHON

  22. Let ' s practice ! IN TR OD U C TION TO P R E D IC TIVE AN ALYTIC S IN P YTH ON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend