Selecting feat u res for model performance D IME N SION AL ITY R E - PowerPoint PPT Presentation

Selecting feat u res for model performance D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine Learning Engineer , Faktion

Ans u r dataset sample DIMENSIONALITY REDUCTION IN PYTHON

Pre - processing the data from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_train_std = scaler.fit_transform(X_train) DIMENSIONALITY REDUCTION IN PYTHON

Creating a logistic regression model from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score lr = LogisticRegression() lr.fit(X_train_std, y_train) X_test_std = scaler.transform(X_test) y_pred = lr.predict(X_test_std) print(accuracy_score(y_test, y_pred)) 0.99 DIMENSIONALITY REDUCTION IN PYTHON

Inspecting the feat u re coefficients print(lr.coef_) array([[-3. , 0.14, 7.46, 1.22, 0.87]]) print(dict(zip(X.columns, abs(lr.coef_[0])))) {'chestdepth': 3.0, 'handlength': 0.14, 'neckcircumference': 7.46, 'shoulderlength': 1.22, 'earlength': 0.87} DIMENSIONALITY REDUCTION IN PYTHON

Feat u res that contrib u te little to a model X.drop('handlength', axis=1, inplace=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) lr.fit(scaler.fit_transform(X_train), y_train) print(accuracy_score(y_test, lr.predict(scaler.transform(X_test)))) 0.99 DIMENSIONALITY REDUCTION IN PYTHON

Rec u rsi v e Feat u re Elimination from sklearn.feature_selection import RFE rfe = RFE(estimator=LogisticRegression(), n_features_to_select=2, verbose=1) rfe.fit(X_train_std, y_train) Fitting estimator with 5 features. Fitting estimator with 4 features. Fitting estimator with 3 features. Dropping a feat u re w ill a � ect other feat u re ' s coe � cients DIMENSIONALITY REDUCTION IN PYTHON

Inspecting the RFE res u lts X.columns[rfe.support_] Index(['chestdepth', 'neckcircumference'], dtype='object') print(dict(zip(X.columns, rfe.ranking_))) {'chestdepth': 1, 'handlength': 4, 'neckcircumference': 1, 'shoulderlength': 2, 'earlength': 3} print(accuracy_score(y_test, rfe.predict(X_test_std))) 0.99 DIMENSIONALITY REDUCTION IN PYTHON

Let ' s practice ! D IME N SION AL ITY R E D U C TION IN P YTH ON

Tree - based feat u re selection D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine Learning Engineer , Faktion

Random forest classifier DIMENSIONALITY REDUCTION IN PYTHON

Random forest classifier from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score rf = RandomForestClassifier() rf.fit(X_train, y_train) print(accuracy_score(y_test, rf.predict(X_test))) 0.99 DIMENSIONALITY REDUCTION IN PYTHON

Random forest classifier DIMENSIONALITY REDUCTION IN PYTHON

Feat u re importance v al u es rf = RandomForestClassifier() rf.fit(X_train, y_train) print(rf.feature_importances_) array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.04, 0. , 0.01, 0.01, 0. , 0. , 0. , 0. , 0.01, 0.01, 0. , 0. , 0. , 0. , 0.05, ... 0. , 0.14, 0. , 0. , 0. , 0.06, 0. , 0. , 0. , 0. , 0. , 0. , 0.07, 0. , 0. , 0.01, 0. ]) print(sum(rf.feature_importances_)) 1.0 DIMENSIONALITY REDUCTION IN PYTHON

Feat u re importance as a feat u re selector mask = rf.feature_importances_ > 0.1 print(mask) array([False, False, ..., True, False]) X_reduced = X.loc[:, mask] print(X_reduced.columns) Index(['chestheight', 'neckcircumference', 'neckcircumferencebase', 'shouldercircumference'], dtype='object') DIMENSIONALITY REDUCTION IN PYTHON

RFE w ith random forests from sklearn.feature_selection import RFE rfe = RFE(estimator=RandomForestClassifier(), n_features_to_select=6, verbose=1) rfe.fit(X_train,y_train) Fitting estimator with 94 features. Fitting estimator with 93 features ... Fitting estimator with 8 features. Fitting estimator with 7 features. print(accuracy_score(y_test, rfe.predict(X_test)) 0.99 DIMENSIONALITY REDUCTION IN PYTHON

RFE w ith random forests from sklearn.feature_selection import RFE rfe = RFE(estimator=RandomForestClassifier(), n_features_to_select=6, step=10, verbose=1) rfe.fit(X_train,y_train) Fitting estimator with 94 features. Fitting estimator with 84 features. ... Fitting estimator with 24 features. Fitting estimator with 14 features. print(X.columns[rfe.support_]) Index(['biacromialbreadth', 'handbreadth', 'handcircumference', 'neckcircumference', 'neckcircumferencebase', 'shouldercircumference'], dtype='object') DIMENSIONALITY REDUCTION IN PYTHON

Reg u lari z ed linear regression D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine Learning Engineer , Faktion

Linear model concept DIMENSIONALITY REDUCTION IN PYTHON

Creating o u r o w n dataset x1 x2 x3 1.76 -0.37 -0.60 0.40 -0.24 -1.12 0.98 1.10 0.77 ... ... ... DIMENSIONALITY REDUCTION IN PYTHON

Creating o u r o w n dataset Creating o u r o w n target feat u re : y = 20 + 5 x + 2 x + 0 x + error 1 2 3 DIMENSIONALITY REDUCTION IN PYTHON

Linear regression in P y thon from sklearn.linear_model import LinearRegression lr = LinearRegression() lr.fit(X_train, y_train) # Actual coefficients = [5 2 0] print(lr.coef_) [ 4.95 1.83 -0.05] # Actual intercept = 20 print(lr.intercept_) 19.8 DIMENSIONALITY REDUCTION IN PYTHON

Linear regression in P y thon # Calculates R-squared print(lr.score(X_test, y_test)) 0.976 DIMENSIONALITY REDUCTION IN PYTHON

Linear regression in P y thon from sklearn.linear_model import LinearRegression lr = LinearRegression() lr.fit(X_train, y_train) # Actual coefficients = [5 2 0] print(lr.coef_) [ 4.95 1.83 -0.05] DIMENSIONALITY REDUCTION IN PYTHON

Loss f u nction : Mean Sq u ared Error DIMENSIONALITY REDUCTION IN PYTHON

Adding reg u lari z ation DIMENSIONALITY REDUCTION IN PYTHON

Adding reg u lari z ation 1 alpha , w hen it ' s too lo w the model might o v er � t , w hen it ' s too high the model might become too simple and inacc u rate . One linear model that incl u des this t y pe of reg u lari z ation is called Lasso , for least absol u te shrinkage DIMENSIONALITY REDUCTION IN PYTHON

Lasso regressor from sklearn.linear_model import Lasso la = Lasso() la.fit(X_train, y_train) # Actual coefficients = [5 2 0] print(la.coef_) [4.07 0.59 0. ] print(la.score(X_test, y_test)) 0.861 DIMENSIONALITY REDUCTION IN PYTHON

Lasso regressor from sklearn.linear_model import Lasso la = Lasso(alpha=0.05) la.fit(X_train, y_train) # Actual coefficients = [5 2 0] print(la.coef_) [ 4.91 1.76 0. ] print(la.score(X_test, y_test)) 0.974 DIMENSIONALITY REDUCTION IN PYTHON

Combining feat u re selectors D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine Learning Engineer , Faktion

Lasso regressor from sklearn.linear_model import Lasso la = Lasso(alpha=0.05) la.fit(X_train, y_train) # Actual coefficients = [5 2 0] print(la.coef_) [ 4.91 1.76 0. ] print(la.score(X_test, y_test)) 0.974 DIMENSIONALITY REDUCTION IN PYTHON

LassoCV regressor from sklearn.linear_model import LassoCV lcv = LassoCV() lcv.fit(X_train, y_train) print(lcv.alpha_) 0.09 DIMENSIONALITY REDUCTION IN PYTHON

LassoCV regressor mask = lcv.coef_ != 0 print(mask) [ True True False ] reduced_X = X.loc[:, mask] DIMENSIONALITY REDUCTION IN PYTHON

Taking a step back Random forest is combination of decision trees . We can u se combination of models for feat u re selection too . DIMENSIONALITY REDUCTION IN PYTHON

Feat u re selection w ith LassoCV from sklearn.linear_model import LassoCV lcv = LassoCV() lcv.fit(X_train, y_train) lcv.score(X_test, y_test) 0.99 lcv_mask = lcv.coef_ != 0 sum(lcv_mask) 66 DIMENSIONALITY REDUCTION IN PYTHON

Feat u re selection w ith random forest from sklearn.feature_selection import RFE from sklearn.ensemble import RandomForestRegressor rfe_rf = RFE(estimator=RandomForestRegressor(), n_features_to_select=66, step=5, verbose=1) rfe_rf.fit(X_train, y_train) rf_mask = rfe_rf.support_ DIMENSIONALITY REDUCTION IN PYTHON

Feat u re selection w ith gradient boosting from sklearn.feature_selection import RFE from sklearn.ensemble import GradientBoostingRegressor rfe_gb = RFE(estimator=GradientBoostingRegressor(), n_features_to_select=66, step=5, verbose=1) rfe_gb.fit(X_train, y_train) gb_mask = rfe_gb.support_ DIMENSIONALITY REDUCTION IN PYTHON

Combining the feat u re selectors import numpy as np votes = np.sum([lcv_mask, rf_mask, gb_mask], axis=0) print(votes) array([3, 2, 2, ..., 3, 0, 1]) mask = votes >= 2 reduced_X = X.loc[:, mask] DIMENSIONALITY REDUCTION IN PYTHON

Selecting feat u res for model performance D IME N SION AL ITY R E - PowerPoint PPT Presentation

Selecting feat u res for model performance D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine Learning Engineer , Faktion Ans u r dataset sample DIMENSIONALITY REDUCTION IN PYTHON Pre - processing the data from

Feat u re engineering P R E P R OC E SSIN G FOR MAC H IN E L E AR N IN G IN P YTH ON Sarah G

Feat u re e x traction D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine

Feat u re Generation FE ATU R E E N G IN E E R IN G W ITH P YSPAR K John Hog u e Lead Data

Regression : feat u re selection P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E

Wh y generate feat u res ? FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P

Feat ure Select ion using/ f or Feat ure Select ion using/ f or Transduct ive ransduct ive S

RES updates, resources and Access European HPC ecosystem Sergi Girona RES Coordinator RES: HPC

Dotmetrics Exclusive Users Selecting basic dimensions (country, devices) Selecting timeframe

Feat u re crossing FE ATU R E E N G IN E E R IN G IN R Jose Hernande z Data Scientist , Uni v

Performance Bas Performance Bas Performance Bas Performance Bas ed ed ed ed Methodology for

SUNAH RES English presentation 06/05/2016 www.sunah res .com SUNAH RES is a commercial real

Volume visualization Steve Marschner CS 6630 Fall 2009 U. Texas High-Res CT Facility U.

Preprocessing data SU P E R VISE D L E AR N IN G W ITH SC IK IT - L E AR N Andreas M ller

Transforming ne w feat u res FE ATU R E E N G IN E E R IN G IN R Jose Hernande z Data Scientist

Time - dela y ed feat u res and a u to - regressi v e models MAC H IN E L E AR N IN G FOR TIME

MRT9 Specifications Features Feat es Performance Performance Engine Power with

RF + RLSC Kari Torkkola Eugene Tuv Motorola Intel Intelligent Systems Lab Analysis and

Random Forests A Statistical Tool for the Sciences Adele Cutler Utah State University Based

14.54 International Trade Lecture 11: Specific Factors Model 14.54 Week 6 Fall 2016

Intro & Updates Ben Hilburn What is Software Radio? Defined by the IEEE P1900.1

The Design Complexity of Program Undo Support in a General

IQC5000B Series RF Signal Record and Playback Value of RF Streaming Evidence of what happened

Asset Pricing Chapter VI. Risk Aversion and Investment Decisions, Part II: Modern Portfolio

Jim Galvins Slides Exports: Today & Tomorrow Jim Galvin Chief Executive Officer

Selecting feat u res for model performance D IME N SION AL ITY R E - PowerPoint PPT Presentation

Selecting feat u res for model performance D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine Learning Engineer , Faktion Ans u r dataset sample DIMENSIONALITY REDUCTION IN PYTHON Pre - processing the data from

Feat u re engineering P R E P R OC E SSIN G FOR MAC H IN E L E AR N IN G IN P YTH ON Sarah G

Feat u re e x traction D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine

Feat u re Generation FE ATU R E E N G IN E E R IN G W ITH P YSPAR K John Hog u e Lead Data

Regression : feat u re selection P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E

Wh y generate feat u res ? FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P

Feat ure Select ion using/ f or Feat ure Select ion using/ f or Transduct ive ransduct ive S

RES updates, resources and Access European HPC ecosystem Sergi Girona RES Coordinator RES: HPC

Dotmetrics Exclusive Users Selecting basic dimensions (country, devices) Selecting timeframe

Feat u re crossing FE ATU R E E N G IN E E R IN G IN R Jose Hernande z Data Scientist , Uni v

Performance Bas Performance Bas Performance Bas Performance Bas ed ed ed ed Methodology for

SUNAH RES English presentation 06/05/2016 www.sunah res .com SUNAH RES is a commercial real

Volume visualization Steve Marschner CS 6630 Fall 2009 U. Texas High-Res CT Facility U.

Preprocessing data SU P E R VISE D L E AR N IN G W ITH SC IK IT - L E AR N Andreas M ller

Transforming ne w feat u res FE ATU R E E N G IN E E R IN G IN R Jose Hernande z Data Scientist

Time - dela y ed feat u res and a u to - regressi v e models MAC H IN E L E AR N IN G FOR TIME

MRT9 Specifications Features Feat es Performance Performance Engine Power with

RF + RLSC Kari Torkkola Eugene Tuv Motorola Intel Intelligent Systems Lab Analysis and

Random Forests A Statistical Tool for the Sciences Adele Cutler Utah State University Based

14.54 International Trade Lecture 11: Specific Factors Model 14.54 Week 6 Fall 2016

Intro &amp; Updates Ben Hilburn What is Software Radio? Defined by the IEEE P1900.1

The Design Complexity of Program Undo Support in a General

IQC5000B Series RF Signal Record and Playback Value of RF Streaming Evidence of what happened

Asset Pricing Chapter VI. Risk Aversion and Investment Decisions, Part II: Modern Portfolio

Jim Galvins Slides Exports: Today &amp; Tomorrow Jim Galvin Chief Executive Officer

Intro & Updates Ben Hilburn What is Software Radio? Defined by the IEEE P1900.1

Jim Galvins Slides Exports: Today & Tomorrow Jim Galvin Chief Executive Officer