Regression : feat u re selection P R AC TIC IN G MAC H IN E L E - PowerPoint PPT Presentation

Regression : feat u re selection P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

Selecting the correct feat u res : Red u ces o v er � � ing Impro v es acc u rac y Increases interpretabilit y Red u ces training time 1 h � ps ://www. anal y ticsindiamag . com /w hat - are - feat u re - selection - techniq u es - in - machine - learning / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Feat u re selection methods Filter : Rank feat u res based on statistical performance Wrapper : Use an ML method to e v al u ate performance Embedded : Iterati v e model training to e x tract feat u res Feat u re importance : tree - based ML models PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Compare and contrast methods Method Use an ML model Select best s u bset Can o v er � t Filter No No No Wrapper Yes Yes Sometimes Embedded Yes Yes Yes Feat u re importance Yes Yes Yes PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Correlation coefficient statistical tests Feat u re / Response Contin u o u s Categorical Contin u o u s Pearson ' s Correlation LDA Categorical ANOVA Chi - Sq u are PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Filter f u nctions F u nction ret u rns df.corr() Pearson ' s correlation matri x sns.heatmap(corr_object) heatmap plot abs() absol u te v al u e PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Wrapper methods 1. For w ard selection ( LARS - least angle regression ) Starts w ith no feat u res , adds one at a time 2. Back w ard elimination Starts w ith all feat u res , eliminates one at a time 3. For w ard selection / back w ard elimination combination ( bidirectional elimination ) 4. Rec u rsi v e feat u re elimination RFECV PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Embedded methods 1. Lasso Regression 2. Ridge Regression 3. ElasticNet PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Tree - based feat u re importance methods Random Forest --> sklearn.ensemble.RandomForestRegressor E x tra Trees --> sklearn.ensemble.ExtraTreesRegressor A � er model � t --> tree_mod.feature_importances_ PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

F u nction ret u rns sklearn.svm.SVR s u pport v ector regression estimator sklearn.feature_selection.RFECV rec u rsi v e feat u re elimination w ith cross -v al rfe_mod.support_ boolean arra y of selected feat u res ref_mod.ranking_ feat u re ranking , selected =1 sklearn.linear_model.LinearRegression linear model estimator sklearn.linear_model.LarsCV least angle regression w ith cross -v al LarsCV.score r - sq u ared score LarsCV.alpha_ estimated reg u lari z ation parameter PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON

Regression : reg u lari z ation P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

Reg u lari z ation algorithms Ridge regression Lasso regression ElasticNet regression PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Ordinar y least sq u ares 1 h � ps :// en .w ikipedia . org /w iki / Linear _ regression # Simple _ and _ m u ltiple _ linear _ regression PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Ridge loss f u nction 1 h � ps :// gerardnico . com / data _ mining / ridge _ regression # t u ning _ parameter _ math _ lambdamath PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Lasso loss f u nction 1 h � ps :// stats . stacke x change . com / q u estions /155192/w h y- discrepanc y- bet w een - lasso - and - randomforest PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Ridge v s lasso Reg u lari z ation L 1 ( Lasso ) L 2 ( Ridge ) penali z es s u m of absol u te v al u e of coe � cients s u m of sq u ares of coe � cients sol u tions sparse non - sparse n u mber of sol u tions m u ltiple one feat u re selection y es no rob u st to o u tliers ? y es no comple x pa � erns ? no y es PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

ElasticNet PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Reg u lari z ation w ith Boston ho u sing data Feat u res CHAS NOX RM Coe � cient estimates 2.7 -17.8 3.8 Reg u lari z ed coe � cient estimates 0 0 0.95 PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Reg u lari z ation f u nctions # Lasso estimator # ElasticNet estimator with cross-validation sklearn.linear_model.Lasso sklearn.linear_model.ElasticNetCV # Lasso estimator with cross-validation # Train/test split sklearn.linear_model.LassoCV sklearn.model_selection.train_test_split # Ridge estimator # Mean squared error sklearn.linear_model.Ridge sklearn.metrics.mean_squared_error(y_test, predict(X_test)) # Ridge estimator with cross-validation # Best regularization parameter sklearn.linear_model.RidgeCV mod_cv.alpha_ # ElasticNet estimator # Array of log values sklearn.linear_model.ElasticNet alphas=np.logspace(-6, 6, 13) PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Classification : feat u re engineering P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

Feat u re engineering ...w h y? E x tracts additional information from the data Creates additional rele v ant feat u res One of the most e � ecti v e w a y s to impro v e predicti v e models PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Benefits of feat u re engineering Increased predicti v e po w er of the learning algorithm Makes y o u r machine learning models perform e v en be � er ! PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

T y pes of feat u re engineering Indicator v ariables Interaction feat u res Feat u re representation PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Indicator v ariables Threshold indicator age : high school v s college M u ltiple feat u res u sed as a � ag Special e v ents black Frida y Christmas Gro u ps of classes w ebsite tra � c paid � ag Google ad w ords {4}} Facebook ads PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Interaction feat u res S u m Di � erence Prod u ct Q u otient Other mathematical combos PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Feat u re representation Datetime stamps Da y of w eek Ho u r of da y Gro u ping categorical le v els into ' Other ' Transform categorical to d u mm y v ariables ( k - 1) binar y col u mns PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Different categorical le v els Training data : model trained w ith [ red , bl u e , green ] Test data : model test w ith [ red , green , y ello w] additional color not seen in training one color missing Rob u st one - hot encoding 1 h � ps :// blog . cambridgespark . com / rob u st - one - hot - encoding - in - p y thon -3 e 29 bfcec 77 e PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Debt to income ratio Monthly Debt Annual Income/12 PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Feat u re engineering f u nctions F u nction ret u rns sklearn.linear_model.LogisticRegression logistic regression sklearn.model_selection.train_test_split train / test split f u nction sns.countplot(x='Loan Status', data=data) bar plot df.drop(['Feature 1', 'Feature 2'], axis=1) drops list of feat u res df["Loan Status"].replace({'Paid': 0, 'Not Paid': 1}) Loan Status as integers pd.get_dummies() k - 1 binar y feat u res sklearn.metrics.accuracy_score(y_test, predict(X_test)) model acc u rac y PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

An e x cellent t u torial : Datacamp article : categorical data PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Ensemble methods P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

Ensemble learning techniq u es B ootstrap Agg regation Boosting Model stacking PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Error meas u rement PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Short trees PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Tall trees PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Fat trees PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Linear model PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Bias Linear relationship ass u mption ( incorrect ) High bias Under � � ing Poor model generali z ation Increasing comple x it y decreases bias PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Comple x model PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Variance High comple x it y models : High v ariance O v er � � ing Poor model generali z ation PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Regression : feat u re selection P R AC TIC IN G MAC H IN E L E - PowerPoint PPT Presentation

Regression : feat u re selection P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist Selecting the correct feat u res : Red u ces o v er ing Impro v es acc u rac y Increases

Feat u re e x traction D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine

Feat u re engineering P R E P R OC E SSIN G FOR MAC H IN E L E AR N IN G IN P YTH ON Sarah G

Feat ure Select ion using/ f or Feat ure Select ion using/ f or Transduct ive ransduct ive S

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

Feat u re Generation FE ATU R E E N G IN E E R IN G W ITH P YSPAR K John Hog u e Lead Data

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Feat u re crossing FE ATU R E E N G IN E E R IN G IN R Jose Hernande z Data Scientist , Uni v

Wh y generate feat u res ? FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

ECOMMERCE How to Earn BIG with Holiday Season WEBINAR NOVEMBER 26, 2019 CONVERTING TRAFFIC

Workload Merging Potential in SAP Hybris DBTest, 2020 Robin Rehrmann (TU Dresden) , Martin Keppner

Session Title 10.3% conversion rate and a leg up on the competition DER DEREK KA KAZEE EE

How To Survive as a Graduate Student Brian Noble David Dill, Benli Pierce, Jay Sipelstein,

HOLIDAY MARKETING QUICK WINS Brought to you by Whereoware #WOWWEBINAR Private and Confidential.

Reconstruction 1876 1 1865 Timeline 1865 - 1876 1876 1865 Overview Presidential

OVERVIEW What's different about customer acquisition during BFCM this year How to keep

Technological Innova.on at Alibaba Alan Qi Vice President of Ant Financial Service Group Outline

Regression : feat u re selection P R AC TIC IN G MAC H IN E L E - PowerPoint PPT Presentation

Regression : feat u re selection P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist Selecting the correct feat u res : Red u ces o v er ing Impro v es acc u rac y Increases

Feat u re e x traction D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine

Feat u re engineering P R E P R OC E SSIN G FOR MAC H IN E L E AR N IN G IN P YTH ON Sarah G

Feat ure Select ion using/ f or Feat ure Select ion using/ f or Transduct ive ransduct ive S

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

Feat u re Generation FE ATU R E E N G IN E E R IN G W ITH P YSPAR K John Hog u e Lead Data

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Feat u re crossing FE ATU R E E N G IN E E R IN G IN R Jose Hernande z Data Scientist , Uni v

Wh y generate feat u res ? FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

ECOMMERCE How to Earn BIG with Holiday Season WEBINAR NOVEMBER 26, 2019 CONVERTING TRAFFIC

Workload Merging Potential in SAP Hybris DBTest, 2020 Robin Rehrmann (TU Dresden) , Martin Keppner

Session Title 10.3% conversion rate and a leg up on the competition DER DEREK KA KAZEE EE

How To Survive as a Graduate Student Brian Noble David Dill, Benli Pierce, Jay Sipelstein,

HOLIDAY MARKETING QUICK WINS Brought to you by Whereoware #WOWWEBINAR Private and Confidential.

Reconstruction 1876 1 1865 Timeline 1865 - 1876 1876 1865 Overview Presidential

OVERVIEW What's different about customer acquisition during BFCM this year How to keep

Technological Innova.on at Alibaba Alan Qi Vice President of Ant Financial Service Group Outline

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?