regression feat u re selection
play

Regression : feat u re selection P R AC TIC IN G MAC H IN E L E - PowerPoint PPT Presentation

Regression : feat u re selection P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist Selecting the correct feat u res : Red u ces o v er ing Impro v es acc u rac y Increases


  1. Regression : feat u re selection P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  2. Selecting the correct feat u res : Red u ces o v er � � ing Impro v es acc u rac y Increases interpretabilit y Red u ces training time 1 h � ps ://www. anal y ticsindiamag . com /w hat - are - feat u re - selection - techniq u es - in - machine - learning / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  3. Feat u re selection methods Filter : Rank feat u res based on statistical performance Wrapper : Use an ML method to e v al u ate performance Embedded : Iterati v e model training to e x tract feat u res Feat u re importance : tree - based ML models PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  4. Compare and contrast methods Method Use an ML model Select best s u bset Can o v er � t Filter No No No Wrapper Yes Yes Sometimes Embedded Yes Yes Yes Feat u re importance Yes Yes Yes PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  5. Correlation coefficient statistical tests Feat u re / Response Contin u o u s Categorical Contin u o u s Pearson ' s Correlation LDA Categorical ANOVA Chi - Sq u are PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  6. Filter f u nctions F u nction ret u rns df.corr() Pearson ' s correlation matri x sns.heatmap(corr_object) heatmap plot abs() absol u te v al u e PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  7. Wrapper methods 1. For w ard selection ( LARS - least angle regression ) Starts w ith no feat u res , adds one at a time 2. Back w ard elimination Starts w ith all feat u res , eliminates one at a time 3. For w ard selection / back w ard elimination combination ( bidirectional elimination ) 4. Rec u rsi v e feat u re elimination RFECV PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  8. Embedded methods 1. Lasso Regression 2. Ridge Regression 3. ElasticNet PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  9. Tree - based feat u re importance methods Random Forest --> sklearn.ensemble.RandomForestRegressor E x tra Trees --> sklearn.ensemble.ExtraTreesRegressor A � er model � t --> tree_mod.feature_importances_ PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  10. F u nction ret u rns sklearn.svm.SVR s u pport v ector regression estimator sklearn.feature_selection.RFECV rec u rsi v e feat u re elimination w ith cross -v al rfe_mod.support_ boolean arra y of selected feat u res ref_mod.ranking_ feat u re ranking , selected =1 sklearn.linear_model.LinearRegression linear model estimator sklearn.linear_model.LarsCV least angle regression w ith cross -v al LarsCV.score r - sq u ared score LarsCV.alpha_ estimated reg u lari z ation parameter PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  11. Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON

  12. Regression : reg u lari z ation P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  13. Reg u lari z ation algorithms Ridge regression Lasso regression ElasticNet regression PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  14. Ordinar y least sq u ares 1 h � ps :// en .w ikipedia . org /w iki / Linear _ regression # Simple _ and _ m u ltiple _ linear _ regression PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  15. Ridge loss f u nction 1 h � ps :// gerardnico . com / data _ mining / ridge _ regression # t u ning _ parameter _ math _ lambdamath PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  16. Lasso loss f u nction 1 h � ps :// stats . stacke x change . com / q u estions /155192/w h y- discrepanc y- bet w een - lasso - and - randomforest PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  17. Ridge v s lasso Reg u lari z ation L 1 ( Lasso ) L 2 ( Ridge ) penali z es s u m of absol u te v al u e of coe � cients s u m of sq u ares of coe � cients sol u tions sparse non - sparse n u mber of sol u tions m u ltiple one feat u re selection y es no rob u st to o u tliers ? y es no comple x pa � erns ? no y es PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  18. ElasticNet PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  19. Reg u lari z ation w ith Boston ho u sing data Feat u res CHAS NOX RM Coe � cient estimates 2.7 -17.8 3.8 Reg u lari z ed coe � cient estimates 0 0 0.95 PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  20. Reg u lari z ation f u nctions # Lasso estimator # ElasticNet estimator with cross-validation sklearn.linear_model.Lasso sklearn.linear_model.ElasticNetCV # Lasso estimator with cross-validation # Train/test split sklearn.linear_model.LassoCV sklearn.model_selection.train_test_split # Ridge estimator # Mean squared error sklearn.linear_model.Ridge sklearn.metrics.mean_squared_error(y_test, predict(X_test)) # Ridge estimator with cross-validation # Best regularization parameter sklearn.linear_model.RidgeCV mod_cv.alpha_ # ElasticNet estimator # Array of log values sklearn.linear_model.ElasticNet alphas=np.logspace(-6, 6, 13) PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  21. Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON

  22. Classification : feat u re engineering P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  23. Feat u re engineering ...w h y? E x tracts additional information from the data Creates additional rele v ant feat u res One of the most e � ecti v e w a y s to impro v e predicti v e models PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  24. Benefits of feat u re engineering Increased predicti v e po w er of the learning algorithm Makes y o u r machine learning models perform e v en be � er ! PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  25. T y pes of feat u re engineering Indicator v ariables Interaction feat u res Feat u re representation PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  26. Indicator v ariables Threshold indicator age : high school v s college M u ltiple feat u res u sed as a � ag Special e v ents black Frida y Christmas Gro u ps of classes w ebsite tra � c paid � ag Google ad w ords {4}} Facebook ads PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  27. Interaction feat u res S u m Di � erence Prod u ct Q u otient Other mathematical combos PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  28. Feat u re representation Datetime stamps Da y of w eek Ho u r of da y Gro u ping categorical le v els into ' Other ' Transform categorical to d u mm y v ariables ( k - 1) binar y col u mns PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  29. Different categorical le v els Training data : model trained w ith [ red , bl u e , green ] Test data : model test w ith [ red , green , y ello w] additional color not seen in training one color missing Rob u st one - hot encoding 1 h � ps :// blog . cambridgespark . com / rob u st - one - hot - encoding - in - p y thon -3 e 29 bfcec 77 e PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  30. Debt to income ratio Monthly Debt Annual Income/12 PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  31. Feat u re engineering f u nctions F u nction ret u rns sklearn.linear_model.LogisticRegression logistic regression sklearn.model_selection.train_test_split train / test split f u nction sns.countplot(x='Loan Status', data=data) bar plot df.drop(['Feature 1', 'Feature 2'], axis=1) drops list of feat u res df["Loan Status"].replace({'Paid': 0, 'Not Paid': 1}) Loan Status as integers pd.get_dummies() k - 1 binar y feat u res sklearn.metrics.accuracy_score(y_test, predict(X_test)) model acc u rac y PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  32. An e x cellent t u torial : Datacamp article : categorical data PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  33. Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON

  34. Ensemble methods P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  35. Ensemble learning techniq u es B ootstrap Agg regation Boosting Model stacking PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  36. Error meas u rement PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  37. Short trees PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  38. Tall trees PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  39. Fat trees PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  40. Linear model PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  41. Bias Linear relationship ass u mption ( incorrect ) High bias Under � � ing Poor model generali z ation Increasing comple x it y decreases bias PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  42. Comple x model PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  43. Variance High comple x it y models : High v ariance O v er � � ing Poor model generali z ation PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend