wh y generate feat u res
play

Wh y generate feat u res ? FE ATU R E E N G IN E E R IN G FOR MAC - PowerPoint PPT Presentation

Wh y generate feat u res ? FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON Robert O ' Callaghan Director of Data Science , Ordergroo v e Feat u re Engineering FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON


  1. Wh y generate feat u res ? FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON Robert O ' Callaghan Director of Data Science , Ordergroo v e

  2. Feat u re Engineering FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  3. Different t y pes of data Contin u o u s : either integers ( or w hole n u mbers ) or � oats ( decimals ) Categorical : one of a limited set of v al u es , e . g . gender , co u ntr y of birth Ordinal : ranked v al u es , o � en w ith no detail of distance bet w een them Boolean : Tr u e / False v al u es Datetime : dates and times FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  4. Co u rse str u ct u re Chapter 1: Feat u re creation and e x traction Chapter 2: Engineering mess y data Chapter 3: Feat u re normali z ation Chapter 4: Working w ith te x t feat u res FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  5. Pandas import pandas as pd df = pd.read_csv(path_to_csv_file) print(df.head()) FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  6. Dataset SurveyDate \ 0 2018-02-28 20:20:00 1 2018-06-28 13:26:00 2 2018-06-06 03:37:00 3 2018-05-09 01:06:00 4 2018-04-12 22:41:00 FormalEducation 0 Bachelor's degree (BA. BS. B.Eng.. etc.) 1 Bachelor's degree (BA. BS. B.Eng.. etc.) 2 Bachelor's degree (BA. BS. B.Eng.. etc.) 3 Some college/university study ... 4 Bachelor's degree (BA. BS. B.Eng.. etc.) FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  7. Col u mn names print(df.columns) Index(['SurveyDate', 'FormalEducation', 'ConvertedSalary', 'Hobby', 'Country', 'StackOverflowJobsRecommend', 'VersionControl', 'Age', 'Years Experience', 'Gender', 'RawSalary'], dtype='object') FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  8. Col u mn t y pes print(df.dtypes) SurveyDate object FormalEducation object ConvertedSalary float64 ... Years Experience int64 Gender object RawSalary object dtype: object FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  9. Selecting specific data t y pes only_ints = df.select_dtypes(include=['int']) print(only_ints.columns) Index(['Age', 'Years Experience'], dtype='object') FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  10. Lets get going ! FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON

  11. Dealing w ith Categorical Variables FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON Robert O ' Callaghan Director of Data Science , Ordergroo v e

  12. Encoding categorical feat u res FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  13. Encoding categorical feat u res FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  14. Encoding categorical feat u res One - hot encoding D u mm y encoding FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  15. One - hot encoding pd.get_dummies(df, columns=['Country'], prefix='C') C_France C_India C_UK C_USA 0 0 1 0 0 1 0 0 0 1 2 0 0 1 0 3 0 0 1 0 4 1 0 0 0 FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  16. D u mm y encoding pd.get_dummies(df, columns=['Country'], drop_first=True, prefix='C') C_India C_UK C_USA 0 1 0 0 1 0 0 1 2 0 1 0 3 0 1 0 4 0 0 0 FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  17. One - hot v s . d u mmies One - hot encoding : E x plainable feat u res D u mm y encoding : Necessar y information w itho u t d u plication FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  18. Inde x Se x 0 Male 1 Female 2 Male Inde x Male Female Inde x Male 0 1 0 0 1 1 0 1 1 0 2 1 0 2 1 FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  19. Limiting y o u r col u mns counts = df['Country'].value_counts() print(counts) 'USA' 8 'UK' 6 'India' 2 'France' 1 Name: Country, dtype: object FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  20. Limiting y o u r col u mns mask = df['Country'].isin(counts[counts < 5].index) df['Country'][mask] = 'Other' print(pd.value_counts(colors)) 'USA' 8 'UK' 6 'Other' 3 Name: Country, dtype: object FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  21. No w y o u deal w ith categorical v ariables FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON

  22. N u meric v ariables FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON Robert O ' Callaghan Director of Data Science , Ordergroo v e

  23. T y pes of n u meric feat u res Age Price Co u nts Geospatial data FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  24. Does si z e matter ? FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  25. Binari z ing n u meric v ariables df['Binary_Violation'] = 0 df.loc[df['Number_of_Violations'] > 0, 'Binary_Violation'] = 1 FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  26. Binari z ing n u meric v ariables FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  27. Binning n u meric v ariables import numpy as np df['Binned_Group'] = pd.cut( df['Number_of_Violations'], bins=[-np.inf, 0, 2, np.inf], labels=[1, 2, 3] ) FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  28. Binning n u meric v ariables FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  29. Lets start practicing ! FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend