data distrib u tions
play

Data distrib u tions FE ATU R E E N G IN E E R IN G FOR MAC H IN - PowerPoint PPT Presentation

Data distrib u tions FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON Robert O ' Callaghan Director of Data Science , Ordergroo v e Distrib u tion ass u mptions FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON Obser


  1. Data distrib u tions FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON Robert O ' Callaghan Director of Data Science , Ordergroo v e

  2. Distrib u tion ass u mptions FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  3. Obser v ing y o u r data import matplotlib as plt df.hist() plt.show() FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  4. Del v ing deeper w ith bo x plots FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  5. Bo x plots in pandas df[['column_1']].boxplot() plt.show() FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  6. Paring distrib u tions import seaborn as sns sns.pairplot(df) FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  7. F u rther details on y o u r distrib u tions df.describe() FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  8. Let ' s practice ! FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON

  9. Scaling and transformations FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON Robert O ' Callaghan Data Scientist

  10. Scaling data FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  11. Min - Ma x scaling FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  12. Min - Ma x scaling FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  13. Min - Ma x scaling in P y thon from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() scaler.fit(df[['Age']]) df['normalized_age'] = scaler.transform(df[['Age']]) FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  14. Standardi z ation FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  15. Standardi z ation in P y thon from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(df[['Age']]) df['standardized_col'] = scaler\ .transform(df[['Age']]) FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  16. Log Transformation FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  17. Log transformation in P y thon from sklearn.preprocessing import PowerTransformer log = PowerTransformer() log.fit(df[['ConvertedSalary']]) df['log_ConvertedSalary'] = log.transform(df[['ConvertedSalary']]) FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  18. Final Slide FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON

  19. Remo v ing o u tliers FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON Robert O ' Callaghan Director of Data Science , Ordergroo v e

  20. What are o u tliers ? FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  21. Q u antile based detection FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  22. Q u antiles in P y thon q_cutoff = df['col_name'].quantile(0.95) mask = df['col_name'] < q_cutoff trimmed_df = df[mask] FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  23. Standard de v iation based detection FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  24. Standard de v iation detection in P y thon mean = df['col_name'].mean() std = df['col_name'].std() cut_off = std * 3 lower, upper = mean - cut_off, mean + cut_off new_df = df[(df['col_name'] < upper) & (df['col_name'] > lower)] FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  25. Let ' s practice ! FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON

  26. Scaling and transforming ne w data FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON Robet O ' Callaghan Director of Data Science , Ordergroo v e

  27. Re u se training scalers scaler = StandardScaler() scaler.fit(train[['col']]) train['scaled_col'] = scaler.transform(train[['col']]) # FIT SOME MODEL # .... test = pd.read_csv('test_csv') test['scaled_col'] = scaler.transform(test[['col']]) FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  28. Training transformations for re u se train_mean = train[['col']].mean() train_std = train[['col']].std() cut_off = train_std * 3 train_lower = train_mean - cut_off train_upper = train_mean + cut_off # Subset train data test = pd.read_csv('test_csv') # Subset test data test = test[(test[['col']] < train_upper) & (test[['col']] > train_lower)] FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  29. Wh y onl y u se training data ? Data leakage : Using data that y o u w on ' t ha v e access to w hen assessing the performance of y o u r model FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  30. A v oid data leakage ! FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend