Model evaluation P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES - - PowerPoint PPT Presentation

model evaluation
SMART_READER_LITE
LIVE PREVIEW

Model evaluation P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES - - PowerPoint PPT Presentation

Model evaluation P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R Rafael Falcon Data Scientist at Shopify Q: What aspects need to be considered when you evaluate a Machine Learning model? 1. Type of Machine Learning task


slide-1
SLIDE 1

Model evaluation

P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R

Rafael Falcon

Data Scientist at Shopify

slide-2
SLIDE 2

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Q: What aspects need to be considered when you evaluate a Machine Learning model?

  • 1. Type of Machine Learning task

Classication Regression Clustering

  • 2. Carefully choose your performance metrics
  • 3. Get a realistic performance estimate

Split data into training/validation/test sets use cross-validation

slide-3
SLIDE 3

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Classication: confusion matrix

slide-4
SLIDE 4

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Classication: accuracy

Accuracy proportion of correctly classied examples useful when errors in predicting all classes are equally important beware of class imbalance scenarios! always predicting most frequent class → high accuracy cost-sensitive accuracy

TP +TN+F P +F N TP +TN TP +TN+c F P +c F N

1 2

TP +TN

slide-5
SLIDE 5

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Q: What is the ROC curve?

Receiver Operating Characteristic (ROC) For models that return class probabilities Is the model able to distinguish between the classes? For each possible classication threshold: True Positive Rate TPR = False Positive Rate FPR = Area under ROC Curve (AUC) higher is better

TP +F N TP F P +TN F P

slide-6
SLIDE 6

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Regression: Root Mean Squared Error (RMSE)

Average distance between model predictions and ground truth (actual values) Easy to compute In the same units of the response variable example: y = house price RMSE = 8,000 Model is $8,000 off from the true house price on average

slide-7
SLIDE 7

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Clustering: validity indices

No label information Two criteria to consider: compact clusters well-separated clusters Several validity indices Dunn's index Davies-Bouldin index Silhouette index etc. Use multiple indices!

slide-8
SLIDE 8

Let's practice!

P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R

slide-9
SLIDE 9

Handling imbalanced data

P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R

Rafael Falcon

Data Scientist at Shopify

slide-10
SLIDE 10

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Nuclear submarine detection

slide-11
SLIDE 11

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Results are in!

slide-12
SLIDE 12

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Frequency of the decision classes

slide-13
SLIDE 13

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Q: What is imbalanced classication?

Large disparity in the frequencies of the decision classes. Accuracy metric is especially sensitive to these scenarios. always predicting majority class --> high accuracy! Two popular avenues: cost-sensitive classication subsampling imbalanced data

slide-14
SLIDE 14

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Cost-sensitive classication

Misclassication cost for minority classes is higher than for the majority class.

slide-15
SLIDE 15

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Q: How to subsample imbalanced data?

Subsample the training data in a way that mitigates the class imbalance. Three common approaches: downsampling upsampling SMOTE

slide-16
SLIDE 16

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Downsampling

Reduce frequency of the overrepresented classes Match the frequency of the underrepresented classes Example: Before: majority class (80 samples), minority class (20 samples) After: majority class (20 samples), minority class (20 samples)

slide-17
SLIDE 17

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Upsampling

Increase frequency of the underrepresented classes random sampling with replacement Match the frequency of the overrepresented classes Example: Before: majority class (80 samples), minority class (20 samples) After: majority class (80 samples), minority class (80 samples)

slide-18
SLIDE 18

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

SMOTE

Synthetic Minority Oversampling TEchnique Generates new instances from the minority class

slide-19
SLIDE 19

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

SMOTE

slide-20
SLIDE 20

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

SMOTE

slide-21
SLIDE 21

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Subsampling before model evaluation

slide-22
SLIDE 22

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Subsampling as part of model evaluation

slide-23
SLIDE 23

Let's practice!

P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R

slide-24
SLIDE 24

Hyperparameter tuning

P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R

Rafael Falcon

Data Scientist at Shopify

slide-25
SLIDE 25

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Q: Model parameter vs. hyperparameter?

Parameters vs. hyperparameters Parameters: learned by the model during training

  • ften in an iterative manner

Hyperparameters: not learned but specied prior to training inuence different aspects of the training process do not change as training unfolds tuned as part of a meta-learning process

slide-26
SLIDE 26

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Example: Multi Layer Perceptron (MLP)

Parameters weight matrix bias vector Hyperparameters learning rate number of hidden layers number of hidden neurons per layer

slide-27
SLIDE 27

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Example: K-means clustering

K-means clustering Parameters cluster prototypes Hyperparameters number of clusters K centroid initialization method

slide-28
SLIDE 28

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Q: What is hyperparameter tuning and how is it done?

Finding adequate hyperparameter values Iterative process generate hyperparameter vector train the model with this vector evaluate model performance Computationally expensive!

slide-29
SLIDE 29

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Hyperpameter tuning strategies

Three main strategies: grid search random search informed search

slide-30
SLIDE 30

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Grid search

Exhaustive search over a manually specied subset of the hyperparameter space All possible combinations are considered expensive but highly parallelizable Example: α ∈ [0,1], β ∈ [2,5] Sample each hyperparameter space

α ∈ {0.2,0.5,0.8} β ∈ {2,3,4,5}

12 hyperparameter vectors are tested (Cartesian product)

slide-31
SLIDE 31

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Random search

Randomly selects hyperparameter vectors discrete sample continuous hyperparameter distribution Highly parallelizable Can outperform grid search a few hyperparameters affect model performance

slide-32
SLIDE 32

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Informed search

Bayesian optimization Probabilistically maps a hyperparameter vector to a model performance indicator Samples more frequently around promising hyperparameter vectors Better results in fewer evaluations compared to grid and random search

slide-33
SLIDE 33

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Hyperparameter tuning in R

Several R packages available:

caret mlr h2o

Check out Hyperparameter tuning in R DataCamp course! We will show you how to tune hyperparameters using caret in the exercises.

slide-34
SLIDE 34

Let's practice!

P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R

slide-35
SLIDE 35

Random Forests or Gradient Boosted Trees?

P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R

Rafael Falcon

Data Scientist at Shopify

slide-36
SLIDE 36

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Q: Commonalities between RFs and GBTs?

They are both top-performing ensemble models. Suitable for both classication and regression tasks. Decision trees as base learners. Can handle missing values. Provide model-specic variable importance metric.

slide-37
SLIDE 37

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Q: Main differences between RFs and GBTs?

Random Forests Bagging ensemble Deeper decision trees Aimed at reducing variance Trees grown in parallel Easier to tune All trees used Gradient Boosting Trees Boosting ensemble Shallower decision trees Aimed at reducing bias Trees grown sequentially Harder to tune Trees added as needed

slide-38
SLIDE 38

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

R implementation

There are multiple R packages that implement RFs and GBTs.

library(randomForest) library(ranger) library(gbm) library(xgboost) library(caret)

slide-39
SLIDE 39

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Hyperparameter tuning: RFs

library(randomForest) # Tunes mtry tunedModel <- tuneRF(x = predictors, y = response, nTreeTry=500) library(caret) # Tunes m_try by default, others if configured tunedModel <- train(x = predictors, y = response, method = 'rf')

slide-40
SLIDE 40

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Hyperparameter tuning: GBTs

library(gbm) # Tunes n.trees based on CV/OOB error

  • pt_ntree_cv <- gbm.perf(model, method="cv")
  • pt_ntree_oob <- gbm.perf(model, method="OOB")

library(caret) # Tunes several hyperparameters model <- train(x = predictors, y = response, method='xgbLinear')

slide-41
SLIDE 41

Let's practice!

P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R

slide-42
SLIDE 42

You made it!

P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R

Rafael Falcon

Data Scientist at Shopify

slide-43
SLIDE 43

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Chapter 1: Data preprocessing and visualization

Data normalization Max-min scaling vs. standardization Handling missing data Exploration and visualization Imputation methods Anomaly detection IQR rule KNN distance score Local Outlier Factor (LOF) Package list (in alphabetical order):

dbscan dplyr FNN ggplot2 naniar simputation tidyr

slide-44
SLIDE 44

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Chapter 2: Supervised learning

Model interpretability Linear regression, decision trees Regularization Ridge, Lasso and Elastic net regression Bias and variance bias-variance analysis Model ensembles Bagging, boosting, stacking Package list (in alphabetical order):

caret caretEnsemble dplyr e1071 elasticnet Metrics nnet rattle rpart rpart.plot

slide-45
SLIDE 45

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Chapter 3: Unsupervised learning

K-means clustering Checking assumptions Determining optimal number of clusters Clustering algorithms Hierarchical, K-means, PAM Cluster validity indices Feature selection Filter, wrapper, embedded methods Feature extraction PCA, LDA Package list (in alphabetical order):

caret clValid dplyr MASS Metrics plot3D stats

slide-46
SLIDE 46

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Chapter 4: Model selection and evaluation

Model evaluation metrics classication, regression, clustering Handling imbalanced data downsampling, upsampling, SMOTE Hyperparameter tuning grid search, random search Random Forests vs. Gradient Boosted Trees commonalities and differences Package list (in alphabetical order):

caret clValid dplyr gbm Metrics randomForest

slide-47
SLIDE 47

PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Next steps

Keep learning! DataCamp courses on: Deep Learning Model validation Machine Learning with Apache Spark Linear classiers etc. Your constructive feedback about this course is important!

slide-48
SLIDE 48

Keep up the great work!

P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R