Model evaluation P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES - PowerPoint PPT Presentation

Model evaluation P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R Rafael Falcon Data Scientist at Shopify

Q: What aspects need to be considered when you evaluate a Machine Learning model? 1. Type of Machine Learning task Classi�cation Regression Clustering 2. Carefully choose your performance metrics 3. Get a realistic performance estimate Split data into training/validation/test sets use cross-validation PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Classi�cation: confusion matrix PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Classi�cation: accuracy Accuracy proportion of correctly classi�ed examples TP + TN TP + TN + F P + F N useful when errors in predicting all classes are equally important beware of class imbalance scenarios! always predicting most frequent class → high accuracy cost-sensitive accuracy TP + TN TP + TN + c F P + c F N 1 2 PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Q: What is the ROC curve? Receiver Operating Characteristic (ROC) For models that return class probabilities Is the model able to distinguish between the classes? For each possible classi�cation threshold: True Positive Rate TPR = TP TP + F N False Positive Rate FPR = F P F P + TN Area under ROC Curve (AUC) higher is better PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Regression: Root Mean Squared Error (RMSE) Average distance between model predictions and ground truth (actual values) Easy to compute In the same units of the response variable example: y = house price RMSE = 8,000 Model is $8,000 off from the true house price on average PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Clustering: validity indices No label information Two criteria to consider: compact clusters well-separated clusters Several validity indices Dunn's index Davies-Bouldin index Silhouette index etc. Use multiple indices! PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Let's practice! P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R

Handling imbalanced data P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R Rafael Falcon Data Scientist at Shopify

Nuclear submarine detection PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Results are in! PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Frequency of the decision classes PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Q: What is imbalanced classi�cation? Large disparity in the frequencies of the decision classes. Accuracy metric is especially sensitive to these scenarios. always predicting majority class --> high accuracy! Two popular avenues: cost-sensitive classi�cation subsampling imbalanced data PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Cost-sensitive classi�cation Misclassi�cation cost for minority classes is higher than for the majority class. PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Q: How to subsample imbalanced data? Subsample the training data in a way that mitigates the class imbalance. Three common approaches: downsampling upsampling SMOTE PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Downsampling Reduce frequency of the overrepresented classes Match the frequency of the underrepresented classes Example: Before : majority class (80 samples), minority class (20 samples) After : majority class (20 samples), minority class (20 samples) PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Upsampling Increase frequency of the underrepresented classes random sampling with replacement Match the frequency of the overrepresented classes Example: Before : majority class (80 samples), minority class (20 samples) After : majority class (80 samples), minority class (80 samples) PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

SMOTE S ynthetic M inority O versampling TE chnique Generates new instances from the minority class PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

SMOTE PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Subsampling before model evaluation PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Subsampling as part of model evaluation PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Hyperparameter tuning P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R Rafael Falcon Data Scientist at Shopify

Q: Model parameter vs. hyperparameter? Parameters vs. hyperparameters Parameters : learned by the model during training often in an iterative manner Hyperparameters : not learned but speci�ed prior to training in�uence different aspects of the training process do not change as training unfolds tuned as part of a meta-learning process PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Example: Multi Layer Perceptron (MLP) Parameters weight matrix bias vector Hyperparameters learning rate number of hidden layers number of hidden neurons per layer PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Example: K-means clustering K-means clustering Parameters cluster prototypes Hyperparameters number of clusters K centroid initialization method PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Q: What is hyperparameter tuning and how is it done? Finding adequate hyperparameter values Iterative process generate hyperparameter vector train the model with this vector evaluate model performance Computationally expensive! PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Hyperpameter tuning strategies Three main strategies: grid search random search informed search PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Grid search Exhaustive search over a manually speci�ed subset of the hyperparameter space All possible combinations are considered expensive but highly parallelizable Example: α ∈ [0,1], β ∈ [2,5] Sample each hyperparameter space α ∈ {0.2,0.5,0.8} β ∈ {2,3,4,5} 12 hyperparameter vectors are tested (Cartesian product) PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Random search Randomly selects hyperparameter vectors discrete sample continuous hyperparameter distribution Highly parallelizable Can outperform grid search a few hyperparameters affect model performance PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Informed search Bayesian optimization Probabilistically maps a hyperparameter vector to a model performance indicator Samples more frequently around promising hyperparameter vectors Better results in fewer evaluations compared to grid and random search PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Hyperparameter tuning in R Several R packages available: caret mlr h2o Check out Hyperparameter tuning in R DataCamp course! We will show you how to tune hyperparameters using caret in the exercises. PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Random Forests or Gradient Boosted Trees? P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R Rafael Falcon Data Scientist at Shopify

Q: Commonalities between RFs and GBTs? They are both top-performing ensemble models. Suitable for both classi�cation and regression tasks. Decision trees as base learners. Can handle missing values. Provide model-speci�c variable importance metric. PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Q: Main differences between RFs and GBTs? Random Forests Gradient Boosting Trees Bagging ensemble Boosting ensemble Deeper decision trees Shallower decision trees Aimed at reducing variance Aimed at reducing bias Trees grown in parallel Trees grown sequentially Easier to tune Harder to tune All trees used Trees added as needed PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

R implementation There are multiple R packages that implement RFs and GBTs. library(randomForest) library(ranger) library(gbm) library(xgboost) library(caret) PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Hyperparameter tuning: RFs library(randomForest) # Tunes mtry tunedModel <- tuneRF(x = predictors, y = response, nTreeTry=500) library(caret) # Tunes m_try by default, others if configured tunedModel <- train(x = predictors, y = response, method = 'rf') PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

Hyperparameter tuning: GBTs library(gbm) # Tunes n.trees based on CV/OOB error opt_ntree_cv <- gbm.perf(model, method="cv") opt_ntree_oob <- gbm.perf(model, method="OOB") library(caret) # Tunes several hyperparameters model <- train(x = predictors, y = response, method='xgbLinear') PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN R

You made it! P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R Rafael Falcon Data Scientist at Shopify

Model evaluation P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES - PowerPoint PPT Presentation

Model evaluation P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R Rafael Falcon Data Scientist at Shopify Q: What aspects need to be considered when you evaluate a Machine Learning model? 1. Type of Machine Learning task

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Bosses and Peers Insights from an Internal Evaluator Evaluation Proposal Evaluation Proposal

Webinar on Meta-evaluation Approaches to Improve Evaluation Practice Mnica Lomea Gelis,

Programme BRICK Programme Evaluation: How, why and what? The plan Practical evaluation -

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation & Analysis Lori

Evaluation DEMMS: Evaluation of Multimedia What are the Evaluation lectures about: When

Heuristic Evaluation (Pinelle) Heuristic evaluation is a method of qualitative evaluation of

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson & Pyla UX Evaluation

Telematics 2 & Performance Evaluation Chapter 4 Introduction to Performance Evaluation

The Environment Model of Evaluation Jim Royer CIS 352 March 22, 2019 CIS 352 The Environment

A Model of Trust Evaluation for A Model of Trust Evaluation for X.509 Certificate Abu Shohel

MIGRATING TO CAN FD TOMORROW: SELF-DRIVING, CONNECTED VEHICLES Secure, Connected, Self-Driving

Haptic Device Design: Practice CPSC 599.86 / 601.86 Sonny Chan University of Calgary A Few Last

Mark Falcon Head of Regulatory Policy and Strategy PayExpo2015, 9-10 June 2015 1 PSR Restricted

Lecture 25: A very brief introduction to discourse Julia Hockenmaier juliahmr@illinois.edu

Pathways to Point-to-Point Space Lift Lt Col Nate Kitzke, USAF AMC Future Concepts and

Open-Source FPGA Implementation of Post-Quantum Cryptographic Hardware Primitives Rashmi Agrawal

NESAWG It Takes a Region Kathy Ruhf November 12-13, 2015 The land access challenge Top

Task Farming For Embarrassingly Parallel Processing Ivan Giroo igiroo@ictp.it Informa(on

Model evaluation P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES - PowerPoint PPT Presentation

Model evaluation P RACTICIN G MACH IN E LEARN IN G IN TERVIEW QUES TION S IN R Rafael Falcon Data Scientist at Shopify Q: What aspects need to be considered when you evaluate a Machine Learning model? 1. Type of Machine Learning task

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Bosses and Peers Insights from an Internal Evaluator Evaluation Proposal Evaluation Proposal

Webinar on Meta-evaluation Approaches to Improve Evaluation Practice Mnica Lomea Gelis,

Programme BRICK Programme Evaluation: How, why and what? The plan Practical evaluation -

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation &amp; Analysis Lori

Evaluation DEMMS: Evaluation of Multimedia What are the Evaluation lectures about: When

Heuristic Evaluation (Pinelle) Heuristic evaluation is a method of qualitative evaluation of

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson &amp; Pyla UX Evaluation

Telematics 2 &amp; Performance Evaluation Chapter 4 Introduction to Performance Evaluation

The Environment Model of Evaluation Jim Royer CIS 352 March 22, 2019 CIS 352 The Environment

A Model of Trust Evaluation for A Model of Trust Evaluation for X.509 Certificate Abu Shohel

MIGRATING TO CAN FD TOMORROW: SELF-DRIVING, CONNECTED VEHICLES Secure, Connected, Self-Driving

Haptic Device Design: Practice CPSC 599.86 / 601.86 Sonny Chan University of Calgary A Few Last

Mark Falcon Head of Regulatory Policy and Strategy PayExpo2015, 9-10 June 2015 1 PSR Restricted

Lecture 25: A very brief introduction to discourse Julia Hockenmaier juliahmr@illinois.edu

Pathways to Point-to-Point Space Lift Lt Col Nate Kitzke, USAF AMC Future Concepts and

Open-Source FPGA Implementation of Post-Quantum Cryptographic Hardware Primitives Rashmi Agrawal

NESAWG It Takes a Region Kathy Ruhf November 12-13, 2015 The land access challenge Top

Task Farming For Embarrassingly Parallel Processing Ivan Giro*o igiro*o@ictp.it Informa(on

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation & Analysis Lori

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson & Pyla UX Evaluation

Telematics 2 & Performance Evaluation Chapter 4 Introduction to Performance Evaluation

Task Farming For Embarrassingly Parallel Processing Ivan Giroo igiroo@ictp.it Informa(on