Data Mining and Machine Learning: Fundamental Concepts and - PowerPoint PPT Presentation

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil Chapter 22: Classification Assessment Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 22: Classification Assessment 1 /

Classification Assessment A classifier is a model or function M that predicts the class label ˆ y for a given input example x : ˆ y = M ( x ) where x = ( x 1 , x 2 ,..., x d ) T ∈ R d is a point in d -dimensional space and y ∈ { c 1 , c 2 ,..., c k } is its predicted class. ˆ To build the classification model M we need a training set of points along with their known classes. Once the model M has been trained, we assess its performance over a separate testing set of points for which we know the true classes. Finally, the model can be deployed to predict the class for future points whose class we typically do not know. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 22: Classification Assessment 2 /

Classification Performance Measures Let D be the testing set comprising n points in a d dimensional space, let { c 1 , c 2 ,..., c k } denote the set of k class labels, and let M be a classifier. For x i ∈ D , let y i denote its true class, and let ˆ y i = M ( x i ) denote its predicted class. Error Rate: The error rate is the fraction of incorrect predictions for the classifier over the testing set, defined as n Error Rate = 1 � I ( y i � = ˆ y i ) n i = 1 where I is an indicator function. Error rate is an estimate of the probability of misclassification. The lower the error rate the better the classifier. Accuracy: The accuracy of a classifier is the fraction of correct predictions: n Accuracy = 1 � I ( y i = ˆ y i ) = 1 − Error Rate n i = 1 Accuracy gives an estimate of the probability of a correct prediction; thus, the higher the accuracy, the better the classifier. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 22: Classification Assessment 3 /

rS uT rS rS rS uT uT uT uT uT uT rS uT uT uT uT uT uT uT uT rS rS uT rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS uT uT rS bC uT uT uT uT uT uT uT bC bC uT bC bC bC bC bC bC bC rS rS uTuT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT rS rS rS bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC uT uT rS bC uT uT uT uT uT uT bC uT uT uT rS rS rS rS rS bC bC bC rS rS rS rS rS rS rS rS rS rS rS bC rS rS rS rS rS rS rS rS rS rS bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC rS Iris Data: Full Bayes Classifier Training data in grey. Testing data in black. Three Classes: Iris-setosa ( c 1 ; circles), Iris-versicolor ( c 2 ; squares) and Iris-virginica ( c 3 ; triangles) X 2 Mean (in white) and density contours (1 and 4 . 0 2 standard deviations) shown for each class. The classifier 3 . 5 misclassifies 8 out of the 30 test cases. Thus, we 3 . 0 have Error Rate = 8 / 30 = 0 . 27 2 . 5 Accuracy = 22 / 30 = 0 . 73 2 X 1 4 4 . 5 5 . 0 5 . 5 6 . 0 6 . 5 7 . 0 7 . 5 8 . 0 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 22: Classification Assessment 4 /

Contingency Table–based Measures Let D = { D 1 , D 2 ,..., D k } denote a partitioning of the testing points based on their true class labels, where D j = { x i ∈ D | y i = c j } . Let n i = | D i | denote the size of true class c i . Let R = { R 1 , R 2 ,..., R k } denote a partitioning of the testing points based on the predicted labels, that is, R j = { x i ∈ D | ˆ y i = c j } . Let m j = | R j | denote the size of the predicted class c j . R and D induce a k × k contingency table N , also called a confusion matrix , defined as follows: � �� N ( i , j ) = n ij = | R i ∩ D j | = x a ∈ D | ˆ y a = c i and y a = c j � � � � where 1 ≤ i , j ≤ k . The count n ij denotes the number of points with predicted class c i whose true label is c j . Thus, n ii (for 1 ≤ i ≤ k ) denotes the number of cases where the classifier agrees on the true label c i . The remaining counts n ij , with i � = j , are cases where the classifier and true labels disagree. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 22: Classification Assessment 5 /

Accuracy/Precision and Coverage/Recall The class-specific accuracy or precision of the classifier M for class c i is given as the fraction of correct predictions over all points predicted to be in class c i acc i = prec i = n ii m i where m i is the number of examples predicted as c i by classifier M . The higher the accuracy on class c i the better the classifier. The overall precision or accuracy of the classifier is the weighted average of class-specific accuracies: k k � m i � acc i = 1 � � Accuracy = Precision = n ii n n i = 1 i = 1 The class-specific coverage or recall of M for class c i is the fraction of correct predictions over all points in class c i : coverage i = recall i = n ii n i The higher the coverage the better the classifier. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 22: Classification Assessment 6 /

F-measure The class-specif ic F-measure tries to balance the precision and recall values, by computing their harmonic mean for class c i : 2 = 2 · prec i · recall i 2 n ii F i = = 1 1 prec i + recall i n i + m i prec i + recall i The higher the F i value the better the classifier. The overall F-measure for the classifier M is the mean of the class-specific values: r F = 1 � F i k i = 1 For a perfect classifier, the maximum value of the F-measure is 1. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 22: Classification Assessment 7 /

Contingency Table for Iris: Full Bayes Classifier True Predicted Iris-setosa ( c 1 ) Iris-versicolor ( c 2 ) Iris-virginica( c 3 ) Iris-setosa ( c 1 ) 10 0 0 m 1 = 10 Iris-versicolor ( c 2 ) 0 7 5 m 2 = 12 Iris-virginica ( c 3 ) 0 3 5 m 3 = 8 n 1 = 10 n 2 = 10 n 3 = 10 n = 30 The class-specific precison and recall values are: prec 1 = n 11 recall 1 = n 11 m 1 = 10 / 10 = 1 . 0 n 1 = 10 / 10 = 1 . 0 prec 2 = n 22 recall 2 = n 22 m 2 = 7 / 12 = 0 . 583 n 2 = 7 / 10 = 0 . 7 prec 3 = n 33 recall 3 = n 33 m 3 = 5 / 8 = 0 . 625 n 3 = 5 / 10 = 0 . 5 The overall accuracy and F-measure is Accuracy = ( n 11 + n 22 + n 33 ) = ( 10 + 7 + 5 ) = 22 / 30 = 0 . 733 30 n F = 1 3 ( 1 . 0 + 0 . 636 + 0 . 556 ) = 2 . 192 = 0 . 731 3 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 22: Classification Assessment 8 /

Binary Classification: Positive and Negative Class When there are only k = 2 classes, we call class c 1 the positive class and c 2 the negative class. The entries of the resulting 2 × 2 confusion matrix are True Class Predicted Class Positive ( c 1 ) Negative ( c 2 ) Positive ( c 1 ) True Positive ( TP ) False Positive ( FP ) Negative ( c 2 ) False Negative ( FN ) True Negative ( TN ) Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 22: Classification Assessment 9 /

Binary Classification: Positive and Negative Class True Positives (TP): The number of points that the classifier correctly predicts as positive: � � � { x i | ˆ y i = y i = c 1 } TP = n 11 = � False Positives (FP): The number of points the classifier predicts to be positive, which in fact belong to the negative class: � � FP = n 12 = � { x i | ˆ y i = c 1 and y i = c 2 } � False Negatives (FN): The number of points the classifier predicts to be in the negative class, which in fact belong to the positive class: � � FN = n 21 = � { x i | ˆ y i = c 2 and y i = c 1 } � True Negatives (TN): The number of points that the classifier correctly predicts as negative: � � TN = n 22 = � { x i | ˆ y i = y i = c 2 } � Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 22: Classification Assessment 10

Binary Classification: Assessment Measures Error Rate: The error rate for the binary classification case is given as the fraction of mistakes (or false predictions): Error Rate = FP + FN n Accuracy: The accuracy is the fraction of correct predictions: Accuracy = TP + TN n The precision for the positive and negative class is given as TP + FP = TP TP prec P = m 1 TN + FN = TN TN prec N = m 2 where m i = | R i | is the number of points predicted by M as having class c i . Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 22: Classification Assessment 11

Binary Classification: Assessment Measures Sensitivity or True Positive Rate: The fraction of correct predictions with respect to all points in the positive class, i.e., the recall for the positive class TP + FN = TP TP TPR = recall P = n 1 where n 1 is the size of the positive class. Specificity or True Negative Rate: The recall for the negative class: FP + TN = TN TN TNR = specif icity = recall N = n 2 where n 2 is the size of the negative class. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 22: Classification Assessment 12

Data Mining and Machine Learning: Fundamental Concepts and - PowerPoint PPT Presentation

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Mining: Concepts and Techniques Chapter 1 Introduction 1 August 19, 2013

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 1 of Data Mining by

Introduction What is data mining? to Data mining functionalities Data Mining Major

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Selection Regions Assume we have a set of classifiers D = { D 1, D 2, ..., D L } Let R n be

How to Make Remaining Problem Plausibility-Based Let Us Consider the . . . How to Modify . . .

Undermodelling Detection with Sign-Perturbed Sums e 1,2 Marco Campi 3 aji 2 Erik Weyer 4 Algo

Checklist for Analytical Method Validation (Chemical) TEST ASSAY/RELATED SUBSTANCES PARAMETER

Outline Significance Levels (8.2.4) z -Tests (8.2.5) ests (8 5) Summary (8.3) 1

Chapter 1: Introduction to data OpenIntro Statistics, 2nd Edition Case study Case study 1 Data

Objectives Random Bit Generation Pseudorandom Bit Generation Statistical Tests

For Wednesday Read chapter 12, section 2 Program 5 due Program 5 Any questions?