Imbalanced Class Problem Introduction to Data Mining, 2nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Classification: Alternative Techniques
10/05/2020 Introduction to Data Mining, 2nd Edition 2
Data Mining Classification: Alternative Techniques Imbalanced Class - - PDF document
Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 1 Class Imbalance Problem Lots of classification problems where the classes are
10/05/2020 Introduction to Data Mining, 2nd Edition 2
10/05/2020 Introduction to Data Mining, 2nd Edition 3
10/05/2020 Introduction to Data Mining, 2nd Edition 4
a: TP (true positive) b: FN (false negative) c: FP (false positive) d: TN (true negative)
10/05/2020 Introduction to Data Mining, 2nd Edition 5
10/05/2020 Introduction to Data Mining, 2nd Edition 6
10/05/2020 Introduction to Data Mining, 2nd Edition 7
10/05/2020 Introduction to Data Mining, 2nd Edition 8
10/05/2020 Introduction to Data Mining, 2nd Edition 9
10/05/2020 Introduction to Data Mining, 2nd Edition 10
10/05/2020 Introduction to Data Mining, 2nd Edition 11
10/05/2020 Introduction to Data Mining, 2nd Edition 12
Class=Yes Class=No Class=Yes 10 Class=No 10 980
10/05/2020 Introduction to Data Mining, 2nd Edition 13
Class=Yes Class=No Class=Yes 10 Class=No 10 980
Class=Yes Class=No Class=Yes 1 9 Class=No 990
10/05/2020 Introduction to Data Mining, 2nd Edition 14
Class=Yes Class=No Class=Yes 40 10 Class=No 10 40
10/05/2020 Introduction to Data Mining, 2nd Edition 15
Class=Yes Class=No Class=Yes 40 10 Class=No 10 40
Class=Yes Class=No Class=Yes 40 10 Class=No 1000 4000
10/05/2020 Introduction to Data Mining, 2nd Edition 16
PREDICTED CLASS ACTUAL CLASS Yes No Yes TP FN No FP TN
is the probability that we reject the null hypothesis when it is
false positive (FP). is the probability that we accept the null hypothesis when it is false. This is a Type II error
10/05/2020 Introduction to Data Mining, 2nd Edition 17
Precision p 0.8 TPR Recall r 0.8 FPR 0.2 Fmeasure F 0.8 Accuracy 0.8
Class=Yes Class=No Class=Yes 40 10 Class=No 10 40
Class=Yes Class=No Class=Yes 40 10 Class=No 1000 4000 Precision p 0.038 TPR Recall r 0.8 FPR 0.2 Fmeasure F 0.07 Accuracy 0.8 TPR FPR 4 TPR FPR 4
10/05/2020 Introduction to Data Mining, 2nd Edition 18
PREDICTED CLASS ACTUAL CLASS
Class=Yes Class=No Class=Yes 10 40 Class=No 10 40
PREDICTED CLASS ACTUAL CLASS
Class=Yes Class=No Class=Yes 25 25 Class=No 25 25
PREDICTED CLASS ACTUAL CLASS
Class=Yes Class=No Class=Yes 40 10 Class=No 40 10
10/05/2020 Introduction to Data Mining, 2nd Edition 19
10/05/2020 Introduction to Data Mining, 2nd Edition 20
prediction is opposite
10/05/2020 Introduction to Data Mining, 2nd Edition 21
Decision trees, rule-based classifiers, neural networks,
10/05/2020 Introduction to Data Mining, 2nd Edition 22
10/05/2020 Introduction to Data Mining, 2nd Edition 23
10/05/2020 Introduction to Data Mining, 2nd Edition 24
10/05/2020 Introduction to Data Mining, 2nd Edition 25
10/05/2020 Introduction to Data Mining, 2nd Edition 26
10/05/2020 Introduction to Data Mining, 2nd Edition 27
Class
+
+
0.25 0.43 0.53 0.76 0.85 0.85 0.85 0.87 0.93 0.95 1.00 TP 5 4 4 3 3 3 3 2 2 1 FP 5 5 4 4 3 2 1 1 TN 1 1 2 3 4 4 5 5 5 FN 1 1 2 2 2 2 3 3 4 5 TPR 1 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.2 FPR 1 1 0.8 0.8 0.6 0.4 0.2 0.2
Threshold >=
10/05/2020 Introduction to Data Mining, 2nd Edition 28
10/05/2020 Introduction to Data Mining, 2nd Edition 29
10/05/2020 Introduction to Data Mining, 2nd Edition 30
10/05/2020 Introduction to Data Mining, 2nd Edition 31
10/05/2020 Introduction to Data Mining, 2nd Edition 32