Automatic Sample-by- sample Model Selection Between Two - - PDF document
Automatic Sample-by- sample Model Selection Between Two - - PDF document
Automatic Sample-by- sample Model Selection Between Two Off-the-shelf Classifiers Steve P. Chadwick University of Texas at Dallas Model Selection by Predicting the Better Classifier Idea: Two classifiers, "primary" and
Model Selection by Predicting the Better Classifier
Idea:
- Two classifiers, "primary" and "secondary"
Use confidence to predict which one is expected to perform best
Pima Indian Diabetes
- Primary classifier: Fisher LD
1 2 3 4 5 6 7 8 9 10 10 20 30 40
Percentage of error within equal-sized bins
1 2 3 4 5 6 7 8 9 1
Fisher LD classifies over 70% of the data before half the total error is accumulated. Secondary classifier: 1-Nearest Neighbor
1 2 3 4 5 6 7 8 9 10 10 20 30 40
Percentage of error within equal-sized bins
1 2 3 4 5 6 7 8 9 10
1-Neigh classifies about 50% of the data when half the total error is accumulated.
Ljubljana Breast Cancer
- Primary classifier: Fisher LD
1 2 3 4 5 6 7 8 9 10 10 20 30 40
Percentage of error within equal-sized bins
1 2 3 4 5 6 7 8 9 1
Fisher LD classifies over 60% of the data before half the total error is accumulated. Secondary classifier: Nearest Unlike Nei.
1 2 3 4 5 6 7 8 9 10 10 20 30 40 50
Percentage of error within equal-sized bins
1 2 3 4 5 6 7 8 9 10
NUN classifies about 60% of the data before half the total error is accumulated.
Confidence measure profiles
- (Ljubljana Breast Cancer)
1 2 3 4 5 6 7 8 9 10 10 20 30 40
Fisher LD using 1/(wtx+s)
1 2 3 4 5 6 7 8 9 10 10 20 30 40 50
1-Nearest Neighbor using 1-Neighbor distance
1 2 3 4 5 6 7 8 9 10 10 20 30 40
MSE using Q
1 2 3 4 5 6 7 8 9 10 10 20 30 40 50
1-Nearest Neighbor using distance from centers
1 2 3 4 5 6 7 8 9 10 10 20 30 40 50
1-Neighbor using nearest unlike neighbor ratio
Differential error
- (Ljubljana Breast Cancer)
30% class A, 70% class B Fisher LD:
1 2 3 4 5 6 7 8 9 10 10 20 30 40 50 60 70
Nearest Unlike Neighbor:
1 2 3 4 5 6 7 8 9 10 20 40 60 80
Differential error
- (Pima Indian Diabetes)
35% class A, 65% class B Fisher LD:
1 2 3 4 5 6 7 8 9 10 10 20 30 40 50
1-Nearest Neighbor:
1 2 3 4 5 6 7 8 9 10 10 20 30 40 50 60 70
Differential error
- (Synthetic Data)
50% class A, 50% class B Fisher LD:
1 2 3 4 5 6 7 8 9 10 5 10 15 20 25 30
1-Nearest Neighbor:
1 2 3 4 5 6 7 8 9 10 20 40 60 80
Obstacles
- About 25% of the training data contributes to
calculating the selection LD when combining linear discriminant and nearest neighbor classifiers.
100 200 300 400 500 600 0.5 1 1.5 2 2.5 3
✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂