Automatic Sample-by- sample Model Selection Between Two - - PDF document

automatic sample by sample model selection between two
SMART_READER_LITE
LIVE PREVIEW

Automatic Sample-by- sample Model Selection Between Two - - PDF document

Automatic Sample-by- sample Model Selection Between Two Off-the-shelf Classifiers Steve P. Chadwick University of Texas at Dallas Model Selection by Predicting the Better Classifier Idea: Two classifiers, "primary" and


slide-1
SLIDE 1

Automatic Sample-by- sample Model Selection Between Two Off-the-shelf Classifiers

Steve P. Chadwick

University of Texas at Dallas

slide-2
SLIDE 2

Model Selection by Predicting the Better Classifier

Idea:

  • Two classifiers, "primary" and "secondary"

Use confidence to predict which one is expected to perform best

Pima Indian Diabetes

  • Primary classifier: Fisher LD

1 2 3 4 5 6 7 8 9 10 10 20 30 40

Percentage of error within equal-sized bins

1 2 3 4 5 6 7 8 9 1

Fisher LD classifies over 70% of the data before half the total error is accumulated. Secondary classifier: 1-Nearest Neighbor

1 2 3 4 5 6 7 8 9 10 10 20 30 40

Percentage of error within equal-sized bins

1 2 3 4 5 6 7 8 9 10

1-Neigh classifies about 50% of the data when half the total error is accumulated.

slide-3
SLIDE 3

Ljubljana Breast Cancer

  • Primary classifier: Fisher LD

1 2 3 4 5 6 7 8 9 10 10 20 30 40

Percentage of error within equal-sized bins

1 2 3 4 5 6 7 8 9 1

Fisher LD classifies over 60% of the data before half the total error is accumulated. Secondary classifier: Nearest Unlike Nei.

1 2 3 4 5 6 7 8 9 10 10 20 30 40 50

Percentage of error within equal-sized bins

1 2 3 4 5 6 7 8 9 10

NUN classifies about 60% of the data before half the total error is accumulated.

slide-4
SLIDE 4

Confidence measure profiles

  • (Ljubljana Breast Cancer)

1 2 3 4 5 6 7 8 9 10 10 20 30 40

Fisher LD using 1/(wtx+s)

1 2 3 4 5 6 7 8 9 10 10 20 30 40 50

1-Nearest Neighbor using 1-Neighbor distance

1 2 3 4 5 6 7 8 9 10 10 20 30 40

MSE using Q

1 2 3 4 5 6 7 8 9 10 10 20 30 40 50

1-Nearest Neighbor using distance from centers

1 2 3 4 5 6 7 8 9 10 10 20 30 40 50

1-Neighbor using nearest unlike neighbor ratio

slide-5
SLIDE 5

Differential error

  • (Ljubljana Breast Cancer)

30% class A, 70% class B Fisher LD:

1 2 3 4 5 6 7 8 9 10 10 20 30 40 50 60 70

Nearest Unlike Neighbor:

1 2 3 4 5 6 7 8 9 10 20 40 60 80

Differential error

  • (Pima Indian Diabetes)

35% class A, 65% class B Fisher LD:

1 2 3 4 5 6 7 8 9 10 10 20 30 40 50

1-Nearest Neighbor:

1 2 3 4 5 6 7 8 9 10 10 20 30 40 50 60 70

Differential error

  • (Synthetic Data)

50% class A, 50% class B Fisher LD:

1 2 3 4 5 6 7 8 9 10 5 10 15 20 25 30

1-Nearest Neighbor:

1 2 3 4 5 6 7 8 9 10 20 40 60 80

slide-6
SLIDE 6

Obstacles

  • About 25% of the training data contributes to

calculating the selection LD when combining linear discriminant and nearest neighbor classifiers.

100 200 300 400 500 600 0.5 1 1.5 2 2.5 3

✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂

Selection LD and data in q-space

The different confidence measures have different ranges, which makes them difficult to compare with each other.