Classification Complexity Measures
and Their Relationship to Feature Selection
- J. L. Solka and D. A. Johannsen
solkajl@nswc.navy.mil;johannsen@nswc.navy.mil
NSWCDD
Interface04 – p.1/40
Classification Complexity Measures and Their Relationship to - - PowerPoint PPT Presentation
Classification Complexity Measures and Their Relationship to Feature Selection J. L. Solka and D. A. Johannsen solkajl@nswc.navy.mil;johannsen@nswc.navy.mil NSWCDD Interface04 p.1/40 Agenda Minimal spanning tree complexity measures. An
solkajl@nswc.navy.mil;johannsen@nswc.navy.mil
NSWCDD
Interface04 – p.1/40
Interface04 – p.2/40
Interface04 – p.3/40
Single nearest neighbor cross-validated classification performance. Graph theoretic measures. Minimal spanning tree (MST) measures. Class cover catch digraph measures.
Interface04 – p.4/40
Interface04 – p.5/40
Interface04 – p.6/40
−4 −3 −2 −1 1 2 3 4 −5 −4 −3 −2 −1 1 2 3
Minimum Spanning Tree Inter−Class Edges for Two Bivariate Normal Samples
Interface04 – p.7/40
−3 −2 −1 1 2 3 4 5 −6 −5 −4 −3 −2 −1 1 2 3 4
Minimum Spanning Tree Inter−Class Edges for Two Bivariate Normal Samples
Interface04 – p.8/40
Is there a correspondence between nearest neighbor cross-validated performance and the MST complexity measure? Can one use the MST complexity measure as a surrogate for nearest neighbor classifier performance during classifier
What is the effect of Minkowski p parameter choice on classifier performance? Can Minkowski p parameter and feature selection be simultaneously optimized based on some measure of classifier performance?
Interface04 – p.9/40
Interface04 – p.10/40
(1)
Interface04 – p.11/40
Interface04 – p.12/40
Interface04 – p.13/40
Interface04 – p.14/40
Interface04 – p.15/40
10 20 30 40 50 60 70 80 90 100 0.25 0.3 0.35 Average Complexity (over cross validation) p 10 20 30 40 50 60 70 80 90 100 0.65 0.7 0.75 Average Performance (over cross validation)
Performance and Complexity as a Function of Minkowski p Paramater for the (Nonsmoothed) Nose Data
p=5
Interface04 – p.16/40
5 10 15 20 25 30 35 40 0.5 0.6 0.7 0.8 0.9 (21 ,0.78125)
Number of Fibers Average Performance Over the Cross−Validation
Interface04 – p.17/40
10 20 30 40 50 60 70 80 90 100 0.2 0.25 0.3 0.35 Average Complexity (over cross validation) p
Performance and Complexity as a Function of Minkowski p Parameter for the Smoothed Nose Data
10 20 30 40 50 60 70 80 90 100 0.6 0.7 0.8 0.9 Average Performance (over cross validation) (29,.85625)
5 10 15 20 25 30 35 40 0.68 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 (37 ,0.85)
Average Performance Over the Cross Validation
Interface04 – p.19/40
Interface04 – p.20/40
10 20 30 40 50 60 70 80 90 100 0.2 0.25 0.3 0.35 Average Complexity (over cross validation) p 10 20 30 40 50 60 70 80 90 100 0.6 0.7 0.8 0.9 Average Performance (over cross validation)
Performance and Complexity as a Function of Minkowski p Value
Interface04 – p.21/40
1000 2000 3000 4000 5000 6000 7000 8000 0.7 0.75 0.8 0.85 (372 ,0.84722)
Number of Genes Average Performance Over the Cross−Validation
Interface04 – p.22/40
Only retain those genes whose expression level is 20
Consider an
✂✁genes by
✂✄patients data matrix Divide each column by its mean Subject each row to a standard normalizing transformation Reduces the dimensionality to roughly 1753 genes
Interface04 – p.23/40
10 20 30 40 50 60 70 80 90 100 0.2 0.3 0.4 Average Complexity (over cross validation) p
Complexity and Performance vs. Minkowski p for Pruned Leukemia Data
10 20 30 40 50 60 70 80 90 100 0.6 0.8 1 Average Performance (over cross validation)
Interface04 – p.24/40
200 400 600 800 1000 1200 1400 1600 1800 0.6 0.7 0.8 0.9 1
Number of Genes Average Performance Over the Cross−Validation
Interface04 – p.25/40
Interface04 – p.26/40
Interface04 – p.27/40
Interface04 – p.28/40
Interface04 – p.30/40
−0.2 0.2 0.4 0.6 0.8 1 1.2 −0.2 0.2 0.4 0.6 0.8 1 0.02 0.04 0.06 0.08 0.1 0.12 0.14 x x2 (x−1)2+y2 (y−1)2 y
Interface04 – p.31/40
Interface04 – p.32/40
Interface04 – p.33/40
Interface04 – p.34/40
Interface04 – p.35/40
Interface04 – p.36/40
1 2 3 4 5 6 7 8 9 10
73 59 53 54 59 43 47 43 48 48
1 2 3 4 5 6 7 8 9 10 5 10 15 20 25 Histogram of Dimension Count Number of Dimensions Selected Count
Interface04 – p.37/40
Interface04 – p.38/40
Interface04 – p.39/40
Presented preliminary results on the benefits of simultaneously optimizing (w,p,s) Presented results illustrating the use of MST as a classification complexity measure Our next step is to perform simultaneous (w,p,s)
such as the MST criteria in conjunction with appropriate
Interface04 – p.40/40