Order parameters and model selection in Machine Learning: model characterization and feature selection
Romaric Gaudel
Advisor: Mich` ele Sebag; Co-advisor: Antoine Cornu´ ejols
Order parameters and model selection in Machine Learning: model - - PowerPoint PPT Presentation
Order parameters and model selection in Machine Learning: model characterization and feature selection Romaric Gaudel Advisor: Mich` ele Sebag; Co-advisor: Antoine Cornu ejols PhD, December 14, 2010 Introduction Relational Kernels
Advisor: Mich` ele Sebag; Co-advisor: Antoine Cornu´ ejols
Introduction Relational Kernels Feature Selection Conclusion +
P(x,y) [ℓ (h(x), y)]
h∗(x) = 0 h∗(x) > 0 h∗(x) < 0
Model Characterization and Feature Selection PhD, December 14, 2010 2 / 52
Introduction Relational Kernels Feature Selection Conclusion +
H = argmin h∈H
n
h∈H
Model Characterization and Feature Selection PhD, December 14, 2010 3 / 52
H h∗ h∗
H
Approximation
Introduction Relational Kernels Feature Selection Conclusion +
H = argmin h∈H
n
h∈H
Model Characterization and Feature Selection PhD, December 14, 2010 3 / 52
H h∗ h∗
H
Approximation Estimation
hn
Introduction Relational Kernels Feature Selection Conclusion +
H = argmin h∈H
n
h∈H
Model Characterization and Feature Selection PhD, December 14, 2010 3 / 52
H h∗ h∗
H
Approximation Estimation
hn
Optimization
ˆ hn
Introduction Relational Kernels Feature Selection Conclusion +
Model Characterization and Feature Selection PhD, December 14, 2010 4 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
1
2
Model Characterization and Feature Selection PhD, December 14, 2010 5 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
1
2
Model Characterization and Feature Selection PhD, December 14, 2010 6 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 7 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
(Giordana & Saitta, 00)
(Botta et al., 03)
Model Characterization and Feature Selection PhD, December 14, 2010 8 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 9 / 52 Negative key ring Positive key ring
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
α∈I Rn n
i=1
n
i=1
i=1 αiyi = 0
ˆ hn(x) = 1 ˆ hn(x) = −1 ˆ hn(x) = 0 ˆ hn(x) > 0 ˆ hn(x) < 0 0 < ξi < 1 ξi > 1 ξi = 0
Model Characterization and Feature Selection PhD, December 14, 2010 10 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
xi ∈x
xj ∈x′ k(xi, xj)
Model Characterization and Feature Selection PhD, December 14, 2010 11 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 12 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
1
2
Model Characterization and Feature Selection PhD, December 14, 2010 13 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
m
i=1
Model Characterization and Feature Selection PhD, December 14, 2010 14 / 52
C C N N CH3 N C N CH3 O C O CH3 CH
N N CH3 CH C C C O C N CH3 O CH3 N
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
ε
Model Characterization and Feature Selection PhD, December 14, 2010 15 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
M+ = m− M−
0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 K(x−,x) K(x+,x) exemples positifs exemples négatifs
Model Characterization and Feature Selection PhD, December 14, 2010 16 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
M+ = m− M−
0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 K(x−,x) K(x+,x) exemples positifs exemples négatifs
Model Characterization and Feature Selection PhD, December 14, 2010 16 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
i=1 αiyi = 0
i=1 αiyiK(xi, x′) + b
Model Characterization and Feature Selection PhD, December 14, 2010 17 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
i=1 αiyi = 0
i=1 αiyiK(xi, x′) + b
Model Characterization and Feature Selection PhD, December 14, 2010 17 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
1 |T |
Model Characterization and Feature Selection PhD, December 14, 2010 18 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
0.2 0.4 0.6 0.8 1
r+ r-
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1
0.1 0.2 0.3 0.4 0.5
r+ r-
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1
m+ M+ ≈ m− M−
Model Characterization and Feature Selection PhD, December 14, 2010 19 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 20 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
1
2
Model Characterization and Feature Selection PhD, December 14, 2010 21 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
F⊆F
Model Characterization and Feature Selection PhD, December 14, 2010 22 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 23 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 24 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 25 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
F⊆F
Model Characterization and Feature Selection PhD, December 14, 2010 26 / 52
f1 f3 f , f
1 3
f , f
2 3
f , f
1 2
f3 f2 f , f
1 2
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
F⊆F
Model Characterization and Feature Selection PhD, December 14, 2010 26 / 52
f1 f3 f , f
1 3
f , f
2 3
f , f
1 2
f3 f2 f , f
1 2
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
1
2
Model Characterization and Feature Selection PhD, December 14, 2010 27 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 28 / 52
f1 f3 f2 f , f
1 3
f , f
2 3
f , f
1 2
f3 f , f
1 2
f3 f1 f2 f1 f3 f2 f2 f2 f1 f3 f3 f1
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
π
f∈F
f∈F V ∗(F ∪ {f})
Model Characterization and Feature Selection PhD, December 14, 2010 29 / 52
f1 f3 f , f
1 3
f , f
2 3
f , f
1 2
f3 f3 f1 f3 f2 f2 f2 f1 f3 f3 f1 f1 f2 f2 f , f
1 2
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
π
f∈F
f∈F V ∗(F ∪ {f})
Model Characterization and Feature Selection PhD, December 14, 2010 29 / 52
f1 f3 f , f
1 3
f , f
2 3
f , f
1 2
f3 f3 f1 f3 f2 f2 f2 f1 f3 f3 f1 f1 f2 f2 f , f
1 2
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
1
2
Model Characterization and Feature Selection PhD, December 14, 2010 30 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree Phase Bandit−Based
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree Phase Bandit−Based
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree Phase Bandit−Based
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree Phase Bandit−Based
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree Phase Bandit−Based
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree Phase Bandit−Based
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree Phase Bandit−Based
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree Phase Bandit−Based
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree Phase Bandit−Based New Node
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree Phase Bandit−Based New Node Phase Random
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree Phase Bandit−Based New Node Phase Random
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree Phase Bandit−Based New Node Phase Random
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Explored Tree Search Tree Phase Bandit−Based New Node Phase Random
Model Characterization and Feature Selection PhD, December 14, 2010 31 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
a∈A
ta
4, ˆ
a +
ta
a: Empirical variance of reward for action a
Model Characterization and Feature Selection PhD, December 14, 2010 32 / 52
Search Tree Phase Bandit−Based
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 33 / 52
Search Tree Phase Bandit−Based
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
1
2
Model Characterization and Feature Selection PhD, December 14, 2010 34 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
2 in experiments) Number of iterations Number of considered actions
Model Characterization and Feature Selection PhD, December 14, 2010 35 / 52 Search Tree Phase Bandit−Based
?
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
8
3
5
2
9
11
4
10
7
1
6
Model Characterization and Feature Selection PhD, December 14, 2010 36 / 52 Search Tree Phase Bandit−Based
?
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
8
3
5
2
9
4
11µ
10
7
1
6
Model Characterization and Feature Selection PhD, December 14, 2010 36 / 52 Search Tree Phase Bandit−Based
?
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
8
3
5
2
9
4
11µ
10
7
1
6
Model Characterization and Feature Selection PhD, December 14, 2010 36 / 52 Search Tree Phase Bandit−Based
?
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
8
3
5
2
9
4
11µ
7
10
1
6
Model Characterization and Feature Selection PhD, December 14, 2010 36 / 52 Search Tree Phase Bandit−Based
?
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
8
3
5
2
9
4
11µ
7
10
1
6
Model Characterization and Feature Selection PhD, December 14, 2010 36 / 52 Search Tree Phase Bandit−Based
?
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
F,f
F,f = Number of trials for feature f after visiting state F
f∈F
Model Characterization and Feature Selection PhD, December 14, 2010 37 / 52 Search Tree Phase Bandit−Based
?
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 38 / 52 Explored Tree Search Tree Random Phase
?
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
* Mann Whitney Wilcoxon test: V(F) =
|{((x,y),(x′,y′))∈V2, NF,k (x)<NF,k (x′), y<y′}| |{((x,y),(x′,y′))∈V2, y<y′}|
Model Characterization and Feature Selection PhD, December 14, 2010 39 / 52
Explored Tree Search Tree Phase Bandit−Based New Node Phase Random
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
* Mann Whitney Wilcoxon test: V(F) =
|{((x,y),(x′,y′))∈V2, NF,k (x)<NF,k (x′), y<y′}| |{((x,y),(x′,y′))∈V2, y<y′}|
Model Characterization and Feature Selection PhD, December 14, 2010 39 / 52
Explored Tree Search Tree Phase Bandit−Based New Node Phase Random
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
* Mann Whitney Wilcoxon test: V(F) =
|{((x,y),(x′,y′))∈V2, NF,k (x)<NF,k (x′), y<y′}| |{((x,y),(x′,y′))∈V2, y<y′}|
Model Characterization and Feature Selection PhD, December 14, 2010 39 / 52
Explored Tree Search Tree Phase Bandit−Based New Node Phase Random
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
* Mann Whitney Wilcoxon test: V(F) =
|{((x,y),(x′,y′))∈V2, NF,k (x)<NF,k (x′), y<y′}| |{((x,y),(x′,y′))∈V2, y<y′}|
Model Characterization and Feature Selection PhD, December 14, 2010 39 / 52
Explored Tree Search Tree Phase Bandit−Based New Node Phase Random
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
+ + −
AUC
* Mann Whitney Wilcoxon test: V(F) =
|{((x,y),(x′,y′))∈V2, NF,k (x)<NF,k (x′), y<y′}| |{((x,y),(x′,y′))∈V2, y<y′}|
Model Characterization and Feature Selection PhD, December 14, 2010 39 / 52
Explored Tree Search Tree Phase Bandit−Based New Node Phase Random
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
New Node Search Tree Bandit−Based Phase Random Phase
Model Characterization and Feature Selection PhD, December 14, 2010 40 / 52
Explored Tree Search Tree Phase Bandit−Based New Node Phase Random
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 41 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
1
2
Model Characterization and Feature Selection PhD, December 14, 2010 42 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 43 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 44 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 45 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
T-test “CFS vs. FUSER ” with 100 features: p-value=0.036
Model Characterization and Feature Selection PhD, December 14, 2010 46 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 47 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
DATABASE ALGORITHM CHALLENGE SUBMITTED IRRELEVANT ERROR FEATURES FEATURES
[1]
. Li, E. P . V. Wilder-Smith Feature selection via sensitivity analysis of SVM probabilistic outputs. Mach.
[2]
Feature extraction, foundations and applications, Springer 2006
Model Characterization and Feature Selection PhD, December 14, 2010 48 / 52
Introduction Relational Kernels Feature Selection Conclusion + Position FS-RL MCTS FUSE Experiments Discussion
Model Characterization and Feature Selection PhD, December 14, 2010 49 / 52
Introduction Relational Kernels Feature Selection Conclusion +
Model Characterization and Feature Selection PhD, December 14, 2010 50 / 52
Introduction Relational Kernels Feature Selection Conclusion +
Model Characterization and Feature Selection PhD, December 14, 2010 51 / 52
Introduction Relational Kernels Feature Selection Conclusion +
f1 f3 f , f
1 3
f , f
2 3
f , f
1 2
f3 f3 f1 f3 f2 f2 f2 f1 f3 f3 f1 f1 f2 f2 f , f
1 2
Model Characterization and Feature Selection PhD, December 14, 2010 52 / 52
Bibliography In a slide TP on MIP FUSE
Model Characterization and Feature Selection PhD, December 14, 2010 53 / 52
Bibliography In a slide TP on MIP FUSE
Model Characterization and Feature Selection PhD, December 14, 2010 54 / 52
Bibliography In a slide TP on MIP FUSE
Model Characterization and Feature Selection PhD, December 14, 2010 55 / 52
Bibliography In a slide TP on MIP FUSE TP on MIP FUSE
Model Characterization and Feature Selection PhD, December 14, 2010 56 / 52
Bibliography In a slide TP on MIP FUSE TP on MIP FUSE
F⊆F
Model Characterization and Feature Selection PhD, December 14, 2010 57 / 52
Bibliography In a slide TP on MIP FUSE Position Contribution
Model Characterization and Feature Selection PhD, December 14, 2010 58 / 52
Bibliography In a slide TP on MIP FUSE Position Contribution
Model Characterization and Feature Selection PhD, December 14, 2010 58 / 52
Bibliography In a slide TP on MIP FUSE Position Contribution
Model Characterization and Feature Selection PhD, December 14, 2010 59 / 52
0.1 0.2 0.3 0.4 0.5
r+ r-
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1
Bibliography In a slide TP on MIP FUSE Stopping feature Computational effort Hyperparameters
s
f1 f3 f2 f , f
1 3
f , f
2 3
f , f
1 2
f3 f , f
1 2
f3 f1 f2 f1 f3 f2 f2 f2 f1 f3 f3 f1 fS f1 fS f3 fS fS f , f
1 2
f3 fS f , f
1 2
fS f2 fS f , f
1 3
fS f , f
2 3
Model Characterization and Feature Selection PhD, December 14, 2010 60 / 52
Bibliography In a slide TP on MIP FUSE Stopping feature Computational effort Hyperparameters
0.1 0.2 0.3 0.4 0.5 1 10 102 103 104 105 Test error Iteration D-FUSE D-FUSER C-FUSE C-FUSER RANDR
5 10 15 20 1 10 102 103 104 105 Number of features chosen by FUSE Iteration D-FUSE C-FUSE
Model Characterization and Feature Selection PhD, December 14, 2010 61 / 52
Bibliography In a slide TP on MIP FUSE Stopping feature Computational effort Hyperparameters
HEUR. HEURISTICS
Model Characterization and Feature Selection PhD, December 14, 2010 62 / 52