order parameters and model selection in machine learning
play

Order parameters and model selection in Machine Learning: model - PowerPoint PPT Presentation

Order parameters and model selection in Machine Learning: model characterization and feature selection Romaric Gaudel Advisor: Mich` ele Sebag; Co-advisor: Antoine Cornu ejols PhD, December 14, 2010 Introduction Relational Kernels


  1. Order parameters and model selection in Machine Learning: model characterization and feature selection Romaric Gaudel Advisor: Mich` ele Sebag; Co-advisor: Antoine Cornu´ ejols PhD, December 14, 2010

  2. Introduction Relational Kernels Feature Selection Conclusion + Supervised Machine Learning Background Unknown distribution I P ( x , y ) on X × Y Objective Find h ∗ minimizing generalization error h ∗ ( x ) > 0 Err ( h ) = I P ( x , y ) [ ℓ ( h ( x ) , y )] E I Where ℓ ( h ( x ) , y ) is the cost of error on example x h ∗ ( x ) = 0 h ∗ ( x ) < 0 Given Training examples L = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } Where ( x i , y i ) ∼ I P ( x , y ) , i ∈ 1 , . . . , n R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 2 / 52

  3. Introduction Relational Kernels Feature Selection Conclusion + Supervised Machine Learning 2 (Vapnik-Chervonenkis; Bottou & Bousquet, 08) Approximation error (a.k.a. bias ) Learned hypothesis belong to H h ∗ h ∗ H = argmin Err ( h ) h ∈H Approximation Estimation error (a.k.a. variance ) h ∗ H H Err estimated by empirical error P ℓ ( h ( x i ) , y i ) Err n ( h ) = 1 n h n = argmin Err n ( h ) h ∈H Optimization error Learned hypothesis returned by an optimization algorithm A ˆ h n = A ( L ) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 3 / 52

  4. Introduction Relational Kernels Feature Selection Conclusion + Supervised Machine Learning 2 (Vapnik-Chervonenkis; Bottou & Bousquet, 08) Approximation error (a.k.a. bias ) Learned hypothesis belong to H h ∗ h ∗ H = argmin Err ( h ) h ∈H Approximation Estimation error (a.k.a. variance ) h ∗ H H Estimation Err estimated by empirical error P ℓ ( h ( x i ) , y i ) h n Err n ( h ) = 1 n h n = argmin Err n ( h ) h ∈H Optimization error Learned hypothesis returned by an optimization algorithm A ˆ h n = A ( L ) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 3 / 52

  5. Introduction Relational Kernels Feature Selection Conclusion + Supervised Machine Learning 2 (Vapnik-Chervonenkis; Bottou & Bousquet, 08) Approximation error (a.k.a. bias ) Learned hypothesis belong to H h ∗ h ∗ H = argmin Err ( h ) h ∈H Approximation Estimation error (a.k.a. variance ) h ∗ H H Estimation Err estimated by empirical error P ℓ ( h ( x i ) , y i ) h n Optimization Err n ( h ) = 1 ˆ h n n h n = argmin Err n ( h ) h ∈H Optimization error Learned hypothesis returned by an optimization algorithm A ˆ h n = A ( L ) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 3 / 52

  6. Introduction Relational Kernels Feature Selection Conclusion + Focus of the thesis Combinatorial optimization problems hidden in Machine Learning + Relational representation = ⇒ Combinatorial optimization problem Example: Mutagenesis database - + Feature Selection = ⇒ Combinatorial optimization problem Example: Microarray data − R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 4 / 52

  7. Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Outline Relational Kernels 1 Feature Selection 2 R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 5 / 52

  8. Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Outline Relational Kernels 1 Feature Selection 2 R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 6 / 52

  9. Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Relational Learning / Inductive Logic Programming Position Relational database X : keys in the database Background knowledge H : set of logical formulas Expressive language Actual covering test: Constraint Satisfaction Problem (CSP) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 7 / 52

  10. Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion CSP consequences within Inductive Logic Programming Consequences of the Phase Transition Complexity Worst case: NP-hard Average case: “easy” except in Phase Transistion (Cheeseman et al. 91) Phase Transition in Inductive Logic Programming Existence (Giordana & Saitta, 00) Impact: fails to learn in Phase Transition region (Botta et al., 03) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 8 / 52

  11. Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Multiple Instance Problems The missing link between Relational and Propositional Learning Multiple Instance Problems (MIP) (Dietterich et al., 89) An example: set of instances An instance: vector of features Target-concept: there exists an instance satisfying a predicate P pos ( x ) ⇐ ⇒ ∃ I ∈ x , P ( I ) Example of MIP Positive key ring A locked door A positive key-ring contains a key which can unlock the door Negative key ring R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 9 / 52

  12. Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Support Vector Machine A Convex optimization problem ˆ n n 0 < ξ i < 1 h n ( x ) > 0 α i − 1 X X argmin α i α j y i y j � x i , x j � 2 R n α ∈ I i = 1 i = 1 (P n i = 1 α i y i = 0 s.t. 0 � α i � C , i = 1 , . . . , n ˆ ξ i = 0 h n ( x ) < 0 Kernel trick ˆ h n ( x ) = 1 ˆ � x i , x j � � K ( x i , x j ) ξ i > 1 h n ( x ) = 0 ˆ h n ( x ) = − 1 Kernel-based propositionalization (differs from RKHS framework) ( L = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } � Φ : x → ( K ( x 1 , x ) , . . . , K ( x n , x )) K R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 10 / 52

  13. Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion SVM and MIP Averaging-kernel for MIP (G¨ artner et al., 02) Given a kernel k on instances P P x j ∈ x ′ k ( x i , x j ) x i ∈ x K ( x , x ′ ) = norm ( x ) norm ( x ′ ) Question MIP Target-concept: existential properties Averaging-Kernel: average properties Do averaging-kernels sidestep limitations of Relational Learning? R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 11 / 52

  14. Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Methodology Inspired from Phase Transition studies Usual Phase Transition framework Generate data after control parameters Observe results Draw phase diagram: results w.r.t. order parameters This study Generalized Multiple Instance Problem Experimental results of averaging-kernel-based propositionalization R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 12 / 52

  15. Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Outline Relational Kernels 1 Theoretical failure region Lower bound on the generalization error Empirical failure region Feature Selection 2 R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 13 / 52

  16. Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Generalized Multiple Instance Problems Generalized MIP (Weidmann et al., 03) An example: set of instances An instance: vector of features Target-concept: conjunction of predicates P 1 , . . . , P m m ^ pos ( x ) ⇐ ⇒ ∃ I 1 , . . . , I m ∈ x , P i ( I i ) i = 1 O CH3 O CH3 Example of Generalized MIP C N C N CH3 N C O C O A molecule: set of sub-graphs = ⇒ CH3 N C C C C Bioactivity: implies several sub-graphs N N N N CH3 CH CH3 CH R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 14 / 52

  17. Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Control Parameters Category Param. Definition | Σ | Size of alphabet Σ , a ∈ Σ Instances d number of numerical features, I = ( a , z ) z ∈ [ 0 , 1 ] d + ε M + Number of instances per posi- tive example M − Number of instances per nega- tive example m + Number of instances in a predi- Examples cate, for positive example m − Number of instances in a predi- cate, for negative example P m Number of predicates “missed” - ε by each negative example P Number of predicate Concept ε Radius of each predicate ( ε - ball) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 15 / 52

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend