A constructive approach to incremental learning Mario Rosario - PowerPoint PPT Presentation

High Performance Computing and Networking Institute National Research Council, Italy The Data Reference Model: A constructive approach to incremental learning Mario Rosario Guarracino October 12, 2006 10/13/2006 10:26 PM

Acknowledgements � prof. Franco Giannessi – U. of Pisa, � prof. Panos Pardalos – CAO UFL, � Onur Seref – CAO UFL, � Claudio Cifarelli – U. of Rome La Sapienza. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 2

Agenda � Generalized eigenvalue classification � Purpose of incremental learning � Subset selection algorithm � Initial points selection � Accuracy results � Conclusion and future work Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 3

Introduction � Supervised learning refers to the capability of a system to learn from examples ( training set ). � The trained system is able to provide an answer ( output ) for each new question ( input ). � S upervised means the desired output for the training set is provided by an external teacher. � Binary classification is among the most successful methods for supervised learning. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 4

Applications � Many applications in biology and medicine: � Tissues that are prone to cancer can be detected with high accuracy. � New DNA sequences or proteins can be tracked down to their origins. � Identification of new genes or isoforms of gene expressions in large datasets. � Analysis and reduction of data spatiality and principal characteristics for drug design. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 5

Peculiarity of the problem � Data produced in biomedical application will exponentially increase in the next years. � In genomic/proteomic application, data are often updated, which poses problems to the training step. � Publicly available datasets contain gene expression data for tens of thousands characteristics. � Current classification methods can over-fit the problem, providing models that do not generalize well. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 6

Linear discriminant planes � Consider a binary classification task with points in two linearly separable sets. – There exists a plane that classifies all points in the two sets B B A A � There are infinitely many planes that correctly classify the training data. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 7

Best plane � To construct the plane “furthers” from both classes, we examine the convex hull of each set. � � � � � � � � �� B B c � � A � � � � � � � � �� A d � � � � � � � � � � � � � The best plane bisects closest points in the convex hulls. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 8

SVM classification � A different approach, yielding the same solution, is to maximize the margin between support planes – Support planes leave all points of a class on one side � � � � � � �� B B A A �� Support planes are pushed apart until they “bump” into a small set of data points ( support vectors ). Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 9

SVM classification � Support Vector Machines are the state of the art for the existing classification methods. � Their robustness is due to the strong fundamentals of statistical learning theory. � The training relies on optimization of a quadratic convex cost function, for which many methods are available. – Available software includes SVM-Lite and LIBSVM. � These techniques can be extended to the nonlinear discrimination, embedding the data in a nonlinear space using kernel functions . Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 10

A different religion � Mangasarian (2004) showed binary classification problem can be formulated as a generalized eigenvalue problem (GEPSVM). � Find x’w 1 = γ 1 the closer to A and the farther from B : � �� B B A A O. L. Mangasarian and E. W. Wild Multisurface Proximal Support Vector Classification via Generalized Eigenvalues. Data Mining Institute Tech. Rep. 04-03, June 2004. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 11

GEP technique � �� Let: � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Previous equation becomes: � � �� Raleigh quotient of Generalized Eigenvalue Problem Gx= λ Hx . Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 12

GEP technique Conversely, to find the plane closer to B and further from A we need to solve: � �� which has the same eigenvectors of the previous problem and reciprocal eigenvalues. We only need to evaluate the eigenvectors related to min and max eigenvalues of Gx= λ Hx . Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 13

GEP technique Let [ w 1 γ 1 ] and [ w m γ m ] be eigenvectors associated to min and max eigenvalues of Gx= λ Hx : � a � A � closer to x'w 1 - γ 1 = 0 than to x'w m - γ m = 0 , � b � B � closer to x'w m - γ m = 0 than to x'w 1 - γ 1 = 0 . Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 14

Regularization � A and B can be rank-deficient. � G and H are always rank-deficient, � the product of matrices of dimension ( n + 1 � n ) is of rank at least n � 0/ � eigenvalue. � Do we need to regularize the problem to obtain a well posed problem? Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 15

An useful theorem Consider GEP Gx= λ Hx and the transformed G 1 x= λ H 1 x defined by: � � � � � � � � � �� for each choice of scalars τ 1 , τ 2 , δ 1 and δ 2 , such that the 2 � 2 matrix � � � � � � � � � � � � is nonsingular. Then G*x= λ H*x and Gx= λ Hx have the same eigenvectors. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 16

Linear case � In the linear case, the theorem can be applied. For τ 1 = τ 2 =1 and δ 1 = δ 2 = δ , the transformed problem is: � �� As long as δ � 1, matrix Ω is non-degenerate. � In practice, in each class of the training set, there has to be a number of linearly independent points equal to the number of features. – prob ( Ker(G) � Ker(H) ≠ 0) = 0 Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 17

Classification accuracy: linear kernel Dataset train dim ReGEC GEPSVM SVM NDC 300 7 87.60 86.70 89.00 ClevelandHeart 297 13 86.05 81.80 83.60 PimaIndians 768 8 74.91 73.60 75.70 GalaxyBright 2462 14 98.24 98.60 98.30 Accuracy results have been obtained using ten fold cross validation Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 18

Nonlinear case � A standard technique to obtain greater separability between sets is to embed the points into a nonlinear space, via kernel functions, like the gaussian kernel : � �� Each element of kernel matrix is: � �� where � � � � Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 19

Nonlinear case � Using a gaussian kernel the problem becomes: � � � �� to produce the proximal surfaces: � � �� The associated GEP involves matrices of the order of the training set and rank at most the number of features. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 20

ReGEC � Matrices are deeply rank deficient and the problem is ill posed. � We propose to generate the two proximal surfaces: � � �� solving the problem � � � �� ~ ~ where K A and K B are main diagonals of K(A,C) and K(B,C) . Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 21

A constructive approach to incremental learning Mario Rosario - PowerPoint PPT Presentation

High Performance Computing and Networking Institute National Research Council, Italy The Data Reference Model: A constructive approach to incremental learning Mario Rosario Guarracino October 12, 2006 10/13/2006 10:26 PM Acknowledgements

Aims of Session Understand the concept of constructive alignment Identify the benefits

Constructive Mathematics in Constructive Set Theory Nicola Gambino University of Palermo MALOA

Incremental Garbage Collection Part II Roland Schatz Incremental Garbage Collection p.1/22

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Jesse Read 1 ,

Incremental and Non-incremental Learning of Control Knowledge for Planning Daniel Borrajo Mill

Incremental Classification: First Step into Lifelong Learning PAN Xinyu MMLab, Department of IE

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

Constructive recognition Eamonn OBrien University of Auckland August 2011 logo Eamonn

Constructive Category Theory and Applications in Algebraic Geometry Sebastian Gutsche

Constructive Homology Classes and Constructive Triangulations Dedicated to Mirian Andr` es

Local Constructive Set Theory Oberwolfach, April, 2008 Peter Aczel petera@cs.man.ac.uk

Refining Constructive Hybrid Games Brandon Bohrer and Andr e Platzer Logical Systems Lab

Constructive set theory an overview Benno van den Berg Utrecht University Heyting dag,

Metrization Theorem Space-Time Analogs . . . How the (Non- . . . for Space-Times: Constructive

Constructive Identities for Physics Andrei Rodin 17 juillet 2014 Andrei Rodin Constructive

Interval Computations as Why Intervals? Applied Constructive Interval Computations . . . Wiener

A Novel Layer Sharing-based Incremental Learning via Bayesian Optimization Bomi Kim, Taehyeon

Convolutional Prototype Ensemble Robust Stream Classification & Novel Class Detection Zhuoyi

PREserving Linked DAta: An introduc7on Carlo Meghini ISTI

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

L101: Incremental structured prediction Structured prediction reminder Given an input x (e.g. a

Risk, Minimum Risk Training, Reinforcement Learning Graham Neubig Site

Learning Agent Learning Agents An Agent that observes its performance and adapts its

A constructive approach to incremental learning Mario Rosario - PowerPoint PPT Presentation

High Performance Computing and Networking Institute National Research Council, Italy The Data Reference Model: A constructive approach to incremental learning Mario Rosario Guarracino October 12, 2006 10/13/2006 10:26 PM Acknowledgements

Aims of Session Understand the concept of constructive alignment Identify the benefits

Constructive Mathematics in Constructive Set Theory Nicola Gambino University of Palermo MALOA

Incremental Garbage Collection Part II Roland Schatz Incremental Garbage Collection p.1/22

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Jesse Read 1 ,

Incremental and Non-incremental Learning of Control Knowledge for Planning Daniel Borrajo Mill

Incremental Classification: First Step into Lifelong Learning PAN Xinyu MMLab, Department of IE

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

Constructive recognition Eamonn OBrien University of Auckland August 2011 logo Eamonn

Constructive Category Theory and Applications in Algebraic Geometry Sebastian Gutsche

Constructive Homology Classes and Constructive Triangulations Dedicated to Mirian Andr` es

Local Constructive Set Theory Oberwolfach, April, 2008 Peter Aczel petera@cs.man.ac.uk

Refining Constructive Hybrid Games Brandon Bohrer and Andr e Platzer Logical Systems Lab

Constructive set theory an overview Benno van den Berg Utrecht University Heyting dag,

Metrization Theorem Space-Time Analogs . . . How the (Non- . . . for Space-Times: Constructive

Constructive Identities for Physics Andrei Rodin 17 juillet 2014 Andrei Rodin Constructive

Interval Computations as Why Intervals? Applied Constructive Interval Computations . . . Wiener

A Novel Layer Sharing-based Incremental Learning via Bayesian Optimization Bomi Kim, Taehyeon

Convolutional Prototype Ensemble Robust Stream Classification &amp; Novel Class Detection Zhuoyi

PREserving Linked DAta: An introduc7on Carlo Meghini ISTI

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

L101: Incremental structured prediction Structured prediction reminder Given an input x (e.g. a

Risk, Minimum Risk Training, Reinforcement Learning Graham Neubig Site

Learning Agent Learning Agents An Agent that observes its performance and adapts its

Convolutional Prototype Ensemble Robust Stream Classification & Novel Class Detection Zhuoyi