a constructive approach to incremental learning
play

A constructive approach to incremental learning Mario Rosario - PowerPoint PPT Presentation

High Performance Computing and Networking Institute National Research Council, Italy The Data Reference Model: A constructive approach to incremental learning Mario Rosario Guarracino October 12, 2006 10/13/2006 10:26 PM Acknowledgements


  1. High Performance Computing and Networking Institute National Research Council, Italy The Data Reference Model: A constructive approach to incremental learning Mario Rosario Guarracino October 12, 2006 10/13/2006 10:26 PM

  2. Acknowledgements � prof. Franco Giannessi – U. of Pisa, � prof. Panos Pardalos – CAO UFL, � Onur Seref – CAO UFL, � Claudio Cifarelli – U. of Rome La Sapienza. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 2

  3. Agenda � Generalized eigenvalue classification � Purpose of incremental learning � Subset selection algorithm � Initial points selection � Accuracy results � Conclusion and future work Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 3

  4. Introduction � Supervised learning refers to the capability of a system to learn from examples ( training set ). � The trained system is able to provide an answer ( output ) for each new question ( input ). � S upervised means the desired output for the training set is provided by an external teacher. � Binary classification is among the most successful methods for supervised learning. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 4

  5. Applications � Many applications in biology and medicine: � Tissues that are prone to cancer can be detected with high accuracy. � New DNA sequences or proteins can be tracked down to their origins. � Identification of new genes or isoforms of gene expressions in large datasets. � Analysis and reduction of data spatiality and principal characteristics for drug design. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 5

  6. Peculiarity of the problem � Data produced in biomedical application will exponentially increase in the next years. � In genomic/proteomic application, data are often updated, which poses problems to the training step. � Publicly available datasets contain gene expression data for tens of thousands characteristics. � Current classification methods can over-fit the problem, providing models that do not generalize well. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 6

  7. Linear discriminant planes � Consider a binary classification task with points in two linearly separable sets. – There exists a plane that classifies all points in the two sets B B A A � There are infinitely many planes that correctly classify the training data. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 7

  8. Best plane � To construct the plane “furthers” from both classes, we examine the convex hull of each set. � � � � � � � � ��� � � � � � � � � � � � � � � � � � � � � � � � B B c � � A � � � � � � � � ���� A d � � � � � � � � � � � � � The best plane bisects closest points in the convex hulls. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 8

  9. SVM classification � A different approach, yielding the same solution, is to maximize the margin between support planes – Support planes leave all points of a class on one side � � � � � � ��� � ���� B B A A �� � � � � �� � � � � � � Support planes are pushed apart until they “bump” into a small set of data points ( support vectors ). Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 9

  10. SVM classification � Support Vector Machines are the state of the art for the existing classification methods. � Their robustness is due to the strong fundamentals of statistical learning theory. � The training relies on optimization of a quadratic convex cost function, for which many methods are available. – Available software includes SVM-Lite and LIBSVM. � These techniques can be extended to the nonlinear discrimination, embedding the data in a nonlinear space using kernel functions . Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 10

  11. A different religion � Mangasarian (2004) showed binary classification problem can be formulated as a generalized eigenvalue problem (GEPSVM). � Find x’w 1 = γ 1 the closer to A and the farther from B : � �� � �� � � ��� � �� � �� � � ��� � �� B B A A O. L. Mangasarian and E. W. Wild Multisurface Proximal Support Vector Classification via Generalized Eigenvalues. Data Mining Institute Tech. Rep. 04-03, June 2004. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 11

  12. GEP technique � �� � �� � � ��� � �� � �� � � ��� � �� Let: � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Previous equation becomes: � � �� ��� � � �� � � � � Raleigh quotient of Generalized Eigenvalue Problem Gx= λ Hx . Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 12

  13. GEP technique Conversely, to find the plane closer to B and further from A we need to solve: � �� � �� � � ��� � �� � �� � � ��� � �� which has the same eigenvectors of the previous problem and reciprocal eigenvalues. We only need to evaluate the eigenvectors related to min and max eigenvalues of Gx= λ Hx . Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 13

  14. GEP technique Let [ w 1 γ 1 ] and [ w m γ m ] be eigenvectors associated to min and max eigenvalues of Gx= λ Hx : � a � A � closer to x'w 1 - γ 1 = 0 than to x'w m - γ m = 0 , � b � B � closer to x'w m - γ m = 0 than to x'w 1 - γ 1 = 0 . Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 14

  15. Regularization � A and B can be rank-deficient. � G and H are always rank-deficient, � the product of matrices of dimension ( n + 1 � n ) is of rank at least n � 0/ � eigenvalue. � Do we need to regularize the problem to obtain a well posed problem? Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 15

  16. An useful theorem Consider GEP Gx= λ Hx and the transformed G 1 x= λ H 1 x defined by: � � � � � � � � � �� � � � � � � � � � �� for each choice of scalars τ 1 , τ 2 , δ 1 and δ 2 , such that the 2 � 2 matrix � � � � � � � � � � � � is nonsingular. Then G*x= λ H*x and Gx= λ Hx have the same eigenvectors. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 16

  17. Linear case � In the linear case, the theorem can be applied. For τ 1 = τ 2 =1 and δ 1 = δ 2 = δ , the transformed problem is: � �� � �� � � � � � �� � �� � � ��� � �� � �� � � � � � �� � �� � � � ��� � �� � As long as δ � 1, matrix Ω is non-degenerate. � In practice, in each class of the training set, there has to be a number of linearly independent points equal to the number of features. – prob ( Ker(G) � Ker(H) ≠ 0) = 0 Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 17

  18. Classification accuracy: linear kernel Dataset train dim ReGEC GEPSVM SVM NDC 300 7 87.60 86.70 89.00 ClevelandHeart 297 13 86.05 81.80 83.60 PimaIndians 768 8 74.91 73.60 75.70 GalaxyBright 2462 14 98.24 98.60 98.30 Accuracy results have been obtained using ten fold cross validation Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 18

  19. Nonlinear case � A standard technique to obtain greater separability between sets is to embed the points into a nonlinear space, via kernel functions, like the gaussian kernel : � �� � �� � � � � � � � � � � � � � � � Each element of kernel matrix is: � �� � �� � � � � �� � � ��� � � � � � � where � � � � Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 19

  20. Nonlinear case � Using a gaussian kernel the problem becomes: � � � �� � � � � �� � � ��� � � � �� � � � � �� � � ��� � �� � to produce the proximal surfaces: � � �� � � � � � � � � � � � � �� � � � � � � � � � � The associated GEP involves matrices of the order of the training set and rank at most the number of features. Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 20

  21. ReGEC � Matrices are deeply rank deficient and the problem is ill posed. � We propose to generate the two proximal surfaces: � � �� � � � � � � � � � � � � �� � � � � � � � � � solving the problem � � � �� � � � � �� � � � � � � � � � � �� � � ��� � � � �� � � � � �� � � � � � � � � � � �� � � ��� � �� ~ ~ where K A and K B are main diagonals of K(A,C) and K(B,C) . Workshop on Data Mining and Mathematical Programming October 12, 2006 -- Pg. 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend