SVM vs Regularized Least Squares Classification
Peng Zhang and Jing Peng Electrical Engineering and Computer Science Department Tulane University, New Orleans, LA 70118, USA {zhangp,jp}@eecs.tulane.edu Abstract
Support vector machines (SVMs) and regularized least squares (RLS) are two recent promising techniques for clas-
- sification. SVMs implement the structure risk minimization
principle and use the kernel trick to extend it to the non- linear case. On the other hand, RLS minimizes a regu- larized functional directly in a reproducing kernel Hilbert space defined by a kernel. While both have a sound math- ematical foundation, RLS is strikingly simple. On the other hand, SVMs in general have a sparse representation of so-
- lutions. In addition, the performance of SVMs has been
well documented but little can be said of RLS. This pa- per applies these two techniques to a collection of data sets and presents results demonstrating virtual identical perfor- mance by the two methods.
- 1. Introduction
Support vector machines (SVMs) have been successfully used as a classification tool in a number of areas, rang- ing from object recognition to classification of cancer mor- phologies [4, 7, 8, 9, 10]. SVMs realize the Structure Risk Minimization principle [10] by maximizing the margin be- tween the separating plane and the data, and use the ker- nel trick to extend them to the nonlinear case. The regular- ized least squares (RLS) method [6], on the other hand, con- structs classifiers by minimizing a regularized functional directly in a reproducing kernel Hilbert space (RKHS) in- duced by a kernel function [5, 6]. While both methods have a sound mathematical founda- tion, the performance of SVMs has been relatively well doc-
- umented. Yet little can be said of RLS. On the other hand,
RLS is claimed to be fully comparable in performance to SVMs [6] but empirical evidence has been lacking thus far. We present in this paper the results of applying these two techniques to a collection of data sets. Our results demon- strate that the two methods are indeed similar in perfor- mance.
- 2. SVMs and RLS
Our learning problem is formulated as follows: Given a set of training data: (xi, yi), where xi represents the ith fea- ture vector in ℜn and yi ∈ ℜ the label of xi. In the binary case yi ∈ {−1, 1}. The goal of learning is to find a map- ping f : X → Y that is predictive (i.e., generalizes well). The data (x, y) is drawn randomly according to an unknown probability measure ρ on the product space X × Y . There is a true input-output function fρ reflecting the environment that produces the data. Then given any mapping function f, the measure of the error of f is:
- X(f − fρ)2dρx, where
ρx is the measure on X induced by the marginal measure ρ. The objective of learning is to find f close to fρ as much as possible. Given the training data z = {xi, yi}m
i=1, then
RSV M = 1 m
l
- i=1
|yi − fz(xi)| (1) represents the empirical error that fz made on the data z, where the classifier fz is induced by SVMs from z. For RLS, on the other hand, the empirical error is RRLS = 1 m
l
- i=1
(yi − fz(xi))2. (2) Note that the main issue concerning learning is generaliza-
- tion. A good (predictive) classifier minimizes the error it
makes on new (unseen) data not on the training data. Also, learning starts from a hypothesis space from which f is cho- sen.
2.1. SVMs
In the SVM framework, unlike typical classifica- tion methods that simply minimize RSV M, SVMs mini- mize the following upper bound of the expected general- ization error R ≤ RSV M + C(h), (3)
Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04) 1051-4651/04 $ 20.00 IEEE
Authorized licensed use limited to: MIT Libraries. Downloaded on February 16,2010 at 20:28:22 EST from IEEE Xplore. Restrictions apply.