2012/02/22 12:00
1 Measuring Similarity with Kernels
1.1 Introduction
Over the last ten years, estimation and learning methods utilizing positive definite kernels have become rather popular, particularly in machine learning. Since these methods have a stronger mathematical slant than earlier machine learning methods (e.g., neural networks), there is also significant interest in the statistical and math- ematical community for these methods. The present chapter aims to summarize the state of the art on a conceptual level. In doing so, we build on various sources (including Vapnik (1998); Burges (1998); Cristianini and Shawe-Taylor (2000); Her- brich (2002) and in particular Sch¨
- lkopf and Smola (2002)), but we also add a fair
amount of recent material which helps in unifying the exposition. The main idea of all the described methods can be summarized in one paragraph. Traditionally, theory and algorithms of machine learning and statistics have been very well developed for the linear case. Real-world data analysis problems, on the
- ther hand, often require nonlinear methods to detect the kind of dependences that
allow successful prediction of properties of interest. By using a positive definite kernel, one can sometimes have the best of both worlds. The kernel corresponds to a dot product in a (usually high-dimensional) feature space. In this space, our estimation methods are linear, but as long as we can formulate everything in terms
- f kernel evaluations, we never explicitly have to work in the high-dimensional