SLIDE 6 110 Transductive Support Vector Machines
OP1 (Transductive SVM (hard-margin)) hard-margin TSVM minimize: V (y∗
u1, ..., y∗ uu, w, b) = 1
2w · w (6.15) subject to: ∀l
i=1 : yli[
w · xli + b] ≥ 1 (6.16) ∀u
j=1 : y∗ uj[
w · x∗
uj + b] ≥ 1
(6.17) ∀u
j=1 : y∗ uj ∈ {−1, +1}
(6.18) Solving this problem means finding the labeling y∗
u1, ..., y∗ uk of the test data for
which the hyperplane that separates both training and test data has maximum
- margin. Figure 6.2 illustrates this. The figure also shows the solution that an
inductive SVM (Cortes and Vapnik, 1995; Vapnik, 1998) computes. An inductive inductive SVM SVM also finds a large-margin hyperplane, but it considers only the training vectors while ignoring all test vectors. In particular, a hard-margin inductive SVM computes the separating hyperplane that has zero training error and the largest margin with respect to the training examples. To be able to handle nonseparable data, one can introduce slack variables ξi (Joachims, 1999) similar to inductive SVMs (Cortes and Vapnik, 1995). OP2 (Transductive SVM (soft-margin)) soft-margin TSVM min: W(y∗
u1, ..., y∗ uu, w, b, ξ1, ..., ξl, ξ∗ 1, ..., ξ∗ u)= 1
2w·w + C
l
ξi + C∗
u
ξ∗
j (6.19)
s.t.: ∀l
i=1 : yli[w · xli + b] ≥ 1 − ξi
(6.20) ∀u
j=1 : y∗ uj[w · x∗ uj + b] ≥ 1 − ξ∗ j
(6.21) ∀u
j=1 : y∗ uj ∈ {−1, +1}
(6.22) ∀l
i=1 : ξi ≥ 0
(6.23) ∀u
j=1 : ξ∗ j ≥ 0
(6.24) C and C∗ are parameters set by the user. They allow trading off margin size against misclassifying training examples or excluding test examples. C∗ can be used reduce sensitivity toward outliers (i.e., single examples falsely reducing the margin on the test data). Both inductive and transductive SVMs can be extended to include kernels (Boser kernels et al., 1992; Vapnik, 1998). Making use of duality techniques from optimization theory, kernels allow learning nonlinear rules as well as classification rules over nonvectorial data (see e.g. (Sch¨
- lkopf and Smola, 2002)) without substantially
changing the optimization problems. Note that in both the hard-margin formulation (OP1) and the soft-margin formu- lation (OP2) of the TSVM, the labels of the test examples enter as integer variables. Due to the constraints in Eqs. 6.18 and 6.22 respectively, both OP1 and OP2 are no longer convex quadratic programs like the analogous optimization problems for inductive SVMs. Before discussing methods for (approximately) solving the TSVM