Reproducing Kernel Hilbert Spaces for Classification
Katarina Domijan and Simon P. Wilson
Department of Statistics, University of Dublin, Trinity College, Ireland
November 1, 2005 1 Working Group on Statistical Learning
Reproducing Kernel Hilbert Spaces for Classification Katarina - - PowerPoint PPT Presentation
Reproducing Kernel Hilbert Spaces for Classification Katarina Domijan and Simon P. Wilson Department of Statistics, University of Dublin, Trinity College, Ireland November 1, 2005 1 Working Group on Statistical Learning General problem
Department of Statistics, University of Dublin, Trinity College, Ireland
November 1, 2005 1 Working Group on Statistical Learning
model in the new space of input features.
November 1, 2005 2 Working Group on Statistical Learning
m=1 hm(X)βm.
X3
1, X1X2, sin(X1), etc.
ˆ β = (HT H)−1HT y.
f(Y |X, β) =
n
1 √ 2πσ2 e−
1 2σ2 (yi−f(xi))2. November 1, 2005 3 Working Group on Statistical Learning
h1(X) = 1, h3(X) = X2, h5(X) = (X − ψ1)3
+,
h2(X) = X, h4(X) = X3, h6(X) = (X − ψ2)3
+.
November 1, 2005 4 Working Group on Statistical Learning
1 2 3 4 5 6 7 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 x f(x) ψ1 ψ1 ψ2 ψ2 November 1, 2005 5 Working Group on Statistical Learning
according to the classification.
log P(Y = 1|X = x) P(Y = 0|X = x) = f(x). Hence: P(Y = 1|X = x) = ef(x) 1 + ef(x) .
November 1, 2005 6 Working Group on Statistical Learning
continuously differentiable.
i=1 (yi − f(xi))2 = 0.
RSS(f, λ) =
n
(yi − f(xi))2 + λ
November 1, 2005 7 Working Group on Statistical Learning
f(x) =
n
Dj(x)βj, where Dj(x) are an n-dim set of basis functions representing a family
linear beyond the boundary knots.
November 1, 2005 8 Working Group on Statistical Learning
ˆ β = (DT D + λΦD)−1DT y, where D and ΦD are matrices with elements: {D}i,j = Dj(xi) and {ΦD}j,k =
′′
j (t)D
′′
k(t)dt,
respectively.
November 1, 2005 9 Working Group on Statistical Learning
min
f
n
(yi − f(xi))2 + λJ(f)
J(f) =
R2
∂2f(x) ∂x2
1
2 + 2 ∂2f(x) ∂x1∂x2 2 + ∂2f(x) ∂x2
2
2 dx1dx2.
November 1, 2005 10 Working Group on Statistical Learning
f(x) = β0 + βT x +
n
αjhj(x). where hj are radial basis functions: hj(x) = ||x − xj||2log(||x − xj||).
November 1, 2005 11 Working Group on Statistical Learning
min
f∈H
n
L(yi, f(xi)) + λJ(f)
(1)
(1995).
November 1, 2005 12 Working Group on Statistical Learning
HK = {f(x)|f(x) =
∞
ciφi(x)}.
K(x1, x2) =
∞
γiφi(x1)φi(x2), (2) where γi ≥ 0, ∞
i=1 γ2 i < ∞.
November 1, 2005 13 Working Group on Statistical Learning
J(f) = ||f||2
HK = ∞
c2
i
γi < ∞
finite-dimensional solution given by: f(x) =
n
βiK(x, xi).
November 1, 2005 14 Working Group on Statistical Learning
min
β
where K is a n × n matrix with elements {K}i,j = K(xi, xj).
November 1, 2005 15 Working Group on Statistical Learning
using cDNA microarrays.
level of the jth gene, for the ith sample.
p(y|z) =
n
p(yi|zi), and zi = f(xi) + ǫi, i = 1, ..., n, ǫi ∼ i.i.d. N(0, σ2).
November 1, 2005 16 Working Group on Statistical Learning
K(xi, xj) = exp(−||xi − xj||2/θ)
zi = f(xi)+ǫi = β0+
n
βjK(xi, xj|θ)+ǫi, i = 1, ..., n, ǫi ∼ i.i.d. N(0, σ2).
November 1, 2005 17 Working Group on Statistical Learning
σ2.
zi|β, θ, σ2 ∼ N(zi|K′
iβ, σ2)
β, σ2 ∼ N(β|0, σ2M−1)IG(σ2|γ1, γ2) θ ∼
p
U(a1q, a2q). where K′
i = (1, K(xi, x1|θ), ..., K(xi, xn|θ)) and M is a diagonal
matrix with elements ξ = (ξ1, ..., ξn+1).
i=1 ξ−1 i
promotes sparseness Figueiredo (2002).
November 1, 2005 18 Working Group on Statistical Learning
p(yi|zi) = [pi(zi)]yi [1 − pi(zi)](1−yi) , pi(zi) = ezi (1 + ezi).
n
yizi −
n
log(1 + ezi).
November 1, 2005 19 Working Group on Statistical Learning
p(β, θ, z, λ, σ2|y).
November 1, 2005 20 Working Group on Statistical Learning
[1] T. Evgeniou, M. Pontil and T. Poggio (2000). Regularization Networks and Support Vector Machines. Advances in Computational Mathematics 13, 1,1–50. [2] M. Figueiredo (2002), Adaptive sparseness using jeffreys prior. Advances in Neural Information Processing Systems, 14, (eds T. G. Dietterich, S. Becker and Z. Ghahramani), 697–704, Cambridge: MIT Press. [3] F. Girosi, M. Jones and T. Poggio (1995) Regularization Theory and Neural Networks Architectures. Neural Computation, 7, 2,219–269. [4] T. Hastie, R. Tibshirani, and J. Friedman (2001). The Elements of Statistical
[5] B. K. Mallick, D. Ghosh and M. Ghosh (2005). Bayesian Classification of Tumors Using Gene Expression Data. J. Royal Statistical Soc. B, 67,219–234. [6] G. Wahba (1990). Spline models for observational data. SIAM [Society for Industrial and Applied Mathematics]
November 1, 2005 21 Working Group on Statistical Learning