Partial order embedding with multiple kernels
Brian McFee and Gert Lanckriet
University of California, San Diego
Partial order embedding with multiple kernels Brian McFee and Gert - - PowerPoint PPT Presentation
Partial order embedding with multiple kernels Brian McFee and Gert Lanckriet University of California, San Diego Goal Embed a set of objects into a Euclidean space such that: 1. Distances conform to human perception 2. Multiple feature
Brian McFee and Gert Lanckriet
University of California, San Diego
Embed a set of objects into a Euclidean space such that:
Motivation: leverage existing technologies for Euclidean data
Is Oasis similar to The Beatles, or not?
are they?
[Schultz and Joachims, 2004, Agarwal et al., 2007]
(i, j) or (k, ℓ)? (Oasis, Beatles, Oasis, Metallica)
g(i) − g(j) < g(k) − g(ℓ)
More similar
Less similar
global structure.
g(i) − g(j)2 + eijkℓ ≤ g(k) − g(ℓ)2
k l i j e
ijkl
il kl ik ij jk
all margins are preserved, and for all i=j: 1 ≤ g(i) − g(j) ≤
dimensionality for POE
low-dimensional solutions
informed
max
A0
Tr(A) (Variance) d(i, j) ≤ O(n · diam(C)) (Diameter) d(i, j) + eijkℓ ≤ d(k, ℓ) (Margins)
Aij = 0 (Centering) d(i, j) . = Aii + Ajj − 2Aij (Distance2)
⇒ g(i) =
i
g(x) = NKx (Kx = x column of K )
Feature space 1 Feature space 2 Feature space m
... Input space Output space
max
W 0,ξ≥0 m
Tr
− γ Tr
− β
ξijkℓ
∀i, j ∈ X d(i, j) ≤ O(n · diam(C)) ∀(i, j, k, ℓ) ∈ C d(i, j) + eijkℓ ≤ d(k, ℓ) + ξijkℓ d(i,j) . =
m
i
− K (p)
j
T W (p) K (p)
i
− K (p)
j
Data [Agarwal et al., 2007]
Constraint processing
13000 → 9000 constraints
Final constraint graph
POE (Top 2 PCA)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
Data [Geusebroek et al., 2005]
varying out-of-plane rotation
label taxonomy
Kernels
histograms
Food Clothing Toys All
Diagonally-constrained N: SDP⇒LP
Sum-kernel space Learned embedding Learned weights
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Training set
Dot Red Green Blue Gray
% Constraints satisfied Kernel Native Optimized Dot product 0.83 0.85 Red 0.63 0.63 Green 0.65 0.67 Blue 0.77 0.83 Gray 0.68 0.69 Unweighted sum 0.76 0.77 Multi — 0.95
Goal
Data
[Ellis et al., 2002]
Features: TFIDF/cosine kernels
(e.g., rock, piano, female vocals)
Prediction accuracy
0.776 0.705 0.514 0.705 0.640 0.790
Tags Biography Tags+Bio
Random
Native Optimized
Note: test comparisons are not internally consistent
Questions?
Agarwal, S., Wills, J., Cayton, L., Lanckriet, G., Kriegman, D., and Belongie, S. (2007). Generalized non-metric multi-dimensional scaling. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics. Ellis, D., Whitman, B., Berenzweig, A., and Lawrence, S. (2002). The quest for ground truth in musical artist similarity. In Proeedings of the International Symposium on Music Information Retrieval (ISMIR), pages 170–177. Geusebroek, J. M., Burghouts, G. J., and Smeulders, A.
The Amsterdam library of object images.
Roth, V., Laub, J., Buhmann, J. M., and Müller, K.-R. (2003). Going metric: denoising pairwise data.
In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15, pages 809–816, Cambridge, MA. MIT Press. Schultz, M. and Joachims, T. (2004). Learning a distance metric from relative comparisons. In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16, Cambridge,
Weinberger, K. Q., Sha, F ., and Saul, L. K. (2004). Learning a kernel matrix for nonlinear dimensionality reduction. In Proceedings of the Twenty-first International Conference