Partial order embedding with multiple kernels Brian McFee and Gert - PowerPoint PPT Presentation

Partial order embedding with multiple kernels Brian McFee and Gert Lanckriet University of California, San Diego

Goal Embed a set of objects into a Euclidean space such that: 1. Distances conform to human perception 2. Multiple feature modalities are integrated coherently 3. We can extend to unseen data Motivation: leverage existing technologies for Euclidean data

Example

Example • Features may not match human perception

Example • Features may not match human perception • Use human input to guide the embedding

Human input • Binary similarity can be ambiguous in multi-media data • Example: Is Oasis similar to The Beatles, or not? • Quantifying similarity may also be difficult... how similar are they?

Relative comparisons [Schultz and Joachims, 2004, Agarwal et al., 2007] • Instead, we ask which of two pairs is more similar: ( i , j ) or ( k , ℓ ) ? (Oasis, Beatles, Oasis, Metallica) • Learn a map g from the data set X to a Euclidean space • For each ( i , j , k , ℓ ) , k l � g ( i ) − g ( j ) � < � g ( k ) − g ( ℓ ) � i j

Partial order More similar ij jk • Relative comparisons should exhibit global structure . • Collect comparisons into a directed graph C ik kl • Cycles must be broken by any embedding • Comparisons should describe a partial order il over X × X . Less similar

Constraint graphs • Force margins between distances: k l � g ( i ) − g ( j ) � 2 + e ijk ℓ ≤ � g ( k ) − g ( ℓ ) � 2 e i j ijkl • Represent e ijk ℓ as edge weights • Graph representation lets us ij jk • detect inconsistencies ( cycles ) • prune redundancies by transitive reduction • simplify: focus on meaningful constraints ik kl il

Constraint simplification

Margin-preserving embeddings • Claim : There exists g : X → R n − 1 such that all margins are preserved, and for all i � = j : � 1 ≤ � g ( i ) − g ( j ) � ≤ ( 4 n + 1 )( diam ( C ) + 1 ) • Reduction via constant-shift embedding [Roth et al., 2003] • Constraint diameter bounds embedding diameter • May produce artificially high-dimensional embeddings

Dimensionality reduction • We show that it’s NP-hard to minimize dimensionality for POE • Instead, optimize a convex objective that prefers low-dimensional solutions • Assume objects are dissimilar, unless otherwise informed • Adapt MVU [Weinberger et al., 2004]: • Maximize all distances • Diameter bound ensures that a solution exists • Respect all partial order constraints

Partial Order Embedding (SDP) • Input: n objects X , margin-weighted constraints C • Output: g : X → R n max Tr ( A ) (Variance) A � 0 d ( i , j ) ≤ O ( n · diam ( C )) (Diameter) d ( i , j ) + e ijk ℓ ≤ d ( k , ℓ ) (Margins) � A ij = 0 (Centering) i , j d ( i , j ) . ( Distance 2 ) = A ii + A jj − 2 A ij • Decompose A = V Λ V T Λ 1 / 2 V T � � ⇒ g ( i ) = i

Out-of-sample extension: kernels • How can we extend embeddings to unseen data? • Learn a linear projection from a feature space • Parameterization: g ( x ) = NK x ( K x = x column of K ) • Learn N by solving an SDP over W = N T N � 0 • PO constraints may be impossible to satisfy: • Soften ordering constraints

Multi-kernel embedding • Concatenate linear projections from m feature spaces: Feature space 1 Feature space 2 ... ... Feature space m Input space Output space • N ( · ) s are jointly optimized by SDP to form the space

MK-POE m � K ( p ) W ( p ) K ( p ) � � W ( p ) K ( p ) � � max Tr − γ Tr W � 0 ,ξ ≥ 0 p = 1 � − β ξ ijk ℓ C s . t . ∀ i , j ∈ X d ( i , j ) ≤ O ( n · diam ( C )) ∀ ( i , j , k , ℓ ) ∈ C d ( i , j ) + e ijk ℓ ≤ d ( k , ℓ ) + ξ ijk ℓ m � T d(i,j) . � W ( p ) � � K ( p ) − K ( p ) K ( p ) − K ( p ) � = i j i j p = 1

Experiment 1: Human perception Data [Agarwal et al., 2007] • 55 images of 3D rabbits with varying surface reflectance • 13049 human perception measurements: ( i , j , i , k ) Constraint processing • Random sampling to achieve a maximal DAG • Transitive reduction to eliminate redundancies 13000 → 9000 constraints Final constraint graph • Unit margins • Diameter = 55

Experiment 1 results POE (Top 2 PCA) 13 4 51 7 23 3 33 32 48 35 40 19 28 14 9 42 20 54 39 Glare 6 34 55 31 24 45 44 16 49 50 47 10 25 1 36 12 15 21 27 2 52 11 46 26 8 22 38 5 30 43 41 37 53 18 17 29 Luminance

Experiment 2: Multi-kernel Data [Geusebroek et al., 2005] Clothing • 10 classes from ALOI • 10 images from each class, varying out-of-plane rotation Toys All • Constraints generated by a label taxonomy Kernels Food • Grayscale dot product • RBF of R,G,B, and grayscale histograms Diagonally-constrained N : SDP ⇒ LP

Experiment 2 results Sum-kernel space Learned embedding Learned weights Dot 10 20 30 40 50 60 70 80 90 100 Red 10 20 30 40 50 60 70 80 90 100 Green 10 20 30 40 50 60 70 80 90 100 Blue 10 20 30 40 50 60 70 80 90 100 Gray 10 20 30 40 50 60 70 80 90 100 Training set

Experiment 2 kernel comparison % Constraints satisfied Kernel Native Optimized Dot product 0.83 0.85 Red 0.63 0.63 Green 0.65 0.67 Blue 0.77 0.83 Gray 0.68 0.69 Unweighted sum 0.76 0.77 Multi — 0.95

Experiment 3: Out-of-sample Goal • Predict comparsions ( i , j , i , k ) with i out of sample Data • 412 popular artists ( aset400 ) [Ellis et al., 2002] • 10-fold cross-validation • ≈ 6300 human-derived training constraints • Mean diameter ≈ 30 (over CV folds) Features: TFIDF/cosine kernels • Tags : 7737 words (e.g., rock, piano, female vocals ) • Biographies : 16753 words

Experiment 3 results Prediction accuracy Native Optimized 0.790 0.776 0.705 0.705 0.640 0.514 Random Tags Biography Tags+Bio Note: test comparisons are not internally consistent

Conclusion • We developed the partial order embedding framework • Simplifies relative comparison embeddings • Enables more careful constraint processing • Graph manipulations can increase embedding robustness • Derived a novel multiple kernel learning technique • Widely applicable to metric learning problems

Thanks! Questions?

Agarwal, S., Wills, J., Cayton, L., Lanckriet, G., Kriegman, D., and Belongie, S. (2007). Generalized non-metric multi-dimensional scaling. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics . Ellis, D., Whitman, B., Berenzweig, A., and Lawrence, S. (2002). The quest for ground truth in musical artist similarity. In Proeedings of the International Symposium on Music Information Retrieval (ISMIR) , pages 170–177. Geusebroek, J. M., Burghouts, G. J., and Smeulders, A. W. M. (2005). The Amsterdam library of object images. Int. J. Comput. Vis. , 61(1):103–112. Roth, V., Laub, J., Buhmann, J. M., and Müller, K.-R. (2003). Going metric: denoising pairwise data.

In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15 , pages 809–816, Cambridge, MA. MIT Press. Schultz, M. and Joachims, T. (2004). Learning a distance metric from relative comparisons. In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16 , Cambridge, MA. MIT Press. Weinberger, K. Q., Sha, F ., and Saul, L. K. (2004). Learning a kernel matrix for nonlinear dimensionality reduction. In Proceedings of the Twenty-first International Conference on Machine Learning , pages 839–846.

Partial order embedding with multiple kernels Brian McFee and Gert - PowerPoint PPT Presentation

Partial order embedding with multiple kernels Brian McFee and Gert Lanckriet University of California, San Diego Goal Embed a set of objects into a Euclidean space such that: 1. Distances conform to human perception 2. Multiple feature

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Scalable Machine Learning 6. Kernels Alex Smola Yahoo! Research and ANU

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Complete partial orders An ( -chain- ) complete partial order , cpo : D = D, ,

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

JUST THE MATHS SLIDES NUMBER 14.1 PARTIAL DIFFERENTIATION 1 (Partial derivatives of the

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Partial Functions and Categories of Partial Maps Science Atlantic at Acadia University Darien

Recovering dialect geography from an unaligned comparable corpus Yves Scherrer LATL, Department

Statistics and learning Multivariate statistics 2 and clustering Emmanuel Rachelson and Matthieu

- AP & LLE Xiangliang Zhang King Abdullah University of Science and Technology

MACHINE LEARNING Spectral Clustering 1 ADVANCED MACHINE LEARNING Outline of Todays Lecture

Gomory Reloaded Matteo Fischetti, DEI, University of Padova (joint work with Domenico Salvagnin)

But If Not We died before we left. Fijan Missionary 6 th Century B.C. - Babylon is the world

Telecom Italia 9M 2009 Results FRANCO BERNABE TELECOM ITALIA GROUP 9M 2009 Results Safe

Da ta Sc ie nc e in Ga ming Ple a se sta nd by. We bina r will be g in a t 1:00 p.m. E DT

Partial order embedding with multiple kernels Brian McFee and Gert - PowerPoint PPT Presentation

Partial order embedding with multiple kernels Brian McFee and Gert Lanckriet University of California, San Diego Goal Embed a set of objects into a Euclidean space such that: 1. Distances conform to human perception 2. Multiple feature

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Scalable Machine Learning 6. Kernels Alex Smola Yahoo! Research and ANU

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Complete partial orders An ( -chain- ) complete partial order , cpo : D = D, ,

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

JUST THE MATHS SLIDES NUMBER 14.1 PARTIAL DIFFERENTIATION 1 (Partial derivatives of the

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Partial Functions and Categories of Partial Maps Science Atlantic at Acadia University Darien

Recovering dialect geography from an unaligned comparable corpus Yves Scherrer LATL, Department

Statistics and learning Multivariate statistics 2 and clustering Emmanuel Rachelson and Matthieu

- AP &amp; LLE Xiangliang Zhang King Abdullah University of Science and Technology

MACHINE LEARNING Spectral Clustering 1 ADVANCED MACHINE LEARNING Outline of Todays Lecture

Gomory Reloaded Matteo Fischetti, DEI, University of Padova (joint work with Domenico Salvagnin)

But If Not We died before we left. Fijan Missionary 6 th Century B.C. - Babylon is the world

Telecom Italia 9M 2009 Results FRANCO BERNABE TELECOM ITALIA GROUP 9M 2009 Results Safe

Da ta Sc ie nc e in Ga ming Ple a se sta nd by. We bina r will be g in a t 1:00 p.m. E DT

- AP & LLE Xiangliang Zhang King Abdullah University of Science and Technology