Making predictions involving pairwise data Aditya Menon and Charles - PowerPoint PPT Presentation

Making predictions involving pairwise data Aditya Menon and Charles Elkan University of California, San Diego September 17, 2010 1 / 44

Overview of talk Propose a new problem, dyadic label prediction, and explain its importance ◮ Within-network classification is a special case Show how to learn supervised latent features to solve the dyadic label prediction problem Compare different approaches to the problem from different communities Highlight remaining challenges 2 / 44

Outline Background: dyadic prediction 1 A related problem: label prediction for dyads 2 Latent feature approach to dyadic label prediction 3 Analysis of label prediction approaches 4 Experimental comparison 5 Conclusions 6 References 7 3 / 44

The dyadic prediction problem Supervised learning: Labeled examples ( x i , y i ) → Predict label of unseen example x ′ Dyadic prediction: Labeled dyads (( r i , c i ) , y i ) → Predict label of unseen dyad ( r ′ , c ′ ) Labels describe interactions between pairs of entities ◮ Example : (user, movie) dyads with a label denoting the rating (collaborative filtering) ◮ Example : (user, user) dyads with a label denoting whether the two users are friends (link prediction) 4 / 44

Dyadic prediction as matrix completion Imagine a matrix X ∈ X m × n , with rows indexed by r i and columns by c i The space X = X ′ ∪ { ? } ◮ Entries with value “?” are missing The dyadic prediction problem is to predict the value of the missing entries Henceforth call the r i row objects, the c i column objects 5 / 44

Dyadic prediction and link prediction Consider a graph where only some edges are observed. Link prediction means predicting the presence/absence of edges There is a two-way reduction between the problems ◮ Link prediction is dyadic prediction on an adjacency matrix ◮ Dyadic prediction is link prediction on a bipartite graph with nodes for the rows and columns Can apply link prediction methods for dyadic prediction, and vice versa ◮ Will be necessary when comparing methods later in the talk 6 / 44

Latent feature methods for dyadic prediction Common strategy for dyadic prediction: learn latent features Simplest form: X ≈ UV T ◮ U ∈ R m × k ◮ V ∈ R n × k ◮ k ≪ min( m, n ) is the number of latent features Learn U, V by optimizing (nonconvex) objective O + λ U F + λ V || X − UV T || 2 2 || U || 2 2 || V || 2 F where || · || 2 O is the Frobenius norm over non-missing entries Can be thought of as a form of regularized SVD 7 / 44

Label prediction for dyads Want to predict labels for individual row/column entities: Labeled dyads (( r i , c i ) , y i ) → Predict label of unseen entity r ′ + Labeled entities ( r i , y r i ) Optionally, predict labels for dyads too Attach labels to row objects only, without loss of generality i ∈ { 0 , 1 } L to allow multi-label prediction Let y r 9 / 44

Dyadic label prediction as matrix completion New problem is also a form of matrix completion Input is standard dyadic prediction matrix X ∈ X m × n and matrix Y ∈ Y m × L Each column of Y is one tag As before, let Y = { 0 , 1 } ∪ { ? } where “?” means missing Y can have any pattern of missing entries Goal is to fill in missing entries of Y Optionally, fill in missing entries of X , if any 10 / 44

Important real-world applications Predict if users in a collaborative filtering population will respond to an ad campaign Score suspiciousness of users in a social network, e.g. probability to be a terrorist Predict which strains of bacteria will appear in food processing plants [2] 11 / 44

Dyadic label prediction and supervised learning An extension of transductive supervised learning: We predict labels for individual examples, but: ◮ Explicit features (side information) for examples may be absent ◮ Relationship information between examples is known via the X matrix ◮ Relationship information may have missing data ◮ Optionally, predict relationship information also 12 / 44

Within-network classification Consider G = ( V, E ) , where nodes V ′ ⊆ V have labels Predicting labels for nodes in V \ V ′ is called within network classification An instance of dyadic label prediction: X is the adjacency matrix of G , while Y consists of node labels 13 / 44

Why is the dyadic interpretation useful? We can let edges E be partially observed, combining link prediction with label prediction Can use existing methods for dyadic prediction for within-network classification ◮ Exploit advantages of dyadic prediction methods such as ability to use side information ◮ Learn latent features 14 / 44

Latent feature approach to dyadic label prediction Given features for row objects, predicting labels in Y is standard supervised learning But we don’t have such features? ◮ Can learn them using a latent feature approach ◮ Model X ≈ UV T and think of U as a feature representation for row objects Given U , learn a weight matrix W via ridge regression: F + λ W W || Y − UW T || 2 2 || W || 2 min F 16 / 44

The SocDim approach SocDim method for within-network classification on G [3] ◮ Compute modularity matrix from adjacency matrix X : 1 2 | E | dd T Q ( X ) = X − where d is vector of node degrees ◮ Latent features are eigenvectors of Q ( X ) ◮ Use latent features in standard supervised learning to predict Y Special case of our approach: G undirected, no missing edges, Y not multilabel, U unsupervised 17 / 44

Supervised latent feature approach We learn U to jointly model the data and label matrices, yielding supervised latent features: F + 1 U,V,W || X − UV T || 2 F + || Y − UW T || 2 2( λ U || U || 2 F + λ V || V || 2 F + λ W || W || 2 min F ) . Equivalent to F + 1 U,V,W || [ XY ] − U [ V ; W ] T || 2 2( λ U || U || 2 F + λ V || V || 2 F + λ W || W || 2 min F ) Intuition: treat the tags as new movies 18 / 44

Why not use the reduction? If goal is predicting labels, reconstructing X is less important So, weight the “label movies” with a tradeoff parameter µ : F + 1 U,V,W || X − UV T || 2 F + µ || Y − UW T || 2 2( λ U || U || 2 F + λ V || V || 2 F + λ W || W || 2 min F ) Assuming no missing entries in X , essentially supervised matrix factorization (SMF) method [4] ◮ SMF was designed for directed graphs, unlike SocDim 19 / 44

From SMF to dyadic prediction Move from SMF approach to one based on dyadic prediction Obtain important advantages ◮ Deal with missing data in X ◮ Allow arbitrary missingness in Y , including partially observed rows Specifically, use LFL approach [1] ◮ Exploit side-information about the row objects ◮ Predict calibrated probabilities for tags ◮ Handle nominal and ordinal tags 20 / 44

Latent feature log-linear (LFL) model Assume discrete entries in input matrix X , say { 1 , . . . , R } Per row and per column, have a latent feature vector for each outcome: U r i and V r j Posit log-linear probability model exp ( U r i ) T V r j p ( X ij = r | U, V ) = r ′ exp ( U r ′ i ) T V r ′ � j 21 / 44

LFL inference and training Model is exp ( U r i ) T V r j p ( X ij = r | U, V ) = r ′ exp ( U r ′ i ) T V r ′ � j For nominal outcomes, predict argmax p ( r | U, V ) For ordinal outcomes, predict � r rp ( r | U, V ) Optimize MSE for ordinal outcomes Optimize log-likelihood for nominal outcomes; get well-calibrated predictions 22 / 44

Incorporating side-information Known features can be highly predictive for matrix entries They are essential to solve cold start problems, where there are no existing observations for a row/column Let a i and b j denote covariates for rows and columns respectively Extended model is p ( X ij = r | U, V ) ∝ exp( ( U r i ) T V r j + ( w r ) T � � a i b j ) . Weight vector w r says how side-information predicts outcome r 23 / 44

Extending LFL to graphs Consider the following generalization of the LFL model: p ( X ij = r | U, V, Λ) ∝ exp ( U r i ) T Λ ij V r j . Constrain latent features depending on nature of the graph: ◮ If rows and columns are distinct sets of entities, let Λ = I ◮ For asymmetric graphs, set V = U and let Λ be unconstrained ◮ For symmetric graphs, set V = U and Λ = I 24 / 44

Using the LFL model for label prediction Idea: Fill in missing entries in X and also missing tags in Y Combined regularized optimization is �� O + 1 U,V,W || X − E ( X ) || 2 λ U || U r || 2 F + λ V || V r || 2 min + F 2 r e Y il ( W T l U i ) l U i + λ W � 2 || W || 2 F 1 + e W T ( i,l ) ∈O If entries in X are ordinal then � E ( X ) ij = r · p ( X ij = r | U, V ) r 25 / 44

Making predictions involving pairwise data Aditya Menon and Charles - PowerPoint PPT Presentation

Making predictions involving pairwise data Aditya Menon and Charles Elkan University of California, San Diego September 17, 2010 1 / 44 Overview of talk Propose a new problem, dyadic label prediction, and explain its importance

1 Predictions for 2020 Predictions for 2020 We will live in flying houses. 1966

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment Review: Tips and

Graph Resistance and Learning from Pairwise Comparisons pairwise comparisons of items. In

Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice

PAIRWISE DECOMPOSITION OF IMAGE SEQUENCES FOR ACTIVE MULTI-VIEW RECOGNITION(EXPERIMENT)

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

Pairwise Comparisons with Flexible Time-Dynamics Lucas Maystre , Victor Kristof, Matthias

Small-scale galaxy dynamics: the pairwise velocity dispersion Jon Loveday University of Sussex

Foundations of Computing II Lecture 9: Pairwise-Independent Hashing Stefano Tessaro

Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions Arpit

CSCE 471/871 Lecture 2: Alignments Pairwise Alignments Stephen Scott Alignments Scoring

Database searching Using pairwise alignments to search databases for similar sequences Query

BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise

Pairwise comparison, and other methods MATH 105: Contemporary Mathematics University of

Summaries of Streaming Data Martin J. Strauss University of Michigan Sparse Approximation

A regenerative modification of the multilevel splitting Alexandra Borodina and Evsey Morozov

An Implementation of Algebraic Data Types in Java using the Visitor Pattern Anton Setzer 1.

Patterns in Trees Thomas Klausner TU Wien Joint work with Michael Drmota INRIA Rocquencourt,

Mining Closed Discriminative Dyadic Sequential Patterns David Lo 1 , Hong Cheng 2 , and Lucia 1 1

L evys Construction L evy showed that the definition of Brownian motion { W t } t 0

Carleson measures for the Dirichlet space on the polydisc P. Mozolyako with N. Arcozzi, K.-M.

The homology of Richard Thompsons group F Ken Brown Cornell University Abstract Let F be

Making predictions involving pairwise data Aditya Menon and Charles - PowerPoint PPT Presentation

Making predictions involving pairwise data Aditya Menon and Charles Elkan University of California, San Diego September 17, 2010 1 / 44 Overview of talk Propose a new problem, dyadic label prediction, and explain its importance

1 Predictions for 2020 Predictions for 2020 We will live in flying houses. 1966

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment Review: Tips and

Graph Resistance and Learning from Pairwise Comparisons pairwise comparisons of items. In

Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice

PAIRWISE DECOMPOSITION OF IMAGE SEQUENCES FOR ACTIVE MULTI-VIEW RECOGNITION(EXPERIMENT)

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

Pairwise Comparisons with Flexible Time-Dynamics Lucas Maystre , Victor Kristof, Matthias

Small-scale galaxy dynamics: the pairwise velocity dispersion Jon Loveday University of Sussex

Foundations of Computing II Lecture 9: Pairwise-Independent Hashing Stefano Tessaro

Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions Arpit

CSCE 471/871 Lecture 2: Alignments Pairwise Alignments Stephen Scott Alignments Scoring

Database searching Using pairwise alignments to search databases for similar sequences Query

BLAST Anders Gorm Pedersen &amp; Rasmus Wernersson Database searching Using pairwise

Pairwise comparison, and other methods MATH 105: Contemporary Mathematics University of

Summaries of Streaming Data Martin J. Strauss University of Michigan Sparse Approximation

A regenerative modification of the multilevel splitting Alexandra Borodina and Evsey Morozov

An Implementation of Algebraic Data Types in Java using the Visitor Pattern Anton Setzer 1.

Patterns in Trees Thomas Klausner TU Wien Joint work with Michael Drmota INRIA Rocquencourt,

Mining Closed Discriminative Dyadic Sequential Patterns David Lo 1 , Hong Cheng 2 , and Lucia 1 1

L evys Construction L evy showed that the definition of Brownian motion { W t } t 0

Carleson measures for the Dirichlet space on the polydisc P. Mozolyako with N. Arcozzi, K.-M.

The homology of Richard Thompsons group F Ken Brown Cornell University Abstract Let F be

BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise