Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs
- J. Feldman*, S. Lattanzi*, S. Leonardi°, V. Mirrokni*.
*Google Research °Sapienza U. Rome
Reduce and Aggregate: Similarity Ranking in Multi-Categorical - - PowerPoint PPT Presentation
Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation Systems: Bipartite
*Google Research °Sapienza U. Rome
items.
Millions of Advertisers Billions of Queries Hundreds of Labels
Nike Store New York Soccer Shoes Soccer Ball
2$ 3$ 4$ 1$ 5$ 2$
Retailers Apparel Sport Equipment
v u The stationary distribution assigns a similarity score to each node in the graph w.r.t. node v. For a node v (the seed) and a probability alpha
Millions of Advertisers Billions of Queries Hundreds of Labels
Nike Store New York Soccer Shoes Soccer Ball
2$ 3$ 4$ 1$ 5$ 2$
Retailers Apparel Sport Equipment
contexts:
and suggest movies.
related authors and suggest papers to read.
A
A
A
Goal: Find the nodes most “similar” to A.
similarity measures:
Jaccard Coefficient, Adamic-Adar.
induce real-time similarity rankings in multi- categorical bipartite graphs, that we apply to several similarity measures.
algorithms.
v u The stationary distribution assigns a similarity score to each node in the graph w.r.t. node v. For a node v (the seed) and a probability alpha
very large-scale MapReduce systems.
subset of labels.
Reduce: Given the bipartite and a category construct a graph with only A nodes that preserves the ranking on the entire graph. Aggregate: Given a node v in A and the reduced graphs of the subset of categories interested determine the ranking for v.
a b c a b c c a b a c
1)
b
2) 3)
Precomputed Rankings
Precomputed Rankings Precomputed Rankings
Precomputed Rankings Precomputed Rankings Precomputed Rankings
Precomputed Rankings Precomputed Rankings Ranking of Red + Yellow
A
(Simon and Ado, ’61; Meyer ’89, etc.).
while preserving correctly the PPR distribution on the entire graph.
X
Y X Y
Koury et al. Aggregation-Disaggregation Algorithm Step 1: Partition the Markov chain into DISJOINT subsets
Koury et al. Aggregation-Disaggregation Algorithm
Step 2: Approximate the stationary distribution on each subset independently.
Koury et al. Aggregation-Disaggregation Algorithm Step 3: Consider the transition between subsets.
Koury et al. Aggregation-Disaggregation Algorithm
Step 4: Aggregate the distributions. Repeat until convergence.
X Y Precompute the stationary distributions individually
X Y Precompute the stationary distributions individually
X Y X Y
with Advertiser-Side nodes.
converges to the correct distribution.
proprietary datasets:
billions nodes, > 5 billions edges.
Inventions graphs.
AdWords.
Recall Precision
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision Recall Precision vs Recall Inter Jaccard Adamic-Adar Katz PPR
Recall Precision
X Y
X Y
0.2 0.4 0.6 0.8 1 10 20 30 40 50 All Kendall-T au Position (k) Kendall-T au Correlation DBLP Patent Query-Ads (cost)
Iterations 1-Cosine Similarity
1e-06 1e-05 0.0001 0.001 2 4 6 8 10 12 14 16 18 20 1-Cosine Iterations Approximation Error vs # Iterations DBLP (1 - Cosine) Patent (1 - Cosine)