ChoiceRank
Identifying Preferences from Node Traffic in Networks
Lucas Maystre, Matthias Grossglauser School of Computer and Communication Sciences, EPFL
ICML — August 8th, 2017
ChoiceRank Identifying Preferences from Node Tra ff ic in Networks - - PowerPoint PPT Presentation
ChoiceRank Identifying Preferences from Node Tra ff ic in Networks Lucas Maystre, Matthias Grossglauser School of Computer and Communication Sciences, EPFL ICML August 8 th , 2017 Motivating Example 2 Problem Statement Explain how users
Identifying Preferences from Node Traffic in Networks
Lucas Maystre, Matthias Grossglauser School of Computer and Communication Sciences, EPFL
ICML — August 8th, 2017
2
3
Explain how users navigate along edges...
0.6 0.1 0.1
...given network structure and marginal traffic.
294 51 73 96 127 196 51
4
Underconstrained problem
λ2 λ4 λ8 λ5 λ3 λ6 λ7
→ “low-rank” parametrization of pij. Consistent with Luce's choice axiom.
Probability of choosing i over j does not depend on the
i λk
[Luce 1959]
5
Given:
G = (V, E) π π = πP Find matrix P such that
pij = 0 Inverting a Steady-State [Kumar et al. WSDM 2015] Random-walk framework Our work We merely assume discrete choices
works with:
Marginal traffic is a minimally sufficient statistic {(c+
i , c− i ) | i ∈ V }
6
Given network structure + marginal traffic, find “good” parameters λ. `(λ; D) = X
(i,j)∈E
cij log j − log X
k∈N +
i
k
n
X
i=1
c−
i log i − c+ i log
X
k∈N +
i
k
transitions D = {cij | (i, j) ∈ E}
pij = λj P
k∈N +
i λk
c13 ...
X
j∈N −
i
cji X
j∈N +
i
cij
7
ML estimate is ofuen ill-defined because of graph structure or data sparsity. → embed in a Bayesian setting by postulating a prior on λi. Theorem: if α > 1 and β > 0, there is always a unique maximum
n
i=1
i log λi − c+ i log
k∈N +
i
n
i=1
8
We maximize the log-posterior using the MM algorithm. [Hunter 2004] λ(t+1)
i
= c−
i
P
j∈N −
i γ(t)
j
, where γ(t)
j
= c+
j
P
k∈N +
j λ(t)
k
λi(t) λi(t+1) One iteration requires two passes over the edges
Scales well to large graphs. Tested on Common Crawl hyperlink graph:
machine
9
C-Rank Traffic P-Rank Uniform 0.0 0.5 1.0 1.5 2.0 2.5 KL-divergence C-Rank Traffic P-Rank Uniform 0.1 0.2 0.3 0.4 Displacement
English Wikipedia traffic — 2 M nodes, 13 M edges, 1.2 B transitions. How well do we recover the transition probabilities?
pij ∝ λj pij ∝ c−
j
pij ∝ PRj pij ∝ 1
10
github.com/ lucasmaystre/choix
11
ChoiceRank
traffic, find transition probabilities.
Luce's choice axiom.
a page's utility.
12
PageRank
state traffic.
uniformly random over neighbors.
page's popularity.
13
2 3 4 5 6 7 8
14
1 2 4 3 1 2 3 4 1 2 3 4
c−
2 = 2
c+
4 = 1
c−
4 = 1
c+
2 = 1
c−
1 = 1
c+
1 = 1
c−
3 = 1, c+ 3 = 2
C-Rank Traffic P-Rank Uniform 0.0 0.1 0.2 0.3 KL-divergence C-Rank Traffic P-Rank Uniform 0.20 0.25 0.30 0.35 0.40 0.45 Displacement
15
Applications beyond clickstream data — e.g., mobility networks.