SLIDE 1 Searching and Sampling
Take a Walk Through a Network! Antonio Carzaniga
Faculty of Informatics Università della Svizzera italiana
Mach 4, 2020
SLIDE 2
Outline
Applications The network as a linear transformation Other applications of linear algebra
SLIDE 3
SLIDE 4
SLIDE 5
v
SLIDE 6
v
SLIDE 7
v a very limited local view of the network
SLIDE 8 v local view Networks
◮ peer-to-peer ◮ . . .
Services
◮ address-based ◮ content-based ◮ multicast ◮ search ◮ sampling ◮ . . .
Algorithms
◮ random walks ◮ . . .
SLIDE 9
GE VS TI VD FR BE OW NW UR GR NE LU ZG SZ GL SG JU SO BS BL AG SH ZH TG AR AL
SLIDE 10
GE VS TI VD FR BE OW NW UR GR NE LU ZG SZ GL SG JU SO BS BL AG SH ZH TG AR AL
remaining hops: 9 remaining hops: 9
SLIDE 11
GE VS TI VD FR BE OW NW UR GR NE LU ZG SZ GL SG JU SO BS BL AG SH ZH TG AR AL
remaining hops: 8 remaining hops: 8
SLIDE 12
GE VS TI VD FR BE OW NW UR GR NE LU ZG SZ GL SG JU SO BS BL AG SH ZH TG AR AL
remaining hops: 7 remaining hops: 7
SLIDE 13
GE VS TI VD FR BE OW NW UR GR NE LU ZG SZ GL SG JU SO BS BL AG SH ZH TG AR AL
remaining hops: 6 remaining hops: 6
SLIDE 14
GE VS TI VD FR BE OW NW UR GR NE LU ZG SZ GL SG JU SO BS BL AG SH ZH TG AR AL
remaining hops: 5 remaining hops: 5
SLIDE 15
GE VS TI VD FR BE OW NW UR GR NE LU ZG SZ GL SG JU SO BS BL AG SH ZH TG AR AL
remaining hops: 4 remaining hops: 4
SLIDE 16
GE VS TI VD FR BE OW NW UR GR NE LU ZG SZ GL SG JU SO BS BL AG SH ZH TG AR AL
remaining hops: 3 remaining hops: 3
SLIDE 17
GE VS TI VD FR BE OW NW UR GR NE LU ZG SZ GL SG JU SO BS BL AG SH ZH TG AR AL
remaining hops: 2 remaining hops: 2
SLIDE 18
GE VS TI VD FR BE OW NW UR GR NE LU ZG SZ GL SG JU SO BS BL AG SH ZH TG AR AL
remaining hops: 1 remaining hops: 1
SLIDE 19
GE VS TI VD FR BE OW NW UR GR NE LU ZG SZ GL SG JU SO BS BL AG SH ZH TG AR AL
remaining hops: 0! remaining hops: 0!
SLIDE 20
GE VS TI VD FR BE OW NW UR GR NE LU ZG SZ GL SG JU SO BS BL AG SH ZH TG AR AL
node SG selected node SG selected
SLIDE 21 Other Applications
Relevance score for hyper-linked documents (PageRank)
◮ Input: a large collection of linked documents such as Web
pages
◮ Output: a ranking of the pages by reputation ◮ a page that is linked by reputable pages acquires more
reputation
◮ equivalent to a random walk over the Web
SLIDE 22
SLIDE 23
Problem: given a directed graph G = (V, A), compute the probability pu that a sufficiently long random walk would end at node u ∈ V for all nodes u.
SLIDE 24 Problem: given a directed graph G = (V, A), compute the probability pu that a sufficiently long random walk would end at node u ∈ V for all nodes u. Approaches:
- 1. Simulation
- 2. Math! (linear algebra)
SLIDE 25
v local view Random Walks
SLIDE 26 v
0.2 0.2 0.5 0.1
local view Random Walks Execution
◮ trivial, local process
SLIDE 27 v
0.2 0.2 0.5 0.1
local view Random Walks Execution
◮ trivial, local process
Configuration and bias
◮ how do we choose
transition probabilities?
◮ is the sample biased?
How?
◮ how long do we walk?
SLIDE 28
AL AG AR BE BL BS FR GE GL GR JU LU NE NW OW SG SH SO SZ TG TI UR VD VS ZG ZH
stationary distribution (hops → ∞)
SLIDE 29 v1 v2 v3
1 1 0.5 0.5
SLIDE 30 v1 v2 v3
1 1 0.5 0.5
let pi(t) = Pr[walk is at node vi at time t] p(0) = [0 1 0]T means the walk starts at v2
SLIDE 31 v1 v2 v3
1 1 0.5 0.5
let pi(t) = Pr[walk is at node vi at time t] p(0) = [0 1 0]T means the walk starts at v2 p1(t + 1) = 0.5 · p3(t) p2(t + 1) = p1(t) + 0.5 · p3(t) p3(t + 1) = p2(t)
SLIDE 32 v1 v2 v3
1 1 0.5 0.5
let pi(t) = Pr[walk is at node vi at time t] p(0) = [0 1 0]T means the walk starts at v2 p1(t + 1) = 0.5 · p3(t) p2(t + 1) = p1(t) + 0.5 · p3(t) p3(t + 1) = p2(t) p(t + 1) = Ap(t) p(t) = Atp(0)
SLIDE 33 v1 v2 v3
1 1 0.5 0.5
let pi(t) = Pr[walk is at node vi at time t] p(0) = [0 1 0]T means the walk starts at v2 p1(t + 1) = 0.5 · p3(t) p2(t + 1) = p1(t) + 0.5 · p3(t) p3(t + 1) = p2(t) p(t + 1) = Ap(t) p(t) = Atp(0) p(0) = c1x1 + c2x2 + · · · + cnxn
SLIDE 34 v1 v2 v3
1 1 0.5 0.5
let pi(t) = Pr[walk is at node vi at time t] p(0) = [0 1 0]T means the walk starts at v2 p1(t + 1) = 0.5 · p3(t) p2(t + 1) = p1(t) + 0.5 · p3(t) p3(t + 1) = p2(t) p(t + 1) = Ap(t) p(t) = Atp(0) p(0) = c1x1 + c2x2 + · · · + cnxn p(t) = Atp(0)
SLIDE 35 v1 v2 v3
1 1 0.5 0.5
let pi(t) = Pr[walk is at node vi at time t] p(0) = [0 1 0]T means the walk starts at v2 p1(t + 1) = 0.5 · p3(t) p2(t + 1) = p1(t) + 0.5 · p3(t) p3(t + 1) = p2(t) p(t + 1) = Ap(t) p(t) = Atp(0) p(0) = c1x1 + c2x2 + · · · + cnxn p(t) = Atp(0) = λt
1c1x1 + λt 2c2x2 + · · · + λt ncnxn
SLIDE 36 v1 v2 v3
1 1 0.5 0.5
let pi(t) = Pr[walk is at node vi at time t] p(0) = [0 1 0]T means the walk starts at v2 p1(t + 1) = 0.5 · p3(t) p2(t + 1) = p1(t) + 0.5 · p3(t) p3(t + 1) = p2(t) p(t + 1) = Ap(t) p(t) = Atp(0) p(0) = c1x1 + c2x2 + · · · + cnxn p(t) = Atp(0) = λt
1c1x1 + λt 2c2x2 + · · · + λt ncnxn
A is stochastic: 1 = |λ1| > |λ2| ≥ |λ3| ≥ . . .
SLIDE 37 v1 v2 v3
1 1 0.5 0.5
let pi(t) = Pr[walk is at node vi at time t] p(0) = [0 1 0]T means the walk starts at v2 p1(t + 1) = 0.5 · p3(t) p2(t + 1) = p1(t) + 0.5 · p3(t) p3(t + 1) = p2(t) p(t + 1) = Ap(t) p(t) = Atp(0) p(0) = c1x1 + c2x2 + · · · + cnxn p(t) = Atp(0) = λt
1c1x1 + λt 2c2x2 + · · · + λt ncnxn
λ1 = 1 λ2,3 = −0.5 ± 0.5i
SLIDE 38 v1 v2 v3
1 1 0.5 0.5
let pi(t) = Pr[walk is at node vi at time t] p(0) = [0 1 0]T means the walk starts at v2 p1(t + 1) = 0.5 · p3(t) p2(t + 1) = p1(t) + 0.5 · p3(t) p3(t + 1) = p2(t) p(t + 1) = Ap(t) p(t) = Atp(0) p(0) = c1x1 + c2x2 + · · · + cnxn p(t) = Atp(0) = λt
1c1x1 + λt 2c2x2 + · · · + λt ncnxn
λ1 = 1 λ2,3 = −0.5 ± 0.5i π
SLIDE 39 v1 v2 v3
1 1 0.5 0.5
let pi(t) = Pr[walk is at node vi at time t] p(0) = [0 1 0]T means the walk starts at v2 p1(t + 1) = 0.5 · p3(t) p2(t + 1) = p1(t) + 0.5 · p3(t) p3(t + 1) = p2(t) p(t + 1) = Ap(t) p(t) = Atp(0) p(0) = c1x1 + c2x2 + · · · + cnxn p(t) = Atp(0) = λt
1c1x1 + λt 2c2x2 + · · · + λt ncnxn
λ1 = 1 λ2,3 = −0.5 ± 0.5i π ǫt ≈ |λ2|t → 0
SLIDE 40 v1 v2 v3
1 1 0.5 0.5
let pi(t) = Pr[walk is at node vi at time t] p(0) = [0 1 0]T means the walk starts at v2 p1(t + 1) = 0.5 · p3(t) p2(t + 1) = p1(t) + 0.5 · p3(t) p3(t + 1) = p2(t) p(t + 1) = Ap(t) p(t) = Atp(0) p(0) = c1x1 + c2x2 + · · · + cnxn p(t) = Atp(0) = λt
1c1x1 + λt 2c2x2 + · · · + λt ncnxn
π ǫt ≈ |λ2|t → 0
Stationary distribution
SLIDE 41 v1 v2 v3
1 1 0.5 0.5
let pi(t) = Pr[walk is at node vi at time t] p(0) = [0 1 0]T means the walk starts at v2 p1(t + 1) = 0.5 · p3(t) p2(t + 1) = p1(t) + 0.5 · p3(t) p3(t + 1) = p2(t) p(t + 1) = Ap(t) p(t) = Atp(0) p(0) = c1x1 + c2x2 + · · · + cnxn p(t) = Atp(0) = λt
1c1x1 + λt 2c2x2 + · · · + λt ncnxn
π ǫt ≈ |λ2|t → 0
Stationary distribution Mixing Time: τ ≈ log|λ2 | ǫ s.t.
ǫt < ǫ for t > τ
SLIDE 42 Notice that, if the network is ergodic, then we know for sure that there is a stationary distribution π that satisfies the equation
π = Aπ.
So, we can compute the stationary distribution directly by solving this system of equations: Aπ = π
n
πi = 1