Comparison of Random Walk based techniques for estimating network - - PowerPoint PPT Presentation

comparison of random walk based techniques for estimating
SMART_READER_LITE
LIVE PREVIEW

Comparison of Random Walk based techniques for estimating network - - PowerPoint PPT Presentation

Comparison of Random Walk based techniques for estimating network averages Jithin K. Sreedharan INRIA, France Arun Kadavankandy Konstantin Avrachenkov Vivek S. Borkar INRIA, France INRIA, France IIT Bombay, India CSoNet 2016, August 2


slide-1
SLIDE 1

Comparison of Random Walk based techniques for estimating network averages

Jithin K. Sreedharan

INRIA, France

Vivek S. Borkar

IIT Bombay, India

Konstantin Avrachenkov

INRIA, France

CSoNet 2016, August 2

Arun Kadavankandy

INRIA, France

slide-2
SLIDE 2

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 2

  • Estimation in Online Social Network

(OSN)

  • Example:

What proportion of a population supports a given political party? How young a given social network is?

Motivation

slide-3
SLIDE 3

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 2

  • Estimation in Online Social Network

(OSN)

  • Example:

What proportion of a population supports a given political party? How young a given social network is?

Motivation

Easy to answer if the graph is fully known beforehand What if the network is not known?

  • Can only crawl network
  • Few queries
slide-4
SLIDE 4

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 3

Problem definition

slide-5
SLIDE 5

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 3

Problem definition

Let

slide-6
SLIDE 6

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 3

Problem definition

  • Undirected graph

Let

slide-7
SLIDE 7

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 3

Problem definition

  • Undirected graph
  • Nodes have labels

Let

slide-8
SLIDE 8

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 3

Problem definition

  • Undirected graph
  • Nodes have labels
  • Large graph

Let

slide-9
SLIDE 9

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 3

Problem definition

  • Undirected graph
  • Nodes have labels
  • Large graph

Let Estimate

slide-10
SLIDE 10

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 3

Problem definition

  • Undirected graph
  • Nodes have labels
  • Large graph

Let Estimate

  • Graph is unknown
slide-11
SLIDE 11

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 3

Problem definition

  • Undirected graph
  • Nodes have labels
  • Large graph

Let Estimate

  • Graph is unknown
  • Only local information available

Seed nodes and their neighbor IDs Query (visit) a neighbor Visited nodes and their neighbor IDs

slide-12
SLIDE 12

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 4 Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 12

Random Walk based estimation

slide-13
SLIDE 13

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 4 Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 13

Random Walk based estimation

slide-14
SLIDE 14

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 4 Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 14

Random Walk based estimation

slide-15
SLIDE 15

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 4 Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 15

Random Walk based estimation

slide-16
SLIDE 16

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 4 Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 16

Random Walk based estimation

slide-17
SLIDE 17

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 4 Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 17

Random Walk based estimation

Random walk has unique stationary distribution if graph 𝐻 is connected and non- bipartite

slide-18
SLIDE 18

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 4 Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

Random Walk based estimation

Random walk has unique stationary distribution if graph 𝐻 is connected and non- bipartite

  • Goal:

Estimate

slide-19
SLIDE 19

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 4 Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 19

Random Walk based estimation

Random walk has unique stationary distribution if graph 𝐻 is connected and non- bipartite

  • Goal:

Estimate

  • How: Ergodic theorem
slide-20
SLIDE 20

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 4 Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 20

Random Walk based estimation

Random walk has unique stationary distribution if graph 𝐻 is connected and non- bipartite

  • Goal:

Estimate

  • How: Ergodic theorem

For any initial distribution,

slide-21
SLIDE 21

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 4 Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 21

Random Walk based estimation

Random walk has unique stationary distribution if graph 𝐻 is connected and non- bipartite

  • Goal:

Estimate

  • How: Ergodic theorem

For any initial distribution,

How to make ?

slide-22
SLIDE 22

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 4 Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 22

Random Walk based estimation

Random walk has unique stationary distribution if graph 𝐻 is connected and non- bipartite

  • Goal:

Estimate

  • How: Ergodic theorem

For any initial distribution,

How to make ? How to compare different random walks?

slide-23
SLIDE 23

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 5

Respondent Driven Sampling

slide-24
SLIDE 24

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 5

Respondent Driven Sampling

slide-25
SLIDE 25

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 5

Respondent Driven Sampling

  • With re-weighting the function 𝑔
slide-26
SLIDE 26

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 5

Respondent Driven Sampling

  • With re-weighting the function 𝑔

Requires knowledge of no. of nodes and no. of edges

slide-27
SLIDE 27

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 5

Respondent Driven Sampling

  • With re-weighting the function 𝑔

Requires knowledge of no. of nodes and no. of edges Estimator:

slide-28
SLIDE 28

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 5

Respondent Driven Sampling

  • With re-weighting the function 𝑔

Requires knowledge of no. of nodes and no. of edges Estimator:

slide-29
SLIDE 29

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 5

Respondent Driven Sampling

  • With re-weighting the function 𝑔

Requires knowledge of no. of nodes and no. of edges Estimator:

slide-30
SLIDE 30

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 5

Respondent Driven Sampling

  • With re-weighting the function 𝑔

Requires knowledge of no. of nodes and no. of edges Estimator:

slide-31
SLIDE 31

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 6

Metropolis Hastings Sampling

slide-32
SLIDE 32

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 6

Metropolis Hastings Sampling

slide-33
SLIDE 33

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 6

Metropolis Hastings Sampling

slide-34
SLIDE 34

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 6

Metropolis Hastings Sampling

If head appears: move to 𝑘

slide-35
SLIDE 35

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 6

Metropolis Hastings Sampling

If head appears: move to 𝑘 If tail appears: stays at 𝑌𝑢

slide-36
SLIDE 36

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 6

Metropolis Hastings Sampling

If head appears: move to 𝑘 For any initial distribution, If tail appears: stays at 𝑌𝑢

slide-37
SLIDE 37

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 7

Reinforcement Learning technique

slide-38
SLIDE 38

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 7

Reinforcement Learning technique

  • Graph not necessarily connected or

has included connected components of interest

slide-39
SLIDE 39

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 7

Reinforcement Learning technique

  • Graph not necessarily connected or

has included connected components of interest

  • Few seed nodes

Seed nodes

slide-40
SLIDE 40

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 8

Idea of tours

slide-41
SLIDE 41

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 8

b a c d e f

g

i h k l m n p

q

r

Idea of tours

slide-42
SLIDE 42

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 8

b a c d e f

g

i h k l m n p

q

r b a d e f i h l m n p r

Idea of tours

slide-43
SLIDE 43

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 8

b a c d e f

g

i h k l m n p

q

r b a d e f i h l m n p r

Idea of tours

Sample

slide-44
SLIDE 44

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 8

b a c d e f

g

i h k l m n p

q

r b a d e f i h l m n p r

Idea of tours

Sample

slide-45
SLIDE 45

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 8

b a c d e f

g

i h k l m n p

q

r b a d e f i h l m n p r

Idea of tours

Sample

slide-46
SLIDE 46

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 8

b a c d e f

g

i h k l m n p

q

r b a d e f i h l m n p r

Idea of tours

Properties of tours: Sample

slide-47
SLIDE 47

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 8

b a c d e f

g

i h k l m n p

q

r b a d e f i h l m n p r

Idea of tours

Properties of tours:

  • Tours are independent

Sample

slide-48
SLIDE 48

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 8

b a c d e f

g

i h k l m n p

q

r b a d e f i h l m n p r

Idea of tours

Properties of tours:

  • Tours are independent
  • Fully distributed crawler

Sample

slide-49
SLIDE 49

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 8

b a c d e f

g

i h k l m n p

q

r b a d e f i h l m n p r

Idea of tours

Properties of tours:

  • Tours are independent
  • Fully distributed crawler

implementation Sample

slide-50
SLIDE 50

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 8

b a c d e f

g

i h k l m n p

q

r b a d e f i h l m n p r

Idea of tours

Properties of tours:

  • Tours are independent
  • Fully distributed crawler

implementation

  • Larger super node size,

shorter the tours Sample

slide-51
SLIDE 51

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 9

Reinforcement Learning technique (contd.)

Stochastic Approximation Algorithm

slide-52
SLIDE 52

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 9

Reinforcement Learning technique (contd.)

Seed set

For each node 𝑗 in Stochastic Approximation Algorithm

slide-53
SLIDE 53

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 9

Reinforcement Learning technique (contd.)

Seed set

For each node 𝑗 in

Function sum inside a tour Cost function

Stochastic Approximation Algorithm

slide-54
SLIDE 54

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 9

Reinforcement Learning technique (contd.)

Seed set

For each node 𝑗 in

sample 1 sample 2 ……. sample 𝑙

Function sum inside a tour Cost function

Stochastic Approximation Algorithm

slide-55
SLIDE 55

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 9

Reinforcement Learning technique (contd.)

Seed set

For each node 𝑗 in

sample 1 sample 2 ……. sample 𝑙

Function sum inside a tour Cost function

Stochastic Approximation Algorithm

slide-56
SLIDE 56

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 9

Reinforcement Learning technique (contd.)

Seed set

For each node 𝑗 in

Function sum inside a tour Cost function

Stochastic Approximation Algorithm

slide-57
SLIDE 57

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 10

Which Random Walk method to select ?

slide-58
SLIDE 58

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 10

Which Random Walk method to select ?

  • Mixing time

Not a good criterion here due to burn-in period.

slide-59
SLIDE 59

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 10

Which Random Walk method to select ?

  • Mixing time

Not a good criterion here due to burn-in period.

X1 X2 ……. Xk : accepted sample : rejected sample Burn-in period Approximate stationary regime

slide-60
SLIDE 60

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 10

Which Random Walk method to select ?

  • Mixing time

Not a good criterion here due to burn-in period.

X1 X2 ……. Xk : accepted sample : rejected sample Burn-in period Approximate stationary regime

Reinforcement Learning technique does not require burn-in period

slide-61
SLIDE 61

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 10

Which Random Walk method to select ?

  • Mixing time

Not a good criterion here due to burn-in period.

  • Efficiency of the estimator:

How many samples are needed to achieve certain accuracy

slide-62
SLIDE 62

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 11

Asymptotic Variance

slide-63
SLIDE 63

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 11

Asymptotic Variance

Asymptotic variance of the estimator

slide-64
SLIDE 64

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 11

Asymptotic Variance

Asymptotic variance of the estimator Also from Central Limit Theorem equivalent

slide-65
SLIDE 65

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 12

Asymptotic Variance (contd.)

slide-66
SLIDE 66

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 12

Asymptotic Variance (contd.)

  • For Metropolis-Hastings Sampling,

where

Fundamental matrix of Markov chain

slide-67
SLIDE 67

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 12

Asymptotic Variance (contd.)

  • For Respondent Driven Sampling,
  • For Metropolis-Hastings Sampling,

where

Fundamental matrix of Markov chain

slide-68
SLIDE 68

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 12

Asymptotic Variance (contd.)

  • For Reinforcement Learning based sampling,
  • For Respondent Driven Sampling,
  • For Metropolis-Hastings Sampling,

where

Fundamental matrix of Markov chain

slide-69
SLIDE 69

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 13

Numerical Studies

slide-70
SLIDE 70

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 13

Numerical Studies

Normalized Root Mean Square Error (NRMSE) vs Budget B

slide-71
SLIDE 71

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 13

Numerical Studies

Normalized Root Mean Square Error (NRMSE) vs Budget B

slide-72
SLIDE 72

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 13

Numerical Studies

Normalized Root Mean Square Error (NRMSE) vs Budget B Why MSE ?

slide-73
SLIDE 73

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 13

Numerical Studies

Normalized Root Mean Square Error (NRMSE) vs Budget B Why MSE ?

slide-74
SLIDE 74

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 13

Numerical Studies

Normalized Root Mean Square Error (NRMSE) vs Budget B Budget B: number of allowed samples Why MSE ?

slide-75
SLIDE 75

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 14

Les Misérables network

slide-76
SLIDE 76

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 14

Les Misérables network

Number of nodes: 77, number of edges: 254.

slide-77
SLIDE 77

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 14

Les Misérables network

Number of nodes: 77, number of edges: 254.

slide-78
SLIDE 78

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 14

Les Misérables network

Number of nodes: 77, number of edges: 254.

slide-79
SLIDE 79

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 15

Les Misérables network contd.

slide-80
SLIDE 80

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 15

Les Misérables network contd.

slide-81
SLIDE 81

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 15

Les Misérables network contd.

slide-82
SLIDE 82

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 16

Les Misérables network contd.

slide-83
SLIDE 83

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 16

Les Misérables network contd.

Study of asymptotic variance

slide-84
SLIDE 84

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 16

Les Misérables network contd.

Study of asymptotic variance

slide-85
SLIDE 85

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 16

Les Misérables network contd.

Study of asymptotic variance

slide-86
SLIDE 86

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 16

Les Misérables network contd.

Study of asymptotic variance

slide-87
SLIDE 87

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 17

Friendster network

slide-88
SLIDE 88

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 17

Friendster network

Number of nodes ~ 65K number of edges ~ 1.25M

slide-89
SLIDE 89

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 17

Friendster network

Number of nodes ~ 65K number of edges ~ 1.25M

slide-90
SLIDE 90

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 17

Friendster network

Number of nodes ~ 65K number of edges ~ 1.25M

slide-91
SLIDE 91

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

Friendster network contd.

slide-92
SLIDE 92

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

Friendster network contd.

Stability of sample paths:

slide-93
SLIDE 93

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

Friendster network contd.

Stability of sample paths: single path example

slide-94
SLIDE 94

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

Friendster network contd.

Stability of sample paths: Varying super-node size single path example

slide-95
SLIDE 95

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

Friendster network contd.

Stability of sample paths: Varying super-node size Varying step size single path example

slide-96
SLIDE 96

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 19

Conclusions

slide-97
SLIDE 97

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 19

Conclusions

  • Rand Walk based estimators of
slide-98
SLIDE 98

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 19

Conclusions

  • Rand Walk based estimators of
  • Numerical and theoretical study of Mean Square Error & Asymptotic Variance of

 Metropolis-Hastings sampling  Respondent Driven sampling (RDS)  New Reinforcement Learning based sampling (RL)

slide-99
SLIDE 99

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 19

Conclusions

  • Rand Walk based estimators of
  • Numerical and theoretical study of Mean Square Error & Asymptotic Variance of

 Metropolis-Hastings sampling  Respondent Driven sampling (RDS)  New Reinforcement Learning based sampling (RL)

  • Reinforcement Learning technique:

 Tackles disconnected graph  A cross between deterministic iteration and MCMC  Can control the stability of the algorithm with step sizes

slide-100
SLIDE 100

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 19

Conclusions

  • Rand Walk based estimators of
  • Numerical and theoretical study of Mean Square Error & Asymptotic Variance of

 Metropolis-Hastings sampling  Respondent Driven sampling (RDS)  New Reinforcement Learning based sampling (RL)

  • Reinforcement Learning technique:

 Tackles disconnected graph  A cross between deterministic iteration and MCMC  Can control the stability of the algorithm with step sizes

  • RDS works better. RL technique comparable, yet more stable and no burn-in !
slide-101
SLIDE 101

Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 20

Thank you!

http://bit.do/Jithin