Distribution and Dependence of Extremes in Network Sampling - - PowerPoint PPT Presentation

distribution and dependence of extremes in
SMART_READER_LITE
LIVE PREVIEW

Distribution and Dependence of Extremes in Network Sampling - - PowerPoint PPT Presentation

Distribution and Dependence of Extremes in Network Sampling Processes Jithin K. Sreedharan* with Konstantine Avrachenkov* and Natalia M. Markovich t *INRIA Sophia Antipolis, France t Institute of Control Sciences, Russian Academy of Sciences,


slide-1
SLIDE 1

Distribution and Dependence of Extremes in Network Sampling Processes

Jithin K. Sreedharan* with Konstantine Avrachenkov* and Natalia M. Markovicht

*INRIA Sophia Antipolis, France

tInstitute of Control Sciences, Russian Academy of Sciences, Moscow

March 30, 2015

slide-2
SLIDE 2

Random Sampling

No complete picture a priori ! All we have: 𝑌1, 𝑌2, … , 𝑌𝑜 Samples: any stationary (most likely dependent) sequence e.g. node ID’s, degrees, number of followers or income of the nodes in OSN etc

slide-3
SLIDE 3
  • Assuming i.i.d. degrees, largest degree≈ 𝐿𝑂1/𝛿, 𝑂 no. of nodes, 𝛿 tail

index of Pareto distribution (N. Litvak et al, LNCS’12)

  • Twitter graph (2012): N= 537M, 𝛿 = 1.124 for out-degree.
  • Largest out-degree predicted is 59M. Actual largest out-degree is 𝟑𝟑M!

Correlations in Graphs and Sampling

  • Correlations in graph properties

exist in real networks e.g: correlation in Coauthorship network

  • Usually neglected in analysis of

sampling algorithms Effect of neglecting correlations:

slide-4
SLIDE 4

First passage time Statistical properties of clusters Kth largest value of samples and many more extremal properties

Questions We Address Here…

Is there a simple way to get information about many extremal properties? Ans: Extremal Index

slide-5
SLIDE 5

Point process of exceedances →Compound poisson process (rate 𝜄𝜐) Tendency to form clusters

Relation to Extreme Value Theory

Extremal Index (𝜄):

Point Process

slide-6
SLIDE 6

Extremal Index: Applications

Gives maxima of the degree sequence with certain probability Pareto case revisited:

  • i.i.d. degrees, largest degree≈ 𝐿𝑂1/𝛿, 𝑂 no. of nodes, 𝛿 tail

index of Pareto distribution (N. Litvak, LNCS’12)

  • Stationary degree samples with EI, largest degree≈ 𝐿(𝑂𝜄)1/𝛿
slide-7
SLIDE 7

Lower the value of EI, more time to hit extreme levels First passage time:

Extremal Index: Applications

e.g. Pareto

slide-8
SLIDE 8

Relation to Mean Cluster Size:

Extremal Index: Applications

slide-9
SLIDE 9

Two mixing conditions on the samples Cond-2: Cond-1: Limits long range dependence Stationary Markov samples or its measurable functions satisfy this

Calculation of Extremal Index

slide-10
SLIDE 10

If the sampled sequence is stationary and satisfies mixing conditions, then Extremal Index

Proposition

0 ≤ 𝜄 ≤ 1 and

slide-11
SLIDE 11

Degree Correlations

  • Undirected and correlated
  • is enough to construct graph
  • Crawling via Random Walks on vertices
  • Degree sequence is a Hidden Markov chain
  • What is the joint stationary distribution on degree state space?
slide-12
SLIDE 12

Standard Random Walk Page Rank Random Walk with Jumps (RWJ)

Meanfield Models

slide-13
SLIDE 13

Check of Meanfield Model in Random Walks

slide-14
SLIDE 14

Extremal Index for Bivariate Pareto Model

slide-15
SLIDE 15

Empirical Copula based estimator:

Estimation of Extremal Index

EI: slope at (1; 1),Linear least square fitting & numerical differentiation Intervals Estimator: Based on

slide-16
SLIDE 16

Numerical Results: Synthetic Graphs

EI EI Analysis Copula based estimator Synthetic graph (5K Nodes) 0.56 0.53 Intervals Estimator 0.58

Copula based estr. Intervals Estimator

slide-17
SLIDE 17

Numerical Results: Real Graphs

EI EI Copula based estimator Intervals Estimator DBLP (32K Nodes,1.1M Edges) 0.29 0.25 Enron Email (37K Nodes,368K Edges) 0.61 0.62

slide-18
SLIDE 18
  • Associated Extremal Value Theory of stationary sequence to

sampling of large graphs

  • For any general stationary samples meeting two mixing conditions,

knowledge of bivariate distribution or bivariate copula is sufficient to derive many extremal properties

  • Extremal Index (EI) encapsulates this relation
  • Applications of EI to many relevant extrems:
  • First hitting time
  • Order statistics
  • Mean cluster size
  • Modeled correlation in degrees of adjacent nodes and random walk

in degree state space

  • Estimates of EI for synthetic graph with degree correlations and find

a good match with theory

  • Estimated EI for two real world networks

Conclusions

slide-19
SLIDE 19

Thank You!