Distribution and Dependence of Extremes in Network Sampling - - PowerPoint PPT Presentation

▶

Feb 20, 2024 364 likes •573 views

Distribution and Dependence of Extremes in Network Sampling Processes Jithin K. Sreedharan* with Konstantine Avrachenkov* and Natalia M. Markovich t *INRIA Sophia Antipolis, France t Institute of Control Sciences, Russian Academy of Sciences,

SLIDE 1

Distribution and Dependence of Extremes in Network Sampling Processes

Jithin K. Sreedharan* with Konstantine Avrachenkov* and Natalia M. Markovicht

*INRIA Sophia Antipolis, France

tInstitute of Control Sciences, Russian Academy of Sciences, Moscow

March 30, 2015

SLIDE 2

Random Sampling

No complete picture a priori ! All we have: 𝑌1, 𝑌2, … , 𝑌𝑜 Samples: any stationary (most likely dependent) sequence e.g. node ID’s, degrees, number of followers or income of the nodes in OSN etc

SLIDE 3

Assuming i.i.d. degrees, largest degree≈ 𝐿𝑂1/𝛿, 𝑂 no. of nodes, 𝛿 tail

index of Pareto distribution (N. Litvak et al, LNCS’12)

Twitter graph (2012): N= 537M, 𝛿 = 1.124 for out-degree.
Largest out-degree predicted is 59M. Actual largest out-degree is 𝟑𝟑M!

Correlations in Graphs and Sampling

Correlations in graph properties

exist in real networks e.g: correlation in Coauthorship network

Usually neglected in analysis of

sampling algorithms Effect of neglecting correlations:

SLIDE 4

First passage time Statistical properties of clusters Kth largest value of samples and many more extremal properties

Questions We Address Here…

Is there a simple way to get information about many extremal properties? Ans: Extremal Index

SLIDE 5

Point process of exceedances →Compound poisson process (rate 𝜄𝜐) Tendency to form clusters

Relation to Extreme Value Theory

Extremal Index (𝜄):

Point Process

SLIDE 6

Extremal Index: Applications

Gives maxima of the degree sequence with certain probability Pareto case revisited:

i.i.d. degrees, largest degree≈ 𝐿𝑂1/𝛿, 𝑂 no. of nodes, 𝛿 tail

index of Pareto distribution (N. Litvak, LNCS’12)

Stationary degree samples with EI, largest degree≈ 𝐿(𝑂𝜄)1/𝛿

SLIDE 7

Lower the value of EI, more time to hit extreme levels First passage time:

Extremal Index: Applications

e.g. Pareto

SLIDE 8

Relation to Mean Cluster Size:

Extremal Index: Applications

SLIDE 9

Two mixing conditions on the samples Cond-2: Cond-1: Limits long range dependence Stationary Markov samples or its measurable functions satisfy this

Calculation of Extremal Index

SLIDE 10

If the sampled sequence is stationary and satisfies mixing conditions, then Extremal Index

Proposition

0 ≤ 𝜄 ≤ 1 and

SLIDE 11

Degree Correlations

Undirected and correlated
is enough to construct graph
Crawling via Random Walks on vertices
Degree sequence is a Hidden Markov chain
What is the joint stationary distribution on degree state space?

SLIDE 12

Standard Random Walk Page Rank Random Walk with Jumps (RWJ)

Meanfield Models

SLIDE 13

Check of Meanfield Model in Random Walks

SLIDE 14

Extremal Index for Bivariate Pareto Model

SLIDE 15

Empirical Copula based estimator:

Estimation of Extremal Index

EI: slope at (1; 1),Linear least square fitting & numerical differentiation Intervals Estimator: Based on

SLIDE 16

Numerical Results: Synthetic Graphs

EI EI Analysis Copula based estimator Synthetic graph (5K Nodes) 0.56 0.53 Intervals Estimator 0.58

Copula based estr. Intervals Estimator

SLIDE 17

Numerical Results: Real Graphs

EI EI Copula based estimator Intervals Estimator DBLP (32K Nodes,1.1M Edges) 0.29 0.25 Enron Email (37K Nodes,368K Edges) 0.61 0.62

SLIDE 18

Associated Extremal Value Theory of stationary sequence to

sampling of large graphs

For any general stationary samples meeting two mixing conditions,

knowledge of bivariate distribution or bivariate copula is sufficient to derive many extremal properties

Extremal Index (EI) encapsulates this relation
Applications of EI to many relevant extrems:
First hitting time
Order statistics
Mean cluster size
Modeled correlation in degrees of adjacent nodes and random walk

in degree state space

Estimates of EI for synthetic graph with degree correlations and find

a good match with theory

Estimated EI for two real world networks

Conclusions

SLIDE 19

Distribution and Dependence of Extremes in Network Sampling Processes

Jithin K. Sreedharan* with Konstantine Avrachenkov* and Natalia M. Markovicht

*INRIA Sophia Antipolis, France

tInstitute of Control Sciences, Russian Academy of Sciences, Moscow

March 30, 2015

Random Sampling

No complete picture a priori ! All we have: 𝑌1, 𝑌2, … , 𝑌𝑜 Samples: any stationary (most likely dependent) sequence e.g. node ID’s, degrees, number of followers or income of the nodes in OSN etc

index of Pareto distribution (N. Litvak et al, LNCS’12)

Correlations in Graphs and Sampling

exist in real networks e.g: correlation in Coauthorship network

sampling algorithms Effect of neglecting correlations:

First passage time Statistical properties of clusters Kth largest value of samples and many more extremal properties

Questions We Address Here…

Is there a simple way to get information about many extremal properties? Ans: Extremal Index

Point process of exceedances →Compound poisson process (rate 𝜄𝜐) Tendency to form clusters

Relation to Extreme Value Theory

Extremal Index (𝜄):

Point Process

Extremal Index: Applications

Gives maxima of the degree sequence with certain probability Pareto case revisited:

index of Pareto distribution (N. Litvak, LNCS’12)

Lower the value of EI, more time to hit extreme levels First passage time:

Extremal Index: Applications

e.g. Pareto

Relation to Mean Cluster Size:

Extremal Index: Applications

Two mixing conditions on the samples Cond-2: Cond-1: Limits long range dependence Stationary Markov samples or its measurable functions satisfy this

Calculation of Extremal Index

If the sampled sequence is stationary and satisfies mixing conditions, then Extremal Index

Proposition

0 ≤ 𝜄 ≤ 1 and

Degree Correlations

Standard Random Walk Page Rank Random Walk with Jumps (RWJ)

Meanfield Models

Check of Meanfield Model in Random Walks

Extremal Index for Bivariate Pareto Model

Empirical Copula based estimator:

Estimation of Extremal Index

EI: slope at (1; 1),Linear least square fitting & numerical differentiation Intervals Estimator: Based on

Numerical Results: Synthetic Graphs

EI EI Analysis Copula based estimator Synthetic graph (5K Nodes) 0.56 0.53 Intervals Estimator 0.58

Copula based estr. Intervals Estimator

Numerical Results: Real Graphs

EI EI Copula based estimator Intervals Estimator DBLP (32K Nodes,1.1M Edges) 0.29 0.25 Enron Email (37K Nodes,368K Edges) 0.61 0.62

sampling of large graphs

knowledge of bivariate distribution or bivariate copula is sufficient to derive many extremal properties

in degree state space

a good match with theory

Conclusions

Thank You!