Estimating and Sampling Graphs with Multidimensional Random Walks - PowerPoint PPT Presentation

Estimating and Sampling Graphs with Multidimensional Random Walks Group 2: Mingyan Zhao, Chengle Zhang, Biao Yin, Yuchen Liu

Motivation Complex Network Social Network Biological Network eference: www.forbe.com imdevsoftware.wordpress.com

Existing Approaches ´ Random vertex sampling ´ Random edge sampling ´ Random walks

Frontier Sampling ´ a new m -dimensional random walk that uses m dependent random walkers.

Contribution ´ Mitigates the large estimation errors caused by disconnected or loosely connected components. ´ Shows that the tail of the degree distribution is better estimated using random edge sampling than random vertex sampling. ´ Presents asymptotically unbiased estimators

Definitions Notation Meaning G d (V, E d ) A labeled directed grap representing the (origina network graph, where V is set of vertices and Ed is a set of edges (u, v) A connection from u to (a.k.a. edges) L v and L e Finite set of vertex and edg labels, L e (u,v) = ∅ Edge (u, v) is unlabeled L v (v) = ∅ Vertex v is unlabeled

Vertex V.S. Edge Sampling

Section 4 1. Mathematical theories and conductions on Random Walk Sampli 2. Strong Law of Large Numbers 3. Four estimators will be applied in Section 5 4. Deficiency of RW 5. Multiple Independent Random Walkers

Strong Law of Large Numbers B Number of RW steps Original statistical Thm: B*(B) Number of edges in E* Total RW Sampled Weak law: Edges

Estimator 1: Edge Label Density st, label edges of interest: , the probability of the labelled edges e estimator based on SLLN

Estimator 2: Assortative Mixing Coefficient (AMC) nsidering directed G, an asymptotically unbiased estimator of AMC: Covariance Which two are highly correlated?

Estimator 3:Vertex label Density Construct an asymptotically unbiased estimator: Since: So, this estimator converges to:

Estimator 4: Global Clustering Coefficient e unbiased estimator by SLLN ce:

Deficiency of RW from one point 1. “Trapped” inside a subgraph (MSE) 2. Start from non-stationary (non-steady) state (MSE, Bias) Burn-in period: Discard the non-stationary samples 1. Just decrease error with non-stationary one 2. Discarding in a small sample is not ideal Multiple Independent Random Walkers come!

Single RW and Multiple RW Still very high Single RW is depend on sample sizes from the estimators provided. Mutiple RW will split the sample sizes into each path. So, error in the total CNMSE Log-log plot!

Why not M-independent RW ? MIRW is hard to sample A m independent vertex with p proportional to their degrees. B

Section 5 Motivation ´ We want an m-dimensional random walk that, in steady state, samples edges uniformly at random but, unlike MultipleRW, can benefit from starting its walkers at uniformly sampled vertices.

Frontier Sampling p = deg( 𝑣 ) /∑𝑀↑▒ deg( 𝑤 ) * 1 / deg( 𝑣 ) =

Frontier Sampling: A m-dimensional Random Wa ´ The frontier sampling process is equivalent to the sampling process of a single random walker over 𝐻↑𝑛 . (Lemma 5.1) ´ P(selecting a vertex and its outgoing edge in FS) = P(randomly sampling an edge from e( 𝑀↓𝑜 ) in single random walker over 𝐻↑𝑛 ).

FS Steady State v.s Uniform Distribution ´ 𝐿↓𝑔𝑡 (𝑛) be a random variable that denotes the number of random walkers in 𝑊↓𝐵 in steady state. ´ Let 𝐿↓𝑣𝑜 ( 𝑛 ) be a random variable that denotes the number of sampled vertices, out of m uniformly sampled vertices from V, that belong to 𝑊↓𝐵 . ´ Proving this to be true indicates that the FS algorithm starting with m random walkers at m uniformly sampled vertices approaches the steady state distribution. This means FS benefits from starting its walkers at uniformly sampled vertices by reducing transient of RW.

FS Steady State V.S. Uniform Distribution ´ By definition ´ Theorem 5.2 ´ Lemma 5.3 ´ Theorem 5.4

MultipleRW Steady State V.S. Uniform Distribution ´ 𝐿↓𝑛𝑥 ( 𝑛 ) be a random variable that denotes the steady state number of MultipleRW random walkers in 𝑊↓𝐵 . Note: 𝑒↓𝐵 (average degree of vertices in 𝑊↓𝐵 ) Conclusion: If we initialize m random walkers with uniformly sampled vertices, FS starts closer to steady state than MultipleRW.

Distributed Frontier Sampling ´ Frontier Sampling can also be parallelized. ´ A MultipleRW sampling process where the cost of sampling a vertex v is an exponentially distributed random variable with parameter deg(v) is equivalent to a FS process. (Theorem 5)

Experiment and Result ´ Data: ´ “Flickr”, “Livejournal”, and “YouTube” ´ Barabási-Albert [5] graph ´ Goal: ´ Compare FS to MultipleRW, SingleRM ´ Compare FS on random vertex and edge sampling ´ Result: FS is constantly more accurate

Assortative Mixing Coefficient

In-degree Distribution Estimates

Out-degree Distribution Estimates

In-degree Distribution loosely connected components ´ Barabási-Albert Graph

FS V.S. Stationary MultipleRW & SingleRW ´ MultipleRW and SingleRW start with steady state

FS V.S. Random Independent Sampling

Density of Special Interest Group

Global Clustering Coefficient Estimates ´ Global Clustering Coefficient a measure of the degree to which nodes in a graph tend to cluster together.

Conclusion ´ In almost all of the tests, FS is better. Future Work • estimating characteristics of dynamic networks • design of new MCMC-based approximation algorithms

Thank you!

Estimating and Sampling Graphs with Multidimensional Random Walks - PowerPoint PPT Presentation

Estimating and Sampling Graphs with Multidimensional Random Walks Group 2: Mingyan Zhao, Chengle Zhang, Biao Yin, Yuchen Liu Motivation Complex Network Social Network Biological Network eference: www.forbe.com imdevsoftware.wordpress.com

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

The Monte Carlo Method Estimating through sampling (estimating , p -value, integrals,...)

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

EE 355 Unit 5 Multidimensional Arrays Mark Redekopp 2 MULTIDIMENSIONAL ARRAYS 3

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Lecture 7 Barna Saha AT&T-Labs Research September 26, 2013 Outline Sampling Estimating F k

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

ONSEN ONSEN HLT/DatCON Merger and HLT/DatCON Merger and Trigger/ROI Distributor Trigger/ROI

Mining Topics in Documents Standing on the Shoulders of Big Data Zhiyuan (Brett) Chen and Bing

Motivation Alternating-time temporal logic plays a key role in Stochastic Multi-Agent Systems

The Image Computation Problem in Hybrid Systems Model Checking e Platzer 1 , 2 Edmund M. Clarke 2

Feedback in workplace-based assessment Introduction to workplace-based assessment What is

TheAlternating-Time ExplicitStrategies Joint work with Lutz Schrder and Dirk Pattinson by

Innovation Reall and unlocking affordable housing markets in urban Africa and Asia Andrew

The Active Memory Cube : A Processing-in-Memory System for High Performance Computing Zehra Sura

Sambuz

Useful Links

Newsletter

Mail Us

Estimating and Sampling Graphs with Multidimensional Random Walks - PowerPoint PPT Presentation

Estimating and Sampling Graphs with Multidimensional Random Walks Group 2: Mingyan Zhao, Chengle Zhang, Biao Yin, Yuchen Liu Motivation Complex Network Social Network Biological Network eference: www.forbe.com imdevsoftware.wordpress.com

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

The Monte Carlo Method Estimating through sampling (estimating , p -value, integrals,...)

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

EE 355 Unit 5 Multidimensional Arrays Mark Redekopp 2 MULTIDIMENSIONAL ARRAYS 3

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Lecture 7 Barna Saha AT&amp;T-Labs Research September 26, 2013 Outline Sampling Estimating F k

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

ONSEN ONSEN HLT/DatCON Merger and HLT/DatCON Merger and Trigger/ROI Distributor Trigger/ROI

Mining Topics in Documents Standing on the Shoulders of Big Data Zhiyuan (Brett) Chen and Bing

Motivation Alternating-time temporal logic plays a key role in Stochastic Multi-Agent Systems

The Image Computation Problem in Hybrid Systems Model Checking e Platzer 1 , 2 Edmund M. Clarke 2

Feedback in workplace-based assessment Introduction to workplace-based assessment What is

TheAlternating-Time ExplicitStrategies Joint work with Lutz Schrder and Dirk Pattinson by

Innovation Reall and unlocking affordable housing markets in urban Africa and Asia Andrew

The Active Memory Cube : A Processing-in-Memory System for High Performance Computing Zehra Sura

Sambuz

Useful Links

Newsletter

Mail Us

Lecture 7 Barna Saha AT&T-Labs Research September 26, 2013 Outline Sampling Estimating F k