Probabilistic Foundations of Statistical Network Analysis Chapter 3: - PowerPoint PPT Presentation

Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling Harry Crane Based on Chapter 3 of Probabilistic Foundations of Statistical Network Analysis Book website: http://www.harrycrane.com/networks.html Harry Crane Chapter 3: Network sampling 1 / 18

Table of Contents Chapter 1 Orientation 2 Binary relational data 3 Network sampling 4 Generative models 5 Statistical modeling paradigm 6 Vertex exchangeability 7 Getting beyond graphons 8 Relative exchangeability 9 Edge exchangeability 10 Relational exchangeability 11 Dynamic network models Harry Crane Chapter 3: Network sampling 2 / 18

Illustration: the effects of sampling Let X 1 , X 2 , . . . , X N be i.i.d. from Pr ( X i = k + 1 ) = λ k e − λ / k ! , k = 0 , 1 , . . . . (1) What is the distribution of X ′ obtained by: Sampling ℓ = 1 , . . . , N uniformly and putting X ′ = X ℓ and 1 Choosing ℓ = 1 , . . . , N according to 2 Pr ( ℓ = k | X 1 , . . . , X N ) ∝ X k , k = 1 , . . . , N , and putting X ′ = X k ? Simple observation: Method of sampling affects the distribution of X ′ . Must be accounted for in inference. Easy for this example. Easier said than done for networks. Under uniform sampling, X ′ distributed as in (1). 1 Under size-biased sampling, X ′ distributed as size-biased distribution: 2 Pr ( X ′ = k + 1 ) ∝ ( k + 1 ) λ k e − λ / k ! , k = 0 , 1 , . . . . Parameters are not just Greek letters! Harry Crane Chapter 3: Network sampling 3 / 18

Network modeling Conventional Definition : A (parameterized) statistical model is a family of probability distributions M = { P θ : θ ∈ Θ } , each defined on the sample space. Population or Sample model? And what’s the connection? Population Observed network (sample) ??? Model { P θ : θ ∈ Θ } ??? Guiding Question : How to draw sound inferences about population model based on sampled network? Need to model data in a manner consistent with (i) population model and (ii) sampling mechanism. Harry Crane Chapter 3: Network sampling 4 / 18

Selection sampling “Selection of [ m ] from [ n ] ”: �→ For example, for A = ( A ij ) 1 ≤ i , j ≤ n given by A 11 A 12 · · · A 1 m · · · A 1 n   A 21 A 22 · · · A 2 m · · · A 2 n   . . . . ... ...   . . . .   . . . .   ,   A m 1 A m 2 · · · A mm · · · A mn     . . . . ... ... . . . .   . . . .   A n 1 A n 2 · · · A nm · · · A nn the restriction A | [ m ] , for m ≤ n , is the upper m × m submatrix given by  A 11 A 12 · · · A 1 m  A 21 A 22 · · · A 2 m    .  . . .  ... . . .   . . .  A m 1 A m 2 · · · A mm Harry Crane Chapter 3: Network sampling 5 / 18

Consistency under selection Let Y N and Y n , n < N , be random arrays and write S n , N : { 0 , 1 } N × N → { 0 , 1 } n × n to denote the act of selecting [ n ] from [ N ] . Definition The distributions of Y N and Y n are consistent under selection if Y n = D S n , N ( Y N ) . Example : p 1 model (Why? See Equation (3.10) and Exercise 3.1.) ERGMs consistent under selection only if sufficient statistics have ‘separable increments’ (Shalizi and Rinaldo, 2013). Population Observed network (sample) Y N S n , N ( Y N ) Distribution Y N Y n Harry Crane Chapter 3: Network sampling 6 / 18

Significance of sampling consistency Example : Suppose Y N follows p 1 model with parameters ( ρ, θ, α, β ) , for α = ( α 1 , . . . , α N ) and β = ( β 1 , . . . , β N ) . Want to estimate reciprocity ρ based on observation Y n = S n , N Y N for n < N . By consistency under selection, Y n distributed from p 1 model with parameter ( ρ, θ, α [ n ] , β [ n ] ) for α [ n ] = ( α 1 , . . . , α n ) and β [ n ] = ( β 1 , . . . , β n ) . = ⇒ If Y N from p 1 model and Y n obtained from Y N by selection sampling, then Y n also from p 1 model with same parameters. = ⇒ ρ, α i , β i are the ‘same’ for Y N and Y n . = ⇒ estimate ˆ ρ n based on Y n and use same estimate for Y N . Same logic does not apply to estimating ERGM unless separable increments holds. (See Chapter 2 and Shalizi–Rinaldo (2014).) Harry Crane Chapter 3: Network sampling 7 / 18

Toward a coherent theory for network modeling I do not suggest that consistency under selection is be-all and end-all. It is a useful illustration of the importance of consistency with respect to subsampling. But selection is just one special kind of subsampling. And selection is very unrealistic in almost all networks applications of interest. Three essential observations: (i) sampling is an indispensable part of network modeling, (ii) relationship between observed and unobserved data established by sampling mechanism is critical for statistical inference, and (iii) nature of this relationship and reason why it is important have not been properly emphasized in the developments of network analysis to date. Harry Crane Chapter 3: Network sampling 8 / 18

Selection from sparse networks Suppose Y N = ( Y ij ) 1 ≤ i , j ≤ N is “sparse” (aside: “sparse” a misnomer): � Y ij ≈ ε N for “small” ε > 0 . 1 ≤ i , j ≤ N Sample n ≪ N vertices uniformly at random and observe the subgraph Y ∗ n induced by Y N . What does Y ∗ n look like? Since vertices sampled uniformly, Y ∗ n is exchangeable and Pr ( Y ∗ 12 = 1 ) ≈ ε N / (( N ( N − 1 )) ≈ ε/ N ≈ 0 . Furthermore, we compute   �  ≤ � ij = 1 ) ≈ n 2 ε/ N ≈ 0 . { Y ∗ Pr ( Y ∗ Pr ij = 1 }  1 ≤ i � = j ≤ n 1 ≤ i � = j ≤ n What are the practical implications of this? Harry Crane Chapter 3: Network sampling 9 / 18

Scenario: Ego networks in high school friendships Suppose Y N modeled by Erd˝ os–Rényi–Gilbert distribution with parameter θ ∈ [ 0 , 1 ] : � θ y ij ( 1 − θ ) 1 − y ij , y ∈ { 0 , 1 } N × N . Pr ( Y N = y ; θ ) = 1 ≤ i � = j ≤ N Observe Y ∗ by sampling v ∗ uniformly from [ N ] and observing Y ∗ = Y N | S , for S = { v ∗ } ∪ { v : Y v ∗ v = 1 or Y vv ∗ = 1 } . What is the distribution of Y ∗ ? Figure: Depiction of one-step snowball sampling operation in Section 2.4. The solid filled vertex (bottom right) corresponds to the randomly chosen vertex v ∗ and those partially filled with dots are its one-step neighborhood. Harry Crane Chapter 3: Network sampling 10 / 18

Network sampling schemes Vertex sampling: As in Section 2.4 (students in a high school). Relational sampling edge sampling: phone calls hyperedge sampling: movie collaborations, co-authorships path sampling: traceroute Snowball sampling: As in Section 3.5. Sampling scheme affects the units of observation. Units of observation affect inference/modeling. Harry Crane Chapter 3: Network sampling 11 / 18

Edge sampling (phone call database) Table: Database of phone calls. Each row contains information about a single phone call: caller and receiver (identified by phone number), time of call, topic discussed, etc. Caller Receiver Time of Call Topic Discussed . . . 555-7892 ( a ) 555-1243 ( b ) 15:34 Business . . . 550-9999 ( c ) 555-7892 ( a ) 15:38 Birthday . . . 555-1200 ( d ) 445-1234 ( e ) 16:01 School . . . 555-7892 ( a ) 550-9999 ( c ) 15:38 Sports . . . 555-1243 ( b ) 555-1200 ( d ) 16:17 Business . . . . . . . ... . . . . . . . . Figure: Network depiction of phone call sequence of caller-receiver pairs ( a , b ) , ( c , a ) , ( d , e ) , ( a , c ) as in the first four rows of Table 1. Edges are labeled in correspondence with the order in which the corresponding calls were observed. Harry Crane Chapter 3: Network sampling 12 / 18

Traceroute sampling (Path sampling) Sample paths in the Internet by sending signals between different IP addresses and tracing the path (traceroute sampling). Figure: Path-labeled network constructed from sequence path ( a , c ) = ( a , b , c ) , path ( a , f ) = ( a , b , e , f ) , path ( a , h ) = ( a , g , h ) , and path ( a , d ) = ( a , d ) . Edges are labeled according to which path they belong. For example, the three edges labeled ‘2’ should be regarded as comprising a single path, namely path ( a , f ) = ( a , b , e , f ) , and not as three distinct edges ( a , b ) , ( b , e ) , ( e , f ) . Harry Crane Chapter 3: Network sampling 13 / 18

Hyperedge sampling Actor collaborations : Movie title Starring cast Rocky Sylvester Stallone, Bert Young, Carl Weathers, . . . Rounders Matt Damon, Ed Norton, John Malkovich, John Turturro, . . . Groundhog Day Bill Murray, Andie McDowell, Chris Elliott, . . . A Bronx Tale Robert DeNiro, Chazz Palminteri, Joe Pesci, . . . Over the Top Sylvester Stallone, Robert Loggia, . . . The Room Tommy Wiseau, Greg Sestero, . . . . . . . . . Scientific coauthorships : Article title Authors A nonparametric view of network models . . . Bickel, Chen Edge exchangeable models for interaction networks Crane, Dempsey Snowball sampling Goodman Latent space approaches to social network analysis Hoff, Raftery, Handcock . . . . . . Harry Crane Chapter 3: Network sampling 14 / 18

Probabilistic Foundations of Statistical Network Analysis Chapter 3: - PowerPoint PPT Presentation

Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling Harry Crane Based on Chapter 3 of Probabilistic Foundations of Statistical Network Analysis Book website: http://www.harrycrane.com/networks.html Harry Crane

Probabilistic Foundations of Statistical Network Analysis Chapter 5: Statistical modeling paradigm

Probabilistic Foundations of Statistical Network Analysis Chapter 2: Binary relational data Harry

Probabilistic Foundations of Statistical Network Analysis Chapter 1: Orientation Harry Crane

Probabilistic Foundations of Statistical Network Analysis Chapter 4: Generative models Harry

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Some Comments on the Some Comments on the Foundations of Network Analysis Foundations of Network

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

Locality-Aware Laplacian Mesh Smoothing Guillaume Aupy , Jeonghyung Park, Padma Raghavan

Cosmic Calibration Katrin Heitmann Statistical Challenges for Large-Scale Structure in the Era of

Earnings Results: 4th Quarter 2013 1 | 07/27/2012 FORWARD-LOOKING STATEMENT This presentation

San Mateo County San Mateo County Department of Parks Department of Parks 2010-11 / 2011-12

Problem 3-45 Design the piston rod of the cylinder at FB of the hydraulic floor crane problem

BIDDERS CONFERENCE IT -3981 Dismantling, Refurbishment, Replacement and Supply of Electrical

HWR Handling in CMTF C. Baffes HWR Transportation Review 24 July 2018 Outline Truck pulls

A crane lowers a girder into place at constant speed. Consider the work W G done by gravity and the

Sambuz

Useful Links

Newsletter

Mail Us