Models for Network Graphs Gonzalo Mateos Dept. of ECE and Goergen - PowerPoint PPT Presentation

Models for Network Graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ April 2, 2020 Network Science Analytics Models for Network Graphs 1

Random graph models Random graph models Small-world models Network-growth models Exponential random graph models Case study: Modeling collaboration among lawyers Network Science Analytics Models for Network Graphs 2

Why statistical graph modeling? ◮ Statistical inference typically conducted in the context of a model ⇒ Models key to transition from descriptive to inferential tasks ◮ In practice, graph models are used for a variety of reasons: 1) Mechanisms explaining properties observed on real-world networks Ex: small-world effects, power-law degree distributions 2) Testing for ‘significance’ of a characteristic η ( G ) in a network graph Ex: is the observed average degree unusual or anomalous? 3) Alternative to the design-based framework for estimating η ( G ) Ex: model-based, e.g., maximum likelihood estimation Network Science Analytics Models for Network Graphs 3

Modeling network graphs ◮ So far the focus has been on network analysis methods to: ⇒ Collect relational data and construct network graphs ⇒ Characterize and summarize their structural properties ⇒ Obtain sample-based estimates of partially-observed structure ◮ Emphasis now on construction and use of models for network data ◮ Def: A model for a network graph is a collection { P θ ( G ) , G ∈ G : θ ∈ Θ } ◮ G is an ensemble of possible graphs ◮ P θ ( · ) is a probability distribution on G (often write P ( · )) ◮ Parameters θ ranging over values in parameter space Θ Network Science Analytics Models for Network Graphs 4

Model specification ◮ Richness of models derives from how we specify P( · ) ⇒ Methods range from the simple to the complex 1) Let P( · ) be uniform on G , add structural constraints to G Ex: Erd¨ os-Renyi random graphs, generalized random graph models 2) Induce P( · ) via application of simple generative mechanisms Ex: small world, preferential attachment, copying models 3) Model structural features and their effect on G ’s topology Ex: exponential random graph models ◮ Computational cost of associated inference algorithms relevant Network Science Analytics Models for Network Graphs 5

Classical random graph models ◮ Assign equal probability on all undirected graphs of given order and size ◮ Specify collection G N v , N e of graphs G ( V , E ) with | V | = N v , | E | = N e � − 1 to each G ∈ G N v , N e , where N = | V (2) | = � N ◮ Assign P ( G ) = � N v � N e 2 ◮ Most common variant is the Erd¨ os-Renyi random graph model G n , p ⇒ Undirected graph on N v = n vertices ⇒ Edge ( u , v ) present w.p. p , independent of other edges ◮ Simulation: simply draw N = � N v � ≈ N 2 v / 2 i.i.d. Ber( p ) RVs 2 ◮ Inefficient when p ∼ N − 1 ⇒ sparse graph, most draws are 0 v ◮ Skip non-edges drawing Geo( p ) i.i.d. RVs, runs in O ( N v + N e ) time Network Science Analytics Models for Network Graphs 6

Properties of G n , p ◮ G n , p is well-studied and tractable. Noteworthy properties: P1) Degree distribution P ( d ) is binomial with parameters ( n − 1 , p ) ◮ Large graphs have concentrated P ( d ) with exponentially-decaying tails P2) Phase transition on the emergence of a giant component ◮ If np > 1, G n , p has a giant component of size O ( n ) w.h.p. ◮ If np < 1, G n , p has components of size only O (log n ) w.h.p. np>1 np<1 P3) Small clustering coefficient O ( n − 1 ) and short diameter O (log n ) w.h.p. Network Science Analytics Models for Network Graphs 7

Generalized random graph models ◮ Recipe for generalization of Erd¨ os-Renyi models ⇒ Specify G of fixed order N v , possessing a desired characteristic ⇒ Assign equal probability to each graph G ∈ G ◮ Configuration model: fixed degree sequence { d (1) , . . . , d ( N v ) } ◮ Size fixed under this model, since N e = ¯ dN v / 2 ⇒ G ⊂ G N v , N e ◮ Equivalent to specifying model via conditional distribution on G N v , N e ◮ Configuration models useful as reference, i.e., ‘null’ models Ex: compare observed G with G ′ ∈ G having power law P ( d ) Ex: expected group-wise edge counts in modularity measure Network Science Analytics Models for Network Graphs 8

Results on the configuration model P1) Phase transition on the emergence of a giant component ◮ Condition depends on first two moments of given P ( d ) ◮ Giant component has size O ( N v ) as in G N v , p ◮ M. Molloy and B. Reed, “A critical point for random graphs with a given degree sequence,” Random Struct. and Alg., vol. 6, pp. 161-180, 1995 P2) Clustering coefficient vanishes slower than in G N v , p ◮ M. Newman et al, “Random graphs with arbitrary degree distrbutions and their applications”, Physical Rev. E, vol. 64, p. 26,118, 2001 P3) Special case of given power-law degree distribution P ( d ) ∼ Cd − α ◮ For α ∈ (2 , 3) , short diameter O (log N v ) as in G N v , p ◮ F. Chung and L. Lu, “The average distances in random graphs with given expected degrees,” PNAS, vol. 99, pp. 15,879-15,882, 2002 Network Science Analytics Models for Network Graphs 9

Simulating generalized random graphs ◮ Matching algorithm A D A D C ¡ ¡ ¡ ¡ ¡ ¡ C B E B E A B C D E Given: nodes with spokes Randomly match mini-nodes Sample graph ◮ Switching algorithm A D A D A D C C C B E B E B E Initialize Randomly switch a pair of edges Sample graph Repeat ~100N e times Network Science Analytics Models for Network Graphs 10

Task 1: Model-based estimation in network graphs ◮ Consider a sample G ∗ of a population graph G ( V , E ) ⇒ Suppose a given characteristic η ( G ) is of interest η ( G ∗ ) of η ( G )? ⇒ Q: Useful estimate ˆ η = ˆ ◮ Statistical inference in sampling theory via design-based methods ⇒ Only source of randomness is due to the sampling design ◮ Augment this perspective to include a model-based component ◮ Assume G drawn uniformly from the collection G , prior to sampling ◮ Inference on η ( G ) should incorporate both randomness due to ⇒ Selection of G from G and sampling G ∗ from G Network Science Analytics Models for Network Graphs 11

Example: size of a “hidden population” ◮ Directed graph G ( V , E ), V the members of the hidden population ⇒ Graph describing willingness to identify other members ⇒ Arc ( i , j ) when ask individual i , mentions j as a member ◮ For given V , model G as drawn from a collection G of random graphs ⇒ Independently add arcs between vertex pairs w.p. p G ◮ Graph G ∗ obtained via one-wave snowball sampling, i.e., V ∗ = V ∗ 0 ∪ V ∗ 1 ⇒ Initial sample V ∗ 0 obtained via BS from V with probability p 0 ◮ Consider the following RVs of interest ◮ N = | V ∗ 0 | : size of the initial sample ◮ M 1 : number of arcs among individuals in V ∗ 0 ◮ M 2 : number of arcs from individuals in V ∗ 0 to individuals in V ∗ 1 ◮ Snowball sampling yields measurements n , m 1 , and m 2 of these RVs Network Science Analytics Models for Network Graphs 12

Method of moments estimator ◮ Method of moments: now A ij = I { ( i , j ) ∈ E } also a RV �� I { i ∈ V ∗ E [ N ] = E 0 } = N v p 0 = n i   � � I { i ∈ V ∗ 0 } I { j ∈ V ∗  = N v ( N v − 1) p 2 E [ M 1 ] = E 0 } A ij 0 p G = m 1 j i � = j   � � I { i ∈ V ∗ ∈ V ∗  = N v ( N v − 1) p 0 (1 − p 0 ) p G = m 2 E [ M 2 ] = E 0 } I { j / 0 } A ij j i � = j ◮ Expectation w.r.t. randomness in selecting G and sample V ∗ 0 . Solution: m 1 m 1 ( m 1 + m 2 ) � m 1 + m 2 � n [( n − 1) m 1 + nm 2 ] , and ˆ p 0 = ˆ , ˆ p G = N v = n m 1 + m 2 m 1 ⇒ Same estimates for p 0 and N v as in the design-based approach Network Science Analytics Models for Network Graphs 13

Directly modeling η ( G ) ◮ So far considered modeling G for model-based estimation of η ( G ) ⇒ Classical random graphs typical in social networks research ◮ Alternatively, one may specify a model for η ( G ) directly Example ◮ Estimate the power-law exponent η ( G ) = α from degree counts ◮ A power law implies the linear model log P ( d ) = C − α log d + ǫ ⇒ Could use a model-based estimator such as least squares � − α � ◮ Better form the MLE for the model f ( d ; α ) = α − 1 d d min d min � d i �� − 1 � N v 1 � Hill estimator ⇒ ˆ α = 1 + log N v d min i =1 Network Science Analytics Models for Network Graphs 14

Task 2: Assessing significance in network graphs ◮ Consider a graph G obs derived from observations ◮ Q: Is a structural characteristic η ( G obs ) significant, i.e., unusual? ⇒ Assessing significance requires a frame of reference, or null model ⇒ Random graph models often used in setting up such comparisons ◮ Define collection G , and compare η ( G obs ) with values { η ( G ) : G ∈ G} ⇒ Formally, construct the reference distribution P η, G ( t ) = |{ G ∈ G : η ( G ) ≤ t }| |G| ◮ If η ( G obs ) found to be sufficiently unlikely under P η, G ( t ) ⇒ Evidence against the null H 0 : G obs is a uniform draw from G Network Science Analytics Models for Network Graphs 15

Models for Network Graphs Gonzalo Mateos Dept. of ECE and Goergen - PowerPoint PPT Presentation

Models for Network Graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ April 2, 2020 Network Science Analytics Models for

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Algorithms for Lipschitz Learning on Graphs Sushant Sachdeva Yale Institute of Network Sciences

Examples of Obstructions to Apex Graphs, Edge-Apex Graphs, and Contraction-Apex Graphs

STACKED GRAPHS STACKED GRAPHS EVOLUTION OF STACKED GRAPHS Stacked Area Chart Themeriver

Graphs Graph definitions There are two kinds of graphs: directed graphs (sometimes called

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

House of Graphs: Introduction what are interesting graphs? GraPHedron First Definition of

Graphs Graphs Definitions Implementation/Representation of graphs Search Traversing

Uniform Hashing in Constant Time and Linear Space Anna Ostlin and Rasmus Pagh IT University

Tools that learn Nando de Freitas and many DeepMind colleagues Learning slow to learn fast

Java Code Generation for the ACL2 Theorem Prover Kestrel Alessandro Coglio Institute designer

On Deception-Based Protection Against Cryptographic Ransomware Ziya Alper Gen, Gabriele

Ev Even Turing Sho Shoul uld So Sometimes No Not Be Be Ab Able To To Te Tell: Mi

LECTURE 4: Growth, TFP, Domestic and International Capital Flows with Other Frictions in

Benchmarking the Performance of Scientific Applications with Irregular I/O at the Extreme

Teaching digital skills Teaching digital skills Learning usability testing by peer training