Data Science Summer School Part II: Network Science Lecture 2/2 G. - PowerPoint PPT Presentation

Data Science Summer School Part II: Network Science Lecture 2/2 G. Caldarelli, networks.imtlucca.it September 2, 2019

Motivation Modelling The art of Modelling is based on ◮ Find the most important features ◮ Realize a synthetic system based on these features ◮ Check if the model can reproduce the real system ◮ Predict future behaviour of the system through the model Random Graph / Definition networks.imtlucca.it 1/82

Hidden and Evident Hypotheses Graphs connect ◮ part of cities across rivers ◮ buidings ◮ offices in the same building Vertices are stable and edge creation has a finite and not negligible cost Random Graph / Definition networks.imtlucca.it 2/82

History The main motivation in the creation of Random Graph theory was to provide ◮ a benchmark for the connection of various vertices ◮ in the case of connecting different buildings with costly phone lines Random Graph / Definition networks.imtlucca.it 3/82

Definition ◮ Take a fixed number of vertices N ◮ no edge is present ◮ we draw a set of m edges out of the N ( N − 1) / 2 available ◮ every edge is extracted with a fixed probability p Such model is known as Random Graph model [Erd˝ os et al. 1959, Gilbert 1959]. No “particular” vertex can be found. Random Graph / Definition networks.imtlucca.it 4/82

Common Definition ◮ Take N vertices ◮ For any couple of vertices draw a link with probability p Expected value of Graph The total number of edges m is a random variable with the expectation value E(m)=p[N(N-1)/2] . If G 0 is a graph with N nodes and m edges, the probability of obtaining it by this graph construction process is P ( G 0 ) = p m (1 − p ) N ( N − 1) / 2 − m Random Graph / Definition networks.imtlucca.it 5/82

First use ◮ a benchmark for the connection of various vertices ◮ in the case of connecting different buildings with costly phone lines Random Graph / Definition networks.imtlucca.it 6/82

Degree Distribution Similarly it is possible to determine the degree distribution[Bollobas 1985]. To have degree k ◮ an edge must be drawn k times p k (1 − p ) ( N − 1) − k ◮ this can happen in � N − 1 � ( N − 1)! = ( N − 1 − k )! k ! k combinations This distribution is automatically normalized since P k = ( p + (1 − p )) N − 1 = 1 . � k =1 , n − 1 Random Graph / Results networks.imtlucca.it 7/82

Degree Distribution II This distribution is usually approximated by means of the Poisson distribution in the two limits N → ∞ and p → 0 (when Np is kept constant and N − 1 ≃ N ) we have: ( N − k )! k ! p k (1 − p ) N − k ≃ ( Np ) k e − pN N ! P k = . k ! Since the mean value � k � of the above distribution is given by np we can write P k = � k � k e −� k � . k ! Random Graph / Results networks.imtlucca.it 8/82

Degree Distribution III ◮ The above results are telling us that a characteristic degree exists ◮ This corresponds to the mean value � k � = Np . ◮ Both larger and smaller values are less probable. ◮ On this respect the random graph model does not reproduce complex networks Random Graph / Results networks.imtlucca.it 9/82

Clustering We can give an estimate of the Clustering Coefficient: for a complete graph it must be 1. If the graph is enough sparse then two points link each other with probability p Expected value E ( C ) ≃ p = � k � N Random Graph / Results networks.imtlucca.it 10/82

Diameter Same estimate can be given for the average distance l between two vertices.If a graph has � k � average degree then ◮ the first neighbours will be � k � ◮ the second neighbours will be at most � k � 2 ◮ the n-th neighbours will be at most � k � n ◮ For the Diameter D , we assume � k � D of order N Expected values � l � ≤ D ≃ log N log k Random Graph / Results networks.imtlucca.it 11/82

Connectedness ◮ If � k � = pN < 1, a typical graph is composed of isolated trees and its diameter equals the diameter of a tree. ◮ If � k � > 1, a giant cluster appears. The diameter of the graph equals the diameter of the giant cluster if � k � > 3 . 5, and is proportional to ln ( N ) / ln ( � k � ). ◮ If � k � > ln ( N ), almost every graph is totally connected. The diameters of the graphs having the same N and � k � are concentrated on a few values around ln ( N ) / ln ( � k � ) Random Graph / Results networks.imtlucca.it 12/82

Coloring of a map The theorem Given any separation of a plane into contiguous regions, producing a figure called a map, no more than four colors are required to color the regions of the map so that no two adjacent regions have the same color. Random Graph / Applications networks.imtlucca.it 13/82

Counterexamples Two regions are called adjacent if they share a common boundary that is not a corner, where corners are the points shared by three or more regions. For example, in the map of the United States of America, Utah and Arizona are adjacent, but Utah and New Mexico, which only share a point that also belongs to Arizona and Colorado, are not Random Graph / Applications networks.imtlucca.it 14/82

Graph theory This problem can be easily visualized with planar graphs. The set of regions of a map can be represented more abstractly as an undirected graph that has a vertex for each region and an edge for every pair of regions that share a boundary segment Random Graph / Applications networks.imtlucca.it 15/82

The Percolation model Percolation Sites (or bonds) of a lattice are chosen with probability p . By varying p we have different clusters [Stauffer 2009]. ◮ Bond percolation on a 2D latttice (25 × 25). ◮ Two nodes are connected by an edge with probability p. ◮ Two realizations: left p=0.315, right p=0.525 At p = p c = 0 . 5, the bonds form a single cluster. This value is indicated as percolation threshold . Percolation / networks.imtlucca.it 16/82

The Percolation model Percolation arise in a quantity of systems ◮ coffee (with percolator), ◮ water into rocks to extract oil (invasion percolation) ◮ certain types of fractures (mud cracking) ◮ networks (robustness to random and targeted attacks) ◮ wildfire propagation ◮ Epidemic spreading how it is possible? Universality there are properties for a large class of systems that are independent of the dynamical details of the system. Systems display universality in a scaling limit, when a large number of interacting parts come together. Percolation / networks.imtlucca.it 17/82

Percolation and Random Graphs For p < p c = 1 / N ◮ The probability of a giant cluster in a graph, and of an infinite cluster in percolation, is equal to 0. ◮ The clusters of a random graph are trees, while the clusters in percolation have a fractal structure and a perimeter proportional with their volume. ◮ The largest cluster in a random graph is a tree with ln( N ) nodes, while in general for percolation P p ( | C | = s ) ≃ e − s /ξ , suggesting that the size of the largest cluster scales as ln( N ). Percolation / networks.imtlucca.it 18/82

Percolation and Random Graphs For p = p c = 1 / N ◮ A unique giant cluster or an infinite cluster appears. ◮ The size of the giant cluster is N 2 / 3 while for infinite dimensional percolation P p ( | C | = s ) s − 3 / 2 , thus the size of the largest cluster scales as N 2 / 3 . Percolation / networks.imtlucca.it 19/82

Percolation and Random Graphs For p > p c = 1 / N ◮ The size of the giant cluster is ( f ( p c N ) − f ( pN )) N , where f is an exponentially decreasing function with f (1) = 1. The size of the infinite cluster is ∝ ( p − p c ) N . ◮ The giant cluster has a complex structure containing cycles, while the infinite cluster is no longer fractal, but compact. Percolation / networks.imtlucca.it 20/82

Configuration model ◮ Let’s start with the degree sequence. ◮ imagine that each node has edge “stubs” attached to it [Bender et al. 1978, Molloy et al. 1995]. ◮ Edges are then assigned by randomly choosing two stubs and drawing an edge between them. Configuration Model / Definition networks.imtlucca.it 21/82

How to build the graph As we see here, it happens that we end up with multiple edges Configuration Model / Definition networks.imtlucca.it 22/82

Probability of connections Let k i , k j denote the non-zero degrees of two particular vertices i , j in a network of m edges. For a particular stub attached to vertex i , there are k j possible stubs, out of 2 m − 1 possible ones probability that i and j are connected is given by 2 m − 1 ≃ k i k j k i k j 2 m Configuration Model / Definition networks.imtlucca.it 23/82

Number of multiple edges The probability that a second edge appears between i , j is ( k i − 1)( k j − 1) 2 m Thus, the probability of both a first and a second edge is k i k j ( k i − 1)( k j − 1) (2 m ) 2 . We can now need obtain the number of multiple edges summing up on all the possible couples Configuration Model / Definition networks.imtlucca.it 24/82

Data Science Summer School Part II: Network Science Lecture 2/2 G. - PowerPoint PPT Presentation

Data Science Summer School Part II: Network Science Lecture 2/2 G. Caldarelli, networks.imtlucca.it September 2, 2019 Motivation Modelling The art of Modelling is based on Find the most important features Realize a synthetic system

FILM RESTORATION SUMMER SCHOOL / FIAF SUMMER SCHOOL 2009 FILM RESTORATION SUMMER SCHOOL / FIAF

SUMMER BRAIN GAIN: REIMAGINING SUMMER LEARNING What is the problem? Why Summer Matters There is

Summer School 2020 Canterbury Whats a summer school? summer sciool noun [C] /sm

Summer Salary MARCH 20, 2019 Todays Agenda What is Summer Salary? Key Considerations

Proposed Project Schedules Bond Summer Summer Summer Summer Commitment 2007 2008 2009

Summer School and Summer Programs 2014 District Effectiveness Report Summer School Locations:

Summer Reading Summer Reading 12th Grade 12th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Summer Reading Summer Reading 9th Grade 9th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Partners: Summer Village of Birchcliff Town of Sylvan Lake Summer Village of Half Moon Bay

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

ACPSD SUMMER LEARNING 2019 INSTRUCTIONAL SERVICES DIVISION Goal for Summer Learning: To

Summer Lunch Program Summer Lunch Presentation for Food Security Task Force 12/4/19 1 Agenda

nd European Summer School on Hydrogen 2 nd European Summer School on Hydrogen 2 Safety Safety

Summer Reading Summer Reading 11th Grade 11th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Summer Reading Summer Reading 10th Grade 10th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Farm to Summer Eats Bringing Fresh, Local Foods to Summer Meals Workshop Goals Understand

N-gram Language Models CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu T

Pushing the Boundaries in Regression Testing Shin Yoo & Mark Harman / King s College London

The art and science of problem solving negotiation Pacey C. Foster Organization Studies Dept

Vehicle routing problems with alternative paths Dominique Feillet University of Avignon ( moving

CS533 No experiment is ever a complete failure. It can always serve as a negative Modeling and

MODULE 6: ECONOMIC DEVELOPMENT IDIS Online for CDBG Entitlement Communities 1 Eligible Economic

Panel Data Analysis Part III Modern Moment Estimation Arellano and Honor (2000) James J.

Neil T. N. Ferguson Responding to Crises Conference 26 September 2016 UNU Wider - Helsinki

Data Science Summer School Part II: Network Science Lecture 2/2 G. - PowerPoint PPT Presentation

Data Science Summer School Part II: Network Science Lecture 2/2 G. Caldarelli, networks.imtlucca.it September 2, 2019 Motivation Modelling The art of Modelling is based on Find the most important features Realize a synthetic system

FILM RESTORATION SUMMER SCHOOL / FIAF SUMMER SCHOOL 2009 FILM RESTORATION SUMMER SCHOOL / FIAF

SUMMER BRAIN GAIN: REIMAGINING SUMMER LEARNING What is the problem? Why Summer Matters There is

Summer School 2020 Canterbury Whats a summer school? summer sciool noun [C] /sm

Summer Salary MARCH 20, 2019 Todays Agenda What is Summer Salary? Key Considerations

Proposed Project Schedules Bond Summer Summer Summer Summer Commitment 2007 2008 2009

Summer School and Summer Programs 2014 District Effectiveness Report Summer School Locations:

Summer Reading Summer Reading 12th Grade 12th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Summer Reading Summer Reading 9th Grade 9th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Partners: Summer Village of Birchcliff Town of Sylvan Lake Summer Village of Half Moon Bay

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

ACPSD SUMMER LEARNING 2019 INSTRUCTIONAL SERVICES DIVISION Goal for Summer Learning: To

Summer Lunch Program Summer Lunch Presentation for Food Security Task Force 12/4/19 1 Agenda

nd European Summer School on Hydrogen 2 nd European Summer School on Hydrogen 2 Safety Safety

Summer Reading Summer Reading 11th Grade 11th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Summer Reading Summer Reading 10th Grade 10th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Farm to Summer Eats Bringing Fresh, Local Foods to Summer Meals Workshop Goals Understand

N-gram Language Models CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu T

Pushing the Boundaries in Regression Testing Shin Yoo &amp; Mark Harman / King s College London

The art and science of problem solving negotiation Pacey C. Foster Organization Studies Dept

Vehicle routing problems with alternative paths Dominique Feillet University of Avignon ( moving

CS533 No experiment is ever a complete failure. It can always serve as a negative Modeling and

MODULE 6: ECONOMIC DEVELOPMENT IDIS Online for CDBG Entitlement Communities 1 Eligible Economic

Panel Data Analysis Part III Modern Moment Estimation Arellano and Honor (2000) James J.

Neil T. N. Ferguson Responding to Crises Conference 26 September 2016 UNU Wider - Helsinki

Pushing the Boundaries in Regression Testing Shin Yoo & Mark Harman / King s College London