AN INTRODUCTION TO NETWORK SCIENCE Nicola Perra - - PowerPoint PPT Presentation

an introduction to network science
SMART_READER_LITE
LIVE PREVIEW

AN INTRODUCTION TO NETWORK SCIENCE Nicola Perra - - PowerPoint PPT Presentation

AN INTRODUCTION TO NETWORK SCIENCE Nicola Perra n.perra@greenwich.ac.uk @net_science REDUCTIONISM: DOMINANT APPROACH IN SCIENCE Systems are the nothing but the sum of their parts NOT ALWAYS A GOOD APPROACH By studying the interactions of


slide-1
SLIDE 1

n.perra@greenwich.ac.uk @net_science

Nicola Perra

AN INTRODUCTION TO NETWORK SCIENCE

slide-2
SLIDE 2

Systems are the nothing but the sum of their parts

REDUCTIONISM: DOMINANT APPROACH IN SCIENCE

slide-3
SLIDE 3

NOT ALWAYS A GOOD APPROACH

By studying the interactions of single individuals can we understand the structure of a company?

slide-4
SLIDE 4

NOT ALWAYS A GOOD APPROACH

By studying the interactions of single individuals can we understand the spreading of infectious diseases?

slide-5
SLIDE 5

NOT ALWAYS A GOOD APPROACH

By studying the tweets of single Twitter users can we understand the emergence of social protests?

slide-6
SLIDE 6

NOT ALWAYS A GOOD APPROACH

By studying the properties of single webpages can we build an efficient search engine?

slide-7
SLIDE 7

NOT ALWAYS A GOOD APPROACH

By studying the properties of a single molecule of water can we understand the transition from ice to liquid water?

slide-8
SLIDE 8

MORE IS DIFFERENT!

[...The main fallacy [of] the reductionist hypothesis [is that it] does not by any means imply a “constructionist” one: The ability to reduce everything to simple fundamental laws does not imply the ability to start from those laws and reconstruct the universe. In fact, the more the elementary particle physicists tell us about the nature of the fundamental laws, the less relevance they seem to have to the very real problems of the rest of science, much less to those of society...] Anderson, P.W., "More is Different" in Science ,177, 4047. (1972)

slide-9
SLIDE 9

COMPLEXITY

Holistic perspective

  • Study systems as a whole
  • Focus shifts on emergent phenomena
slide-10
SLIDE 10

COMPLEX SYSTEMS

Properties:

  • Complex systems are the spontaneous outcome of the interactions among the system

constitutive units

  • They are self-organizing systems. There is not blueprint, or global supervision
  • Their behavior cannot be described from the properties of each constitutive units
slide-11
SLIDE 11

COMPLEX SYSTEMS

Complex DOES NOT mean complicated!

slide-12
SLIDE 12

COMPLEX SYSTEMS REPRESENTATION

Many complex systems can be described as a graph

  • Nodes/vertices describe their constitutive units
  • Links/edges describe the interaction between them

If, after this abstraction the complex features are still present

  • Complex Networks!
slide-13
SLIDE 13

WHY DO WE CARE?

Complex Networks are ubiquitous! Biological networks

  • Biochemical networks: molecular-level interactions and mechanisms of control in the cell
  • Example 1) metabolic networks. Nodes are chemicals. Links describe the reactions
  • Example 2) protein-protein interaction networks. Nodes are proteins. Links their interactions

Nature Biotechnology 20, 991 - 997 (2002)

slide-14
SLIDE 14

WHY DO WE CARE?

Biological networks

  • Example 3) gene regulatory networks. Node are genes. A direct link between i and j implies that

the first gene regulates the expression of the second

  • Example 4) neural networks. Nodes are neurons. Links describe the synapses
slide-15
SLIDE 15

WHY DO WE CARE?

Biological networks

  • Ecological networks. Nodes are species. Links their interactions
  • Example 1) Food webs. Nodes are species. Links describe predator-prey interactions

http://www.uic.edu/classes/bios/bios101/

slide-16
SLIDE 16

WHY DO WE CARE?

Networks of information

  • Data items, connected in some way
  • World Wide Web. Nodes webpages. Links, connections between them
  • Citation networks. Nodes papers (patents/legal documents). Links citations between them
slide-17
SLIDE 17

WHY DO WE CARE?

Technological Networks

  • Phone networks
  • Internet
  • Power grids
  • Transportation networks
slide-18
SLIDE 18

WHY DO WE CARE?

Social Networks

  • Interviews and questionnaires
  • Data from archival or third parties records
slide-19
SLIDE 19

WHY DO WE CARE?

Social Networks

  • Co-authorship networks
  • Face-to-face networks

http://www.sociopatterns.org/

slide-20
SLIDE 20

NETWORKS REPRESENTATION AND THEIR STATISTICAL FEATURES

slide-21
SLIDE 21

NETWORKS AS GRAPHS

Basic Ingredients

  • basic unites: nodes/vertices
  • their interactions: links, edges, connections

N E G(N, E)

slide-22
SLIDE 22

NETWORKS AS GRAPHS

Mathematical representation

  • adjacency matrix

Aij = ⇢ 1 if there is a connection between i and j

  • therwise
slide-23
SLIDE 23

UNDIRECTED NETWORKS

Symmetrical connections -> symmetrical adjacency matrix

A = AT

slide-24
SLIDE 24

DIRECTED NETWORKS

Links (arcs) have direction

A 6= AT

slide-25
SLIDE 25

WEIGHTED NETWORKS

Links are not simply binary

Aij = ⇢ wij if i and j interacted w times

  • therwise

Typically weights are positive, but it is not necessary (signed networks)

slide-26
SLIDE 26

BIPARTITE NETWORKS

Two type of vertices Incidence matrix [m,n]

Bij = ⇢ 1 if j belongs to i

  • therwise
slide-27
SLIDE 27

PROJECTIONS OF BIPARTITE NETWORKS

A B C D 1 2 3 4 5 A B C D 1 2 3 4 5

slide-28
SLIDE 28

BASIC MEASURES

Degree

  • number of connections of each node

ki = P

j Aij

Strength

  • total number of interactions of each node

si = P

j Aij

Degree in directed networks

  • in-degree
  • out-degree

kOUT

i

= P

j Aij

kIN

i

= P

j AT ij

slide-29
SLIDE 29

BASIC MEASURES

Degree

  • what is the sum of all the degree?

hki = 1 N X

i

ki = 2E N X

i

ki = 2E

slide-30
SLIDE 30

BASIC MEASURES

Path

  • sequence of nodes between i and j

Path length

  • number of hops between i and j
slide-31
SLIDE 31

BASIC MEASURES

Geodesic Path

  • the path with the shortest path length
slide-32
SLIDE 32

BASIC MEASURES

Local clustering

  • for any i it is the fraction of the neighbours that are connected

ci =

ei

ki(ki−1) 2

ci = 0 ci = 0.5

slide-33
SLIDE 33

STATISTICAL DESCRIPTION OF NETWORKS MEASURES

In large systems statistical descriptions are necessary

  • distributions

hxi = P

x xP(x)

hxni = P

x xnP(x)

x → P(x) ≡ Nx

N

σ2 = P

x(x µ)2P(x) = hx2i µ2 ⌘ hx2i hxi2

slide-34
SLIDE 34

DEGREE DISTRIBUTION IN REAL NETWORKS

Far from normal distributions

  • the average is not a good descriptor of the distribution (absence of a characteristic scale)
  • large variance -> large heterogeneity
  • mathematically described by heavy-tailed (sometimes power-law) distributions
slide-35
SLIDE 35

POWER LAWS

Power-laws

  • scale invariance
  • linear in log-log scale
  • divergent moments depending on the exponent

f(x) = ax−γ → f(cx) = ac−γx−γ ∼ x−γ f(x) = ax−γ → log(f(x)) = log(a) − γ log(x)

slide-36
SLIDE 36

POWER LAWS

slide-37
SLIDE 37

PATH LENGTH DISTRIBUTION IN REAL NETWORKS

Small-world phenomena

  • even for very large graphs the average path length is very very small
  • it scales logarithmically, or even slower, with networks’ size
  • the path length distribution is defined by a characteristic scale

Science, 301, 2003

https://www.facebook.com/notes/facebook-data-team/anatomy-of-facebook/10150388519243859

slide-38
SLIDE 38

CLUSTERING IN REAL NETWORKS

Average local clustering Given a value, is it high or low?

  • Null models
  • typically high for social networks, typically low for technological networks
  • still open and debated topic

hCi = 1 N X

i

Ci

slide-39
SLIDE 39

REAL NETWORKS PROPERTIES

Generally speaking

  • heavy-tailed degree distribution
  • small-world phenomena
  • large clustering (depends on the network type)
slide-40
SLIDE 40

Albert-Barabasi model (1999)

  • based on preferential attachment (rich get richer), or Matthew effect (1968), Gibrat

principle (1955), or cumulative advantage (1976)

  • network growth

NETWORKS MODELS

slide-41
SLIDE 41

The model

  • network starts with m0 connected nodes
  • at each time step a new node is added
  • the node connects with m<m0 existing nodes selected proportionally to their degree

Π(ki) = ki P

l kl

NETWORKS MODELS

slide-42
SLIDE 42

Albert-Barabasi model (1999)

  • degree distribution

P(k) = 2m2k−3

NETWORKS MODELS

slide-43
SLIDE 43

Albert-Barabasi model (1999)

  • clustering

hCi ⇠ (ln N)2 N

NETWORKS MODELS

slide-44
SLIDE 44

Albert-Barabasi model (1999)

  • path length

hli = log N log log N

NETWORKS MODELS

slide-45
SLIDE 45

In summary

  • the model creates scale-free networks
  • small-world phenomena
  • vanishing clustering

NETWORKS MODELS

slide-46
SLIDE 46

@net_science

Nicola Perra

MODELING AND FORECASTING EPIDEMIC EVENTS

slide-47
SLIDE 47

DATA

We are in a unique position in history

  • unprecedented amount of data now available on human activities and interactions

From the “social atom” to “social molecules”

  • dramatic shift in scale
  • new phenomenology (More is different!)

Digital revolution

slide-48
SLIDE 48

DATA

PLoS ONE, 8(4), 2013

slide-49
SLIDE 49

Mapping language use at worldwide scale

PLoS ONE, 8(4), 2013

PROBING SOCIO-DEMOGRAPHIC TREATS

slide-50
SLIDE 50

PROBING COGNITIVE LIMITS

The social brain hypothesis

  • typical social group size determined by neocortical size
  • measured in various primates, extrapolated for humans: 100-200 (Dunbar’s number)

PLoS ONE, 6(8), 2011

50 100 150 200 250 300 350 400 450 500 550 600 1 2 3 4 5 6 7 8

ωout k

  • ut

A) ρ

Average Weight per Connection

slide-51
SLIDE 51

www.ebolatracking.org

MAPPING THE GLOBAL DISCUSSION DURING EMERGENCIES

slide-52
SLIDE 52

PROBING HUMAN MOBILITY

slide-53
SLIDE 53

Active and passive data collections

  • (Active) participatory platforms
  • (Passive) data harvesting

PROBING HEALTH STATUSES

slide-54
SLIDE 54

DATA ARE NOT ENOUGH! WE NEED MODELS!

Holistic approach necessary --> Complex Systems/Networks

Data Models

slide-55
SLIDE 55

CAN WE FORECAST THE SPREADING OF INFECTIOUS DISEASES?

slide-56
SLIDE 56

GOOD EXAMPLES

Weather Forecasts

slide-57
SLIDE 57

WHY ARE WE ABLE TO FORECAST WEATHER?

Global collective effort Large computational resources Huge datasets Deep knowledge of the Physical processes

slide-58
SLIDE 58

FOR EPIDEMICS?

Global collective effort Large computational resources Huge datasets Deep knowledge of the Physical processes

slide-59
SLIDE 59

Within school contact patterns

Human interactions are contact networks

NETWORK THINKING

slide-60
SLIDE 60

Mobility and epidemic spreding

NETWORK THINKING

slide-61
SLIDE 61

Black death in1347: a continuous diffusion process

(Murray 1989)

SARS epidemics: a discrete network driven process

(Colizza et al. 2007; Brockmann&Helbing 2013)

NETWORK THINKING

slide-62
SLIDE 62

NETWORKS ARE CENTRAL IN THE ANALYSIS OF CONTAGION PROCESSES

slide-63
SLIDE 63

DISEASES SPREAD IN MULTI-LAYER NETWORKS

slide-64
SLIDE 64

WWW.GLEAMVIZ.ORG

slide-65
SLIDE 65

POPULATION LAYER

Division of the earth in ~800K cells Voronoi tessellation

slide-66
SLIDE 66

MOBILITY LAYER

Long distance: 99% of the world wide air network Short distance: real data+”gravity law”

slide-67
SLIDE 67

EPIDEMIC LAYER

Any general model: according to the disease under study

S I

I

R β µ

time

slide-68
SLIDE 68

DATA STRUCTURE

slide-69
SLIDE 69

GLEAM AT WORK

slide-70
SLIDE 70

SHORT TERM PREDICTIONS

Quantification of current risks

slide-71
SLIDE 71

LONG TERM PREDICTIONS

Crucial for vaccination campaigns Characterisation of the unknown parameters

  • Basic reproductive number, R0
slide-72
SLIDE 72

LONG TERM PREDICTIONS

R0 estimation

Traditional approach Fit the exponential phase Our approach Maximum Likehood on the arrival times BMC, 7, 45, 2009

slide-73
SLIDE 73

LONG TERM PREDICTIONS

BMC, 7, 45, 2009

slide-74
SLIDE 74

MODEL’S ACCURACY

BMC, 10, 165, 2012

slide-75
SLIDE 75

WHAT ABOUT THE SEASONAL FLU?

slide-76
SLIDE 76

PREDICTING THE SEASONAL FLU

Major public health concern

  • two modeling techniques: fits VS generative models
slide-77
SLIDE 77

PREDICTING THE SEASONAL FLU

Classic time-series approach

  • The goal is to find a correlation between a surveillance and another (more refined) data

source such as Twitter or queries on google

  • The parable of Google Flu Trends reveals the issues with this approach
slide-78
SLIDE 78

PREDICTING THE SEASONAL FLU

Generative models

  • Simulate the actual infection process
  • They requires a lot of data as “initial conditions” that are typically not available during

the outbreak

slide-79
SLIDE 79

CAN WE MERGE THE TWO?

slide-80
SLIDE 80

MODELING THE SEASONAL FLU

GLEAM GLEAM D B A C

1200 1000 800 600 400 200 043 48 1 6 11 cases week Training baseline best estimate confidence interval surveillance data Predictions

residual immunity Generation time R0

STAGE 1 STAGE 2 STAGE 3

M E C H A N I ST I C M O D E L I N G I N P U T O U T P U T M O D E L S E L E C T I O N

Extracting features of geographical locations, languages, and key words from Twitter data, and ILI trends from surveillance data. Parameter space sampling Stochastic simulations

A B C D

Model selection and prediction

slide-81
SLIDE 81

MODELING THE SEASONAL FLU

www.fluoutlook.org

slide-82
SLIDE 82

THANKS TO

  • A. Vespignani
  • D. Mistry
  • K. Sun
  • Q. Zhang
  • C. Cattuto
  • M. Quaggiotto
  • M. Delfino
  • A. Panisson
  • D. Paolotti
  • M. Tizzoni
  • L. Rossi
  • S. Meloni
  • Y. Moreno
  • L. Weng
  • A. Flammini
  • F. Menczer
  • A. Baronchelli
  • M. Starnini
  • B. Goncalves
  • C. Castillo
  • E. Ubaldi
  • F. Ciulla

T.S. Lu

  • F. Bonchi

L.M. Aiello

  • J. Ratkiewicz
  • M. Martino
  • C. Dunne
  • B. Riberio

M.V. Tommasello

  • C. Tessone
  • F. Schweitzer
  • M. Karsai
  • V. Colizza
  • C. Poletto
  • D. Chao
  • H. M. Halloran
  • I. Longini
  • V. Loreto
  • G. Caldarelli
  • A. Chessa
  • R. Pastor-Satorras
  • J. Borge-Holthoefer
  • R. Burioni
  • S. Liu
  • D. Mocanu
  • R. Compton
slide-83
SLIDE 83 ISBN 978-3-319-14010-0

1

Computational Social Sciences Series Editors: Elisa Bertino · Jacob Foster · Nigel Gilbert · Jennifer Golbeck · James A. Kitts Larry Liebovitch · Sorin A. Matei · Anton Nijholt · Robert Savit · Alessandro Vinciarelli

Computational Social Sciences

CSS

Bruno Gonçalves Nicola Perra Editors

Social Phenomena

From Data to Models Social Phenomena

Gonçalves · Perra Eds.

Bruno Gonçalves · Nicola Perra Editors

Social Phenomena

From Data to Models

Tiis book focuses on the new possibilities and approaches to social modeling currently being made possible by an unprecedented variety of datasets generated by our interactions with modern technologies. Tiis area has witnessed a veritable explosion

  • f activity over the last few years, yielding many interesting and useful results. Our

aim is to provide an overview of the state of the art in this area of research, merging an extremely heterogeneous array of datasets and models. Social Phenomena: From Data to Models is divided into two parts. Part I deals with modeling social behavior under normal conditions: How we live, travel, collaborate and interact with each other in our daily lives. Part II deals with societal behavior under exceptional conditions: Protests, armed insurgencies, terrorist attacks, and reactions to infectious diseases. Tiis book ofgers an overview of one of the most fertile emerging fjelds bringing together practitioners from scientifjc communities as diverse as social sciences, physics and computer science. We hope to not only provide an unifying framework to understand and characterize social phenomena, but also to help foster the dialogue between researchers working on similar problems from difgerent fjelds and perspectives.

Physics 9 7 8 3 3 1 9 1 4 0 1 0 0