Detecting the source of spread in complex networks Boleslaw - PowerPoint PPT Presentation

Detecting the source of spread in complex networks Boleslaw Szymanski and Krzysztof Suchecki RPI , Troy 1

Plan ● Spreading processes and sources ● Source search in networks ● Pinto-Thiran-Vetterli algorithm ● Beyond basic methods 2

Spreading processess and sources physical substances infections waves Start small Become widespread 3

Spreading processess and sources Is it possible to identify the source ? If we have full data, it's obviously easy . The point at which the wave/cloud/infection/etc. appeared earliest is the source. Usually we don't have full data, only partial: ● Limited time (only since certain point) ● Limited scope (only know certain points) 4

Spreading processess and sources Is it possible to identify the source ? In deterministic spreading (e.g., waves) in space, this is easy . t=8 Given D+1 points with time, or just 2 points with direction, we can tell where the source is. Problems: t=9 ● Stochastic/complex dynamics t=3 (epidemics) t=7 Complex space (spreading in ● atmosphere) Spreading in network (epidemics, ● information) 5

Source search in networks source Similar “triangulation” approach could be observer used in networked environment. spreading t=2 -each observer has a “circle” of radius time equal to time of observation - where all “circles” intersect is the source t=2 t=1 t=2 t=2 t=1 6

Source search in networks source If the process is stochastic, then the times are random variables and sharp-defined observer “circles” become blurry distributions. spreading t =4 P(s|t) 2 time i Probability of given node being source conditional on observation time t at observer i t 3 =5 i t 1 =3 t ~4 2 Note: on the right, the sum of t ~5 3 probabilities from different observers are added up – this is not overall t ~3 1 probability for given node to be source P(s|t )+P(s|t )+P(s|t )≠P(s|t ,t ,t ) 7 1 2 3 1 2 3

Source search in networks source If we look at all observers together, could observer we determine the overall probability ? spreading P ( s | t 1 ,t 2 ,t 3 )≡ P ( s | t ) t =4 2 time If we have this, we could determine the most likely source. t 3 =5 t 1 =3 t ~4 2 t ~5 3 t ~3 1 8

Source search in networks Bayes' Theorem: P ( s | t )= P ( t | s ) P ( s ) P ( t ) In other words: With this, we can calculate P(s|t) if we know P(t|s) – distribution of observed times if If we can calculate distribution of given node would be source times given a source, we can calculate distribution of probability P(s) – usually we know nothing about of being source given observation which node could be real source, so we times. assume uniform 1/N distribution over all nodes T o calculate P(t|s) we need to P(t) – we can calculate as know something about the P ( t )= ∑ P ( t ,s )= ∑ P ( s ) P ( t | s ) spreading process. s s Which we will need only for single value of t (the one that was observed) 9

Source search in networks The better model we have for spreading, the more accurately we can calculate P(t|s), and thus make more accurate calculation of P(s|t) and find the source. ● Susceptible-Infected(-Recovered) model, Infection rate created to describe spread of infectious  Recovery rate I I diseases, is one of most commonly used to    I describe complex behavior, by reducing it to S I randomness. R ● Diffusion/random walks, could be used to Random movement rate    describe spread that conserves some “mass” ● Assume normally distributed delays on edges Delays normally distributed this is not really accurate model for anything, t2-t1 ~ N(μ,σ) but unlike others, is possible to precisely calculate P(t|s) analytically t1 t2 - could be used to approximate other models 10

Source search in networks Assume: ● normal delays on links t ~N(μ,σ) ij tree topology ← unfortunately necessary for analytical solution ● assuming Mean: IID delays t =t +t μ =μ 2 0 1 12 =μ t 43 1 01 t 12 =2μ μ =μ +μ 2 01 12 t =t +t t =2μ 3 04 43 μ =μ +μ t =t t 01 3 04 43 04 1 01 V ariance: σ 2 =σ 2 =σ 2 1 01 σ 2 =σ 2 +σ 2 12 =2σ 2 2 01 σ 2 =σ 2 +σ 2 =2σ 2 Sum of normally distributed variables t = 3 04 43 ij = normally distributed variables t i 2 ( t i − μ i ) 1 exp − P ( t i )= √ 2 2 2 σ i 2 πσ i 11

Source search in networks Assume: ● normal delays on links t ~N(μ,σ) ij tree topology ← unfortunately necessary for analytical solution ● t =t +t 2 0 1 12 Mean: t 43 μ=? t 12 t =t +t t 3 04 43 t =t t 01 Covariance: 04 1 01 Σ=? T ake all times – multivariate normal distribution 1 P (⃗ t )= t 2 exp − 1 (⃗ μ) T Σ − 1 (⃗ t − ⃗ t − ⃗ μ) Note: times may be correlated ! 12 t 1

Source search in networks We know how to calculate P(t|s) as multivariate normal distribution under few assumptions. We can get what is probability P(t|s) for the observed time and calculate P(t) t o 2 1 o o 3 best fit ! 2 (highest P(t|s)) Note: illustration only , distributions not t 1 according to network shown on the right P ( s | t )= P ( t | s ) P ( s ) Given and P(s) (a priori), P(t) (from P(t|s) and P(s)) P ( t ) We know that node s with highest P(s|t) is the one where P(t|s) is highest (what distribution fits the real data best) 14

Source search in networks We know how to calculate P(t|s) as multivariate normal distribution under few assumptions. We can get what is probability P(t|s) for the observed time and calculate P(t) t o 2 1 o o 2 3 highest P(s|t) Note: illustration only , distributions not t 1 according to network shown on the right We can also calculate P(s|t) and thus calculate how likely it is for each node to be source. (distribution of P(s|t) on nodes) 15

Pinto-Thiran-Vetterli algorithm Known: source t=15 ● Network topology observer Times when spreading arrived ● t=8 spreading at observers Mean time it takes to infect ● time along a single link t=17 t=7 V ariance of that time ● Want to know ● True source of the spread t=12 t=3 Assumes ● Network is a tree (or approximates as such) ● Normally distributed delays Not known ● When spread started (not on links necessarily at t=0) P .C. Pinto, P . Thiran, M. Vetterli, “Locating the source of diffusion in large-scale networks”, Physical Review Letters 109, 068702 (2012) 16

Pinto-Thiran-Vetterli algorithm Issue: network is not a tree Solution: make a tree out of it ! o 0 Since spreading process uses fastest path, it usually means the shortest topologically . o 2 o Use Breadth-First Search to make a tree s Suspected source 1 (BFS tree) rooted at suspected source. Note: each suspected source may have different BFS tree, unless original network Which link to take ? is actually a tree. Shortest paths are not unique, so we have to take one of the trees. Different trees may give different results. 17

Pinto-Thiran-Vetterli algorithm Issue: we don't know the “zero” time (when spread started) Solution: look at relative times only – use one observer as reference (e.g. observer 1 becomes 0 (reference), 2→1, 3→2) Mean: use time relative to reference o 0 μ= μ | P s 1 P s 0 |−| | = μ − 1 ⃗ o 2 0 | P |−| P | s 2 s 0 o 1 Covariance: use paths anchored at reference, not suspected source Note: since the correlations are = σ 2 1 1 correct for tree only, for non-trees | P 02 ∩ P 01 | | P 02 | it's only approximation. Using 1 4 closest observer (with smallest time) as reference minimizes this reference observer also introduces randomness, error for non-tree networks. which is added or substracted from relative results (depend on situation) 18

Pinto-Thiran-Vetterli algorithm Performance of PTV algorithm: Only really works when infection rate is high → so called propagation ratio  /  . High propagation ratio – process is more deterministic. Low propagation ratio – process is more stochastic. Can't expect to find a needle in a haystack with few measurement points, but still performs reasonably well if the process isn't too random. Note: broken horizontal lines show accuracy of naive method that says that observer with lowest time is actual source, accuracy is equal density of observers then 19

Beyond basic methods What can be we improve ? ● Make it faster (because it's slow O(N 3 ) or worse) Don't approximate with a tree ● Use other distribution than normal ● Adapt for directed, weighted network ● Early estimation of source using yet silent observers ● Note: red – not attempted or done, hard to solve yellow – only approximation done green – done black – under investigation 20

Detecting the source of spread in complex networks Boleslaw - PowerPoint PPT Presentation

Detecting the source of spread in complex networks Boleslaw Szymanski and Krzysztof Suchecki RPI , Troy 1 Plan Spreading processes and sources Source search in networks Pinto-Thiran-Vetterli algorithm Beyond basic methods 2

Detecting Spammers and Content Detecting Spammers and Content Detecting Spammers and Content

Detecting the source of spread in complex networks Krzysztof Suchecki 08/29/2019, Troy Plan

Spread Spectrum Concept Frequency Hopping Spread Spectrum Direct Sequence Spread

12/6/2013 Detecting Fakes Image Forensics: Detecting Forged Photos 1.Detecting photorealistic

Overview of Complex Networks Complex Networks Principles of Complex Systems | @pocsvox Basic

Complex Networks Basic definitions Principles of Complex Systems Books Course 300, Fall, 2008

Complex Networks Principles of Complex Systems Basic definitions Examples of CSYS/MATH 300,

network Complex Networks Complex Networks Prof. Peter Dodds Nutshell Nutshell noun Basic

Outline Overview of Complex Networks Complex Networks Complex Networks Basic definitions

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

Complex Networks Principles of Complex Systems Basic definitions Examples of Course CSYS/MATH

network Complex Networks Complex Networks experience for professional or social purposes : a

Overview of Complex Networks Principles of Complex Systems Basic definitions Examples of

network Complex Networks Complex Networks Prof. Peter Dodds Nutshell Nutshell noun

Complex Networks :. .: Lectures notes for Basics of Complex Networks Course 295C Fall, 2007

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

Future Possibilities at Jefferson Lab (JLab) Arne Freyberger Operations Department Accelerator

Lecture 8/Chapter 7 Part 2. Summarizing Data Ch.7: Measurement Data Summaries Displaying

HCPA Event Fire Safety Workshop for Residential/Supported Living Services Welcome &

From Interaction Overview Diagrams to PEPA nets Le la Kloul PRiSM, Universit e de

The Future of Postgres Sharding This presentaon will cover the advantages of sharding and

T h e E C M W F / C o p e r n i c u s l a t e s t g l o b a l r e a n a l y s i s E R A 5 Andras

Predicting Fix Locations from Bug Reports Master Thesis by Markus Thiele (Supervised by Rahul

Th Thin inkin king, f fas ast an and slo slow Daniel Kahnema man Fa Fast

Detecting the source of spread in complex networks Boleslaw - PowerPoint PPT Presentation

Detecting the source of spread in complex networks Boleslaw Szymanski and Krzysztof Suchecki RPI , Troy 1 Plan Spreading processes and sources Source search in networks Pinto-Thiran-Vetterli algorithm Beyond basic methods 2

Detecting Spammers and Content Detecting Spammers and Content Detecting Spammers and Content

Detecting the source of spread in complex networks Krzysztof Suchecki 08/29/2019, Troy Plan

Spread Spectrum Concept Frequency Hopping Spread Spectrum Direct Sequence Spread

12/6/2013 Detecting Fakes Image Forensics: Detecting Forged Photos 1.Detecting photorealistic

Overview of Complex Networks Complex Networks Principles of Complex Systems | @pocsvox Basic

Complex Networks Basic definitions Principles of Complex Systems Books Course 300, Fall, 2008

Complex Networks Principles of Complex Systems Basic definitions Examples of CSYS/MATH 300,

network Complex Networks Complex Networks Prof. Peter Dodds Nutshell Nutshell noun Basic

Outline Overview of Complex Networks Complex Networks Complex Networks Basic definitions

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

Complex Networks Principles of Complex Systems Basic definitions Examples of Course CSYS/MATH

network Complex Networks Complex Networks experience for professional or social purposes : a

Overview of Complex Networks Principles of Complex Systems Basic definitions Examples of

network Complex Networks Complex Networks Prof. Peter Dodds Nutshell Nutshell noun

Complex Networks :. .: Lectures notes for Basics of Complex Networks Course 295C Fall, 2007

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

Future Possibilities at Jefferson Lab (JLab) Arne Freyberger Operations Department Accelerator

Lecture 8/Chapter 7 Part 2. Summarizing Data Ch.7: Measurement Data Summaries Displaying

HCPA Event Fire Safety Workshop for Residential/Supported Living Services Welcome &amp;

From Interaction Overview Diagrams to PEPA nets Le la Kloul PRiSM, Universit e de

The Future of Postgres Sharding This presentaon will cover the advantages of sharding and

T h e E C M W F / C o p e r n i c u s l a t e s t g l o b a l r e a n a l y s i s E R A 5 Andras

Predicting Fix Locations from Bug Reports Master Thesis by Markus Thiele (Supervised by Rahul

Th Thin inkin king, f fas ast an and slo slow Daniel Kahnema man Fa Fast

HCPA Event Fire Safety Workshop for Residential/Supported Living Services Welcome &