analysis of network flow data
play

Analysis of Network Flow Data Gonzalo Mateos Dept. of ECE and - PowerPoint PPT Presentation

Analysis of Network Flow Data Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ April 26, 2016 Network Science Analytics Analysis of


  1. Analysis of Network Flow Data Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ April 26, 2016 Network Science Analytics Analysis of Network Flow Data 1

  2. Network flows Network flows, measurements and statistical analysis Gravity models Traffic matrix estimation Case study: Internet traffic matrix estimation Estimation of network flow costs Case study: Dynamic delay cartography Network Science Analytics Analysis of Network Flow Data 2

  3. Traffic flows ◮ Networks often serve as conduits for traffic flows Example ◮ Commodities and people flow over transportation networks; ◮ Data flows over communication networks; and ◮ Capital flows over networks of trade relations ◮ Flow-related questions on network design, provisioning and routing ⇒ Solutions involve tools in optimization and algorithms ◮ Our focus: statistical analysis and modeling of network flow data ⇒ Regression-based prediction of unknown flow characteristics Network Science Analytics Analysis of Network Flow Data 3

  4. Routing matrix ◮ Let G ( V , E ) be a digraph. Flows are directed: origin → destination ⇒ Directed edges (arcs) here referred to as links ⇒ Number of flows is N f , typically have N f = O ( N 2 v ) ⇒ Flows traverse multiple links en route to their destinations ◮ Routing matrix R ∈ { 0 , 1 } N e × N f states incidence of routes with links � 1 , if flow f routed via link e , r e , f = 0 , otherwise ◮ Assumed flows follow a single route from origin to destination Network Science Analytics Analysis of Network Flow Data 4

  5. Example: Routing of two flows Ex: Consider a digraph with N e = 7 links and N f = 2 active flows e 2   1 0 f 2 0 0 e 5     1 0   e 7   f 1 R = 0 0 e 3     0 1   e 1   0 1   e 6 e 4 1 0 ◮ Strongly connected digraph: flows can be as many as N v ( N v − 1) Network Science Analytics Analysis of Network Flow Data 5

  6. Traffic matrix ◮ Central to study of network flows is the traffic matrix Z ∈ R N v × N v ◮ Entry z ij is total volume of flow from origin vertex i to destination j ◮ Ex: net out-flow from i and net in-flow to j given by � � z i + = z ij and z + j = z ij j i ◮ Link-level aggregate traffic vector x := [ x 1 , . . . , x N e ] T related to Z as x = Rz , where z := vec( Z ) ⇒ Link counts x e equal the sum of flow volumes routed through e Network Science Analytics Analysis of Network Flow Data 6

  7. Flow costs and time dependencies ◮ Notion of cost c associated with paths or links also important Ex: generalized socioeconomic cost for transportation analysis ⇒ Study choices made by consumers of transportation resources Ex: quality of service (QoS) in network traffic analysis ⇒ Monitor delays to unveil congestion or anomalies ◮ Implicitly assumed a static snapshot taken of the network flows ⇒ Flows dynamic in nature. Time-varying models more realistic ⇒ When appropriate will denote x ( t ) , Z ( t ) or R ( t ) ◮ Common assumption to treat routing matrix R as being fixed ⇒ Routing changes at slower time scale than flow dynamics Network Science Analytics Analysis of Network Flow Data 7

  8. Example: Internet2 traffic matrix ◮ Internet2 backbone: N f = 110 flows (8 shown) over a week ⇒ Temporal periodicity and “spatial” correlation apparent Network Science Analytics Analysis of Network Flow Data 8

  9. Roadmap ◮ Roadmap dictated by types of measurement and analysis goal ◮ Measure: origin-destination (OD) flow volumes z ij in full ◮ Goal: model flows to understand and predict future traffic ⇒ Gravity models ◮ Measure: link counts x e , flow volumes unavailable ◮ Goal: traffic matrix estimation, i.e., predict unobserved OD flows z ij ⇒ Gaussian and Poisson models, entropy minimization ◮ Measure: OD costs c ij for a subset of paths ◮ Goal: predict unobserved OD and link costs ⇒ Active network tomography and network kriging Network Science Analytics Analysis of Network Flow Data 9

  10. Gravity models Network flows, measurements and statistical analysis Gravity models Traffic matrix estimation Case study: Internet traffic matrix estimation Estimation of network flow costs Case study: Dynamic delay cartography Network Science Analytics Analysis of Network Flow Data 10

  11. Gravity models ◮ Gravity models originate in the social sciences [Stewart ’41] ⇒ Describe aggregate level of interactions among populations ◮ Ex: geography, economics, sociology, hydrology, computer networks ◮ Newton’s law of gravitation for masses m 1 , m 2 separated by d 12 F 12 = G m 1 m 2 d 2 12 ◮ Gravity models specify interactions among populations vary: ⇒ In direct proportion to the population’s sizes; and ⇒ Inversely with some measure of their separation ◮ Intuition: OD flows as “population interactions”, makes sense! Network Science Analytics Analysis of Network Flow Data 11

  12. Model specification ◮ Sets of origins I and destinations J . Flows Z ij from i ∈ I to j ∈ J ◮ Gravity models state Z ij are independent, Poisson, with mean E [ Z ij ] = h O ( i ) h D ( j ) h S ( c ij ) ⇒ Origin h O ( · ), destination h D ( · ), and separation function h S ( · ) ⇒ “Distance” between i , j captured by separation attributes c ij ◮ Ex: Stewart’s theory of demographic gravitation specifies E [ Z ij ] = γπ O , i π D , j d − 2 ij ⇒ Population sizes measured by π O , i and π D , j , distance by d ij ⇒ Demographic gravitational constant γ ◮ Unlike Netwon’s law, no empirical or theoretical support here Network Science Analytics Analysis of Network Flow Data 12

  13. Origin, destination and separation functions ◮ Multiple origin, destination and separation functions proposed ⇒ Motivated from sociophysics and economic utility theory ◮ Ex: power functions for h O ( i ) and h D ( j ), where for α, β ≥ 0 h O ( i ) = ( π O , i ) α h D ( j ) = ( π D , j ) β and ◮ Ex: power function h S ( c ij ) = c − θ ij , θ ≥ 0. General exponential form h S ( c ij ) = exp( θ T c ij ) , θ , c ij ∈ R K ◮ Convenient for inference of model parameters, since log E [ Z ij ] = log γ + α log π O , i + β log π D , j + θ T c ij ⇒ Log-linear form facilitates standard regression software Network Science Analytics Analysis of Network Flow Data 13

  14. Example: Austrian phone-call data ◮ Q: Structure of telecommunication interactions among populations? ⇒ Planning for government (de)regulation of the sector ⇒ Predict influence of technologies in regional development ◮ Gravity models to model telecommunication patterns as flows ◮ Data for phone-call traffic among 32 Austrian districts in 1991 ⇒ 32 × 31 = 992 flow measurements z ij , i � = j = 1 , . . . , 32 ⇒ Gross regional product (GRP) per region → Size proxy ⇒ Road-based distance among regions → Separation proxy Network Science Analytics Analysis of Network Flow Data 14

  15. Phone-call data scatterplots 6.5 7.5 8.5 6.5 7.5 8.5 1.6 2.0 2.4 2.8 5 4 Z ij 3 2 1 | | | | | | | || | | | | | | | | | | | | GRP i GRP j d ij 8.5 ◮ Data (in log 10 scale) suggest a gravity model of the form E [ Z ij ] = γ ( π O , i ) α ( π D , j ) β ( c ij ) − θ ⇒ π O , i = GRP i , π D , j = GRP j , c ij = d ij i - j ’s road-based distance ◮ Typical that flow volumes vary widely in scale Network Science Analytics Analysis of Network Flow Data 15

  16. Inference for gravity models ◮ Specified Z ij as independent Poisson RVs, with means µ ij = E [ Z ij ] ⇒ ML for statistical inference in the general gravity model ◮ Let α i = log h O ( i ), β i = log h D ( j ) and θ ∈ R K . Will focus on log µ ij = α i + β j + θ T c ij ⇒ Log-linear model ∈ class of generalized linear models ◮ P. McCullagh and J. Nedler, Generalized Linear Models . CRC, 1989 ◮ Given flow observations Z = z , the Poisson log-likelihood for µ is � ℓ ( µ ) = z ij log µ ij − µ ij i , j ∈I×J ⇒ Substitute the gravity model and maximize ℓ ( µ ) for MLE Network Science Analytics Analysis of Network Flow Data 16

  17. ML parameter estimates α i } i ∈I , ˆ β := { ˆ β j } j ∈J and ˆ ◮ MLEs ˆ α := { ˆ θ satisfy T c ij , i , j ∈ I × J ⇒ log ˆ α i + ˆ β j + ˆ log ˆ µ ij = ˆ µ = M ˆ θ γ T ˆ α T ˆ T � T , mean flow estimates ˆ ◮ Defined ˆ � γ := ˆ β θ µ ij solve � � µ ij = z i + , i ∈ I µ ij = z + j , j ∈ J ˆ and ˆ j i � � c ij ( k )ˆ µ ij = c ij ( k ) z ij , k = 1 , . . . , K i , j i , j ◮ Unique MLE ˆ θ under mild conditions, e.g., rank( M ) = I + J + K − 1 α i , ˆ ⇒ Values ˆ β j unique only up to a constant ◮ A. Sen, “Maximum likelihood estimation of gravity model parameters,” J. Regional Science , vol. 26, pp. 461-474, 1986 Network Science Analytics Analysis of Network Flow Data 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend