large scale wi fi traffic in public
play

Large-Scale Wi-Fi Traffic in Public Hot-Spots Amitabha Ghosh * , R. - PowerPoint PPT Presentation

Modeling and Characterization of Large-Scale Wi-Fi Traffic in Public Hot-Spots Amitabha Ghosh * , R. Jana + , V. Ramaswami + , J. Rowland + , N.K. Shankar + * Electrical Engineering, Princeton University + AT&T Labs - Research 1 Outline


  1. Modeling and Characterization of Large-Scale Wi-Fi Traffic in Public Hot-Spots Amitabha Ghosh * , R. Jana + , V. Ramaswami + , J. Rowland + , N.K. Shankar + * Electrical Engineering, Princeton University + AT&T Labs - Research 1

  2. Outline  Goals  An Overview of Data  Arrival Count Modeling  Connection Duration Modeling  Simultaneous Users Modeling  Conclusions 2

  3. Motivation  Increasing number of WLAN deployments to meet the growing demand of (mobile) users for wireless access 3

  4. Goals  Study and analyze Wi-Fi traces collected by AT&T in March 2010 in NYC and SF  Coffee shops, fast food chains, book stores, hotels, stadiums  Data contains:  Connection login/logout times  Bytes uploaded/downloaded  Venue size (small, medium, large), z ip codes, …  Realistic modeling of Capacity Planning  Session arrivals  Connection duration distribution  Simultaneously present customer distribution 4

  5. Data Collection Mobile Internet Access using Wi-Fi Hotspots 5

  6. Data Statistics # of customers 234,742 # of devices 10 # of connections 1,322,541 # of cities 2 (NYC, SF) # of Wi-Fi venues 362 # of zip codes 87 Trace duration 4 weeks (3 weeks training, 1 week validation) 6

  7. Overview: Arrivals 15 12 Weekday Tiny Bookstore/Hotels Coffee Shops Small Weekend Medium Average number of arrivals Average number of arrivals 10 Large 12 8 9 6 6 4 3 2 0 0 3 am 6 am 9 am 12 pm 3 pm 6 pm 9 pm 12 am 3 am 6 am 9 am 12 pm 3 pm 6 pm 9 pm 12 am 3 am 6 am 9 am 12 pm 3 pm 6 pm 9 pm 12 am 3 am 6 am 9 am 12 pm 3 pm 6 pm 9 pm 12 am Two weekdays (15 min bins) Two days (15 min bins)  Arrival rates vary drastically within the same business type  Characteristic peaks in means across all categories within same business type  Significantly different weekday and weekend patterns 7

  8. Overview: Byte Counts  Coffee shops: typically download a few KB  Enterprises: typically download a few MB to a few GB  Long tails 8

  9. Overview: Durations CDF of connection durations by Complimentary distribution business types function of connection durations (log-log scale) by business types => Long tails 9

  10. Arrival Count Modeling: Approach  Data showed time-dependent arrival rates  MMPP fails  Models arrival counts with constant periods of arrival rate  Polynomial curve fitting to the observed mean  Poor performance  Could not capture within-day pattern with small no. of terms  Standard Poisson regression fails  Non-homogeneous Poisson regression with clustering 10

  11. Arrival Count Modeling  Non-stationary Poisson Process  Time-dependent deterministic arrival rate  Divide time into 3 hour bins I: 8 bins per day  Divide each bin into 15 min slots J: 12 slots per bin I: time 3 hr J: 15 min 11

  12. Arrival Count Modeling  Poisson Regression Model (GLM)  Polynomial type dependence on bin and slot numbers  First term  Over-a-day mean behavior  Sum terms  Differential effects of specific cluster and slots within it  Last term  Interaction term – differential effect of slot J does not have to be the same across all clusters 12

  13. Arrival Count Modeling  Clustering  K-means clustering:  Cluster time slots into groups such that within each group the average number of arrivals do not differ much  Automatic 24 hour wrap-around in clustering  Clusters of 15 min time slots over a day  Non-contiguous busy slots (35-37, 72-75) map to a common cluster 13

  14. Results: Arrivals 9 Observed mean arrival rate 8 Model mean arrival rate Average number of arrivals 7 6 5 4 3 2 1 0 3 am 6 am 9 am 12 pm 3 pm 6 pm 9 pm 12 am One weekday (15 min bins)  Coffee shops: Observed mean arrival rate plotted against the model mean arrival rate; these provide intra--day patterns for a cluster by averaging over its members 14

  15. Results: Arrivals 14 Observed data Model mean 12 2.5% quantile 97.5% quantile 10 Number of arrivals 8 6 4 2 0 Mon Tue Wed Thu Fri 5 weekdays (15 min bins)  Coffee Shops: Model mean arrival rate along with the 97.5% quantile and 2.5% quantile bands plotted against 5 days of validation data for an example coffee shop. 15

  16. Session Duration Modeling  Model the logarithm of duration (Y) as a Phase- Type (PH) random variable (X) 16

  17. PH-Type Distribution  Properties of a PH-Type random variable  Distribution time to absorption in a Markov Process  Dense in the class of all distributions  Exponentially decaying tail asymptotically going to 0 as , is the real Eigen value of the rate transition matrix  Captures both tails and heads, as opposed to Pareto and Weibull 17

  18. Results: Duration 1  Phase type distributions were 0.9 fit using the EM algorithm 0.8 0.7  A fit of order 5 was found to 0.6 be adequate CDF 0.5 0.4 0.3 0.2 0.1 Observed Model 0 0 50 100 150 200 250 300 350 Connection duration (min)  Coffee Shops: CDF plot of durations for coffee shops and data (truncated at 6 hours) 18

  19. Simultaneous Connections  Arrivals  Non-homogeneous Poisson process (time-dependent arrival rates)  Durations  PH-type distribution  Simultaneously present customers  Queuing model 19

  20. Simultaneous Connections  Theorem The number of busy servers Q(t), i.e., the number of simultaneously present customers, at time t follows a Poisson distribution with mean m(t) given by: where H() is the service time distribution 20

  21. Simultaneous Connections  Novel proof based on semi-regenerative argument  Does not require the system to be empty at some infinite past  Simple, transparent, and general  Show that the Probability Generating Function G(t) of Q(t) is 21

  22. Simultaneous Connections  Proof idea: u v t  No arrivals in (u,t]  First arrival occurs at some v in (u,t]  Q(u): num of customers who arrive in (u,t] and are still there at t  G(z,u): PGF of Q(u,t)  Expected number of arrivals in (0,t]  Let 22

  23. Simultaneous Connections  Solve the integral equation 23

  24. Results: Simultaneous Connections 15 Number of simultaneously present customers Observed data Model mean, m(t) 2.5% quantile 12 97.5% quantile 9 6 3 0 Mon Tue Wed Thu Fri 5 weekdays (15 min bins)  Coffee Shops: Expected number of simultaneously present customers along with the 97.5% quantile and 2.5% quantile bands plotted against 5 days of validation data for an example coffee shop. 24

  25. Conclusions  Examined salient differences w.r.t.  Arrival counts, temporal variations, connection durations, byte counts  Modeling  Arrival count modeling using statistical clustering and non- stationary Poisson model under GLM framework  Use of Phase-Type r.v. to model the logarithm of long-tailed durations  Simultaneously present customer modeling using a queuing model  New proof on semi-regenerative argument for the number of busy servers in queue 25

  26.  Amitabha Ghosh, Rittwik Jana, V. Ramaswami, Jim Rowland, and N. K. Shankaranarayanan, Modeling and Characterization of Large-Scale Wi-Fi Traffic in Public Hot-Spots, INFOCOM 2011, Shanghai, China, April 2011. http://www.princeton.edu/~amitabhg/ Email: amitabhg@princeton.edu Thank you! 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend