analysis of wide area user mobility patterns
play

Analysis of wide area user mobility patterns Kevin Simler*, Steven - PowerPoint PPT Presentation

Analysis of wide area user mobility patterns Kevin Simler*, Steven E. Czerwinski , Anthony Joseph UC Berkeley * Now at MIT 2004/12/02 Now at Google Motivation We want to understand user behavior In order to design better


  1. Analysis of wide area user mobility patterns Kevin Simler*, Steven E. Czerwinski † , Anthony Joseph UC Berkeley * Now at MIT 2004/12/02 † Now at Google

  2. Motivation � We want to understand user behavior � In order to design better systems � In order to generate synthetic traces � In order to model user behavior � How can we capture user presence in the wide area?

  3. Motivation � We want to understand user behavior � In order to design better systems � In order to generate synthetic traces � In order to model user behavior � How can we capture user presence in the wide area? web

  4. Motivation � We want to understand user behavior � In order to design better systems � In order to generate synthetic traces � In order to model user behavior � How can we capture user presence in the wide area? web, IM

  5. Motivation � We want to understand user behavior � In order to design better systems � In order to generate synthetic traces � In order to model user behavior � How can we capture user presence in the wide area? web, IM, …, e-mail

  6. Why e-mail? � E-mail is a widely-used service � User typically checks e-mail first � Berkeley provides IMAP + web front end � Any Internet connection → e-mail access � E-mail reflects users’ Internet presence

  7. Outline � Background � Analysis and results � User modeling � Future work � Summary

  8. Trace characteristics � 31-days (May 2003) � Server from UC Berkeley EECS dept. � Regular IMAP plus web front-end � 1004 active users, primarily: � Professors � Graduate students � Support staff � Tracked across different service providers

  9. Building on previous work � Wireless Campus Studies � Mobility on a campus � Single service provider with homogenous users � Tang & Baker, Kotz & Essien, Balazinska & Castro � Metricom WLAN � Mobility across/between cities � Single service provider with more diverse users � Tang & Baker

  10. Trace data � Each entry in the trace includes: � Timestamp (seconds) � Request type ( login , close , select , etc.) � Username � IP address

  11. Preprocessing � We want user behavior � Trace records client application behavior � Outlook, Eudora, Thunderbird, etc. � Primary difference: � Client polls for new e-mail at regular intervals � Fixed period per client, variable across clients

  12. We filter client polling using a Fourier transform Client connections from a single user: … client connection login logout

  13. We filter client polling using a Fourier transform p p … Use a Fourier transform to identify polling period p .

  14. We filter client polling using a Fourier transform … Identify sequence separated by p . Remove all but the first connection.

  15. We filter client polling using a Fourier transform > 15 minute gap … Clump connections into user sessions

  16. We filter client polling using a Fourier transform … user session user session

  17. We filter client polling using a Fourier transform … Now we have (roughly) a trace of user behavior

  18. Outline � Background � Trace analysis � Defining location � Daily mobility � Monthly mobility � Session activity � User modeling � Future work � Summary

  19. Defining network location � Connection used to access the Internet � E.g. a dialup ISP, campus wireless network � Approximated by a combination of � Authoritative DNS server � AS number � Subnet

  20. How mobile are users each day? Fraction of user-days 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 Number of locations

  21. How mobile are users each day? Fraction of user-days 0.6 50% of user- 0.5 days involve logging in from 0.4 only 1 location 0.3 0.2 0.1 0 0 1 2 3 Number of locations

  22. How mobile are users each day? Fraction of user-days 0.6 15% of user- 0.5 days involve logging in from 0.4 2 locations 0.3 0.2 0.1 0 0 1 2 3 Number of locations

  23. How mobile are users each day? Fraction of user-days 0.6 Upshot: On any 0.5 given day, users are not highly 0.4 mobile 0.3 0.2 0.1 0 0 1 2 3 Number of locations

  24. How mobile are users in 31 days? � How many unique subnets do they visit? � How many unique AS #s do they visit? Let’s look at a graph….

  25. How mobile are users in 31 days? 1 cumulative fraction of users subnets 0.8 AS #s 0.6 0.4 0.2 0 0 2 4 6 8 10 12 14 # clusters

  26. How mobile are users in 31 days? 1 cumulative fraction of users subnets 0.8 AS #s 0.6 80% of users 0.4 log in from 8 or 0.2 fewer unique 0 subnets 0 2 4 6 8 10 12 14 # clusters

  27. How mobile are users in 31 days? 1 cumulative fraction of users subnets 0.8 AS #s 0.6 90% of users 0.4 log in from 3 or 0.2 fewer unique 0 AS numbers 0 2 4 6 8 10 12 14 # clusters

  28. How mobile are users in 31 days? 1 cumulative fraction of users subnets 0.8 AS #s 0.6 Upshot: Again, 0.4 most users are 0.2 not highly 0 mobile 0 2 4 6 8 10 12 14 # clusters

  29. User activity at a location 0.7 fraction of visits 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4+ # sessions

  30. User activity at a location 0.7 60% of visits to fraction of visits 0.6 a location 0.5 result in only 1 0.4 session 0.3 0.2 0.1 0 1 2 3 4+ # sessions

  31. User activity at a location 0.7 20% of visits to fraction of visits 0.6 a location result 0.5 in exactly 2 0.4 sessions 0.3 0.2 0.1 0 1 2 3 4+ # sessions

  32. User activity at a location 0.7 Upshot: Users fraction of visits 0.6 access their e- 0.5 mail once or 0.4 twice per visit. 0.3 0.2 0.1 0 1 2 3 4+ # sessions

  33. Outline � Background � Trace analysis � User modeling � Categorizing users � Model structure � Training and testing � Future work � Summary

  34. Categorizing users � Based on number of primary locations � For a given user, a primary location is: � One where the user spends >5% of the time � Categories � Users with 1 primary location � Users with 2 primary locations � Users with 3+ primary locations

  35. Structure of our models � One model for each category � Two-tiered Markov model � High-level states represent user’s location � Low-level states represent user’s activity � Both MMs are 1 st order

  36. Model structure for category 2 � 2 primary locations + 1 traveling state primary 1 primary 2 traveling

  37. Model structure for category 2 � 2 primary locations + 1 traveling state primary 1 High-level (location) states primary 2 traveling

  38. Model structure for category 2 � 2 primary locations + 1 traveling state primary 1 Low-level (session) states primary 2 I.e. Logged-In and Logged-Out traveling

  39. Training � We have all the information � Which locations are primary � Where the user is, at any time � When the user is logged in/out � Simple to compute transition probabilities

  40. Testing methodology � Create synthetic trace � Chose metrics to measure a trace � Compare real trace with synthetic trace

  41. Testing one metric � # of sessions between visits to primary � Each user visits his primary � leaves to visit other locations � then comes back to his primary � Every time this happens, record the number of other locations � There will be a CDF for the entire trace (real or synthetic)

  42. Testing results

  43. Outline � Background � Trace analysis � User modeling � Future work � Summary

  44. Using the results � Synthetic traces can help test systems � User behavior has implications for design � E.g. focus resources on primary locations � Model can predict user behavior on-the-fly � E.g. to cache, or not to cache?

  45. As technology changes… � Blackberries � More physical locations � Shorter, more frequent sessions � Still, primary locations will be important � Wireless LAN hotspots � More network locations

  46. Outline � Background � Trace analysis � User modeling � Future work � Summary

  47. Summary – what we’ve done � Obtained a trace from an e-mail server � Filtered out client polling � Analyzed trace of user behavior � Modeled categories of users with tiered MM � Generated synthetic traces

  48. Summary – user behavior � Most users log in from 1 or 2 locations � But a few users are highly mobile � Users access e-mail infrequently, but for long periods of time

  49. Thank you � Quick clarifying questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend