Analysis of wide area user mobility patterns Kevin Simler*, Steven - - PowerPoint PPT Presentation

analysis of wide area user mobility patterns
SMART_READER_LITE
LIVE PREVIEW

Analysis of wide area user mobility patterns Kevin Simler*, Steven - - PowerPoint PPT Presentation

Analysis of wide area user mobility patterns Kevin Simler*, Steven E. Czerwinski , Anthony Joseph UC Berkeley * Now at MIT 2004/12/02 Now at Google Motivation We want to understand user behavior In order to design better


slide-1
SLIDE 1

Analysis of wide area user mobility patterns

Kevin Simler*, Steven E. Czerwinski†, Anthony Joseph UC Berkeley 2004/12/02

* Now at MIT

† Now at Google

slide-2
SLIDE 2

Motivation

We want to understand user behavior

In order to design better systems In order to generate synthetic traces In order to model user behavior

How can we capture user presence in the

wide area?

slide-3
SLIDE 3

Motivation

We want to understand user behavior

In order to design better systems In order to generate synthetic traces In order to model user behavior

How can we capture user presence in the

wide area?

web

slide-4
SLIDE 4

Motivation

We want to understand user behavior

In order to design better systems In order to generate synthetic traces In order to model user behavior

How can we capture user presence in the

wide area?

web, IM

slide-5
SLIDE 5

Motivation

We want to understand user behavior

In order to design better systems In order to generate synthetic traces In order to model user behavior

How can we capture user presence in the

wide area?

web, IM, …, e-mail

slide-6
SLIDE 6

Why e-mail?

E-mail is a widely-used service User typically checks e-mail first Berkeley provides IMAP + web front end

Any Internet connection → e-mail access

E-mail reflects users’ Internet presence

slide-7
SLIDE 7

Outline

Background Analysis and results User modeling Future work Summary

slide-8
SLIDE 8

Trace characteristics

31-days (May 2003) Server from UC Berkeley EECS dept.

Regular IMAP plus web front-end

1004 active users, primarily:

Professors Graduate students Support staff

Tracked across different service providers

slide-9
SLIDE 9

Building on previous work

Wireless Campus Studies

Mobility on a campus Single service provider with homogenous users Tang & Baker, Kotz & Essien, Balazinska &

Castro

Metricom WLAN

Mobility across/between cities Single service provider with more diverse users Tang & Baker

slide-10
SLIDE 10

Trace data

Each entry in the trace includes:

Timestamp (seconds) Request type (login, close, select, etc.) Username IP address

slide-11
SLIDE 11

Preprocessing

We want user behavior Trace records client application behavior

Outlook, Eudora, Thunderbird, etc.

Primary difference:

Client polls for new e-mail at regular intervals Fixed period per client, variable across clients

slide-12
SLIDE 12

We filter client polling using a Fourier transform

Client connections from a single user:

client connection login logout

slide-13
SLIDE 13

We filter client polling using a Fourier transform

p p Use a Fourier transform to identify polling period p.

slide-14
SLIDE 14

We filter client polling using a Fourier transform

Identify sequence separated by p. Remove all but the first connection.

slide-15
SLIDE 15

We filter client polling using a Fourier transform

> 15 minute gap

Clump connections into user sessions

slide-16
SLIDE 16

We filter client polling using a Fourier transform

user session user session

slide-17
SLIDE 17

We filter client polling using a Fourier transform

Now we have (roughly) a trace of user behavior

slide-18
SLIDE 18

Outline

Background Trace analysis

Defining location Daily mobility Monthly mobility Session activity

User modeling Future work Summary

slide-19
SLIDE 19

Defining network location

Connection used to access the Internet

E.g. a dialup ISP, campus wireless network

Approximated by a combination of

Authoritative DNS server AS number Subnet

slide-20
SLIDE 20

How mobile are users each day?

0.1 0.2 0.3 0.4 0.5 0.6 1 2 3

Number of locations Fraction of user-days

slide-21
SLIDE 21

How mobile are users each day?

0.1 0.2 0.3 0.4 0.5 0.6 1 2 3

Number of locations Fraction of user-days

50% of user- days involve logging in from

  • nly 1 location
slide-22
SLIDE 22

How mobile are users each day?

0.1 0.2 0.3 0.4 0.5 0.6 1 2 3

Number of locations Fraction of user-days

15% of user- days involve logging in from 2 locations

slide-23
SLIDE 23

How mobile are users each day?

0.1 0.2 0.3 0.4 0.5 0.6 1 2 3

Number of locations Fraction of user-days

Upshot: On any given day, users are not highly mobile

slide-24
SLIDE 24

How mobile are users in 31 days?

How many unique subnets do they visit? How many unique AS #s do they visit? Let’s look at a graph….

slide-25
SLIDE 25

How mobile are users in 31 days?

0.2 0.4 0.6 0.8 1 2 4 6 8 10 12 14

# clusters

cumulative fraction of users

subnets AS #s

slide-26
SLIDE 26

How mobile are users in 31 days?

0.2 0.4 0.6 0.8 1 2 4 6 8 10 12 14

# clusters

cumulative fraction of users

subnets AS #s

80% of users log in from 8 or fewer unique subnets

slide-27
SLIDE 27

How mobile are users in 31 days?

0.2 0.4 0.6 0.8 1 2 4 6 8 10 12 14

# clusters

cumulative fraction of users

subnets AS #s

90% of users log in from 3 or fewer unique AS numbers

slide-28
SLIDE 28

How mobile are users in 31 days?

0.2 0.4 0.6 0.8 1 2 4 6 8 10 12 14

# clusters

cumulative fraction of users

subnets AS #s

Upshot: Again, most users are not highly mobile

slide-29
SLIDE 29

User activity at a location

0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 2 3 4+

# sessions fraction of visits

slide-30
SLIDE 30

User activity at a location

0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 2 3 4+

# sessions fraction of visits

60% of visits to a location result in only 1 session

slide-31
SLIDE 31

User activity at a location

0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 2 3 4+

# sessions fraction of visits

20% of visits to a location result in exactly 2 sessions

slide-32
SLIDE 32

User activity at a location

0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 2 3 4+

# sessions fraction of visits

Upshot: Users access their e- mail once or twice per visit.

slide-33
SLIDE 33

Outline

Background Trace analysis User modeling

Categorizing users Model structure Training and testing

Future work Summary

slide-34
SLIDE 34

Categorizing users

Based on number of primary locations For a given user, a primary location is:

One where the user spends >5% of the time

Categories

Users with 1 primary location Users with 2 primary locations Users with 3+ primary locations

slide-35
SLIDE 35

Structure of our models

One model for each category Two-tiered Markov model

High-level states represent user’s location Low-level states represent user’s activity

Both MMs are 1st order

slide-36
SLIDE 36

Model structure for category 2

2 primary locations + 1 traveling state

primary 1 primary 2 traveling

slide-37
SLIDE 37

Model structure for category 2

2 primary locations + 1 traveling state

primary 1 primary 2 traveling

High-level (location) states

slide-38
SLIDE 38

Model structure for category 2

2 primary locations + 1 traveling state

primary 1 primary 2 traveling

Low-level (session) states

I.e. Logged-In and Logged-Out

slide-39
SLIDE 39

Training

We have all the information

Which locations are primary Where the user is, at any time When the user is logged in/out

Simple to compute transition probabilities

slide-40
SLIDE 40

Testing methodology

Create synthetic trace Chose metrics to measure a trace Compare real trace with synthetic trace

slide-41
SLIDE 41

Testing one metric

# of sessions between visits to primary

Each user visits his primary leaves to visit other locations then comes back to his primary

Every time this happens, record the

number of other locations

There will be a CDF for the entire trace

(real or synthetic)

slide-42
SLIDE 42

Testing results

slide-43
SLIDE 43

Outline

Background Trace analysis User modeling Future work Summary

slide-44
SLIDE 44

Using the results

Synthetic traces can help test systems User behavior has implications for design

E.g. focus resources on primary locations

Model can predict user behavior on-the-fly

E.g. to cache, or not to cache?

slide-45
SLIDE 45

As technology changes…

Blackberries

More physical locations Shorter, more frequent sessions Still, primary locations will be

important

Wireless LAN hotspots

More network locations

slide-46
SLIDE 46

Outline

Background Trace analysis User modeling Future work Summary

slide-47
SLIDE 47

Summary – what we’ve done

Obtained a trace from an e-mail server Filtered out client polling Analyzed trace of user behavior Modeled categories of users with tiered MM Generated synthetic traces

slide-48
SLIDE 48

Summary – user behavior

Most users log in from 1 or 2 locations But a few users are highly mobile Users access e-mail infrequently, but for

long periods of time

slide-49
SLIDE 49

Thank you

Quick clarifying questions?