Revisit Behavior in Social Media: The Phoenix-R Model and - - PowerPoint PPT Presentation

revisit behavior in social media
SMART_READER_LITE
LIVE PREVIEW

Revisit Behavior in Social Media: The Phoenix-R Model and - - PowerPoint PPT Presentation

Revisit Behavior in Social Media: The Phoenix-R Model and Discoveries Flavio Figueiredo, Yasuko Matsubara, Bruno Ribeiro, Jussara M. Almeida, Christos Faloutsos Institute for Web Research (InWeb) @ DCC-UFMG Databases Group @ CMU 1 How should


slide-1
SLIDE 1

Revisit Behavior in Social Media: The Phoenix-R Model and Discoveries

Flavio Figueiredo, Yasuko Matsubara, Bruno Ribeiro, Jussara M. Almeida, Christos Faloutsos Institute for Web Research (InWeb) @ DCC-UFMG Databases Group @ CMU

1

slide-2
SLIDE 2

How should we account and model information popularity online?

2

slide-3
SLIDE 3

How should we account and model information popularity online?

3

slide-4
SLIDE 4

Audience: Unique users

4

slide-5
SLIDE 5

Audience vs Visits

5

Multiple Visits from the Same Users

slide-6
SLIDE 6

Measuring both visits and audience (unique users) have their benefits

  • How many users watched my ad?

– Exposure – Revenue

  • How many times was my ad watched?

– Caching – Sharding and content provisioning

  • However…

– Understanding and modeling both effects is still an open issue

6

slide-7
SLIDE 7

Our Study

  • Understanding and modeling revisit behavior in

social media

  • Understanding

– Characterization of millions of user activities – User played/watched/visited a social media object at a certain time

  • Modeling

– The Phoenix-R model for popularity time series

7

slide-8
SLIDE 8

Datasets

8

Dataset User Activities Description

MMTweet (Million Musical Tweets) Little over 1 million Tweets declaring songs which users listen to Twitter 576 million Hashtags LastFM 19 million Plays on artists and songs YouTube

  • 3 million daily time

series

  • User Activity

– User, Object (song/tweet/video), Time stamp

  • All of the datasets range from months to years
slide-9
SLIDE 9

Discoveries

9

slide-10
SLIDE 10

Discoveries

10

  • Relationships between audience (unique

users) and revisits

Dataset

Median #𝑺𝒇𝒘𝒋𝒕𝒋𝒖𝒕 #𝑩𝒗𝒆𝒋𝒇𝒐𝒅𝒇 Median #𝑺𝒇𝒘𝒋𝒕𝒋𝒖𝒕 #𝑼𝒑𝒖𝒃𝒎 𝑾𝒋𝒕𝒋𝒖𝒕 % of cases #𝑺𝒇𝒘𝒋𝒕𝒋𝒖𝒕 > #𝑩𝒗𝒆𝒋𝒇𝒐𝒅𝒇 MMTweet

0.68 0.40 33%

Twitter

1.70

0.62 66% LastFM 25.39 0.96 100%

slide-11
SLIDE 11

Discoveries on Smaller time Scales

11

  • Isolate the effect of users coming back to the

datasets after long periods

  • Daily Time Windows

Dataset

Median #𝑺𝒇𝒘𝒋𝒕𝒋𝒖𝒕 #𝑩𝒗𝒆𝒋𝒇𝒐𝒅𝒇 MMTweet 0.83 Twitter 2.50 LastFM 28.0

slide-12
SLIDE 12

What we know so far

12

  • Users revisit the same object

– On some datasets (LastFM and Twitter) most of visits are returning users

  • Revisits are common on small time scales

– Above results hold – Complements [Anderson2014]

  • Users abandon content but it may take a long

time

– Preying behavior from [Ribeiro2014]

slide-13
SLIDE 13

Users eventually stop visiting

13

Decay in popularity in one of the most popular songs last year

slide-14
SLIDE 14

Some objects behave like a sum of multiple cascades

14

Multiple cascade (spike) like behavior in a very popular music song

slide-15
SLIDE 15

How de we model these time series?

15

slide-16
SLIDE 16

The Phoenix-R Model!

16

slide-17
SLIDE 17

Phoenix-R Explained

17

  • Single shock (cascade) model
  • Epidemiology model
slide-18
SLIDE 18

Single Shock

18

  • Starting with some Susceptible

and Infected Individuals

  • The Infected access the content
slide-19
SLIDE 19

19

  • At the next time tick some

Infected recover

  • Some Susceptible are infected by

the previous infected

  • We now expect more visits (more infected)

Single Shock

slide-20
SLIDE 20

Single Shock Equations

20

slide-21
SLIDE 21

Multiple Shocks

21

  • Simplifying assumption that each shock is a

new population (set of users)

slide-22
SLIDE 22

How many shocks to add?

22

  • A perfect model (zero error) can be created by

– Letting each access be a single user which immediately recovers – However, lot’s of parameters (cost)

  • Using Minimum Description Length (MDL)
slide-23
SLIDE 23

How do we fit a time series?

23

  • Step 1:

– Identify Peaks using Wavelets – Intuitively, each peak is a candidate shock (cascade) – Linear

  • Step 2:

– Add each peak sorted by height to the model – If the MDL decreases, accept peak

  • Step 3:

– Stop when the MDL stops decreasing

slide-24
SLIDE 24

Linear runtime (time series length) and parameter free algorithm

24

Find Peaks Adding shocks Exit

slide-25
SLIDE 25

How good is Phoenix-R?

25

  • Comparing Phoenix-R with two state of the art

alternatives

– RMSE (smaller is better)

  • Phoenix-R is always better or just as good
slide-26
SLIDE 26

How good is Phoenix-R?

26

  • Comparing Phoenix-R with two state of the art

alternatives

– RMSE (smaller is better)

  • Phoenix-R is always better or just as good
slide-27
SLIDE 27

Phoenix-R is also good at forecasting

27

  • RMSE (smaller is better)
  • 1, 7 or 30 days ahead forecasting
  • Ties on very linear time series
slide-28
SLIDE 28

Phoenix-R is also good at forecasting

28

  • RMSE (smaller is better)
  • 1, 7 or 30 days ahead forecasting
  • Ties on very linear time series
slide-29
SLIDE 29

Examples of Phoenix-R at work

29

slide-30
SLIDE 30

Examples of Phoenix-R at work

30

slide-31
SLIDE 31

Conclusions

31

  • Phoenix-R model for revisits and multiple

cascades

  • Based on discoveries from real data
  • Scalable linear fitting algorithm

– On time series length

  • Useful for predictions