In Inter er-ev event Time Distributions in Online Human Behavior - - PowerPoint PPT Presentation

in inter er ev event time distributions
SMART_READER_LITE
LIVE PREVIEW

In Inter er-ev event Time Distributions in Online Human Behavior - - PowerPoint PPT Presentation

P-E-R-S-I-S-T-E-N-C-E and D I S T I N C T I V E N E S S of In Inter er-ev event Time Distributions in Online Human Behavior Jiwan Jeong and Sue Moon School of Computing, KAIST In TempWeb 17 (WWW 17 Companion) April 3, 2017 What is


slide-1
SLIDE 1

P-E-R-S-I-S-T-E-N-C-E and DISTINCTIVENESS of

In Inter er-ev event Time Distributions

in Online Human Behavior

Jiwan Jeong and Sue Moon

School of Computing, KAIST In TempWeb ’17 (WWW ’17 Companion) April 3, 2017

slide-2
SLIDE 2

What is inter-event time?

  • Time gap between two consecutive events
  • E.g., earthquake waves, packet arrivals, …
2
slide-3
SLIDE 3

Our definition of inter-event time

  • Time gap between two consecutive actions in a service by one person
  • E.g., tweeting, blog posting, email sending, …
  • Simply put
  • Inter-event time = interval
  • Inter-event time distribution = interval pattern
3
slide-4
SLIDE 4

Previous studies focused on

  • Characterizing aggregate interval patterns
  • Web re-visit pattern [Adar CHI 2007][Adar CHI 2008]
  • Web browsing pattern [Kumar WWW 2010]
  • Service usage pattern [Halfaker WWW 2015]
  • Finding universal laws among interval patterns
  • Power-law by priority queuing process [Barabasi Nature 2005]
  • Log-normal by non-homogeneous Poisson process [MalmgrenPNAS2008]
4
slide-5
SLIDE 5

We focus on individual-level

  • How does an individual’s interval pattern change over time?
  • Does it remain consistent or fluctuate from time to time?
  • How distinctive is it from those of others?
5
slide-6
SLIDE 6

Individuals have in

inter erval al patter erns

that are pe

persistent over time,

but distinct

ctive from others.

6
slide-7
SLIDE 7

Tweets by El

Ellen n DeGene neres

Twitter timeline ✂

✂ ✂

7
slide-8
SLIDE 8

Tweets by Ji

Jimmy y Fallon

8
slide-9
SLIDE 9

Tweets by Su

Sue Mo Moon

9
slide-10
SLIDE 10

Tweets by Al

Albe bert-Lá László Ba Barabási si

10
slide-11
SLIDE 11

Tweets by Ey

Eytan Ada Adar

11
slide-12
SLIDE 12

Tweets by Aa

Aaron n Cl Clause set

12
slide-13
SLIDE 13

Tweets by Ni

Nicolas C Christakis

13
slide-14
SLIDE 14

Tweets by Al

Alex x Ve Vespagini

14
slide-15
SLIDE 15

Tweets by Andr

Andrew w Ng

15
slide-16
SLIDE 16

Tweets by Ed

Ed Chi

16
slide-17
SLIDE 17

Tweets by Bru

Bruno Go Gonçalv alves

17
slide-18
SLIDE 18

Tweets by Hae

Haewoon Kw Kwak

18
slide-19
SLIDE 19

Tweets by Ca

Carl rlos s Ca Castillo

19
slide-20
SLIDE 20

Tweets by Pe

Peter Do Dodds

20
slide-21
SLIDE 21

In this work

  • Design a computation framework to quantify interval patterns
  • Show their persistence and distinctiveness
  • Use interval patterns to distinguish one user from others
21
slide-22
SLIDE 22

Datasets for this study

  • 15 years of entire history
  • 7 years of entire history
  • 3000 recent tweets per user
  • 3 years of email history
22
slide-23
SLIDE 23

Estimate interval patterns Compare interval patterns Design computation framework

23
slide-24
SLIDE 24

Estimate interval patterns Compare interval patterns Design computation framework

24
slide-25
SLIDE 25

Convert di

discrete e in inter ervals als to co continuous PDF

?

25
slide-26
SLIDE 26

Gaussian kernel density estimation

For multi-modal distributions, we use Sheather and Jones’ bandwidth [Sheater J R Stat Soc B 1991]

26
slide-27
SLIDE 27

Now, we can estimate interval patterns!

!

27
slide-28
SLIDE 28

Estimate interval patterns Compare interval patterns Design computation framework

28
slide-29
SLIDE 29

Calculate di

distanc nce between interval patterns

?

29
slide-30
SLIDE 30

Jensen-Shannon distance

  • A metric of the difference between probability density functions
  • Non-negative: 𝑒 𝑦, 𝑧 ≥ 0
  • Identity of indiscernibles: 𝑒 𝑦, 𝑧 = 0 iff 𝑦 = 𝑧
  • Symmetry: 𝑒 𝑦, 𝑧 = 𝑒 𝑧, 𝑦
  • Subadditivity: 𝑒 𝑦, 𝑨 ≤ 𝑒 𝑦, 𝑧 + 𝑒 𝑧, 𝑨
30
slide-31
SLIDE 31

Now, we can compare interval patterns!

!

31
slide-32
SLIDE 32

Estimate interval patterns Compare interval patterns Design computation framework

32
slide-33
SLIDE 33

Define se

self-di distanc nce and re refere rence di distanc nce

dself dref

33
slide-34
SLIDE 34

Experimental settings for longitudinal analysis

  • Select users with +500 actions on each service
  • Divide each user’s timeline into 10 windows
  • +,
  • = 45 self-distances for each user
  • 10 ×10 = 100 reference distances for each pair of users

W1 W2 … W9 W10

34
slide-35
SLIDE 35

P-E-R-S-I-S-T-E-N-C-E & DISTINCTIVENESS

35
slide-36
SLIDE 36

Persistence and distinctiveness are relative

  • If 𝑒1234 are small, the pattern is persistent
  • How small should it be?
  • If 𝑒1234 < 𝑒624, the pattern is persistent [Saramäki PNAS 2014]
  • Furthermore, if 𝑒1234 ≪ 𝑒624, the patterns are distinctive
36
slide-37
SLIDE 37

𝑒1234 vs 𝑒624

37
slide-38
SLIDE 38

How long do interval patterns persist?

  • Binning 𝑒1234 by the time gap between two windows
  • Compare binned 𝑒1234 with overall 𝑒624

Wi Wj

38
slide-39
SLIDE 39

Persistence over time

Binned into 6 groups

39
slide-40
SLIDE 40

Persistence over time

40
slide-41
SLIDE 41

Persistence over time

41
slide-42
SLIDE 42

Do interval patterns persist after long inactivity?

  • Binning 𝑒1234 by the longest interval between two windows
  • Compare binned 𝑒1234 with overall 𝑒624

Wi Wj

42
slide-43
SLIDE 43

Persistence after inactivity

43
slide-44
SLIDE 44

Persistence after inactivity

44
slide-45
SLIDE 45

Do interval patterns persist through changing daily routine?

  • Binning 𝑒1234 by the circadian distance between two windows

Wi Wj 24 24 12 12

Circadian distance

45
slide-46
SLIDE 46

Persistence through changing daily routine

46
slide-47
SLIDE 47

In summary,

  • Individuals have interval signatures that persist over years
  • The signatures persist even after coming back from long inactivity
  • The signatures persist through changing daily routine
47
slide-48
SLIDE 48

User Identification Using Interval Signatures

APPLICATION

48
slide-49
SLIDE 49

User identification: Problem definition

  • Given two windows each containing 100 intervals
  • Can we determine those from the same user or not?

WA WB

49
slide-50
SLIDE 50

A very simple identifier

WA WB

Calculate the distance d

If d < threshold, Else,

50
slide-51
SLIDE 51

Identification performance (1 − 𝐹𝑟𝑣𝑏𝑚 𝐹𝑠𝑠𝑝𝑠 𝑆𝑏𝑢𝑓)

  • Performance of other behavioral biometrics
  • Keystroke dynamics: ~90% [Peacock IEEE S&P 2004]
  • Mouse dynamics: ~80% [Jorgensen AsiaCCS 2011]
  • Gaits: ~80% [Gaufrov University of Oslo 2008]

Wikipedia me2day Twitter Enron Consecutive 80% 87% 83% 76% > 1 year gap 71% 78% 76% 71%

51
slide-52
SLIDE 52

Follow-up questions

  • What do people with similar interval signatures have in common?
  • What can be inferred about users by analyzing interval signatures?
  • How interval signatures are related to other personal characteristics?
52
slide-53
SLIDE 53

In Interval Signature re:

P-E-R-S-I-S-T-E-N-C-E and DISTINCTIVENESS of

In Inter er-ev event Time Distributions

in Online Human Behavior

Q&A

slide-54
SLIDE 54

Dataset statistics

# of users Wikipedia me2day Twitter Enron With >25 actions 521K 587K 921K 937K With >100 actions 165K 203K 768K 542K With >500 actions 47K 43K 334K 65K

54
slide-55
SLIDE 55

𝑒1234 vs 𝑒624 at different window sizes

55
slide-56
SLIDE 56

K-means clustering of interval patterns

56
slide-57
SLIDE 57

Joint probability matrix for transition 𝑋

D → 𝑋 DF+

57