A FEW CHIRPS ABOUT TWITTER Balachander Krishnamurthy AT&T - - PowerPoint PPT Presentation

a few chirps about twitter
SMART_READER_LITE
LIVE PREVIEW

A FEW CHIRPS ABOUT TWITTER Balachander Krishnamurthy AT&T - - PowerPoint PPT Presentation

A FEW CHIRPS ABOUT TWITTER Balachander Krishnamurthy AT&T Labs--Research Phillipa Gill University of Calgary Martin Arlitt HP Labs/University of Calgary Outline What are micro-content networks? Methodology


slide-1
SLIDE 1

A FEW CHIRPS ABOUT TWITTER

Balachander Krishnamurthy – AT&T Labs--Research Phillipa Gill – University of Calgary Martin Arlitt – HP Labs/University of Calgary

slide-2
SLIDE 2

Outline

 What are micro-content networks?  Methodology  Characterization  Conclusions

2

slide-3
SLIDE 3

Micro-content networks

 An average YouTube video is large, 10 MB  Micro-content network messages are very small

(typically < 1 KB)

 One to many communication possible  Often a publish-subscribe system with control on

subscribers

 Senders and recipients can choose how to send/

receive messages

3

slide-4
SLIDE 4

Twitter

 Started Oct. 2006

 Allows users to send short messages (“tweets”)

 Max length of 140 characters (compatible with SMS)

 Micro-blogging  Notion of following (friends) and followers

(subscribers) - with permission

 Used to transmit messages during the 2007

California fires, and riots in Kenya

4

slide-5
SLIDE 5

Interfacing with Twitter

5

slide-6
SLIDE 6

Outline

 What are micro-content networks?  Methodology  Characterization  Conclusions

6

slide-7
SLIDE 7

Methodology

 Constrained crawl (67,527 users)

 Constrained by Twitter API rate limiting  Limited to collecting partial set of each user’s friends

 Metropolized random walk (31,579 users)

 Used to validate constrained crawl  Previously used for unbiased sampling of peer to peer

networks [Stutzbach et al. IMC 2006]

 Public Timeline data (35,978 users)

 Timeline of most recent messages available on demand.

7

slide-8
SLIDE 8

Outline

 What are micro-content networks?  Methodology  Characterization  Conclusions

8

slide-9
SLIDE 9

High order results

 Following vs. followers

 Relationships not always symmetric

 Different classes of users

 Not all human

 Number of tweets varies significantly  Geographic patterns vary

 Few countries dominate

9

slide-10
SLIDE 10

Characterization

 User relationships  Properties of tweets

 What tools are used to post tweets?  When are Twitter users active?  How many tweets do users have?

 Other properties of Twitter users

 UTC offsets in the datasets  Geographical spread of Twitter

10

slide-11
SLIDE 11

Characterizing user relationships

 “Followers” (people who subscribe to receive your

tweets)

 “Following” (people whose tweets you subscribe to)  Relationships are not necessarily symmetric

11

slide-12
SLIDE 12

User relationships

12

slide-13
SLIDE 13

User relationships - Broadcasters

 News outlets, radio

stations

 No reason to follow

anyone

 Post playlists, headlines

13

slide-14
SLIDE 14

User relationships - Acquaintances

 Similar number of

followers and following

 Along the diagonal  Green portion is top 1-

percentile of tweeters

14

slide-15
SLIDE 15

User relationships - Odd

 Some people follow

many users (programmatically)

 Hoping some will follow

them back

 Spam, widgets,

celebrities (at top)

15

slide-16
SLIDE 16

Characterizing user tweets

 Where do tweets come from?  When are people tweeting?  How many tweets do users have?

16

slide-17
SLIDE 17

Where do tweets come from?

Crawl Timeline % tweets source % tweets 61.7 40,163 Web 57.0 20,510 7.5 4,901 txt (mobile) 7.4 2,667 7.2 4,674 IM 7.5 2,714 1.2 792 Facebook 0.7 261 22.4 14,566 custom applications 27.3 9,821

17

slide-18
SLIDE 18

When are people tweeting?

  • Steady activity during the day with

drop-off during late night hours.

18

slide-19
SLIDE 19

Number of tweets per user

19

slide-20
SLIDE 20

Other properties of Twitter users

 UTC offsets  Geographical spread of users

20

slide-21
SLIDE 21

Comparison of UTC offsets of users between datasets

21

slide-22
SLIDE 22

Geographical presence of Twitter

22

slide-23
SLIDE 23

Summary

 One of the first large characterizations of Twitter  Diversity of access methods  Presence of interesting user-communities (e.g.,

broadcasters)

 Distinct properties compared to larger OSNs

23

slide-24
SLIDE 24

QUESTIONS?

http://www.readwriteweb.com/archives/cartoon_twitter_dating.php http://itmanagement.earthweb.com/cnews/article.php/3754291/Tech+Comics:+Twitter+and+140+Characters.htm