Growth of the Flickr Social Network Alan Mislove Hema Swetha Koppula - - PowerPoint PPT Presentation

growth of the flickr social network
SMART_READER_LITE
LIVE PREVIEW

Growth of the Flickr Social Network Alan Mislove Hema Swetha Koppula - - PowerPoint PPT Presentation

Growth of the Flickr Social Network Alan Mislove Hema Swetha Koppula Krishna Gummadi Peter Druschel Bobby Bhattacharjee MPI-SWS IIT Kharagpur Rice University University of Maryland WOSN 2008 Online social


slide-1
SLIDE 1

Growth of the Flickr Social Network

Alan Mislove†‡ Hema Swetha Koppula¶ Krishna Gummadi† Peter Druschel† Bobby Bhattacharjee§

†MPI-SWS ¶IIT Kharagpur ‡Rice University §University of Maryland

WOSN 2008

slide-2
SLIDE 2

Alan Mislove 08.18.2008 WOSN 2008

Online social networks

  • Popular way to connect, share content
  • Among most visited sites on Web
  • Users: Orkut (60 M), LiveJournal (5 M)
  • Unique opportunity to dynamics of

large, complex social networks

2

slide-3
SLIDE 3

Alan Mislove 08.18.2008 WOSN 2008

Why study social network growth?

  • Online social networks share many structural properties
  • Significant clustering, small diameter, power-law degrees
  • Similar underlying growth processes?
  • Proper understanding of growth can
  • Provide insights into structure
  • Predict future growth
  • Model arbitrary-sized networks
  • Most work to-date relies on theoretical models
  • Not known if they predict actual growth

3

slide-4
SLIDE 4

Alan Mislove 08.18.2008 WOSN 2008

This work

  • Use a measurement-driven approach to

understand growth

  • Present large-scale measurement of Flickr

network growth

  • ~1 M new users, ~10 M new links
  • Look for underlying cause of structural

characteristics

  • High symmetry
  • Power-law node degree
  • Significant local clustering

4

slide-5
SLIDE 5

Alan Mislove 08.18.2008 WOSN 2008

Contributions

  • Methodology to collect large-scale network growth data
  • Measured both Flickr and YouTube
  • Make data available to researchers
  • Much larger scale, higher granularity than existing data sets
  • Already in use
  • Initial analysis
  • Examine high-level properties of growth data
  • Test whether data is consistent with existing models

5

slide-6
SLIDE 6

Alan Mislove 08.18.2008 WOSN 2008

Rest of the talk

  • Measuring social network growth
  • Analyzing growth properties
  • Related work

6

slide-7
SLIDE 7

Alan Mislove 08.18.2008 WOSN 2008

  • Flickr reluctant to give out data
  • Cannot enumerate user list
  • Instead, performed crawls of user graph
  • Picked known seed user
  • Crawled all of his friends
  • Added new users to list
  • Continued until all reachable users

crawled

  • Effectively performed a BFS of graph

Crawling social networks

7

slide-8
SLIDE 8

Alan Mislove 08.18.2008 WOSN 2008

  • Flickr reluctant to give out data
  • Cannot enumerate user list
  • Instead, performed crawls of user graph
  • Picked known seed user
  • Crawled all of his friends
  • Added new users to list
  • Continued until all reachable users

crawled

  • Effectively performed a BFS of graph

Crawling social networks

7

slide-9
SLIDE 9

Alan Mislove 08.18.2008 WOSN 2008

Observing growth

  • Crawls subject to rate-limiting
  • Discovered appropriate rate
  • Crawled using cluster of 58 machines
  • Using Flickr API
  • Result: could complete crawl in 1 day
  • Repeated daily for 3 months
  • Revisited all previously discovered users
  • Looked for new links, users

8

...

slide-10
SLIDE 10

Alan Mislove 08.18.2008 WOSN 2008

How much were we able to crawl?

9

  • Users don’t necessarily

form single WCC

  • Disconnected users
  • Estimate coverage by

selecting random users

  • Result: 27% coverage
  • But, disconnected users

have very low degree

  • 90% have no outgoing links
slide-11
SLIDE 11

Alan Mislove 08.18.2008 WOSN 2008

How much were we able to crawl?

9

  • Users don’t necessarily

form single WCC

  • Disconnected users
  • Estimate coverage by

selecting random users

  • Result: 27% coverage
  • But, disconnected users

have very low degree

  • 90% have no outgoing links
slide-12
SLIDE 12

Alan Mislove 08.18.2008 WOSN 2008

Limitations to growth data

  • Newly discovered users may have

existing links

  • Don’t know when existing links were

created

  • Only count links we observed being

created

  • Crawls have resolution of 1 day
  • Can’t tell order of link creation within

a day

10

slide-13
SLIDE 13

Alan Mislove 08.18.2008 WOSN 2008

Limitations to growth data

  • Newly discovered users may have

existing links

  • Don’t know when existing links were

created

  • Only count links we observed being

created

  • Crawls have resolution of 1 day
  • Can’t tell order of link creation within

a day

10

slide-14
SLIDE 14

Alan Mislove 08.18.2008 WOSN 2008 11

Rest of the talk

  • Measuring social network growth
  • Analyzing growth properties
  • Related work
slide-15
SLIDE 15

08.18.2008 WOSN 2008 Alan Mislove

Growth data characteristics

  • Crawled Flickr daily for over 3 months
  • Nov. 2 - Dec. 3, 2006 and Feb. 3 - May 18, 2007
  • Observed ~1 M new users and ~10 M new links
  • Network grew from 17 M to 33 M links
  • Growth rate of 455% per year
  • Link addition dominates removal
  • 2.43:1 ratio (conservative)
  • Focus only on link addition

12

slide-16
SLIDE 16

Alan Mislove 08.18.2008 WOSN 2008

Network growth questions

  • How does growth lead to observed structural properties?
  • Is growth consistent with a known model?
  • Networks have high symmetry
  • What causes symmetric links to form?
  • Networks follow power-laws
  • Which users create and receive new links?
  • Does it happen via preferential attachment?
  • Networks have significant local clustering
  • Much higher than random power-law graphs
  • How do users select new destinations?

13

slide-17
SLIDE 17

08.18.2008 WOSN 2008 Alan Mislove

How quickly do symmetric links form?

  • Over 80% of symmetric links created within 48 hours

14

0.2 0.4 0.6 0.8 1 5 10 15 20 25 30 CDF Time Between Establishment of Two Halves of Link (days)

slide-18
SLIDE 18

Alan Mislove 08.18.2008 WOSN 2008

Reciprocity

  • Users can create link in response to

incoming link

  • “Out of courtesy”
  • Known in sociology
  • Flickr emails users about new incoming

links

  • Data consistent with reciprocity causing

high level of link symmetry

15

slide-19
SLIDE 19

08.18.2008 WOSN 2008 Alan Mislove

Preferential attachment

  • Model for creating power-law networks
  • Known as “cumulative advantage” or

“rich get richer”

  • New links go preferentially to nodes with

many links

  • For directed networks, we define
  • Preferential creation
  • Preferential reception

16

slide-20
SLIDE 20

08.18.2008 WOSN 2008 Alan Mislove

Is preferential attachment happening?

  • Yes, linear correlation between
  • Links created and outdegree (preferential creation)
  • Links received and indegree (preferential reception)
  • Is this consistent with a known model?
  • Both global and local models have been proposed

17

100 1 0.01 0.0001 1000 100 10 1 Links Received (new links/node/day) Indegree 100 1 0.01 0.0001 1000 100 10 1 Links Created (new links/node/day) Outdegree

slide-21
SLIDE 21

08.18.2008 WOSN 2008 Alan Mislove

Barabasi-Albert (BA) model

  • Well-known model for creating power-law

networks

  • Uses global preferential attachment
  • Destination selected using global weighted

ranking

  • Is data consistent with such a global process?
  • Look for evidence using distance between

source and destination

18

P(x) = dx Σ di

slide-22
SLIDE 22

08.18.2008 WOSN 2008 Alan Mislove

Does proximity matter?

  • New friends much closer than BA model predicts
  • Models which take into account local rules may be more accurate

19

Predicted by BA model Observed

0.2 0.4 0.6 0.8 1 7 6 5 4 3 2 CDF Distance (hops)

slide-23
SLIDE 23

08.18.2008 WOSN 2008 Alan Mislove

Implications of network growth

  • Observed growth of a large, complex social network
  • Found multiple growth processes at work
  • Reciprocity leads to high symmetry
  • Preferential attachment leads to power-law degrees
  • Proximity bias leads to local clustering
  • But, data inconsistent with global BA model
  • Future work: Modeling complex network growth
  • Based on local rules
  • Verify consistency of data with other proposed models

20

slide-24
SLIDE 24

Alan Mislove 08.18.2008 WOSN 2008

Related work

  • Growth models
  • Preferential attachment [Science’99]
  • Random walks [Phya.A’04]
  • Common neighbors [Phys.Rev.E’01]
  • Small-scale empirical studies
  • Scientific collaboration networks [Phys.Rev.E’01,Euro.Phy.Ltrs’04]
  • Email networks [Science’06]
  • Movie actor networks [J.Stat.Mech.’06]

21

slide-25
SLIDE 25

Alan Mislove 08.18.2008 WOSN 2008

Summary

  • Presented first large-scale study of online social network growth
  • Collected data covering ~1 M new users, ~10 M new links
  • Found high-level growth processes at play
  • Growth via local, rather than global, processes
  • Data sets are available to researchers
  • Many already using data (72 researchers, including sociologists!)
  • Also have growth data for YouTube network

22

slide-26
SLIDE 26

Alan Mislove 08.18.2008 WOSN 2008

Questions?

23

http://socialnetworks.mpi-sws.org Data sets available from: