http://cs224w.stanford.edu Course website: Course website: - - PowerPoint PPT Presentation

http cs224w stanford edu course website course website
SMART_READER_LITE
LIVE PREVIEW

http://cs224w.stanford.edu Course website: Course website: - - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University , y http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu Slides will be available online Reading material will be


slide-1
SLIDE 1

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University ,

y

http://cs224w.stanford.edu

slide-2
SLIDE 2

 Course website:  Course website:

http://cs224w.stanford.edu

 Slides will be available online  Reading material will be posted online:

  • Chapters from the book from Jon Kleinberg and

David Easley from Cornell

  • Whole book is available at:

htt // ll d /h /kl i b / t k b k http://www.cs.cornell.edu/home/kleinber/networks‐book

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 2

slide-3
SLIDE 3

C t t (b dd ) li t

 Contact (buddy) list  Messaging window

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 3

slide-4
SLIDE 4

 Observe social and communication  Observe social and communication

phenomena at a planetary scale

 Largest social network analyzed to date  Largest social network analyzed to date

Questions:

 What is the structure of the communication

network?

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 4

slide-5
SLIDE 5

D t f J 2006

 Data for June 2006  Log size:

150Gb/day (compressed) 150Gb/day (compressed)

 Total: 1 month of communication data:

4.5Tb of compressed data

 Activity over June 2006 (30 days)

  • 245 million users logged in
  • 180 million users engaged in conversations
  • 17,5 million new accounts activated
  • More than 30 billion conversations
  • More than 30 billion conversations
  • More than 255 billion exchanged messages

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 5

slide-6
SLIDE 6

Activity on a typical day (June 1 2006): Activity on a typical day (June 1 2006):

 1 billion conversations  93 million users login  93 million users login  65 million different users talk (exchange

messages) messages)

 1.5 million invitations for new accounts sent

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 6

slide-7
SLIDE 7

Fraction of country’s population on MSN:

  • Iceland: 35%
  • Spain: 28%
  • Netherlands,

Canada Sweden

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

Canada, Sweden, Norway: 26%

  • France, UK: 18%
  • USA, Brazil: 8%

9/22/2010 7

slide-8
SLIDE 8

Buddy Conversation

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 8

slide-9
SLIDE 9

 Buddy graph  Buddy graph

  • 240 million people (people that login in June ’06)
  • 9 1 billi

b dd d (f i d hi li k )

  • 9.1 billion buddy edges (friendship links)

 Communication graph (take only 2‐user

conversations) conversations)

  • Edge if the users exchanged at least 1 message

180 illi l

  • 180 million people
  • 1.3 billion edges
  • 30 billion conversations

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 9

slide-10
SLIDE 10

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 10

slide-11
SLIDE 11

 Remove nodes (in some order) and observe  Remove nodes (in some order) and observe

how network falls apart:

  • Number of edges deleted
  • Number of edges deleted
  • Size of largest connected component

O d d b Order nodes by:

 Number of links  Total conversations  Total conversations  Total conv. Duration  Messages/conversation

g /

 Avg. sent, avg. duration

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 11

slide-12
SLIDE 12

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 12

slide-13
SLIDE 13

Origins of a small‐world idea: Origins of a small world idea:

 Bacon number:

  • Create a network of Hollywood actors
  • Connect two actors if they co‐

appeared in the movie B b b f t t

  • Bacon number: number of steps to

Kevin Bacon

 As of Dec 2007, the highest (finite)

, g ( ) Bacon number reported is 8

 Only approx. 12% of all actors

t b li k d t B cannot be linked to Bacon

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 13

slide-14
SLIDE 14

Erdos numbers are small

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

Hollywood and science are small‐worlds

9/22/2010 14

slide-15
SLIDE 15

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 15

slide-16
SLIDE 16

 What is the typical shortest path

What is the typical shortest path length between any two people?

  • Experiment on the global friendship

network

  • Can’t measure, need to probe explicitly

 The Small‐world experiment [Stanley  The Small world experiment [Stanley

Milgram ’67]

  • Picked 300 people at random

St l Mil

p p

  • Ask them to get a letter to a by passing it

through friends to a stockbroker in Boston

Stanley Milgram

Boston

 How many steps does it take?

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 16

slide-17
SLIDE 17

 64 chains completed:

Milgram’s small world experiment

 64 chains completed:

  • 6.2 on the average, thus

“6 degrees of separation” 6 degrees of separation

 Further observations:  Further observations:

  • People what owned stock

had shortest paths to the stockbroker than had shortest paths to the stockbroker than random people: 5.4 vs. 5.7

  • People from the Boston area have even closer

People from the Boston area have even closer paths: 4.4

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 17

slide-18
SLIDE 18

Hops Nodes

1 1 10 2 78

MSN Messenger network

2 78 3 3,96 4 8,648 5 3,299,252 6 28 395 849

Number of steps between pairs of people

6 28,395,849 7 79,059,497 8 52,995,778 9 10,321,008

people

10 1,955,007 11 518,410 12 149,945 13 44,616 14 13,740 15 4,476 16 1,542 17 536

A th l th 6 6

18 167 19 71 20 29 21 16

  • Avg. path length 6.6

90% of the people can be reached in < 8 hops

22 10 23 3 24 2 25 3

9/22/2010 18

slide-19
SLIDE 19

 People use different networks:

Boston vs. occupation

 Criticism:

  • Funneling:

Funneling:

  • 31 of 64 chains passed through 1 of 3 people

ass their final step  Not all links/nodes are equal

  • Choice of starting points and the target were non‐random
  • Choice of starting points and the target were non‐random
  • People refuse to participate (25% for Milgram)
  • Some sort of social search: People in the experiment follow

some strategy (e.g., geographic routing) instead of forwarding the letter to everyone. They are not finding the shortest path.

  • There are not many samples.
  • People might have used extra information resources.

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 19

slide-20
SLIDE 20

 What is the structure of a social network?  What is the structure of a social network?  How people behave in those networks and

which mechanisms do they use to route and which mechanisms do they use to route and find information?

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 20

slide-21
SLIDE 21

[Dodds‐Muhamad‐Watts, ’03]

 In 2003 Dodds Muhamad and Watts  In 2003 Dodds, Muhamad and Watts

performed the experiment using email:

  • 18 targets of various backgrounds
  • 18 targets of various backgrounds
  • 24,000 first steps (~1,500 per target)

65% d t t

  • 65% dropout per step
  • 384 chains completed (1.5%)
  • Avg. chain length = 4.01

PROBLEM: Huge drop‐out rate, i.e., longer chains are less likely to complete

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

longer chains are less likely to complete

Chain length, L

21

slide-22
SLIDE 22

 Huge drop‐out rate:

Huge drop out rate:

  • Longer chains don’t complete

Correction proposed by Harrison‐White. Let:

f ( b d) f i f h i h ld

  • fj = true (unobserved) fraction of chains that would

have length j

  • N = total # of starters
  • Nj = # starters who reached target in j steps
  • Then: fj

* := Nj/N

  • Assume drop out rate 1  in each step so f * : f j
  • Assume drop‐out rate 1‐ in each step, so fj := fj j
  • j fj=1  j fj

* j =1

  • Observe fj

*, calculate the average dropout rate 1‐

fj , g p and

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 22

slide-23
SLIDE 23

 After the correction:

After the correction:

  • Typical path length L=7

S t ll d t d

 Some not well understood

phenomena in social networks:

  • Funneling effect: some target’s friends
  • Funneling effect: some target s friends

are more likely to be the final step.

  • Conjecture: High reputation/authority
  • Effects of target’s characteristics:

structurally why are high‐status target easier to find g

  • Conjecture: Core‐periphery net structure

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 23

slide-24
SLIDE 24
  • N… # people assigned

to correspond to target

  • Nc…# completed

chains

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

chains

  • r… frac. of people who

did not forward

  • L… mean path length

24

slide-25
SLIDE 25

 Assume each human is connected to 100 other  Assume each human is connected to 100 other

people:

 So:

So:

  • In step 1 she can reach 100 people
  • In step 2 she can reach 100*100 = 10,000 people

In step 2 she can reach 100 100 10,000 people

  • In step 3 she can reach 100*100*100 = 100,000 people
  • In 5 steps she can reach 10 billion people

p p p

 What’s wrong here?

  • Many edges are local (“short”):

friend of a friend

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 25

slide-26
SLIDE 26

 How can we understand the small world  How can we understand the small world

phenomena?

 What is a good model?  What is a good model?  Plan:  Plan:

  • Simplest random graph model [Erdos‐Renyi, ‘60]
  • The Small world model [Watts Strogatz ‘98]
  • The Small‐world model [Watts‐Strogatz ‘98]
  • Models of geographic search in networks

C i k

  • Connections to peer‐to‐peer networks

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9/22/2010 26

slide-27
SLIDE 27

 Erdos‐Renyi Random Graph model [Erdos‐Renyi ‘60]  Erdos Renyi Random Graph model [Erdos Renyi, 60]

  • aka.: Poisson/Bernoulli random graphs
  • Not perfect model but interesting calculations
  • Not perfect model but interesting calculations

 Two variants:

  • Gn p: undirected graph on n nodes and each edge (u,v)

n,p

g p g ( , ) appears i.i.d. with probability p.

  • So a graph with m edges appears with prob.:

(M choose m)pm(1-p)M-m, ( )p ( p) where M=n(n-1)/2 is the max number of edges

  • Gn,m: undirected graph with n nodes, m uniformly at

random picked edges p g

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

What kinds of networks does such model produce?

9/22/2010 27

slide-28
SLIDE 28

 What is expected degree of a node?

L t X b d i th d f th

  • Let Xv be a random var. measuring the degree of the

node v (# of incident edges): E[Xv]= j j P(Xv=j)

  • Linearity of expectation:
  • For any random variables Y1,Y2,…,Yk
  • If Y=Y1+Y2+…Yk, then E[Y]= i E[Yi]
  • Easier way: decompose Xv in Xv= Xv1+Xv2+…+Xvn

Easier way: decompose Xv in Xv Xv1 Xv2 … Xvn

  • where Xvu is a {0,1}‐random variable which tells if edge (v,u)

exists or not. So:

E[X ]=  E[X ] = p (n‐1) E[Xv] u E[Xvu] p (n 1)

  • How to think about it:
  • Prob. of node u linking to node v is p
  • u can link (flips a coin) for all of (n-1) remaining nodes

( p ) ( ) g

  • Thus, the expected degree of node u is: p(n-1)

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 28