http cs224w stanford edu course website course website
play

http://cs224w.stanford.edu Course website: Course website: - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University , y http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu Slides will be available online Reading material will be


  1. CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University , y http://cs224w.stanford.edu

  2.  Course website:  Course website: http://cs224w.stanford.edu  Slides will be available online  Reading material will be posted online:  Chapters from the book from Jon Kleinberg and David Easley from Cornell  Whole book is available at: http://www.cs.cornell.edu/home/kleinber/networks ‐ book htt // ll d /h /kl i b / t k b k 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 2

  3.  Contact (buddy) list C t t (b dd ) li t  Messaging window 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 3

  4.  Observe social and communication  Observe social and communication phenomena at a planetary scale  Largest social network analyzed to date  Largest social network analyzed to date Questions:  What is the structure of the communication network ? 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 4

  5.  Data for June 2006 D t f J 2006  Log size: 150Gb/day (compressed) 150Gb/day (compressed)  Total: 1 month of communication data: 4.5Tb of compressed data  Activity over June 2006 (30 days)  245 million users logged in  180 million users engaged in conversations  17,5 million new accounts activated  More than 30 billion conversations  More than 30 billion conversations  More than 255 billion exchanged messages 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 5

  6. Activity on a typical day (June 1 2006): Activity on a typical day (June 1 2006):  1 billion conversations  93 million users login  93 million users login  65 million different users talk (exchange messages) messages)  1.5 million invitations for new accounts sent 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 6

  7. Fraction of country’s population on MSN: • Iceland: 35% • Spain: 28% • Netherlands, Canada Sweden Canada, Sweden, Norway: 26% • France, UK: 18% • USA, Brazil: 8% 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 7

  8. Buddy Conversation 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 8

  9.  Buddy graph  Buddy graph  240 million people (people that login in June ’06)  9 1 billi  9.1 billion buddy edges (friendship links) b dd d (f i d hi li k )  Communication graph (take only 2 ‐ user conversations) conversations)  Edge if the users exchanged at least 1 message  180 million people 180 illi l  1.3 billion edges  30 billion conversations 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9

  10. 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 10

  11.  Remove nodes (in some order) and observe  Remove nodes (in some order) and observe how network falls apart:  Number of edges deleted  Number of edges deleted  Size of largest connected component O d Order nodes by: d b  Number of links  Total conversations  Total conversations  Total conv. Duration  Messages/conversation g /  Avg. sent, avg. duration 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 11

  12. 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 12

  13. Origins of a small ‐ world idea: Origins of a small world idea:  Bacon number:  Create a network of Hollywood actors  Connect two actors if they co ‐ appeared in the movie  Bacon number: number of steps to B b b f t t Kevin Bacon  As of Dec 2007, the highest (finite) , g ( ) Bacon number reported is 8  Only approx. 12% of all actors cannot be linked to Bacon t b li k d t B 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 13

  14. Erdos numbers are small Hollywood and science are small ‐ worlds 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 14

  15. 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 15

  16.  What is the typical shortest path What is the typical shortest path length between any two people?  Experiment on the global friendship network  Can’t measure, need to probe explicitly  The Small ‐ world experiment [Stanley  The Small world experiment [Stanley Milgram ’67]  Picked 300 people at random p p Stanley Milgram St l Mil  Ask them to get a letter to a by passing it through friends to a stockbroker in Boston Boston  How many steps does it take? 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 16

  17. Milgram’s small world experiment  64 chains completed:  64 chains completed:  6.2 on the average, thus “6 degrees of separation” 6 degrees of separation  Further observations:  Further observations:  People what owned stock had shortest paths to the stockbroker than had shortest paths to the stockbroker than random people: 5.4 vs. 5.7  People from the Boston area have even closer People from the Boston area have even closer paths: 4.4 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 17

  18. Hops Nodes 0 1 1 10 2 2 78 78 3 3,96 4 8,648 MSN Messenger network 5 3,299,252 6 6 28 395 849 28,395,849 7 79,059,497 Number of steps 8 52,995,778 between pairs of 9 10,321,008 people people 10 1,955,007 11 518,410 12 149,945 13 44,616 14 13,740 15 4,476 16 1,542 17 536 18 167 19 71 20 29 21 16 A Avg. path length 6.6 th l th 6 6 22 10 23 3 90% of the people can be reached in < 8 hops 24 2 25 3 9/22/2010 18

  19.  People use different networks: Boston vs. occupation  Criticism:  Funneling: Funneling:  31 of 64 chains passed through 1 of 3 people ass their final step  Not all links/nodes are equal  Choice of starting points and the target were non ‐ random  Choice of starting points and the target were non ‐ random  People refuse to participate (25% for Milgram)  Some sort of social search: People in the experiment follow some strategy (e.g., geographic routing) instead of forwarding the letter to everyone. They are not finding the shortest path.  There are not many samples.  People might have used extra information resources. 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 19

  20.  What is the structure of a social network?  What is the structure of a social network?  How people behave in those networks and which mechanisms do they use to route and which mechanisms do they use to route and find information? 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 20

  21. [Dodds ‐ Muhamad ‐ Watts, ’03]  In 2003 Dodds Muhamad and Watts  In 2003 Dodds, Muhamad and Watts performed the experiment using email:  18 targets of various backgrounds  18 targets of various backgrounds  24,000 first steps (~1,500 per target)  65% dropout per step 65% d t t  384 chains completed (1.5%) Avg. chain length = 4.01 PROBLEM: Huge drop ‐ out rate, i.e., longer chains are less likely to complete longer chains are less likely to complete Chain length, L 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 21

  22.  Huge drop ‐ out rate: Huge drop out rate:  Longer chains don’t complete Correction proposed by Harrison ‐ White. Let:  f j = true (unobserved) fraction of chains that would f ( b d) f i f h i h ld have length j  N = total # of starters  N j = # starters who reached target in j steps * := N j /N  Then: f j  Assume drop out rate 1  in each step so f * : f  j  Assume drop ‐ out rate 1 ‐  in each step, so f j := f j  j   j f j =1   j f j *  j =1 * , calculate the average dropout rate 1 ‐   Observe f j f j , g p and 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 22

  23.  After the correction: After the correction:  Typical path length L=7  Some not well understood S t ll d t d phenomena in social networks:  Funneling effect: some target’s friends  Funneling effect: some target s friends are more likely to be the final step.  Conjecture: High reputation/authority  Effects of target’s characteristics: structurally why are high ‐ status target easier to find g  Conjecture: Core ‐ periphery net structure 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 23

  24. • N… # people assigned to correspond to target • N c …# completed chains chains • r… frac. of people who did not forward • L… mean path length 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 24

  25.  Assume each human is connected to 100 other  Assume each human is connected to 100 other people:  So: So:  In step 1 she can reach 100 people  In step 2 she can reach 100*100 = 10,000 people In step 2 she can reach 100 100 10,000 people  In step 3 she can reach 100*100*100 = 100,000 people  In 5 steps she can reach 10 billion people p p p  What’s wrong here?  Many edges are local (“short”): friend of a friend 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend