http://cs224w.stanford.edu October August 12/3/2013 Jure - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu

October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

 Imagine you want to track the flow of information Obscure  We would like to tech story identify cascades like this: Small tech blog Engadget Wired Slashdot BBC NYT CNN 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

[SDM ‘07]  Tracking Hyperlinks on the Blogosphere Blog Posts Blogs Information cascade Time ordered hyperlinks  Identify cascades – graphs induced by a time ordered propagation of hyperlinks 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

[SDM ‘07] Cascade shapes (ranked by frequency) The probability of Count observing a cascade on n nodes follows: p(n) ~ n -2 x = Cascade size (number of nodes) 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

[SDM ‘07] Effective diameter Number of edges Cascade size Cascade size (number of nodes)  Most of cascades are trees:  Number of edges is smaller than the number of nodes in a cascade  Diameter increases logarithmically 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

 Cascade sizes follow a heavy-tailed distribution  Viral marketing:  Books: steep drop-off: power-law exponent -5  DVDs: larger cascades: exponent -1.5  Blogs:  Power-law exponent -2  What’s a good model?  What role does the underlying social network play?  Can make a step towards more realistic cascade generation (propagation) model? 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

1) Randomly pick blog to 2) Infect each in-linked neighbor with probability β. infect, add to cascade. 1 1 B 1 1 B 2 B 1 B 2 1 B 1 B 1 1 2 1 2 1 B 3 3 B 4 1 B 3 3 B 4 3) Add infected neighbors 4) Set node infected in (i) to to cascade. uninfected. 1 1 B 1 B 1 B 2 1 B 2 1 B 1 B 1 1 1 2 2 B 4 B 4 1 B 3 3 B 4 1 B 3 3 B 4 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

Generative model produces realistic Count Count cascades β =0.025 Cascade node in-degree Cascade size Count Count Size of star cascade Size of chain cascade Most frequent cascades 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

Obscure  Advantages: tech story  Unambiguous, precise and explicit way to trace information flow Small tech blog Engadget  We obtain both the times as well as the trace (graph) of information flow Slashdot Wired  Caveats:  Not all links transmit information: BBC NYT CNN  Navigational links, templates, adds  Many links are missing:  Mainstream media sites do not create links  Bloggers “forget” to link the source  (We will later see how to identify networks/cascades just based on what times sites mentioned information) 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

[KDD ‘09]  Extract textual fragments that travel relatively unchanged, through many articles:  Look for phrases inside quotes: “…”  About 1.25 quotes per document in our data  Why it works? Quotes …  are integral parts of journalistic practices  tend to follow iterations of a story as it evolves  are attributed to individuals and have time and location 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

[KDD ‘09] Quote: Our opponent is someone who sees America, it seems, as being so imperfect, imperfect enough that he‘s palling around with terrorists who would target their own country. 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

[KDD ‘09]  Goal: Find mutational variants of a phrase  Form approximate phrase inclusion graph  Shorter phrase is approximately included in a longer one (word edit distance = 1) BDXCY ABCDEFGH BCD ABCD  Objective: In DAG of approx. phrase inclusion, delete min total edge weight s.t. each connected component has a single “ sink ” 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

Nodes are phrases BDXCY BCD ABCDEFGH ABCD ABC ABCEFG ABCEF CEFP CEF CEFPQR UVCEXF 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

Nodes are phrase Edges are inclusion relations BDXCY BCD ABCDEFGH ABCD ABC ABCEFG ABCEF CEFP CEF CEFPQR UVCEXF 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

Nodes are phrases Edges are inclusion relations BDXCY Edges have weights BCD ABCDEFGH ABCD ABC ABCEFG ABCEF CEFP CEF CEFPQR UVCEXF 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

 Objective: In a directed acyclic graph (approx. phrase inclusion), delete min total edge weight s.t. each connected component has a single “sink” node BDXCYZ BCD ABCDEFGH ABCD ABC ABCEFG ABCEF CEFP CEF CEFPQR UVCEXF 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

[KDD ‘09]  DAG-partitioning is NP-hard but heuristics are effective:  Observation: Enough to know node’s parent to reconstruct optimal solution  Heuristic: Nodes are phrases Edges are inclusion relations BDXCY Edges have weights Proceed right-to-left BCD ABCDEFGH and assign a node ABCD (keep a single edge) ABC ABCEFG to the strongest ABXCE cluster CEFP CEF CEFPQR UVCEXF 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

Quoted text Volume the fundamentals of our economy are strong 3654 the fundamentals of the economy are strong 988 fundamentals of our economy are strong 645 fundamentals of the economy are strong 557 if john mccain hadn't said that the fundamentals of our economy are strong on the day of one of our nation's worst financial crises the claim that he invented the blackberry would have been the most preposterous thing said all week 224 fundamentals of the economy 172 the fundamentals of the economy are sound 119 i promise you we will never put america in this position again we will clean up wall street 83 the fundamentals of our economy are sound 81 clean up wall street 78 our economy i think still the fundamentals of our economy are strong 75 fundamentals of the economy are sound 72 the fundamentals of our economy are strong but these are very very difficult times and i promise you we will never put america in this position again 68 the economy is in crisis 66 these are very very difficult times 63 the fundamentals of our economy are strong but these are very very difficult times 62 do you still think the fundamentals of our economy are strong genius 62 our economy i think still the fundamentals of our economy are strong but these are very very difficult times 60 mccain's first response to this crisis was to say that the fundamentals of our economy are strong then he admitted it was a crisis and then he proposed a commission which is just washington-speak for i'll get back to you later 55 i still believe the fundamentals of our economy are strong 53 i think still the fundamentals of our economy are strong 50 cut taxes for 95 percent of all working families 50 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20 today of all days john mccain's stubborn insistence that the fundamentals of the economy are strong shows that he is

 Since 2008 we have been collecting nearly all blog posts and news articles:  6 billion documents  20 TB of data  Solution: Graph stream clustering  Phrases arrive in a stream  Simultaneously cluster the graph and attach phrases to the graph  Dynamically remove completed clusters 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

Can we extract any ? interesting temporal variations? … is periodic, has no trends. ”Bandwidth” of the online media is constant 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

 Volume over time of top 50 largest total volume memes (phrase clusters)  More at: http://snap.stanford.edu/nifty 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

 Media coverage of the current economic crisis  Main proponents of the debate: Speech in congress Dept. of Labor release 60-minutes interview Top republican voice ranks only 14 th 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24

http://cs224w.stanford.edu October August 12/3/2013 Jure - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link

http://cs224w.stanford.edu In decision-based models nodes make decisions based on pay-off

Admin and Lecture 1: Everyday perception of chance David Aldous January 20, 2016 Format of this

Personal Learning Environments Stephen Downes September 25, 2008 What is my personal learning

iMingle iMingle Team So, what is iMingle? What does iMingle do? iMingle helps in

About Me > Eduardo Silva Github & Twitter: @edsiper Personal Blog :

Pers rsonal Bra randing 6/14/16 JobCompass Outreach Team Twool9.com .. Agenda Opening

How to Help Students Succeed by Taking Ownership of Their Learning Online Through Personal

Computer Chinese Chess Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Readability: a one-hundred-year-old field still in his teens Thomas Franois CENTAL (IL&C),

http://cs224w.stanford.edu October August 12/3/2013 Jure - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link

http://cs224w.stanford.edu In decision-based models nodes make decisions based on pay-off

Admin and Lecture 1: Everyday perception of chance David Aldous January 20, 2016 Format of this

Personal Learning Environments Stephen Downes September 25, 2008 What is my personal learning

iMingle iMingle Team So, what is iMingle? What does iMingle do? iMingle helps in

About Me &gt; Eduardo Silva Github &amp; Twitter: @edsiper Personal Blog :

Pers rsonal Bra randing 6/14/16 JobCompass Outreach Team Twool9.com .. Agenda Opening

How to Help Students Succeed by Taking Ownership of Their Learning Online Through Personal

Computer Chinese Chess Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Readability: a one-hundred-year-old field still in his teens Thomas Franois CENTAL (IL&amp;C),

About Me > Eduardo Silva Github & Twitter: @edsiper Personal Blog :

Readability: a one-hundred-year-old field still in his teens Thomas Franois CENTAL (IL&C),