http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu

 Teams of 2 ‐ 3 students (1 is also ok)  Teams of 2 3 students (1 is also ok)  Project:  Experimental evaluation of algorithms and models Experimental evaluation of algorithms and models on an interesting dataset  A theoretical project that considers a model, an algorithm or a network property and derives a rigorous result about it  An in depth critical survey of one of the course  An in ‐ depth critical survey of one of the course topics relating models, experimental results and underlying social theories and offering a novel perspective on the area 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

 Answer the following questions:  Answer the following questions:  What is the problem you are solving?  Wh t d t  What data will you use (how will you get it)? ill (h ill t it)?  How will you do the project?  Which algorithms/techniques/models you plan to Whi h l ith /t h i / d l l t use/develop?  Be as specific as you can! p y  Who will you evaluate, measure success?  What do you expect to submit/accomplish by What do you expect to submit/accomplish by the end of the quarter? 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

 The project should contain at least some amount of p j mathematical analysis, and some experimentation on real or synthetic data  The result of the project will typically be a 10 page h l f h j ill i ll b 10 paper, describing the approach, the results, and the related work.  Due on midnight OCT 18 2010  Upload PDF to http://coursework.stanford.edu Upload PDF to http://coursework.stanford.edu  TAs will assign group numbers – we will send a link to a GoogleDoc g  Name your file: <group#>_proposal.pdf 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

 Wikipedia  Wikipedia  IM buddy graph  Yahoo Altavista web graph  Yahoo Altavista web graph  Stanford WebBase  Twitter Data  Twitter Data  Blogs and news data  Yahoo Music Ratings  Yahoo Music Ratings 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

 Richly labeled network containing extracted  Richly labeled network containing extracted data from Wikipedia (based on infoboxes):  Richly labeled network Richly labeled network  multiple types of nodes and edges  About 2.6 million concepts described by 247 million triples, including abstracts in 14 different languages  http://dbpedia org  http://dbpedia.org  Other OpenLinkedData datasets available at http://esw.w3.org/DataSetRDFDumps 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

 Networks of positive and negative edges  Networks of positive and negative edges  Data includes:  Trust/distrust edges  Trust/distrust edges  Also Epinions product reviews and review ratings  SNAP: http://snap stanford edu/data/#signnets SNAP: http://snap.stanford.edu/data/#signnets  Trustlet: http://www.trustlet.org/wiki 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

 Prosper marketplace – Peer ‐ to ‐ peer lending:  Prosper marketplace – Peer to peer lending:  Lenders ask for loans  People then bid (price, interest rate) on loans to  P l th bid ( i i t t t ) l t fund them  Rich social structure around the website  Rich social structure around the website Data at http://www prosper com/tools/DataExport aspx Data at http://www.prosper.com/tools/DataExport.aspx 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

Turiya is a start up that collects game data from game  Turiya is a start up that collects game data from game publishers and processes these to produce business intelligence of value to it’s clients  Data collected includes:  Data collected includes:  Players and their attributes  Logs of game events g g  Information about virtual items  Information about transactions in real money or credits  Analyses include: A l i l d  Player segmentation  Virtual goods recommendations Virtual goods recommendations If If you are interested i t t d  Lifetime value estimation of players – send us an email! 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

 What to Wear is a Social Game played on Facebook  Contestants create outfits and submit these to a daily C t t t t tfit d b it th t d il competition, which has a theme like e.g. “an outfit for attending your ex’s ‐ wedding”  Contestants can also vote and comment on other people’s Contestants can also vote and comment on other people s submissions  You get credit for both participating and judging  Items for outfits are either bought from the store or reused from Items for outfits are either bought from the store or reused from the contestant’s closet  ~30,000 players/month  Data about this game includes:  Player data  Data about previous competitions  Fashion items data If you are interested If i t t d  Data about outfits – send us an email!  Many other data (~400 relations in all) 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

 Amazon product review data:  Amazon product review data: For each product:  P  Product info: name, salesrank d t i f l k  Product categorization  All reviews All i  user, rating, how helpful was the review  People who bought X also bought Y – network! P l h b ht X l b ht Y t k! If If you are interested i t t d – send us an email! 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

 Collaboration network of computer scientists  Collaboration network of computer scientists  Each CS publication is included:  Author names  Author names  Title  Year Year  Conference, journal name  Get the data at:  http://dblp.uni ‐ trier.de/xml/  http://kdl.cs.umass.edu/data/dblp/dblp ‐ info.html p // / / p/ p 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

 Patents (http://www.nber.org/patents/) ( p // g/p /)  Citations between patents  For each patent we also know:  Time  Time  Patent categorization  Patent inventor data, …  Arxiv High ‐ energy Physics: g e e gy ys cs  Citation network between papers  For each paper we also know  Author names  Author names  Title and abstract of the paper  Year of publication  Journal Journal  Data at: http://snap.stanford.edu/data/#citnets 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

 ~50 million tweets per month starting 50 million tweets per month starting in June 2009 (6 months)  Format: T 2009-06-07 02:07:42 2009 06 07 02 07 42 U http://twitter.com/redsoxtweets W #redsox Extra Bases: Sox win, 8-1: The Rangers spoiled Jon Lester's perfecto and his shutout.. http://tinyurl.com/pyhgwy http://tinyurl.com/pyhgwy  Two important things: If you are interested  URLs – send us an email! send us an email!  H  Hash ‐ tags h t  Twitter social graph and some profiles: http://an kaist ac kr/traces/WWW2010 html http://an.kaist.ac.kr/traces/WWW2010.html 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

 Inferring links of the who ‐ follows ‐ whom  Inferring links of the who follows whom network  What is the lifecycle of URLs and hash ‐ tags? h h l f l f d h h ?  How do hash ‐ tags get adopted?  Multiple competing hash ‐ tags, which one wins? M l i l i h h hi h i ?  Finding early/influential users?  Community discovery  Where/how will the information propagate? 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

 More than 1 million newsmedia and blog  More than 1 million newsmedia and blog articles per day since August 2008  Extracted phrases (quotes) and links  Extracted phrases (quotes) and links  http://memetracker.org  Format:  Format: http://cnnpoliticalticker.wordpress.com/2008/08/31/mccain-defends- P palins-experience-level 2008-09-01 00:00:13 T dangerously unprepared to be president dangerously unprepared to be president Q Q even more dangerously unprepared Q understands the challenges that we face Q worked and succeeded Q http://www.cnn.com L 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students (1 is also ok) Project:

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link

http://cs224w.stanford.edu In decision-based models nodes make decisions based on pay-off

A Taxonomy of Web Search by Andrei Broder Bahaeddin Eravci, Emre Yilmaz 2012 Bahaeddin Eravci,

Algorithms for Web Indexing and Searching Rolf Fagerberg Fall 2004 1 The Internet Very

Algorithms for Web Indexing and Searching Rolf Fagerberg Fall 2007 1 The Internet Very

Solving a problem: scandir and Unix ls Access all the entries in a directory, or selected

Large-Scale Systems: WebOS Access to geographically distributed data-dissemination and

Introduction to Computational Linguistics Frank Richter fr@sfs.uni-tuebingen.de. Seminar f

Community Open House Shaping the Vision for a Greater Community What is a Secondary Plan?

Technologies behind Internet Search Engine Ming-Jer Lee CTO VisionNEXT Inc. Type of Search

Sambuz

Useful Links

Newsletter

Mail Us