http cs224w stanford edu teams of 2 3 students 1 is also
play

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students (1 is also ok) Project:


  1. CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2.  Teams of 2 ‐ 3 students (1 is also ok)  Teams of 2 3 students (1 is also ok)  Project:  Experimental evaluation of algorithms and models Experimental evaluation of algorithms and models on an interesting dataset  A theoretical project that considers a model, an algorithm or a network property and derives a rigorous result about it  An in depth critical survey of one of the course  An in ‐ depth critical survey of one of the course topics relating models, experimental results and underlying social theories and offering a novel perspective on the area 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

  3.  Answer the following questions:  Answer the following questions:  What is the problem you are solving?  Wh t d t  What data will you use (how will you get it)? ill (h ill t it)?  How will you do the project?  Which algorithms/techniques/models you plan to Whi h l ith /t h i / d l l t use/develop?  Be as specific as you can! p y  Who will you evaluate, measure success?  What do you expect to submit/accomplish by What do you expect to submit/accomplish by the end of the quarter? 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

  4.  The project should contain at least some amount of p j mathematical analysis, and some experimentation on real or synthetic data  The result of the project will typically be a 10 page h l f h j ill i ll b 10 paper, describing the approach, the results, and the related work.  Due on midnight OCT 18 2010  Upload PDF to http://coursework.stanford.edu Upload PDF to http://coursework.stanford.edu  TAs will assign group numbers – we will send a link to a GoogleDoc g  Name your file: <group#>_proposal.pdf 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

  5.  Wikipedia  Wikipedia  IM buddy graph  Yahoo Altavista web graph  Yahoo Altavista web graph  Stanford WebBase  Twitter Data  Twitter Data  Blogs and news data  Yahoo Music Ratings  Yahoo Music Ratings 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

  6.  Richly labeled network containing extracted  Richly labeled network containing extracted data from Wikipedia (based on infoboxes):  Richly labeled network Richly labeled network  multiple types of nodes and edges  About 2.6 million concepts described by 247 million triples, including abstracts in 14 different languages  http://dbpedia org  http://dbpedia.org  Other OpenLinkedData datasets available at http://esw.w3.org/DataSetRDFDumps 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

  7.  Networks of positive and negative edges  Networks of positive and negative edges  Data includes:  Trust/distrust edges  Trust/distrust edges  Also Epinions product reviews and review ratings  SNAP: http://snap stanford edu/data/#signnets SNAP: http://snap.stanford.edu/data/#signnets  Trustlet: http://www.trustlet.org/wiki 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

  8.  Prosper marketplace – Peer ‐ to ‐ peer lending:  Prosper marketplace – Peer to peer lending:  Lenders ask for loans  People then bid (price, interest rate) on loans to  P l th bid ( i i t t t ) l t fund them  Rich social structure around the website  Rich social structure around the website Data at http://www prosper com/tools/DataExport aspx Data at http://www.prosper.com/tools/DataExport.aspx 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

  9. Turiya is a start up that collects game data from game  Turiya is a start up that collects game data from game publishers and processes these to produce business intelligence of value to it’s clients  Data collected includes:  Data collected includes:  Players and their attributes  Logs of game events g g  Information about virtual items  Information about transactions in real money or credits  Analyses include: A l i l d  Player segmentation  Virtual goods recommendations Virtual goods recommendations If If you are interested i t t d  Lifetime value estimation of players – send us an email! 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

  10.  What to Wear is a Social Game played on Facebook  Contestants create outfits and submit these to a daily C t t t t tfit d b it th t d il competition, which has a theme like e.g. “an outfit for attending your ex’s ‐ wedding”  Contestants can also vote and comment on other people’s Contestants can also vote and comment on other people s submissions  You get credit for both participating and judging  Items for outfits are either bought from the store or reused from Items for outfits are either bought from the store or reused from the contestant’s closet  ~30,000 players/month  Data about this game includes:  Player data  Data about previous competitions  Fashion items data If you are interested If i t t d  Data about outfits – send us an email!  Many other data (~400 relations in all) 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

  11.  Amazon product review data:  Amazon product review data: For each product:  P  Product info: name, salesrank d t i f l k  Product categorization  All reviews All i  user, rating, how helpful was the review  People who bought X also bought Y – network! P l h b ht X l b ht Y t k! If If you are interested i t t d – send us an email! 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

  12.  Collaboration network of computer scientists  Collaboration network of computer scientists  Each CS publication is included:  Author names  Author names  Title  Year Year  Conference, journal name  Get the data at:  http://dblp.uni ‐ trier.de/xml/  http://kdl.cs.umass.edu/data/dblp/dblp ‐ info.html p // / / p/ p 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

  13.  Patents (http://www.nber.org/patents/) ( p // g/p /)  Citations between patents  For each patent we also know:  Time  Time  Patent categorization  Patent inventor data, …  Arxiv High ‐ energy Physics: g e e gy ys cs  Citation network between papers  For each paper we also know  Author names  Author names  Title and abstract of the paper  Year of publication  Journal Journal  Data at: http://snap.stanford.edu/data/#citnets 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

  14.  ~50 million tweets per month starting 50 million tweets per month starting in June 2009 (6 months)  Format: T 2009-06-07 02:07:42 2009 06 07 02 07 42 U http://twitter.com/redsoxtweets W #redsox Extra Bases: Sox win, 8-1: The Rangers spoiled Jon Lester's perfecto and his shutout.. http://tinyurl.com/pyhgwy http://tinyurl.com/pyhgwy  Two important things: If you are interested  URLs – send us an email! send us an email!  H  Hash ‐ tags h t  Twitter social graph and some profiles: http://an kaist ac kr/traces/WWW2010 html http://an.kaist.ac.kr/traces/WWW2010.html 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

  15.  Inferring links of the who ‐ follows ‐ whom  Inferring links of the who follows whom network  What is the lifecycle of URLs and hash ‐ tags? h h l f l f d h h ?  How do hash ‐ tags get adopted?  Multiple competing hash ‐ tags, which one wins? M l i l i h h hi h i ?  Finding early/influential users?  Community discovery  Where/how will the information propagate? 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

  16.  More than 1 million newsmedia and blog  More than 1 million newsmedia and blog articles per day since August 2008  Extracted phrases (quotes) and links  Extracted phrases (quotes) and links  http://memetracker.org  Format:  Format: http://cnnpoliticalticker.wordpress.com/2008/08/31/mccain-defends- P palins-experience-level 2008-09-01 00:00:13 T dangerously unprepared to be president dangerously unprepared to be president Q Q even more dangerously unprepared Q understands the challenges that we face Q worked and succeeded Q http://www.cnn.com L 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend