ComparingRecommenda/on AlgorithmsforSocialBookmarking ToineBogers - PowerPoint PPT Presentation

Comparing Recommenda/on  Algorithms for Social Bookmarking  Toine Bogers  Royal School of Library and Informa/on Science  Copenhagen, Denmark 

About me  • Ph.D. from Tilburg University   “ Recommender Systems for Social Bookmarking ”   Promotor: Prof. dr. Antal van den Bosch  • Currently @ RSLIS (Copenhagen, DK)   Research assistant on retrieval fusion project  • Research interests   Recommender systems   Social bookmarking   Expert search   Informa/on retrieval 

Outline  1. Introduc/on  2. Collabora/ve filtering  3. Content‐based filtering  4. Recommender systems fusion  5. Conclusions 

Social bookmarking  • Way of storing, organizing, and managing bookmarks of  Web pages, scien/fic ar/cles, books, etc.    All done online   Can be made public or kept private   Allow users to  tag  (= label) their items   Many different websites available: 

Social bookmarking  • Different domains   Web pages   Scien/fic ar/cles   Books  • Strong growth in popularity   Millions of users, items, and tags   For example: Delicious  - 140,000+ posts/day on average in 2008 (Keller, 2009)  - 7,000,000+ posts/month in 2008 (Wetzker et al., 2009) 

Content overload  • Problems with this growth   Content overload   Increasing ambiguity  • How can we deal with this?   Browsing  Can become less effec/ve   as content increases!   Search  • A possible solu/on   Take a more ac/ve role:  recommenda,on 

Recommenda/on tasks  !"#$%&$''' ?@AB *9A7 9CD *+")& !"#$%"&%'("& !"#$%"& ,"-#))"./ ?@AB )" $,#3%'.4 012#. *+")& 7#,"& 914& *9A7 ()*&"$+$$''' "5$",+6 %'("&+8'6& 6:44"62#. ;"$+8& ;#)1'. !",6#.1%'<"0& 9CD "5$",+6 6"1,-8 =,#>6'.4

Item recommenda/on  • Our focus:  item recommenda,on     Iden/fy sets of items that are likely to be of interest to a  certain user   - Return a ranked list of items  - ‘Find Good Items’ task (Herlocker et al., 2004)   Based on different informa/on sources  - Transac/on pajerns ( usage data , purchase informa/on)  – Explicit ra/ngs  – Implicit feedback  - Metadata  - Tags 

Related work  • Work on social bookmarking mostly focused on   Improving browsing experience  - clustering, dealing with ambiguity   Incorpora/ng tags in search algorithms   Tag recommenda/on   • Problems with work on item recommenda/on   Different data sets   Different evalua/on metrics   No comparison of algorithms under controlled condi/ons   Hardly ever publicly available data sets   No user‐based evalua/on 

Collec/ng data  • Four data sets from two different domains   Web bookmarks  - Delicious  - BibSonomy  ~78% of users posted only type of content    Scien/fic ar/cles    (bookmarks or scien/fic ar/cles)  - CiteULike  - BibSonomy 

What did we collect?  • Usage data   User‐item‐tag triples with /mestamps  • Metadata   Varies with the domain  Web bookmarks   Scien,fic ar,cles  TITLE ,  DESCRIPTION ,  TAGS ,  Item‐intrinsic    URL   - TITLE ,  DESCRIPTION ,  JOURNAL ,  AUTHOR ,  TAGS ,  URL , etc.   Item‐extrinsic  - CHAPTER ,  DAY ,  EDITION ,  YEAR ,  INSTITUTION , etc.  

Filtering  • Why?   To reduce noise in our data sets   Common procedure in recommender systems research  • How?   ≥ 20 items per user   ≥ 2 users per item (no  hapax legomena  items)   No untagged posts  • Compared to related work   Stricter filtering   More realis/c 

Data sets  Bookmarks  Scien,fic ar,cles  Delicious  BibSonomy   CiteULike  BibSonomy  # users  1,243  192  1,322  167  # items  152,698  11,165  38,419  12,982  # tags  42,820  13,233  28,312  5,165  # posts  238,070  29,096  84,637  29,720 

Experimental setup  • Backtes/ng   Withhold randomly selected items from test users   Use remaining material for training recommender system   Success is predicted the user’s interest in his/her withheld  items   • Details   Overall 90%‐10% split on users   Withhold 10 randomly selected items of each test user   Parameter op/miza/on  - Used 10‐fold cross‐valida/on  - 90‐10 splits  - 10 withheld items   Macro‐averaging of evalua/on scores 

Evalua/on  • ‘Find Good Items’ task returns a ranked list   Need metric that take into ranking of items  • Precision‐oriented metric   Mean Average Precision (MAP)  - Average Precision (AP) is average of precision values at each relevant,  retrieved item  - MAP is AP averaged over all users  - “single figure measure of quality across recall levels” (Manning, 2009)  • Tested different metrics   All precision‐oriented metrics showed the same picture 

Collabora/ve filtering  • Ques/on   How can we use the informa/on in the  folksonomy  to  generate bejer recommenda/ons?    - Users  - Items  usage pajerns  - Tags  • Collabora/ve filtering (CF)   Ajempts to automate “word‐of‐mouth” recommenda/ons   Recommend items based on how  like‐minded  users rated  those items   Similarity based on  - Usage data  - Tagging data 

Collabora/ve filtering  • Model‐based CF   ‘Eager’ recommenda/on algorithms   Train a predic/ve model of the recommenda/on task   Quick to apply to generate recommenda/ons  • Memory‐based CF   ‘Lazy’ recommenda/on algorithms   Simply store all pajerns in memory   Defer predic/on effort to when user requests  recommenda/ons 

Related work  • Model‐based   Hybrid PLSA‐based approach  (Wetzker et al., 2009)   Tensor decomposi/on (Symeonidis et al., 2008)  • Memory‐based   Tag‐aware fusion (Tso‐Sujer et al., 2008)  • Graph‐based   FolkRank (Hotho et al., 2006)   Random walk (Clements et al., 2008) 

Algorithms  • User‐based  k ‐NN algorithm   Calculate similarity between the ac/ve user and all other users   Determine the top  k  nearest neighbors  - I.e., the most similar users   Unseen items from nearest neighbors are scored by the  similarity between the neighbor and the ac/ve user  • Item‐based  k ‐NN algorithm   Calculate similarity between the ac/ve user’s items and all  other items   Determine the top  k  nearest neighbors  - I.e., the most similar items for each of the ac/ve user’s items   Unseen neighboring items are scored by the similarity  between the neighbor and the ac/ve user’s item 

Usage data  • Baseline: CF using usage data  items  • Profile vectors   User profiles  users  UI   Item profiles  • No explicit ra/ngs available   Only binary informa/on (1 or 0)   Or rather:  unary !  • Similarity metric   Cosine similarity  • 10‐fold cross‐valua/on to op/mize  k 

Results (usage data)  Bookmarks  Scien,fic ar,cles  BibSonomy  Delicious  BibSonomy  CiteULike  UBCF + usage data  0.0277  0.0046  0.0865  0.0746  IBCF + usage data  0.0244  0.0027  0.0737  0.0887 

Tagging data  • Tags are short topical descrip/ons of an item (or user)  tags  tags  • Profile vectors  users   User tag profiles  UT  IT  items   Item tag profiles  • Similarity metrics   Cosine similarity   Jaccard overlap   Dice’s coefficient 

Results (tagging data)  Bookmarks  Scien,fic ar,cles  BibSonomy  Delicious  BibSonomy  CiteULike  UBCF + usage data  0.0277  0.0046  0.0865  0.0746  IBCF + usage data  0.0244  0.0027  0.0737  0.0887  UBCF + tagging data  0.0102  0.0017  0.0459  0.0449  IBCF + tagging data  0.0370  0.0101  0.1100  0.0814 

ComparingRecommenda/on AlgorithmsforSocialBookmarking ToineBogers - PowerPoint PPT Presentation

ComparingRecommenda/on AlgorithmsforSocialBookmarking ToineBogers RoyalSchoolofLibraryandInforma/onScience Copenhagen,Denmark Aboutme Ph.D.fromTilburgUniversity

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

ECML PKDD Discovery Challenge 2008 Spam Detection and Tag Recommendations in Social Bookmarking

del.icio.us Analysis What is del.icio.us? social bookmarking web service a user can

Climate: What Is It Anyway Comparing Weather and Climate Climate Regions and Biomes Comparing

Big- Big -O O Analyzing Algorithms Asymptotically Analyzing Algorithms Asymptotically P1 P2

Ken Baldauf Florida State University Observation 1 Web 2.0 - the participatory, social Web,

AUC: a Better Measure than Accuracy in Comparing Learning Algorithms Authors: Charles X. Ling,

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Immaculate Concep.on Church Renova.on Steering Commi6ee Recommenda.ons

Co Coun uncil cil Recommenda ommendation tion on on a com ompr prehensiv ehensive e app

Recommenda)onEngines;Collabora)ve Filtering;Thema)cclusteringoflarge

Break-out group 4 Marisna (Indonesia) Policy recommenda.on 1:

Council recommenda.ons on Proposed Revisions to Na.onal Standards

Recommenda)ons on EPAs Presented by: Dra2 Financial Capability

SOC simulation use cases Roland Vavrek (ESA) OU-SIM Meeting, Marseille, CPPM 13/01/2016 ESA

Scale-out Computing Model on Massive Core System: From HPC to Fabric-Based SoC Dr. Fu Li

Accompanying Slides to the Guidebook Modules Highlights of the National Tourism DRM 1. Strategy

HERO: Open-Source Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore

An A to Z of technologies A survey of tools and resources with potential uses for EAP Julie

Beyond the Web: Retrieval in Social Information Spaces Sebastian Marius Kirsch

Engineering Education in the Age of Web 2.0 Explorations Through iMechanica.org Teng Li Z.

Training professional staff in Some demo examples weve created Web 2.0 the UWA Online