 
              Comparing Recommenda/on Algorithms for Social Bookmarking Toine Bogers Royal School of Library and Informa/on Science Copenhagen, Denmark
About me • Ph.D. from Tilburg University  “ Recommender Systems for Social Bookmarking ”  Promotor: Prof. dr. Antal van den Bosch • Currently @ RSLIS (Copenhagen, DK)  Research assistant on retrieval fusion project • Research interests  Recommender systems  Social bookmarking  Expert search  Informa/on retrieval
Outline 1. Introduc/on 2. Collabora/ve filtering 3. Content‐based filtering 4. Recommender systems fusion 5. Conclusions
Social bookmarking • Way of storing, organizing, and managing bookmarks of Web pages, scien/fic ar/cles, books, etc.  All done online  Can be made public or kept private  Allow users to tag (= label) their items  Many different websites available:
Social bookmarking • Different domains  Web pages  Scien/fic ar/cles  Books • Strong growth in popularity  Millions of users, items, and tags  For example: Delicious - 140,000+ posts/day on average in 2008 (Keller, 2009) - 7,000,000+ posts/month in 2008 (Wetzker et al., 2009)
Content overload • Problems with this growth  Content overload  Increasing ambiguity • How can we deal with this?  Browsing Can become less effec/ve as content increases!  Search • A possible solu/on  Take a more ac/ve role: recommenda,on
Recommenda/on tasks !"#$%&$''' ?@AB *9A7 9CD *+")& !"#$%"&%'("& !"#$%"& ,"-#))"./ ?@AB )" $,#3%'.4 012#. *+")& 7#,"& 914& *9A7 ()*&"$+$$''' "5$",+6 %'("&+8'6& 6:44"62#. ;"$+8& ;#)1'. !",6#.1%'<"0& 9CD "5$",+6 6"1,-8 =,#>6'.4
Item recommenda/on • Our focus: item recommenda,on  Iden/fy sets of items that are likely to be of interest to a certain user - Return a ranked list of items - ‘Find Good Items’ task (Herlocker et al., 2004)  Based on different informa/on sources - Transac/on pajerns ( usage data , purchase informa/on) – Explicit ra/ngs – Implicit feedback - Metadata - Tags
Related work • Work on social bookmarking mostly focused on  Improving browsing experience - clustering, dealing with ambiguity  Incorpora/ng tags in search algorithms  Tag recommenda/on • Problems with work on item recommenda/on  Different data sets  Different evalua/on metrics  No comparison of algorithms under controlled condi/ons  Hardly ever publicly available data sets  No user‐based evalua/on
Collec/ng data • Four data sets from two different domains  Web bookmarks - Delicious - BibSonomy ~78% of users posted only type of content  Scien/fic ar/cles (bookmarks or scien/fic ar/cles) - CiteULike - BibSonomy
What did we collect? • Usage data  User‐item‐tag triples with /mestamps • Metadata  Varies with the domain Web bookmarks Scien,fic ar,cles TITLE , DESCRIPTION , TAGS , Item‐intrinsic   URL - TITLE , DESCRIPTION , JOURNAL , AUTHOR , TAGS , URL , etc.  Item‐extrinsic - CHAPTER , DAY , EDITION , YEAR , INSTITUTION , etc.
Filtering • Why?  To reduce noise in our data sets  Common procedure in recommender systems research • How?  ≥ 20 items per user  ≥ 2 users per item (no hapax legomena items)  No untagged posts • Compared to related work  Stricter filtering  More realis/c
Data sets Bookmarks Scien,fic ar,cles Delicious BibSonomy CiteULike BibSonomy # users 1,243 192 1,322 167 # items 152,698 11,165 38,419 12,982 # tags 42,820 13,233 28,312 5,165 # posts 238,070 29,096 84,637 29,720
Experimental setup • Backtes/ng  Withhold randomly selected items from test users  Use remaining material for training recommender system  Success is predicted the user’s interest in his/her withheld items • Details  Overall 90%‐10% split on users  Withhold 10 randomly selected items of each test user  Parameter op/miza/on - Used 10‐fold cross‐valida/on - 90‐10 splits - 10 withheld items  Macro‐averaging of evalua/on scores
Evalua/on • ‘Find Good Items’ task returns a ranked list  Need metric that take into ranking of items • Precision‐oriented metric  Mean Average Precision (MAP) - Average Precision (AP) is average of precision values at each relevant, retrieved item - MAP is AP averaged over all users - “single figure measure of quality across recall levels” (Manning, 2009) • Tested different metrics  All precision‐oriented metrics showed the same picture
Collabora/ve filtering • Ques/on  How can we use the informa/on in the folksonomy to generate bejer recommenda/ons? - Users - Items usage pajerns - Tags • Collabora/ve filtering (CF)  Ajempts to automate “word‐of‐mouth” recommenda/ons  Recommend items based on how like‐minded users rated those items  Similarity based on - Usage data - Tagging data
Collabora/ve filtering • Model‐based CF  ‘Eager’ recommenda/on algorithms  Train a predic/ve model of the recommenda/on task  Quick to apply to generate recommenda/ons • Memory‐based CF  ‘Lazy’ recommenda/on algorithms  Simply store all pajerns in memory  Defer predic/on effort to when user requests recommenda/ons
Related work • Model‐based  Hybrid PLSA‐based approach (Wetzker et al., 2009)  Tensor decomposi/on (Symeonidis et al., 2008) • Memory‐based  Tag‐aware fusion (Tso‐Sujer et al., 2008) • Graph‐based  FolkRank (Hotho et al., 2006)  Random walk (Clements et al., 2008)
Algorithms • User‐based k ‐NN algorithm  Calculate similarity between the ac/ve user and all other users  Determine the top k nearest neighbors - I.e., the most similar users  Unseen items from nearest neighbors are scored by the similarity between the neighbor and the ac/ve user • Item‐based k ‐NN algorithm  Calculate similarity between the ac/ve user’s items and all other items  Determine the top k nearest neighbors - I.e., the most similar items for each of the ac/ve user’s items  Unseen neighboring items are scored by the similarity between the neighbor and the ac/ve user’s item
Usage data • Baseline: CF using usage data items • Profile vectors  User profiles users UI  Item profiles • No explicit ra/ngs available  Only binary informa/on (1 or 0)  Or rather: unary ! • Similarity metric  Cosine similarity • 10‐fold cross‐valua/on to op/mize k
Results (usage data) Bookmarks Scien,fic ar,cles BibSonomy Delicious BibSonomy CiteULike UBCF + usage data 0.0277 0.0046 0.0865 0.0746 IBCF + usage data 0.0244 0.0027 0.0737 0.0887
Tagging data • Tags are short topical descrip/ons of an item (or user) tags tags • Profile vectors users  User tag profiles UT IT items  Item tag profiles • Similarity metrics  Cosine similarity  Jaccard overlap  Dice’s coefficient
Results (tagging data) Bookmarks Scien,fic ar,cles BibSonomy Delicious BibSonomy CiteULike UBCF + usage data 0.0277 0.0046 0.0865 0.0746 IBCF + usage data 0.0244 0.0027 0.0737 0.0887 UBCF + tagging data 0.0102 0.0017 0.0459 0.0449 IBCF + tagging data 0.0370 0.0101 0.1100 0.0814
Recommend
More recommend