harnessing folksonomies for resource classification
play

Harnessing Folksonomies for Resource Classification PhD Thesis - PowerPoint PPT Presentation

Harnessing Folksonomies for Resource Classification PhD Thesis Arkaitz Zubiaga UNED July 12th, 2011 Advisors: Raquel Mart nez Unanue V ctor Fresno Fern andez Table of Contents PhD Thesis Arkaitz Zubiaga Motivation 1


  1. Harnessing Folksonomies for Resource Classification PhD Thesis Arkaitz Zubiaga UNED July 12th, 2011 Advisors: Raquel Mart´ ınez Unanue V´ ıctor Fresno Fern´ andez

  2. Table of Contents PhD Thesis Arkaitz Zubiaga Motivation 1 Motivation Selection of a Selection of a Classifier 2 Classifier STS & Datasets STS & Datasets 3 Representing the Representing the Aggregation of Tags Aggregation of 4 Tags Tag Tag Distributions on STS 5 Distributions on STS User Behavior User Behavior on STS 6 on STS Conclusions & Outlook Conclusions & Outlook 7 Publications Publications 8 Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 2 / 98

  3. Motivation Index PhD Thesis Arkaitz Zubiaga Motivation 1 Motivation Selection of a Selection of a Classifier 2 Classifier STS & Datasets STS & Datasets 3 Representing the Representing the Aggregation of Tags Aggregation of 4 Tags Tag Tag Distributions on STS 5 Distributions on STS User Behavior User Behavior on STS 6 on STS Conclusions & Outlook Conclusions & Outlook 7 Publications Publications 8 Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 3 / 98

  4. Motivation Resource Classification PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 4 / 98

  5. Motivation Resource Classification PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 5 / 98

  6. Motivation Resource Classification PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 6 / 98

  7. Motivation Resource Classification PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifying resources is a common task . Classifier STS & Web pages, books, movies, files,... Datasets Representing Large collections of resources → expensive & effortful the Aggregation of to classify manually. Tags Tag LoC reported an average cost of $94.58 for cataloging Distributions each book in 2002. on STS User Behavior on STS Enormous costs and efforts → automatic classification . Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 7 / 98

  8. Motivation Resource Classification PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier Representation of resources → self-content . STS & Datasets Use of self-content of resources presents some issues : Representing the Not always representative enough. Aggregation of Tags Not always accessible (e.g., books). Tag Distributions on STS Social tags provided by users → alternative to solve the User Behavior problem. on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 8 / 98

  9. Motivation Tagging PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Publications T1 , T2 , T3 = sets of tags . Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 9 / 98

  10. Motivation Social Tagging PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Aggregation of user annotations → folksonomy . Publications Folksonomy: Folk (People) + Taxis (Classification) + Nomos (Management). Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 10 / 98

  11. Motivation Organization of Resources PhD Thesis Arkaitz Zubiaga Motivation User annotations → own organization of resources . Selection of a Classifier STS & A user’s tags Datasets Representing Tag # Resources the Aggregation of 82 research Tags 28 twitter Tag Distributions 35 web2.0 on STS 42 language User Behavior on STS 64 english Conclusions & ... ... Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 11 / 98

  12. Motivation Example of Bookmarks PhD Thesis Arkaitz Zubiaga User Resource Tags Motivation 1 user1 flickr.com photo , web2.0 , social Selection of a Classifier 2 user2 flickr.com photography , images STS & 3 user1 google.com searchengine Datasets 4 user3 twitter.com microblogging , twitter Representing the Aggregation of Tags Bookmark: (1) user u i ∈ U who annotates Tag Distributions (2) resource r j ∈ R being annotated on STS (3) tags T ij = { t 1 , ..., t n } ∈ T utilized. User Behavior on STS Conclusions & Outlook Publications b ij : u i × r j × T ij Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 12 / 98

  13. Motivation Sum of Annotations PhD Thesis Arkaitz Zubiaga Top tags (79,681 users) Motivation Tag Rank Tag User Count Selection of a Classifier 1 photos 22,712 STS & Datasets 2 flickr 19,046 Representing 3 photography 15,968 the Aggregation of 4 photo 15,225 Tags 5 sharing 10,648 Tag Distributions 6 9,637 images on STS 7 9,528 web2.0 User Behavior on STS 8 4,571 community Conclusions & 9 3,798 Outlook social Publications 10 3,115 pictures Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 13 / 98

  14. Motivation Tag-based Resource Classification PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 14 / 98

  15. Motivation Problem Statement PhD Thesis Arkaitz Zubiaga How can the annotations provided by users on social tagging Motivation systems be exploited to improve the accuracy of a resource Selection of a classification task? Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 15 / 98

  16. Motivation Related Work PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier Social tags for information management : STS & Datasets Search: Bao et al. (2007) & Heymann et al. (2008). Representing the Aggregation of Recommender Systems: Shepitsen et al. (2008) & Li Tags et al. (2008). Tag Distributions on STS Enhanced Browsing: Smith (2008). User Behavior on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 16 / 98

  17. Motivation Related Work PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classification: Noll and Meinel (2008) → statistical Classifier analysis of matches between tags & taxonomies . STS & Datasets Tags are useful for broad categorization . Representing Not for narrower categorization . the Aggregation of Tags Lack of further research with: Tag Distributions Actual classification experiments. on STS Other types of resources . User Behavior on STS Different representations of social tags. Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 17 / 98

  18. Selection of a Classifier Index PhD Thesis Arkaitz Zubiaga Motivation 1 Motivation Selection of a Selection of a Classifier 2 Classifier STS & Datasets STS & Datasets 3 Representing the Representing the Aggregation of Tags Aggregation of 4 Tags Tag Tag Distributions on STS 5 Distributions on STS User Behavior User Behavior on STS 6 on STS Conclusions & Outlook Conclusions & Outlook 7 Publications Publications 8 Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 18 / 98

  19. Selection of a Classifier Characteristics of the task PhD Thesis Arkaitz Zubiaga We have: Motivation Selection of a Large set of resources : some labeled + many unlabeled. Classifier Multiclass taxonomy. STS & Datasets Automated classifiers learn a model from labeled Representing the resources . Aggregation of Tags This model is used to classify unlabeled resources Tag afterward. Distributions on STS User Behavior 2 learning settings: on STS Supervised : only labeled resources considered for learning. Conclusions & Outlook Semi-supervised : unlabeled resources are also taken into Publications account. Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 19 / 98

  20. Selection of a Classifier Support Vector Machines (SVM) PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Hyperplane that separates with largest margin . Conclusions & Outlook Publications Use of kernels → redimensions the space. Resource/Hyperplane margin → Classifier’s reliability . Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 20 / 98

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend