Contextual Suggestion Track TREC Thaer Samar, Alejandro Bellogin, - PowerPoint PPT Presentation

Contextual Suggestion Track TREC Thaer Samar, Alejandro Bellogin, Jimmy Lin, Arjen P. de Vries, Alan Said

Summary  Content-based recommendation  Computes the similarity between documents and users profiles  Classifier (not submitted)  Training data:  + Yelp, tripadvisor, wikitravel, zagat, yahoo-travel, orbitz  - Random sample  Using full ClueWeb12

ClueWeb12  Statistics:  From February to May 2012  5.5 TB (compressed)  27.3 TB (uncompressed)  33,447 WARC files  733,019,372 documents  Hadoop cluster:  90 computing nodes  720 parallel map/reduce tasks

Profiles & ClueWeb12 local cluster cluster Attractions WARC Files Files Generate Find Profiles Context < (contextId,docId) , doc content > < userID, descriptions > Generate Dictionary Dictionary < term, id > Transform Transform Profiles Documents Generate < userId , {termId, tf} > Desc & Titles < (contextId,docId) , {<termId, tf>} > < contextId, docId, desc, title > Sim(Document,user) Generate < userId, contextId, docId, score > Ranked list < userId, contextId, docId, rank, desc, title >

Find Context  Goal: extract relevant documents for each context  How do we measure the relevance?  Exact mention of the context (format: {City, ST}) Kennewick, WA  Exclude non related sentences I am in Kennewick, washing ...  Exclude documents that mention the city of interest but in different states Greenville, NC and Greenville, SC  We found 13,548,982 documents out of 733,019,372 ClueWeb12 documents

Generate profiles  We used the description of attractions rated by the user to generate his profile  Why descriptions not the attraction website  7 urls were found with one-one matching  35 were found considering hostname matches and url variation, .i.e, http(s), www  ratings for the attraction's descriptions and websites were very similar

Documents & profiles representation  Vector Space Model  Elements of the vectors are <term, frequency> pairs  Efficient in terms of : ● Size 918 GB (before) 40 GB (after) ● Processing speed  More complete implementation in https://github.com/lintool/clueweb

Similarity  Cosine similarity between profile and document vector space representation

Descriptions and final results join

Results

Analysis  We asked the following questions  Effect of sub-collection creation (context finding)  Effect of similarity function  Rating bias in ClueWeb vs Open Web

Effect of sub-collection creation 1/2 Re-run our approach on the  sub-collection given by organizers 27% of given sub-collection  are in our sub-collection

Effect of sub-collection creation 2/2  Significant improvement when ignoring the geographical aspect (P@5_g)  Our method retrieves relevant documents for the user but not geographically appropriate  The given sub-collection is more appropriate for the contexts

Effect of ranking function ● (Low coverage of relevance assessment) ● 5-nearest neighbour outperform other k-neighbours ● Generating user profiles based on descriptions with negative rating gave the worst results

Archive Web vs Open Web evaluation

Thanks!

Contextual Suggestion Track TREC Thaer Samar, Alejandro Bellogin, - PowerPoint PPT Presentation

Contextual Suggestion Track TREC Thaer Samar, Alejandro Bellogin, Jimmy Lin, Arjen P. de Vries, Alan Said Summary Content-based recommendation Computes the similarity between documents and users profiles Classifier (not submitted)

Better Contextual Suggestions from ClueWeb12 Using Domain Knowledge Inferred from The Open Web

USI at the TREC 2015 Contextual Suggestion Track Mohammad Aliannejadi Seyed Ali Bahrainian Fabio

Overview of TREC 2014 Ellen Voorhees Text REtrieval Conference (TREC) TREC 2014 Track

Introduc>on TREC 2012 Context Sugges>on track operates on

Captioning for Contextual Suggestion (position paper) Charles L. A. Clarke William Song

Regional Trec - September 27, 2015 - Cadogan Farms TREC Workshop April 2015 Regional TREC

Venue Appropriateness Prediction for Contextual Suggestion Mohammad Alian Nejadi Ida Mele Fabio

Webis at the TREC 2012 Session track Matthias Hagen Martin Potthast Matthias Busse Jakob Gomoll

(Do Not) Track Me Sometimes: Users Contextual Preferences for Web Tracking William Melicher,

TREC Deep Learning Track Nick Craswell (Microsoft), Bhaskar Mitra (Microsoft and UCL), Emine Yilmaz

TREC, TAC, takeoffs, tacks, tasks, and titillations for 2009 Ian Soboroff, NIST

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry

Search Evaluation at Grooveshark Yoni Teitelbaum 2013-07-02 Traditional Evaluation: TREC Image

Contextual Analysis SWEN-444 Contextual analysis Systematic analysis of contextual user work

Text REtrieval Conference (TREC) TREC TRACKS Crowdsourcing Personal Blog, Microblog documents

Dr Samar Al Muntaser Background There is a paucity of literature on the epidemiology,

IN5320 - Development in Platform Ecosystems Lecture 5: Design in Platform ecosystems 5th of

robustly combining supervised and bandit feedback Chicheng Zhang 1 ; Alekh Agarwal 1 ; Hal Daum

Betting on Performance a note on hypothesis-driven performance testing James Lewis : @boicy 2

Section 8 Section 8 Programming a 8-1 1 Software Development Flow Software Development Flow

Experimental Challenges in Cyber Security: A Story of

Motivation (1) Mutter: Wie oft muss ich dir noch sagen, dass du die Zimmer aufr aumen sollst?

Scalar implicatures - a view from processing Judith Degen University of Rochester September 18,

Distributed Resources By Bryce Yonker on the Verge For CEDMC Nov 2020 Mission: Promote and

Contextual Suggestion Track TREC Thaer Samar, Alejandro Bellogin, - PowerPoint PPT Presentation

Contextual Suggestion Track TREC Thaer Samar, Alejandro Bellogin, Jimmy Lin, Arjen P. de Vries, Alan Said Summary Content-based recommendation Computes the similarity between documents and users profiles Classifier (not submitted)

Better Contextual Suggestions from ClueWeb12 Using Domain Knowledge Inferred from The Open Web

USI at the TREC 2015 Contextual Suggestion Track Mohammad Aliannejadi Seyed Ali Bahrainian Fabio

Overview of TREC 2014 Ellen Voorhees Text REtrieval Conference (TREC) TREC 2014 Track

Introduc&gt;on TREC 2012 Context Sugges&gt;on track operates on

Captioning for Contextual Suggestion (position paper) Charles L. A. Clarke William Song

Regional Trec - September 27, 2015 - Cadogan Farms TREC Workshop April 2015 Regional TREC

Venue Appropriateness Prediction for Contextual Suggestion Mohammad Alian Nejadi Ida Mele Fabio

Webis at the TREC 2012 Session track Matthias Hagen Martin Potthast Matthias Busse Jakob Gomoll

(Do Not) Track Me Sometimes: Users Contextual Preferences for Web Tracking William Melicher,

TREC Deep Learning Track Nick Craswell (Microsoft), Bhaskar Mitra (Microsoft and UCL), Emine Yilmaz

TREC, TAC, takeoffs, tacks, tasks, and titillations for 2009 Ian Soboroff, NIST

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry

Search Evaluation at Grooveshark Yoni Teitelbaum 2013-07-02 Traditional Evaluation: TREC Image

Contextual Analysis SWEN-444 Contextual analysis Systematic analysis of contextual user work

Text REtrieval Conference (TREC) TREC TRACKS Crowdsourcing Personal Blog, Microblog documents

Dr Samar Al Muntaser Background There is a paucity of literature on the epidemiology,

IN5320 - Development in Platform Ecosystems Lecture 5: Design in Platform ecosystems 5th of

robustly combining supervised and bandit feedback Chicheng Zhang 1 ; Alekh Agarwal 1 ; Hal Daum

Betting on Performance a note on hypothesis-driven performance testing James Lewis : @boicy 2

Section 8 Section 8 Programming a 8-1 1 Software Development Flow Software Development Flow

Experimental Challenges in Cyber Security: A Story of

Motivation (1) Mutter: Wie oft muss ich dir noch sagen, dass du die Zimmer aufr aumen sollst?

Scalar implicatures - a view from processing Judith Degen University of Rochester September 18,

Distributed Resources By Bryce Yonker on the Verge For CEDMC Nov 2020 Mission: Promote and

Introduc>on TREC 2012 Context Sugges>on track operates on