Text REtrieval Conference (TREC) TREC TRACKS Crowdsourcing - PowerPoint PPT Presentation

Text REtrieval Conference (TREC)

TREC TRACKS Crowdsourcing Personal Blog, Microblog documents Spam Retrieval in a Chemical IR domain Genomics, Medical Records Answers, Novelty not documents QA, Entity Searching corporate Legal repositories Enterprise Size, Terabyte, Million Query efficiency, & Web web search VLC Video Beyond text Speech OCR Beyond Cross-language just Chinese English Spanish Human-in-the- HARD, Feedback loop Interactive, Session Streamed Filtering text Routing Static text Ad Hoc, Robust 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Text REtrieval Conference (TREC)

Some lessons learned after 20 • People like to do this; an enormous amount of work is done by coordinators & participants!! • Getting data has been/always will be a problem • However having good, freely available test collections is a major contribution of these evaluations • Two years is about right for a given task!! • The first year is an exciting pilot getting the evaluation right; second year has training data for improvements; by the third year its boring (nothing new is being tried or learned) • The Cranfield paradigm is surprisingly robust • The use of pooling for relevance assessment has been shown to result in stable evaluations (at least most of the time)!! • The paradigm has been successfully adapted to different media (such as video), different tasks (such as QA)!! Text REtrieval Conference (TREC)

But-- • How do we balance “basic” IR research against moving into new and exciting areas • Beyond the early years, we are not seeing real improvements; this is discouraging, especially to today’s students!! • Is it right to keep adapting Cranfield?? • Does it mean we are blinding ourselves to “the big picture”; are the tasks we can model the important ones?? • If not Cranfield, then what?? • How do we work in areas that are known to be important but either lack data (desktop search), or are difficult to cleanly define (such as different relevancy criteria like quality, time- dependency, etc.) Text REtrieval Conference (TREC)

2011 Medical Records Track • Ad hoc search task • set of ~ 100,000 de-identified clinical records assembled by U. of Pittsburgh’s BLULab NLP repository – assembled into ~17,000 “visits” through mapping table • 35 topics developed and judged by physicians enrolled in OHSU bioinformatics program; modeled after inclusion criteria for clinical studies patients with complicated GERD who receive endoscopy • systems return ranked list of visits • Evaluation • judgment sets produced using deep but sparse stratified sampling • bpref as main evaluation metric; inferred measures noisy with type of sampling used Text REtrieval Conference (TREC)

2011 Microblog Track • Test Collection • Tweets2011 collection of about 16 million tweets • 50 topics created by NIST assessors consisting of [ title , triggerTweet ] pairs where title is an English statement of the information need and triggerTweet is a pointer to a tweet in the collection • Evaluation • pools of top 30 tweets from submitted runs • tweet is relevant if it contains relevant information itself or points to relevant information – Must be in English and NOT a retweet – Must precede time of triggerTweet Text REtrieval Conference (TREC)

2011 Session Track • 76 sessions derived from 62 topics • topics taken from previous tracks and faceted like web topics • A session (created at U. Sheffield) consists of • sequence of queries issued to satisfy the information need of the topic; median of two reformations; 38% with more • ranked list of (top 10) URLs returned for each query • set of URLs clicked on plus dwell times for each query • Four runs comprise single submission: • R1 : results for last query using no other info • R2 : results for last query, using content of all previous queries in session • R3 : results for last query using content of previous queries plus ranked lists • R4 : results for last query using content of previous queries, ranked lists, and click/dwell time info Text REtrieval Conference (TREC)

2011 Crowdsourcing Track • Investigate judgments from a crowd • participants collect assessments for sets of topic-doc pairs; 5 pairs per set • evaluate quality of crowdsourcing design by quality of the judgments (consensus or matching NIST) • Given a set of labels for same [topic, doc] pair, compute a final label • test data built from crowdsourcing judgments collected from TREC 2010 Relevance Feedback track; evaluate quality of consensus labels as either function of gold standard [NIST] labels or as function of others’ consensus labels Text REtrieval Conference (TREC)

More TRECs • TREC 2012 Tracks • Crowdsourcing, Microblog, Medical Records, Session, Web continuing (Legal had no new data) • Knowledge Base Acceleration (KBA) – Update Wikipedia entities based on extraction from streaming data • Contextual Suggestion – Given a set of profiles, a set of example suggestions, and a set of contexts, for each profile/context pairing, participants should return a ranked list of 50 proposed suggestions • TREC 2013?? Text REtrieval Conference (TREC)

Text REtrieval Conference (TREC) TREC TRACKS Crowdsourcing - PowerPoint PPT Presentation

Text REtrieval Conference (TREC) TREC TRACKS Crowdsourcing Personal Blog, Microblog documents Spam Retrieval in a Chemical IR domain Genomics, Medical Records Answers, Novelty not documents QA, Entity Searching corporate Legal

Regional Trec - September 27, 2015 - Cadogan Farms TREC Workshop April 2015 Regional TREC

Overview of TREC 2014 Ellen Voorhees Text REtrieval Conference (TREC) TREC 2014 Track

TREC, TAC, takeoffs, tacks, tasks, and titillations for 2009 Ian Soboroff, NIST

AutoAdapt @ TREC 2010 Dyaa Albakour October 7, 2010 Dyaa Albakour AutoAdapt @ TREC 2010 The

Overview of TREC 2013 Ellen Voorhees Text REtrieval Conference (TREC) Back to our roots, writ

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Search Evaluation at Grooveshark Yoni Teitelbaum 2013-07-02 Traditional Evaluation: TREC Image

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

TREC 2003 Tracks A Tale of Two Evaluat ions Retrieval in a domain Genome Novelty Answers,

Text REtrieval Conference (TREC) Question Answering Tasks and Evaluation Methods Hoa Trang Dang

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Utilizing Knowledge Bases for Text Retrieval: A Wishlist for Text Retrieval: A Wishlist

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

Big Spatial Data Management on Spark 1 Tons of Spatial data out there Geotagged Pictures

MIT CSAIL 1 Problem: 3G/LTE is a battery hog Up to 14 hours on 2G Up to 6.5

Why? CASCADE project Digital Literacies in Academia Engaging with research online podcasts

Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of

The MixedEmotions Platform Technical Webinar for the MixedEmotions Big Data Emotion Analysis

Theo heory of of w walking m metho hods Michael B Duignan For information about the research

Introduction to Machine Learning 1. Overview Alex Smola Carnegie Mellon University

Stealthy Porn: Understanding Real-World Adversarial Images for Illicit Online Promotion Yuan, Di