Users Information Abstract Sources With the ever incr e - PDF document

Index Structures for Information Filtering Under the V ector Space Mo del � T ak W. Y an and Hector Garcia-Molina Departmen t of Computer Science Stanford Univ ersit y Stanford, CA 94305 Users Information Abstract Sources With the ever incr e asing volumes of ele ctr onic information gener ation, users of information systems ar e facing an information overlo ad. It is desir able to supp ort infor- Information Filtering mation �ltering as a c omplement to tr aditional r etrieval Server(s) me chanism. The numb er of users, and thus pr o�les (r ep- r esenting users' long-term inter ests), hand le d by an information �ltering system is p otential ly huge, and the system has to pr o c ess a c onstant str e am of inc oming information Figure 1: Information Filtering Serv er(s) in a timely fashion. The e�ciency of the �ltering pr o c ess is thus an imp ortant issue. In this p ap er, we study what data structur es and algo- systems can pro vide an information �ltering mec hanism, rithms c an b e use d to e�ciently p erform lar ge-sc ale infor- through whic h a user subscrib es pr o�les , or queries that mation �ltering under the ve ctor sp ac e mo del, a r etrieval are con tin uously ev aluated, to represen t his long-term in- mo del establishe d as b eing e�e ctive. We apply the ide a of terests, and then passiv ely receiv es information �ltered b y the standar d inverte d index to index user pr o�les. We de- the system according to the pro�les. vise an alternative to the standar d inverte d index, in which we, inste ad of indexing every term in a pr o�le, sele ct only Researc h in information �ltering has receiv ed a lot of the signi�c ant ones to index. We evaluate their p erfor- atten tion lately . Ho w ev er, previous w ork has fo cused on manc e and show that the indexing metho ds r e quir e or ders the e�ectiv eness (precision and recall) of the �ltering, and of magnitude fewer I/Os to pr o c ess a do cument than when little has b een done to address the e�ciency (p erformance) no index is use d. We also show that the pr op ose d alterna- asp ect of the problem. W e b eliev e that information �ltering tive p erforms b etter in terms of I/O and CPU pr o c essing is going to b e used on a large scale and hence the e�ciency time in many c ases. issue m ust b e addressed. In this pap er, w e presen t data structure and algorithms to supp ort information �ltering. Wide area information retriev al is no w a realit y; large- scale w orld-wide information �ltering is also foreseeable. 1 In tro duction Consider a p opulation of users and a n um b er of information sources in a net w ork ed information �ltering en viron- Information is increasingl y a v ailable in electronic form. men t. The �ltering can b e done either at the information The n um b er and size of full text do cumen t databases are sources, at the user sites, or at an in termediate information rapidly increasing. Users of suc h database systems are fac- �ltering server (Figure 1). Relying solely on user �ltering ing an information o v erload; it is b ecoming di�cult for is exp ensiv e since net w ork bandwidth is w asted to transmit users to rely solely on traditional retrosp ectiv e searc h and irrelev an t information and a lot of w asteful lo cal pro cess- retriev al mec hanisms to k eep themselv es apprised of new ing is done. Relying on �ltering at the sources themselv es do cumen ts that are relev an t to their in terest. As a com- is also exp ensiv e since users need to replicate their pro�les plemen t to con v en tional searc h mec hanism, information at al l p ossible sources. The information �ltering serv er is a go o d compromise. It collects information from a set of � This researc h w as sp onsored b y the Adv anced Researc h Pro jects Agency (ARP A) of the Departmen t of Defense under sources and routes it to in terested users. Of course, there Gran t No.MD A972-92-J-1029 with the Corp oration for National can b e m ultiple information �ltering serv ers on the net- Researc h Initiativ es (CNRI). The views and conclusions con- w ork, eac h servicing a di�eren t set (ma yb e o v erlapping) of tained in this do cumen t are those of the authors and should not users and information sources. b e in terpreted as necessarily represen ting the o�cial p olicies or In this pap er, w e fo cus on one information �ltering endorsemen t , either expressed or implied, of ARP A, the U.S. serv er and consider what data structure and algorithms Go v ernmen t , or CNRI.

Users Information Abstract Sources With the ever incr e - PDF document

Index Structures for Information Filtering Under the V ector Space Mo del T ak W. Y an and Hector Garcia-Molina Departmen t of Computer Science Stanford Univ ersit y Stanford, CA 94305 Users Information

Fermilab Users Meeting Fermilab Users Meeting Fermilab Users Meeting Fermilab Users

Unix : Name Resolution quick user overview Users and Groups Users and Groups Users

Users Satisfaction Survey & Users & producers dialogue Haitham Zeidan Dissemination and

Canadian Light Source Access Mechanisms Dean Chapman, Science Director International CLS users

USER-CENTRIC SOCIAL MULTIMEDIA COMPUTING FROM USERS,ON USERS,FOR USERS Jitao Sang Institute of

Users and Coverage Initial Consultation Meeting with the State of Nevada Users and Coverage Goals

October 6, 2010 Instagram Over One Billion Users Worldwide USA Users Comprise 19% Owned By

Users Org Business Meeting to review and conduct the business of the Fermilab Users

Networked World 1.3 billion users 700 billion minutes/month 280 million users 80%

2017 Social Media Facts Facebook 1.01 billion monthly active users as of October 23, 2012

Data Landing Page Current State All users can currently do on Users struggle with the terms our

future shock treatment ctrl+s SVN HTML CSS JavaScript elements attributes class names IDs

What Weve Learned from Users Evaluation, session 11 CS6200: Information Retrieval Users vs.

TRAINING PRESENTATION EXPLANATION OF USERS AND ROLES Users: Each user in the TBR Central eSHOP

Retain Your Users for Life Proactive We Live in an Age of Instant Gratification Personalized

Users: Theories of Tech Use September 9, 2010 I214 Prof. Van House Users as Sources of

Market Performance and Planning Forum July 30, 2013 Objective: Enable dialogue on

Aquarius Plus - Potable Water Purifier Martechnic Navigation Aquarius Plus Martechnic

ON THE CUSP: A New Option for Addressing Administrative Burden Aubrey Schoenleben & Sally

Data Assimilation for Stochastic Transport Models Dan Crisan Imperial College London 12TH

Compliance Risks In Human Resources Jennifer Swayze SPHR-CA, GPHR, SHRM SCP, CEBS Employment

USB Power Delivery and Type-C Overview Architecture Product Portfolio Solution USB Type-C

Real-time traffic sign detection Hassan Shojania Agenda Introduction Method [Escalera

THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a gentle introduction to the Kalman

Sambuz

Useful Links

Newsletter

Mail Us

Users Information Abstract Sources With the ever incr e - PDF document

Index Structures for Information Filtering Under the V ector Space Mo del T ak W. Y an and Hector Garcia-Molina Departmen t of Computer Science Stanford Univ ersit y Stanford, CA 94305 Users Information

Fermilab Users Meeting Fermilab Users Meeting Fermilab Users Meeting Fermilab Users

Unix : Name Resolution quick user overview Users and Groups Users and Groups Users

Users Satisfaction Survey &amp; Users &amp; producers dialogue Haitham Zeidan Dissemination and

Canadian Light Source Access Mechanisms Dean Chapman, Science Director International CLS users

USER-CENTRIC SOCIAL MULTIMEDIA COMPUTING FROM USERS,ON USERS,FOR USERS Jitao Sang Institute of

Users and Coverage Initial Consultation Meeting with the State of Nevada Users and Coverage Goals

October 6, 2010 Instagram Over One Billion Users Worldwide USA Users Comprise 19% Owned By

Users Org Business Meeting to review and conduct the business of the Fermilab Users

Networked World 1.3 billion users 700 billion minutes/month 280 million users 80%

2017 Social Media Facts Facebook 1.01 billion monthly active users as of October 23, 2012

Data Landing Page Current State All users can currently do on Users struggle with the terms our

future shock treatment ctrl+s SVN HTML CSS JavaScript elements attributes class names IDs

What Weve Learned from Users Evaluation, session 11 CS6200: Information Retrieval Users vs.

TRAINING PRESENTATION EXPLANATION OF USERS AND ROLES Users: Each user in the TBR Central eSHOP

Retain Your Users for Life Proactive We Live in an Age of Instant Gratification Personalized

Users: Theories of Tech Use September 9, 2010 I214 Prof. Van House Users as Sources of

Market Performance and Planning Forum July 30, 2013 Objective: Enable dialogue on

Aquarius Plus - Potable Water Purifier Martechnic Navigation Aquarius Plus Martechnic

ON THE CUSP: A New Option for Addressing Administrative Burden Aubrey Schoenleben &amp; Sally

Data Assimilation for Stochastic Transport Models Dan Crisan Imperial College London 12TH

Compliance Risks In Human Resources Jennifer Swayze SPHR-CA, GPHR, SHRM SCP, CEBS Employment

USB Power Delivery and Type-C Overview Architecture Product Portfolio Solution USB Type-C

Real-time traffic sign detection Hassan Shojania Agenda Introduction Method [Escalera

THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a gentle introduction to the Kalman

Sambuz

Useful Links

Newsletter

Mail Us

Users Satisfaction Survey & Users & producers dialogue Haitham Zeidan Dissemination and

ON THE CUSP: A New Option for Addressing Administrative Burden Aubrey Schoenleben & Sally