information retrieval and filtering over self organising
play

Information Retrieval and Filtering over Self-Organising Digital - PowerPoint PPT Presentation

Information Retrieval and Filtering over Self-Organising Digital Libraries Paraskevi Raftopoulou Raftopoulou 1,2 1,2 , Euripides G.M. Petrakis 2 , Paraskevi Christos Tryfonopoulos 1 , and Gerhard Weikum 1 1 Max-Planck Institute for Informatics,


  1. Information Retrieval and Filtering over Self-Organising Digital Libraries Paraskevi Raftopoulou Raftopoulou 1,2 1,2 , Euripides G.M. Petrakis 2 , Paraskevi Christos Tryfonopoulos 1 , and Gerhard Weikum 1 1 Max-Planck Institute for Informatics, Saarbruecken, Germany http://www.mpi-inf.mpg.de/ 2 Technical University of Crete, Chania, Greece http://www.intelligence.tuc.gr/

  2. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 2 of 32 Outline � Motivating scenario � Background � iClusterDL � Architecture � Protocols � Experimental evaluation � Related work & outlook ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  3. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 3 of 32 Motivating scenario ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  4. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 4 of 32 Motivating scenario � Christos needs papers on information retrieval “I want papers on information retrieval” Answers Christos ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  5. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 5 of 32 Motivating scenario � Christos needs papers on information retrieval “I want papers on information retrieval” Answers Christos ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  6. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 6 of 32 Motivating scenario � Christos needs papers on information retrieval “I want papers on information retrieval” Answers Christos ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  7. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 7 of 32 Motivating scenario � There are lots of DLs out there! � Why ask one or a few, when you could ask thousands? � Goal: Distributed resource sharing � Framework to provide IR and IF functionality on top of SONs � Integrate DLs, publishers and other networks seamlessly and with minimum effort � Speed-up query processing ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  8. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 8 of 32 Background information ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  9. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 9 of 32 Background: IR vs IF � IR scenario: � A user poses an one-time query “I want papers on information retrieval”. � The system returns a list of pointers to matching resources (or the actual resources). � IF (or pub/sub or information dissemination) scenario: � A user posts a continuous query to receive a notification when a paper on “information retrieval” is published. � The system notifies the subscriber with a pointer to the matching resources (or the actual resources). ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  10. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 10 of 32 Background: SONs overlay net. overlay net. � Virtually connected peers p p 1 p 1 p p 2 p 2 p p 3 p 3 p p p 1 2 3 1 2 3 � Routing indices with links to p 4 p 4 p p 4 4 other peers p 7 p 7 p p 7 7 RI 4 p 5 p 5 p p p 8 p p 8 p � Peers connected to each other 5 p 1 5 p 6 p 6 p p 8 8 6 6 p 7 are called neighbors � Provide semantic (and social) physical net. physical net. information about peers p 1 p 2 p 3 p p p 1 2 3 � Self-organising overlay p 4 p 4 p 7 p networks 7 p 5 p p 8 p 5 p 6 p 8 � Support rich data models and 6 expressive query languages ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  11. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 11 of 32 Background: Rewiring strategies � Techniques for self-organising peers: � abandon old connections and create new ones � periodic process � Inspired by the ‘small world effect’ � reach anybody in a small intra-cluster or number of routing hops short-range links inter-cluster or long-range links ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  12. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 12 of 32 iClusterDL architecture ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  13. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 13 of 32 iClusterDL basics � (i) intelligent + (Cluster) clustering + (DL) digital libraries = iClusterDL Contributions: � Architecture and protocols to support both IR and IF � 2-level hierarchical (super-peer) P2P network � seamless and easy integration of DLs, scalable � Self-organising DLs based on SONs � support rich query models � benefits from loosely-connected peers ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  14. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 14 of 32 iClusterDL Architecture Super-peer SP P CiteSeer C SP Forms message routing layer � SP Runs a rewiring protocol SP � Serves clients and providers � SP stores cont. queries SP � stores resource publications � SP answers one-time queries � P creates notifications � stores notifications � Integration Springer ACM DL DL ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  15. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 15 of 32 iClusterDL Architecture Provider P P CiteSeer C SP Implemented by information � sources SP SP Used to expose source’s � contents SP P SP Connects to iClusterDL � SP network through a super-peer P Integration Springer ACM DL DL ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  16. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 16 of 32 iClusterDL Architecture Client C P CiteSeer C SP Connects to iClusterDL C � network through a super-peer SP SP Information consumers: � SP P pose one-time queries � SP receive answers � SP subscribe to resource � P publications receive notifications � request resource / Integration send resource Springer ACM DL DL ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  17. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 17 of 32 iClusterDL Protocols � Super-peer join/leave � Super-peer rewiring � Client join (first time only) � Client connect/disconnect � Resource publication/indexing/removal/update � One-time query processing � Continuous query processing � Notification delivery (client online or offline) ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  18. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 18 of 32 Super-peer protocols � Basic idea: Organise super-peers in SONs. Make sure that similar super-peers are clustered together. � Two levels of clustering: � A provider peer clusters its documents and uses its interests to join the network. � A super-peer uses the interests of its providers to identify itself in the network and find other similar super-peers. ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  19. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 19 of 32 Super-peer rewiring A super-peer s 1. computes its intra-cluster similarity (average similarity with its short-range links) 2. initiates rewiring if similarity < threshold θ 3. sends a message ( msg ) with its interest to m neighbors All super-peers receiving msg append their interest and � forward msg to m neighbors The message is sent back to s when TTL = 0 � ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  20. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 20 of 32 IR protocols � Basic idea: Index information in the SON. Make sure one-time queries meet similar publications. � Two levels of indexing: � Global (among all super-peers): Use a self-organising protocol. � Local (at each super-peer): Use a local index appropriate for the publication language. ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  21. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 21 of 32 One-time query processing A super-peer s 1. compares q against its interests & selects the interest int most similar to q 2. if similarity ≥ threshold θ � forwards a message ( msg ) including q to all its short-range links � sends q to all similar providers stored in its provider table 3. if similarity < threshold θ forwards msg to the m of its neighbors most similar to q � All super-peers receiving msg do the same process � The message is forwarded until TTL = 0 ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

  22. Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete 22 of 32 Experimental evaluation ECDL Conference 2008 Aarhus, Denmark, 14-19 September 2008

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend