A Peer-to-Peer Inverted Index Implementation for Word-based Content - PowerPoint PPT Presentation

A Peer-to-Peer Inverted Index Implementation for Word-based Content Search Nuno Lopes University of Minho October 2003

P2P System Characterization • Scalable up to millions of nodes • Highly dynamic node membership • Reduced node uptime: 1 hour on average • No centralized authority � 2003 Nuno Lopes c 1 SDDI 2003

1st Generation of P2P Systems File Sharing Oriented • Napster Centralized search with p2p file download ⇒ Single point-of-failure • Gnutella Broadcast based search ⇒ Network overloaded � 2003 Nuno Lopes c 2 SDDI 2003

Searching Model • Local model Individual peer search Examples: Gnutella, Pedone’02 • Global model Information is placed on a global (distributed) shared index � 2003 Nuno Lopes c 3 SDDI 2003

2nd Generation of P2P Systems Distributed Hash Table (DHT) Based • Examples: Chord, Pastry, others... • Simple hash table operations on ( key , value ) pairs • Efficient routing: O (log N ) hops for any peer • Scalable state information: O (log N ) routing entries per peer • But... incapable of searching � 2003 Nuno Lopes c 4 SDDI 2003

Inverted Index Description • Association word �→ { document location } SET • Document Location Set is highly dynamic • Follows Zipf distribution 700 600 500 # Documents 400 300 200 100 0 0 5000 10000 15000 20000 25000 30000 35000 Words � 2003 Nuno Lopes c 5 SDDI 2003

Inverted Index API • INSERT( word , reference ) • REMOVE( word , reference ) • HAS REF( word , reference ): bool • GET REF( word ): reference • NEXT REF( word , reference ): reference � 2003 Nuno Lopes c 6 SDDI 2003

Inverted Index Implementation Index is splited in constant size blocks, accessed through 2 layers: • DHT as base platform for block-oriented storage ⇒ Unsuitable as a stand-alone implementation • B+ tree for block management Responsible for the set implementation to each word � 2003 Nuno Lopes c 7 SDDI 2003

Current Simulation Settings • Only the B+ tree layer is simulated • Peers store a single block each • Messages have an atomic cost • Single client requests index operations on the system • Data consists on 1000 small documents with 36499 unique words � 2003 Nuno Lopes c 8 SDDI 2003

Initial Simulation Results • B+ trees make the storage load uniform across peers • However... root blocks for popular words have high network load 800 700 600 500 Access rate 400 300 200 100 0 0 10000 20000 30000 40000 50000 60000 Blocks � 2003 Nuno Lopes c 9 SDDI 2003

Caching Mechanism • Clients have high probability of requesting the same blocks for popular words • Caching of (non-leaf) blocks reduces the number of accesses • In order to avoid stale copies, leaf blocks are never cached • Higher level blocks are less probable to become modified and therefore stale � 2003 Nuno Lopes c 10 SDDI 2003

Simulation Results (Using Cache) • The use of a cache mechanism (LRU) distributes more evenly the network load on peers • Access rates were reduced by a factor of 10 60 50 40 Access rate 30 20 10 0 0 10000 20000 30000 40000 50000 60000 Blocks � 2003 Nuno Lopes c 11 SDDI 2003

Open Questions • Measurement of DHT as stand-alone implementation of inverted index • Analysis of the block caching mechanism to determine the best cache size for different numbers of peers on the system • Implementation of multiple blocks to peer association for studying effective peer load • AND and OR search operators implementation and load measurement � 2003 Nuno Lopes c 12 SDDI 2003

A Peer-to-Peer Inverted Index Implementation for Word-based Content - PowerPoint PPT Presentation

A Peer-to-Peer Inverted Index Implementation for Word-based Content Search Nuno Lopes University of Minho October 2003 P2P System Characterization Scalable up to millions of nodes Highly dynamic node membership Reduced node uptime:

Indices Tomasz Bartoszewski Inverted Index Search Construction Compression Inverted

Inverted Index Lecture 12 Inverted Index 1 December 2014 1 Wentworth Institute of Technology

Crawling HTML create an user user inverted index query Search show results inverted

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary

Multi-Indexed Files : Outline ! Introduction ! Inverted Files ! Multilist Files rasitjutrakul

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Microsoft AI & Research Traditional IR Keyword based Search AUTB streams Inverted index

NPFL103: Information Retrieval (1) Introduction, Boolean retrieval, Inverted index, Text

Inverted Indexes the IR Way CS330 Fall 2005 1 Term Doc # How Inverted Files now 1 is 1

THE PEER-TO-PEER NETWORK JOHN NEWBERY @jfnewbery github.com/jnewbery THE PEER-TO-PEER NETWORK

Serverless networking (peer-to-peer computing) Peer-to-peer models Client-server computing

Peer-to-Peer Networks 09 Random Graphs for Peer-to-Peer-Networks Christian Ortolf Technical

Comparing Hybrid Peer-to-Peer Hybrid peer-to-peer systems Systems Beverly Yang and Hector

Inverted Index Large set D of documents (possibly from WWW). We have a set of terms appearing in

Index Rules and Methodology Index Name Ticker S-Network US Equity 3000 Index SN3000 S-Network

Peer to Peer Learning & Support Aims and Objectives of this Workshop Workshop 3: Peer to

Crowdfund Scotland Falkirk Funders Fayre 05th March 2019 | 1 Our purpose Crowdfunder exists

Mechanical Support for Efficient Dissemination on the CAN Overlay Network - Francesco Bongiovanni

Types of Types Types of Types natural numbers. A type is a (possibly infinite) set of values.

VoroNet: A scalable object network based on Voronoi tessellations Olivier Beaumont, Anne-Marie

Disclosures I have nothing to disclose Heart Failure for the Hospitalist Ronald Witteles,

A Reviewing and Rating Site for the Web of Data Tom Heath and Enrico Motta KMi, The Open

for Technopreneurs [ENT 207/ENT607] Desai Sethi Centre for Entrepreneurship Business

ni Txizx ] " Independence, Continued Example: Max of Two Exponentials Min of n Uniforms

A Peer-to-Peer Inverted Index Implementation for Word-based Content - PowerPoint PPT Presentation

A Peer-to-Peer Inverted Index Implementation for Word-based Content Search Nuno Lopes University of Minho October 2003 P2P System Characterization Scalable up to millions of nodes Highly dynamic node membership Reduced node uptime:

Indices Tomasz Bartoszewski Inverted Index Search Construction Compression Inverted

Inverted Index Lecture 12 Inverted Index 1 December 2014 1 Wentworth Institute of Technology

Crawling HTML create an user user inverted index query Search show results inverted

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary

Multi-Indexed Files : Outline ! Introduction ! Inverted Files ! Multilist Files rasitjutrakul

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Microsoft AI &amp; Research Traditional IR Keyword based Search AUTB streams Inverted index

NPFL103: Information Retrieval (1) Introduction, Boolean retrieval, Inverted index, Text

Inverted Indexes the IR Way CS330 Fall 2005 1 Term Doc # How Inverted Files now 1 is 1

THE PEER-TO-PEER NETWORK JOHN NEWBERY @jfnewbery github.com/jnewbery THE PEER-TO-PEER NETWORK

Serverless networking (peer-to-peer computing) Peer-to-peer models Client-server computing

Peer-to-Peer Networks 09 Random Graphs for Peer-to-Peer-Networks Christian Ortolf Technical

Comparing Hybrid Peer-to-Peer Hybrid peer-to-peer systems Systems Beverly Yang and Hector

Inverted Index Large set D of documents (possibly from WWW). We have a set of terms appearing in

Index Rules and Methodology Index Name Ticker S-Network US Equity 3000 Index SN3000 S-Network

Peer to Peer Learning &amp; Support Aims and Objectives of this Workshop Workshop 3: Peer to

Crowdfund Scotland Falkirk Funders Fayre 05th March 2019 | 1 Our purpose Crowdfunder exists

Mechanical Support for Efficient Dissemination on the CAN Overlay Network - Francesco Bongiovanni

Types of Types Types of Types natural numbers. A type is a (possibly infinite) set of values.

VoroNet: A scalable object network based on Voronoi tessellations Olivier Beaumont, Anne-Marie

Disclosures I have nothing to disclose Heart Failure for the Hospitalist Ronald Witteles,

A Reviewing and Rating Site for the Web of Data Tom Heath and Enrico Motta KMi, The Open

for Technopreneurs [ENT 207/ENT607] Desai Sethi Centre for Entrepreneurship Business

ni Txizx ] &quot; Independence, Continued Example: Max of Two Exponentials Min of n Uniforms

Microsoft AI & Research Traditional IR Keyword based Search AUTB streams Inverted index

Peer to Peer Learning & Support Aims and Objectives of this Workshop Workshop 3: Peer to

ni Txizx ] " Independence, Continued Example: Max of Two Exponentials Min of n Uniforms