Adaptive Data Propagation in Peer-to-Peer Systems Thomas Repantis - - PowerPoint PPT Presentation

adaptive data propagation in peer to peer systems
SMART_READER_LITE
LIVE PREVIEW

Adaptive Data Propagation in Peer-to-Peer Systems Thomas Repantis - - PowerPoint PPT Presentation

Adaptive Data Propagation in Peer-to-Peer Systems Thomas Repantis trep@cs.ucr.edu CS253-Distributed Systems, Winter 2004 p.1/18 Overview 1. Problem 2. Solution 3. Simulation Results 4. Conclusion CS253-Distributed Systems, Winter 2004


slide-1
SLIDE 1

Adaptive Data Propagation in Peer-to-Peer Systems

Thomas Repantis

trep@cs.ucr.edu

CS253-Distributed Systems, Winter 2004 – p.1/18

slide-2
SLIDE 2

Overview

  • 1. Problem
  • 2. Solution
  • 3. Simulation Results
  • 4. Conclusion

CS253-Distributed Systems, Winter 2004 – p.2/18

slide-3
SLIDE 3

Problem Definition

How can we efficiently locate an object in an unstructured peer-to-peer system, when a reference to that object is given? Traditionally flooding, propagating query hop-by-hop, with many disadvantages: Messages travel a large number of hops. Waste processing power of many nodes. Produce large amounts of network traffic. Delay the answer.

CS253-Distributed Systems, Winter 2004 – p.3/18

slide-4
SLIDE 4

Suggested Solutions

Organize nodes according to their interests. Use Bloom filters to summarize data stored in nodes. Each node examines the content synopses of its neighbors to decide were to propagate a query.

CS253-Distributed Systems, Winter 2004 – p.4/18

slide-5
SLIDE 5

We suggest going even further

Let us propagate the Content Synopses adaptively, according to parameters like: Number of queries we have received from a peer. Number of local hits the queries of a peer have produced.

CS253-Distributed Systems, Winter 2004 – p.5/18

slide-6
SLIDE 6

System operation example

According to its criteria, C propagates S only to B B based on S routes Q only to C QH is routed back to A

QH A B C D E F G Q Q S S S QH

CS253-Distributed Systems, Winter 2004 – p.6/18

slide-7
SLIDE 7

Simulation Parameters

We implemented our protocols on top of the Neurogrid simulator Counting Bloom Filters, 4-bit counter, 10 bits length, 4 hash functions 300 possible Documents 400 possible Keywords 30 Documents per Node 1 Keyword per Document 50 Maximum Connections per Node TTL 7

CS253-Distributed Systems, Winter 2004 – p.7/18

slide-8
SLIDE 8

Synopses Hits

Queries found in neighbors’ content synopses.

20000 40000 60000 80000 100000 120000 140000 160000 180000 200000 1000 2000 3000 4000 5000 6000 7000 Number of Synopses Hits Number of Nodes Synopses Hits "SynopsesHitsADP" "SynopsesHitsBF"

CS253-Distributed Systems, Winter 2004 – p.8/18

slide-9
SLIDE 9

Synopses Misses

Queries not found in neighbors’ content synopses.

50000 100000 150000 200000 250000 1000 2000 3000 4000 5000 6000 7000 Number of Synopses Misses Number of Nodes Synopses Misses "SynopsesMissesADP" "SynopsesMissesBF"

CS253-Distributed Systems, Winter 2004 – p.9/18

slide-10
SLIDE 10

False Positives

Queries falsely propagated.

100 200 300 400 500 600 700 1000 2000 3000 4000 5000 6000 7000 Number of False Positives Number of Nodes False Positives "FalsePositivesADP" "FalsePositivesBF"

CS253-Distributed Systems, Winter 2004 – p.10/18

slide-11
SLIDE 11

Average Number of Matches

Number of matching documents found for a search.

100 200 300 400 500 600 700 1000 2000 3000 4000 5000 6000 7000 Average Number of Matches Number of Nodes Average Number of Matches "MatchesAvgADP" "MatchesAvgBF" "MatchesAvgGNT"

CS253-Distributed Systems, Winter 2004 – p.11/18

slide-12
SLIDE 12

Average Number of Message Trans- fers

Number of messages sent during a search.

5000 10000 15000 20000 25000 1000 2000 3000 4000 5000 6000 7000 Average Number of Message Transfers Number of Nodes Average Number of Message Transfers "MsgTransfersAvgADP" "MsgTransfersAvgBF" "MsgTransfersAvgGNT"

CS253-Distributed Systems, Winter 2004 – p.12/18

slide-13
SLIDE 13

Average Number of Nodes Reached

Number of nodes reached during a search.

1000 2000 3000 4000 5000 6000 7000 1000 2000 3000 4000 5000 6000 7000 Average Number of Nodes Reached Number of Nodes Average Number of Nodes Reached "NodesReachedAvgADP" "NodesReachedAvgBF" "NodesReachedAvgGNT"

CS253-Distributed Systems, Winter 2004 – p.13/18

slide-14
SLIDE 14

Average TTL of First Match

The TTL of the first message that found the first match (i.e. how many hops before the first hit).

1 2 3 4 5 6 1000 2000 3000 4000 5000 6000 7000 Average TTL of First Match Number of Nodes Average TTL of First Match "TTLavgADP" "TTLavgBF" "TTLavgGNT"

CS253-Distributed Systems, Winter 2004 – p.14/18

slide-15
SLIDE 15

Average Recall

Proportion of all possible matches that was actually discovered.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1000 2000 3000 4000 5000 6000 7000 Average Recall Number of Nodes Average Recall "RecallAvgADP" "RecallAvgBF" "RecallAvgGNT"

CS253-Distributed Systems, Winter 2004 – p.15/18

slide-16
SLIDE 16

Average Recall Efficiency

Average Recall divided by the number of messages transferred.

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 1000 2000 3000 4000 5000 6000 7000 Average Recall Efficiency Number of Nodes Average Recall Efficiency "RecallEffAvgADP" "RecallEffAvgBF" "RecallEffAvgGNT"

CS253-Distributed Systems, Winter 2004 – p.16/18

slide-17
SLIDE 17

Conclusions

By propagating content synopses to peers that are selected adaptively we get: Faster search and retrieval. Less bandwidth wasted. Less processing power wasted. However, the recall of flooding is not reached. Room for future work! Other parameters Further propagation Pulling instead of pushing

CS253-Distributed Systems, Winter 2004 – p.17/18

slide-18
SLIDE 18

Thank you!

Questions/comments?

CS253-Distributed Systems, Winter 2004 – p.18/18