Path Query Routing in Unstructured Peer-to-Peer Networks Nicolas - - PowerPoint PPT Presentation

path query routing in unstructured peer to peer networks
SMART_READER_LITE
LIVE PREVIEW

Path Query Routing in Unstructured Peer-to-Peer Networks Nicolas - - PowerPoint PPT Presentation

Introduction Related work Content based routing of path queries Conclusion Path Query Routing in Unstructured Peer-to-Peer Networks Nicolas Bonnel, Gildas Mnier, Pierre-francois Marteau Laboratoire Valoria - Universit de Bretagne Sud


slide-1
SLIDE 1

Introduction Related work Content based routing of path queries Conclusion

Path Query Routing in Unstructured Peer-to-Peer Networks

Nicolas Bonnel, Gildas Ménier, Pierre-francois Marteau Laboratoire Valoria - Université de Bretagne Sud August 29, 2007

1 / 21

slide-2
SLIDE 2

Introduction Related work Content based routing of path queries Conclusion Context Overview

1

Introduction Context Overview

2

Related work P2P architecture Bloom filters

3

Content based routing of path queries Multi Level EDBF Clustering Preliminary results

4

Conclusion

2 / 21

slide-3
SLIDE 3

Introduction Related work Content based routing of path queries Conclusion Context Overview

Context

Context Indexing very large database Semi-structured information (XML) Need to index the structure

  • f documents

Need to answer approximative queries Queries Exact : article/title 1 unkown element : article/ ?/paragraph 0 or more unknown elements : */paragraph

3 / 21

slide-4
SLIDE 4

Introduction Related work Content based routing of path queries Conclusion Context Overview

Architecture

Overview Distributed XML database The system constrains the location and replication of data Resources scavenging Allow to use more computers, cheap cost Ex : SETI Peer to Peer architecture Fault tolerance Scalability

4 / 21

slide-5
SLIDE 5

Introduction Related work Content based routing of path queries Conclusion P2P architecture Bloom filters

1

Introduction Context Overview

2

Related work P2P architecture Bloom filters

3

Content based routing of path queries Multi Level EDBF Clustering Preliminary results

4

Conclusion

5 / 21

slide-6
SLIDE 6

Introduction Related work Content based routing of path queries Conclusion P2P architecture Bloom filters

Structured p2p network

Chord, CAN, Tapestry, ... Advantage Easy to retrieve rare items Limitations Approximatives and ranged queries very costly Load balancing problems

6 / 21

slide-7
SLIDE 7

Introduction Related work Content based routing of path queries Conclusion P2P architecture Bloom filters

Untructured p2p network

Gnutella [Clip2, 2002], ... Advantages Highly replicated items can be retrieved at a cheap cost Can control data placement Limitation Very costly to retrieve rare items

7 / 21

slide-8
SLIDE 8

Introduction Related work Content based routing of path queries Conclusion P2P architecture Bloom filters

Bloom Filters [Bloom, 70]

Definition A array of m bits. hi : 0 <= i < k k hash functions. insert(x) : ∀i : A[hi(x)] = 1 query(x) : true if ∀i : A[hi(x)] == 1 False positives False positives are possible, but false negatives are not Probability of false positive : (1 − (1 − 1

m)kn)k

8 / 21

slide-9
SLIDE 9

Introduction Related work Content based routing of path queries Conclusion P2P architecture Bloom filters

Exponentially Decaying Bloom Filter [Kumar, 2005]

EDBF θ(x) = |∀i : A[hi(x)] == 1| Can be used to encode stochastic routing tables. θ(x)/k : probability to find x among a specific link. n hop count from element : θ(x)/k = 1/dn Update Copy filter of each neighboor Bits of the copy are set to 0 with a probability (1 − 1/d) OR with local filter ex : propagation with a decay d = 2

9 / 21

slide-10
SLIDE 10

Introduction Related work Content based routing of path queries Conclusion P2P architecture Bloom filters

Multi Level Bloom Filter [Koloniari, 2004]

Depth Bloom Filter Set of Bloom filters Each XML path of length i is stored in DBFi Breadth Bloom Filter Set of Bloom filters Each element at level i is stored in BBFi

10 / 21

slide-11
SLIDE 11

Introduction Related work Content based routing of path queries Conclusion Multi Level EDBF Clustering Preliminary results

1

Introduction Context Overview

2

Related work P2P architecture Bloom filters

3

Content based routing of path queries Multi Level EDBF Clustering Preliminary results

4

Conclusion

11 / 21

slide-12
SLIDE 12

Introduction Related work Content based routing of path queries Conclusion Multi Level EDBF Clustering Preliminary results

Multi Level EDBF

Breadth EDBF Exponential decaying version

  • f BBF

Additional filter to store elements in a reverse order Querying Product of probabilities BBF can answer E/∗ queries RBBF can answer ∗/E queries Both filters can answer E/ ∗ /E queries

12 / 21

slide-13
SLIDE 13

Introduction Related work Content based routing of path queries Conclusion Multi Level EDBF Clustering Preliminary results

Data clustering

Agent An agent carries an indexed XML path choosen at random Moves randomly on the network If better node → moves the indexed XML path on this node Comparison function : number of XML path’s elements in the filter of the node and its neighborhood Example BBF1 contains A BBF2 contains B RBBF1 contains C RBBF2 contains B path A/B/C have a score of 4

13 / 21

slide-14
SLIDE 14

Introduction Related work Content based routing of path queries Conclusion Multi Level EDBF Clustering Preliminary results

Experiments settings

Network topology 200 nodes Random graph Node degree between 3 and 8 Settings 260 000 XML documents from Wikipedia (1.5 GByte) Filter’s size : 8192 (213), 3 filters in sets (BBF1, BBF2, BBF3, RBBF1, RBBF2, RBBF3) Number of hash functions : 32 1000 queries generated at random

14 / 21

slide-15
SLIDE 15

Introduction Related work Content based routing of path queries Conclusion Multi Level EDBF Clustering Preliminary results

Preliminary results

20 40 60 80 100 10 20 30 40 50 60 70 80 200 400 600 800 1000 1200 1400 1600 1800 2000 Filters occupation (%) Elapsed time (h) Filters occupation (%) Paths indexed per node Paths moved per node in 1 h

15 / 21

slide-16
SLIDE 16

Introduction Related work Content based routing of path queries Conclusion Multi Level EDBF Clustering Preliminary results

Preliminary results

10 20 30 40 50 60 70 80 90 100 200 400 600 800 1000 Queries answered (%) Hop count limit SQR, no unknown element RW, no unknown element SQR, 1 unknown element RW, 1 unknown element

2 element queries No unknown element : article/title, section/paragraph, ... 1 unknown element : ?/abstract, article/ ?, ...

16 / 21

slide-17
SLIDE 17

Introduction Related work Content based routing of path queries Conclusion Multi Level EDBF Clustering Preliminary results

Preliminary results

10 20 30 40 50 60 70 80 90 100 200 400 600 800 1000 Queries answered (%) Hop count limit SQR, no unknown element RW, no unknown element SQR, 1 unknown element RW, 1 unknown element

3 element queries No unknown element : article/section/paragraph, ... 1 unknown element : article/ ?/paragraph, ?/section/paragraph, ...

17 / 21

slide-18
SLIDE 18

Introduction Related work Content based routing of path queries Conclusion Multi Level EDBF Clustering Preliminary results

Preliminary results

10 20 30 40 50 60 70 80 90 100 200 400 600 800 1000 Queries answered (%) Hop count limit SQR RW

Ancestor-descendant queries article/*/paragraph, article/*/abstract

18 / 21

slide-19
SLIDE 19

Introduction Related work Content based routing of path queries Conclusion

1

Introduction Context Overview

2

Related work P2P architecture Bloom filters

3

Content based routing of path queries Multi Level EDBF Clustering Preliminary results

4

Conclusion

19 / 21

slide-20
SLIDE 20

Introduction Related work Content based routing of path queries Conclusion

Conclusion

Contribution Routing of approximative XML path queries Data clustering of path indexes. Experiments Increase routing performances compared to random walk Good performances with rare elements Future Work Larger network Information replication Take into account element attributes

20 / 21

slide-21
SLIDE 21

Introduction Related work Content based routing of path queries Conclusion

Acknowledgements

This research was supported by Region Bretagne.

21 / 21

slide-22
SLIDE 22

Introduction Related work Content based routing of path queries Conclusion

References

Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7) :422–426, 1970. Clip2. The gnutella protocol specification v0.4, 2002.

  • G. Koloniari and E. Pitoura.

Content-based routing of path queries in peer-to-peer systems. In Proceedings of the EDBT’04 International Conference, Heraklion, Crete, Greece, 2004. Abhishek Kumar, Jun Xu, and Ellen W. Zegura. Efficient and scalable query routing for unstructured peer-to-peer networks. In Proc. of IEEE Infocom, 2005.

22 / 21