Project AutoMate Squid: Decentralized Discovery Service C. Schmidt, - - PDF document

project automate
SMART_READER_LITE
LIVE PREVIEW

Project AutoMate Squid: Decentralized Discovery Service C. Schmidt, - - PDF document

Project AutoMate Squid: Decentralized Discovery Service C. Schmidt, The AutoMate Group The Applied Software Systems Laboratory Rutgers, The State University of New Jersey http://automate.rutgers.edu CAIP Autonomic Computing Workshop June,


slide-1
SLIDE 1

Project AutoMate

Squid: Decentralized Discovery Service

  • C. Schmidt, The AutoMate Group

The Applied Software Systems Laboratory Rutgers, The State University of New Jersey http://automate.rutgers.edu

CAIP Autonomic Computing Workshop June, 2003

CAIP Autonomic Computing Workshop, June 2003 2

Outline

  • Introduction
  • Related Work
  • Design
  • Evaluation
  • Ongoing work

CAIP Autonomic Computing Workshop, June 2003 3

Motivation

  • The need for information discovery in large,

decentralized, distributed resource sharing environments, in the absence of global knowledge of naming conventions

  • Examples:

– P2P Document Sharing Systems – Grid Resource Discovery – Web Service Discovery – Collaboration

slide-2
SLIDE 2

CAIP Autonomic Computing Workshop, June 2003 4

Overview

  • Squid is a Peer-to-Peer (P2P) indexing and

information discovery system

  • Supports decentralized information discovery in

AutoMate

  • Supports complex queries containing partial keywords,

wildcards and range queries

  • Guarantees that all existing data elements matching a

query will be found with bounded cost in terms of number of messages and nodes involved

CAIP Autonomic Computing Workshop, June 2003 5

Related Work

Information Discovery P2P Systems

  • Unstructured (Gnutella-like)

– Unstructured overlay network, use flooding

  • Hybrid (Napster)

– Unstructured overlay network, use centralized directories for search

  • Data-lookup (CAN, Chord, Pastry, etc)

– Structured overlay, Internet-scale DHT

  • Structured keyword search

– Structured overlay, extend data-lookup protocols – Examples:

  • Distributed Inverted Indices
  • Space Filling Curve

CAIP Autonomic Computing Workshop, June 2003 6

Outline

  • Introduction
  • Related Work
  • Design
  • Evaluation
  • Ongoing work
slide-3
SLIDE 3

CAIP Autonomic Computing Workshop, June 2003 7

Design - Overview

Document (kw1, kw2, …, kwD)

SFC

D dimensional keyword space 1-dimensional index space Peers (P1, P2, …Pk, …)

CAIP Autonomic Computing Workshop, June 2003 8

The keyword space

  • Documents have assigned keywords

Computer Network Document

Storage space Base Bandwidth

Computational resource

C

  • s

t

2-dimensional keyword space for a P2P sharing system 3-dimensional keyword space for storing computational resources, using the attributes: storage space, base bandwidth and cost

CAIP Autonomic Computing Workshop, June 2003 9

Hilbert Space-Filling Curve (SFC)

  • f: Nd → N, recursive generation
  • Properties:

– Digital causality – Locality preserving – Clustering

1 1 00 01 10 11 11 10 01 00 00 11 01 10 0000 0010 0001 0011 0100 0101 0110 0111 10001001 1010 1011 1100 1111 1110 1101

slide-4
SLIDE 4

CAIP Autonomic Computing Workshop, June 2003 10

Using SFC to generate the index space

  • the d-dimensional keyword space is mapped to a 1-

dimensional index space using SFC

Computer Network Document

CAIP Autonomic Computing Workshop, June 2003 11

The overlay network

  • Use Chord as overlay network

Overlay network with 5 nodes and an identifier space from 0 to 64 Each node stores the keys that map to the segment of the curve between itself and the predecessor node. Cost to look-up data: O(log2N)

13 29 40 51

CAIP Autonomic Computing Workshop, June 2003 12

The Query Engine

  • Query: combination of keywords, partial keywords,

wildcards, ranges

  • Example:

– (computer, network) – (computer, net*) – (comp*, *) – (256-512MB, *, 10Mbps-*) (memory, cost, base bandwidth)

slide-5
SLIDE 5

CAIP Autonomic Computing Workshop, June 2003 13

Query Processing

  • Step1: Translate the query to relevant clusters on the

SFC-based index space

  • Step2: Query the appropriate nodes in the overlay

Query, e.g. (computer, *)

29 13 40 51

Query the nodes 13 and 29

CAIP Autonomic Computing Workshop, June 2003 14

Query optimization

  • Not all clusters that are generated for a query exist in

the network => optimize!

  • SFC generation recursive => clusters generation is

recursive => the process of cluster generation can be viewed as a tree

  • Optimization: embed the tree into the overlay, and

prune nodes during the construction phase

CAIP Autonomic Computing Workshop, June 2003 15

Query optimization – illustration

1 1 11 00 01 10 00 00 01 10 11 01 10 11 0000 0001 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0010

000 001 010 011 100 101 110 111 111 110 101 100 011 010 001 000

Solve query: (011, *)

00, 01 000101, 000110 011111 011010, 011011, 011100 001001, 001010 0110, 0111 0001, 0010

slide-6
SLIDE 6

CAIP Autonomic Computing Workshop, June 2003 16

Query optimization – illustration

00, 01 000101, 000110 011111 011010, 011011, 011100 001001, 001010 0110, 0111 0001, 0010

Embed the leftmost tree path (solid arrows) and the rightmost path (dashed arrows) onto the

  • verlay network topology.

000000 000100 001001 001111 011110 111000

00 01 0001 0110

CAIP Autonomic Computing Workshop, June 2003 17

Load balancing

  • Load balancing at node join:

– generate more than one ID for the new node, send join requests in the network and join with the ID that places the node in the most crowded part of the network

  • Load balancing at runtime:

– run a local load balancing algorithm between neighbors (from time to time), and redistribute the load – use virtual nodes that can migrate to less loaded physical nodes

CAIP Autonomic Computing Workshop, June 2003 18

Outline

  • Introduction
  • Related Work
  • Design
  • Evaluation
  • Ongoing work
slide-7
SLIDE 7

CAIP Autonomic Computing Workshop, June 2003 19

Experimental evaluation

  • 1000 to 5400 nodes
  • Up to 106 keys (unique keyword combinations)
  • Metrics:

– Number of routing nodes – Number of processing nodes – Number of data nodes – Number of messages

  • Query types:

– Q1: (computer, *), (comp*, *, *) – Q2: (comp*, net*), (computer, network, *) – Q3: range queries

CAIP Autonomic Computing Workshop, June 2003 20

2D keyword space – Q1 and Q2 queries

  • System size increases from 1000 to 5400 nodes, keys from 2*105

to 106

CAIP Autonomic Computing Workshop, June 2003 21

3D keyword space – Q1 and Q2 queries

slide-8
SLIDE 8

CAIP Autonomic Computing Workshop, June 2003 22

3D keyword space – range queries

CAIP Autonomic Computing Workshop, June 2003 23

Load balancing

The distribution of the keys in the index space. The index space was partitioned into 5000

  • intervals. The Y-axis represents

the number of keys per interval.

300 600 900 1200 1500 1800 2100 1 501 1001 1501 2001 2501 3001 3501 4001 4501 The index space (intervals) Number of keys

The distribution of the keys when using

  • nly the load balancing at node join

technique.

300 600 900 1200 1500 1800 2100 1 501 1001 1501 2001 2501 3001 3501 4001 4501 Nodes in the system Number of keys

The distribution of the keys when using both the load balancing at node join technique, and the local load balancing.

300 600 900 1200 1500 1800 2100 1 501 1001 1501 2001 2501 3001 3501 4001 4501 Nodes in the system Number of keys

CAIP Autonomic Computing Workshop, June 2003 24

Outline

  • Introduction
  • Related Work
  • Design
  • Evaluation
  • Ongoing work
slide-9
SLIDE 9

CAIP Autonomic Computing Workshop, June 2003 25

Ongoing work

  • Tests with a 5-dimensional keyword space
  • Develop new methods to further prune the clusters

that do not exist in the network

  • Implement the actual system, on top of Chord lookup

system

CAIP Autonomic Computing Workshop, June 2003 26

Future work

  • Ranking
  • New overlay topology
  • Replication and caching

CAIP Autonomic Computing Workshop, June 2003 27

Reference

  • [1] T. Bially. A class of dimension changing mapping and its

application to bandwidth compression. Ph. D Thesis, Polytechnic Institute of Brooklyn, June 1967.

  • [2] I. Stoica, R. Morris, D. Karger, F. Kaashoek and H.
  • Balakrishnan. Chord: A Scalable Peer-To-Peer Lookup Service

for Internet Applications. In Proceedings of ACM SIGCOMM, 2001.

  • [3] C. Schmidt and M. Parashar. Flexible Information Discovery in

Decentralized Distributed Systems, In Proceedings of IEEE High Performance Distributed Computing, June 2003.

  • [4] M. Agarwal at all. AutoMate: Enabling Autonomic Applications
  • n the Grid, in Proceedings of the Autonomic Computing

Workshop, June 2003.