peer to peer systems and
play

Peer-to-peer systems and Data location overlay networks Churn - PowerPoint PPT Presentation

Outline Complex Adaptive Systems C.d.L. Informatica Universit di Bologna Introduction to P2P systems Common topologies Peer-to-peer systems and Data location overlay networks Churn Newscast algorithm Fabio


  1. Outline Complex Adaptive Systems C.d.L. Informatica – Università di Bologna • Introduction to P2P systems • Common topologies Peer-to-peer systems and • Data location overlay networks • Churn • Newscast algorithm Fabio Picconi • Security Dipartimento di Scienze dell’Informazione 1 2 Peer-to-peer vs. client-server Example – Video sharing Client-server: YouTube Advantages • Client can disconnect after upload • Uploader needs little bandwidth downloader downloader • Other users can find the file easily (just use search on server webpage) Disadvantages downloader uploader • Server may not accept file or remove it later (according to content client-server peer-to-peer policy) • Server well connected to the • Only nodes located on the downloader downloader • Whole system depends on the server “center” of the Internet “periphery” of the Internet client-server (what if shut down like Napster?) • Servers carries out critical tasks • Tasks distributed across all nodes • Server storage and bandwidth • Clients only talk to server • Clients talk to other clients are expensive! 3 4

  2. Example – Video sharing Comparison: P2P vs. client-server Peer-to-peer: BitTorrent Client-server Peer-to-peer Advantages • Asymmetric: client and servers • Symmetric: each node carries out • Does not depend on a central server carry out different tasks the same tasks • Bandwidth shared across nodes • Global knowledge: servers have • Local knowledge: nodes only downloader downloader (downloaders also act as uploaders) a global view of the network know a small set of other nodes • High scalability, low cost • Centralization: communications • Decentralization: nodes must self- Disadvantages and management are centralized organize in a decentralized way downloader seeder • Single point of failure: a server • Robustness: several nodes may fail • Seeder must remain on-line to failure brings down the system with little or no impact guarantee file availability • Limited scalability: servers • High scalability: high aggregate • Content is more difficult to find downloader downloader easily overloaded capacity, load distribution (downloaders must find .torrent file) peer-to-peer • Expensive: server storage and • Freeloaders cheat in order to • Low-cost: storage and bandwidth bandwidth capacity is not cheap download without uploading are contributed by users 5 6 Characterizing peer-to-peer systems P2P environment The main characteristics of P2P systems are: P2P systems are deployed in a challenging environment: • decentralization (i.e., no central server) • High latency and low bandwidth between nodes - a high hop count will result in a high end-to-end latency • self-organization (e.g., adding new nodes and removing disconnected ones) - transferring large files may take a long time • symmetric communications (e.g., peers act as clients and servers) • Churn • scalability (thanks to high aggregate capacity and load distribution) - nodes may disconnect temporarily - new nodes are constantly joining the system, while others leave the • shared ownership (i.e., storage and bandwidth are contributed by peers) overlay permanently • overlay construction and routing (i.e., nodes form a logical network on • Security top of the underlying IP network) - P2P clients run on machines under full control of their users - data sent to other nodes may be erased, corrupted, disclosed, etc. - malicious users may try to bring down the system (e.g., routing attack) • Selfishness a message from one peer to - users may run hacked P2P clients in order to avoid contributing resources another is sent through the underlying IP network 7 8

  3. Problems Topology Some of the problems that a P2P systems designer must face: Some common topologies: • Overlay construction and maintenance • Flat unstructured: a node can connect to any other node - maintain a given overlay topology (e.g., random, two-level, ring, etc.) - only constraint: maximum degree d max - fast join procedure • Data location - usually very tolerant to churn - locate a given data object among a large number of nodes - good for data dissemination, bad for location • Data dissemination • Two-level unstructured: nodes connect to a supernode - propagate data in an efficient and robust manner - supernodes form a small overlay • Per-node state - used for indexing and forwarding - keep the amount of state per node small - large state and high load on supernodes • Tolerance to churn • Flat structured: constraints based on node ids - maintain system invariants (e.g., topology, data location, data availability) - allows for efficient data location despite node arrivals and departures - constraints require long join and leave procedures - less robust in high-churn environments 9 10 Data location - Flooding Data location - Flooding Problem: find the set of nodes S that store a copy of object O (1) Flooding (cont.) Flooding in a flat unstructured network: Solutions: (1) Flooding : send a search message to all nodes [first Gnutella protocol] • A search message contains either keywords or an object id obj Advantages : - simplicity - no topology constraints search horizon for Disadvantages : TTL = 2 search - high network overhead (huge traffic generated by each search request) - flooding stopped by TTL (which produces search horizon) Objects that lie outside of the horizon are not found - only applicable to small number of nodes 11 12

  4. Data location - Superpeers Data location - Superpeers (2) Two-level overlay : use superpeers to index the locations of an object (2) Two-level overlay (cont.) [eMule, Gnutella 2, BitTorrent] • Each node connects to a superpeer and advertises the list of objects it stores request obj response • Search requests are sent to the superpeer, which forwards them to other superpeers Advantages : - highly scalable Disadvantages : - superpeers must be realiable, powerful and well connected to the Internet (expensive) • A two-level overlay is a partially centralized system - superpeers must maintain large state • In some systems superpeers do not connect to each other (e.g., BitTorrent) - the system relies on a small number of superpeers 13 14 Data location - KBR Data location - KBR (cont.) (3) Structured networks : use a routing algorithm that implements Key-Based Key-Based Routing [Pastry] route(k=8955,msg) Routing [Overnet, Kad, BitTorrent trackerless] Source node id: 04F2 04F2 Key-Based Routing (also known as Distributed Hash Tables, or DHTs ) E25A k = object id: 8955 obj works as follows: 8955 C52A 3A79 • each node is given a unique node identifier, or nodeid Hop # Hop id Shared prefix length 0 04F2 0 • given a key k , the node whose nodeid is numerically closest to k AC78 1 85E0 1 among all nodes in the network is known as the root of key k 2 8909 2 5230 3 8957 3 • given a routing key k , a KBR algorithm can route a message to the 8957 4 8954 3 (root of k) root of k in a small number of hops, usually O (log N ) 620F 8954 • the location of an object with id objectid is tracked by the root of obj obj8955 k = objectid 8909 8955 85E0 stored on Object 8955 is tracked by node 8954, 8821 nodes • thus, one can find the location of an object by routing a message to the which knows of two copies stored 620F,C52A overlay address space root of k = objectid and querying the root for the location of the object at nodes 620F and C52A [0000,FFFF] 15 16

  5. Data location - KBR (cont.) Data location - KBR (cont.) Routing table for node 4F28 [Pastry] (3) Structured networks (cont.) Advantages : Node id: 4F28 - completely decentralized (no need for superpeers) Routing table - routing algorithm achieves low hop count for large network sizes 0 2A3 1 9BA 2 F34 … E 129 F 0A4 used to find Disadvantages : next hop with 4 0 9A 4 1 3C 4 2 88 … 4 E 01 N/A longer shared - each object must be tracked by a different node prefix 4F 0 4 4F 1 B N/A … 4F F 5 - objects are tracked by unreliable nodes (i.e., which may disconnect) 4F2 1 ... used to find the - keyword-based searches are more difficult to implement than nodeid closest Leaf Set with superpeers (because objects are located by their objectid) to a key that is close to the - the overlay must be structured according to a given topology 4F04 4F1B 4F21 - 4F30 4F55 4FF5 local nodeid in order to achieve a low hop count • In this example the routing table size is 4 x 15 = 60 entries , for a - routing tables must be updated every time a node joins or leaves the maximum network size of N = 65536 nodes . overlay • The average route length in this case is 4 hops . 17 18 Data location - Loosely structured overlays Data location - Loosely structured overlays (4) Loosely structured networks: use hints on the location of objects [Freenet] (4) Loosely structured networks (cont.) • Nodes locate objects by sending search requests containing the object id • A search response leaves routing hints on the path back to the source • Requests are propagated using a technique similar to flooding • Hints are used when propagating future requests for similar object ids • Objects with similar identifiers are grouped on the same nodes Hints request Hints AE5J: B for AE5J AE5J: D C 5B20: E C A A request B for AF02 B AE5J 5B20 AE5J 5B20 response F D E AF02 F D E AF02 Hints AE5J: F 19 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend