Peer-to-peer systems and Data location overlay networks Churn - - PowerPoint PPT Presentation

peer to peer systems and
SMART_READER_LITE
LIVE PREVIEW

Peer-to-peer systems and Data location overlay networks Churn - - PowerPoint PPT Presentation

Outline Complex Adaptive Systems C.d.L. Informatica Universit di Bologna Introduction to P2P systems Common topologies Peer-to-peer systems and Data location overlay networks Churn Newscast algorithm Fabio


slide-1
SLIDE 1

1

Complex Adaptive Systems

C.d.L. Informatica – Università di Bologna

Peer-to-peer systems and

  • verlay networks

Fabio Picconi Dipartimento di Scienze dell’Informazione 2

  • Introduction to P2P systems
  • Common topologies
  • Data location
  • Churn
  • Newscast algorithm
  • Security

Outline

3

Peer-to-peer vs. client-server

client-server

  • Server well connected to the

“center” of the Internet

  • Servers carries out critical tasks
  • Clients only talk to server
  • Only nodes located on the

“periphery” of the Internet

  • Tasks distributed across all nodes
  • Clients talk to other clients

peer-to-peer

4

Example – Video sharing

Client-server: YouTube

client-server

Advantages

  • Client can disconnect after upload
  • Uploader needs little bandwidth
  • Other users can find the file easily

(just use search on server webpage)

Disadvantages

  • Server may not accept file or

remove it later (according to content policy)

  • Whole system depends on the server

(what if shut down like Napster?)

  • Server storage and bandwidth

are expensive!

uploader downloader downloader downloader downloader downloader

slide-2
SLIDE 2

5

Example – Video sharing

Peer-to-peer: BitTorrent

peer-to-peer

Advantages

  • Does not depend on a central server
  • Bandwidth shared across nodes

(downloaders also act as uploaders)

  • High scalability, low cost

Disadvantages

  • Seeder must remain on-line to

guarantee file availability

  • Content is more difficult to find

(downloaders must find .torrent file)

  • Freeloaders cheat in order to

download without uploading

seeder downloader downloader downloader downloader downloader

6

Comparison: P2P vs. client-server

Client-server

  • Asymmetric: client and servers

carry out different tasks

  • Global knowledge: servers have

a global view of the network

  • Centralization: communications

and management are centralized

  • Single point of failure: a server

failure brings down the system

  • Limited scalability: servers

easily overloaded

  • Expensive: server storage and

bandwidth capacity is not cheap

Peer-to-peer

  • Symmetric: each node carries out

the same tasks

  • Local knowledge: nodes only

know a small set of other nodes

  • Decentralization: nodes must self-
  • rganize in a decentralized way
  • Robustness: several nodes may fail

with little or no impact

  • High scalability: high aggregate

capacity, load distribution

  • Low-cost: storage and bandwidth

are contributed by users

7

Characterizing peer-to-peer systems

The main characteristics of P2P systems are:

  • decentralization (i.e., no central server)
  • self-organization (e.g., adding new nodes and removing disconnected ones)
  • symmetric communications (e.g., peers act as clients and servers)
  • scalability (thanks to high aggregate capacity and load distribution)
  • shared ownership (i.e., storage and bandwidth are contributed by peers)
  • overlay construction and routing (i.e., nodes form a logical network on

top of the underlying IP network)

a message from one peer to another is sent through the underlying IP network

8

P2P environment

P2P systems are deployed in a challenging environment:

  • High latency and low bandwidth between nodes
  • a high hop count will result in a high end-to-end latency
  • transferring large files may take a long time
  • Churn
  • nodes may disconnect temporarily
  • new nodes are constantly joining the system, while others leave the
  • verlay permanently
  • Security
  • P2P clients run on machines under full control of their users
  • data sent to other nodes may be erased, corrupted, disclosed, etc.
  • malicious users may try to bring down the system (e.g., routing attack)
  • Selfishness
  • users may run hacked P2P clients in order to avoid contributing resources
slide-3
SLIDE 3

9

Problems

Some of the problems that a P2P systems designer must face:

  • Overlay construction and maintenance
  • maintain a given overlay topology (e.g., random, two-level, ring, etc.)
  • Data location
  • locate a given data object among a large number of nodes
  • Data dissemination
  • propagate data in an efficient and robust manner
  • Per-node state
  • keep the amount of state per node small
  • Tolerance to churn
  • maintain system invariants (e.g., topology, data location, data availability)

despite node arrivals and departures

10

Topology

Some common topologies:

  • Flat unstructured: a node can connect to any other node
  • only constraint: maximum degree dmax
  • fast join procedure
  • usually very tolerant to churn
  • good for data dissemination, bad for location
  • Two-level unstructured: nodes connect to a supernode
  • supernodes form a small overlay
  • used for indexing and forwarding
  • large state and high load on supernodes
  • Flat structured: constraints based on node ids
  • allows for efficient data location
  • constraints require long join and leave procedures
  • less robust in high-churn environments

11

Data location - Flooding

Problem: find the set of nodes S that store a copy of object O Solutions:

(1) Flooding: send a search message to all nodes [first Gnutella protocol]

  • A search message contains either keywords or an object id

Advantages:

  • simplicity
  • no topology constraints

Disadvantages:

  • high network overhead (huge traffic generated by each search request)
  • flooding stopped by TTL (which produces search horizon)
  • only applicable to small number of nodes

12

Data location - Flooding

(1) Flooding (cont.) Flooding in a flat unstructured network:

search horizon for TTL = 2

  • bj

Objects that lie outside of the horizon are not found

search

slide-4
SLIDE 4

13

Data location - Superpeers

(2) Two-level overlay: use superpeers to index the locations of an object [eMule, Gnutella 2, BitTorrent]

  • Each node connects to a superpeer and advertises the list of objects it stores
  • Search requests are sent to the superpeer, which forwards them to other

superpeers Advantages:

  • highly scalable

Disadvantages:

  • superpeers must be realiable, powerful and well connected to the

Internet (expensive)

  • superpeers must maintain large state
  • the system relies on a small number of superpeers

14

Data location - Superpeers

(2) Two-level overlay (cont.)

  • A two-level overlay is a partially centralized system
  • In some systems superpeers do not connect to each other (e.g., BitTorrent)

response request

  • bj

15

Data location - KBR

(3) Structured networks: use a routing algorithm that implements Key-Based Routing [Overnet, Kad, BitTorrent trackerless] Key-Based Routing (also known as Distributed Hash Tables, or DHTs) works as follows:

  • each node is given a unique node identifier, or nodeid
  • given a key k, the node whose nodeid is numerically closest to k

among all nodes in the network is known as the root of key k

  • given a routing key k, a KBR algorithm can route a message to the

root of k in a small number of hops, usually O(log N)

  • the location of an object with id objectid is tracked by the root of

k = objectid

  • thus, one can find the location of an object by routing a message to the

root of k = objectid and querying the root for the location of the object

16

Data location - KBR (cont.)

Key-Based Routing [Pastry]

Source node id: 04F2 k = object id: 8955 Hop # Hop id Shared prefix length 04F2 1 85E0 1 2 8909 2 3 8957 3 4 8954 3 (root of k)

04F2 3A79 5230 8909 8954 8957 AC78 C52A E25A route(k=8955,msg)

  • verlay address space

[0000,FFFF]

620F 85E0

  • bj8955

stored on nodes 620F,C52A 8821

  • bj

8955

  • bj

8955

Object 8955 is tracked by node 8954, which knows of two copies stored at nodes 620F and C52A

slide-5
SLIDE 5

17

Data location - KBR (cont.)

Routing table for node 4F28 [Pastry]

Routing table

02A3 409A 4F04 19BA 413C 4F1B 4F21 2F34 4288 N/A E129 4E01 F0A4 N/A 4FF5 … … … ...

Leaf Set

4F04 4F1B 4F21

  • 4F30

4F55 4FF5

Node id: 4F28

used to find next hop with longer shared prefix used to find the nodeid closest to a key that is close to the local nodeid

  • In this example the routing table size is 4 x 15 = 60 entries, for a

maximum network size of N = 65536 nodes.

  • The average route length in this case is 4 hops.

18

Data location - KBR (cont.)

(3) Structured networks (cont.) Advantages:

  • completely decentralized (no need for superpeers)
  • routing algorithm achieves low hop count for large network sizes

Disadvantages:

  • each object must be tracked by a different node
  • objects are tracked by unreliable nodes (i.e., which may disconnect)
  • keyword-based searches are more difficult to implement than

with superpeers (because objects are located by their objectid)

  • the overlay must be structured according to a given topology

in order to achieve a low hop count

  • routing tables must be updated every time a node joins or leaves the
  • verlay

19

Data location - Loosely structured overlays

(4) Loosely structured networks: use hints on the location of objects [Freenet]

  • Nodes locate objects by sending search requests containing the object id
  • Requests are propagated using a technique similar to flooding
  • Objects with similar identifiers are grouped on the same nodes

AE5J A B C D E F 5B20 request for AE5J response AF02

20

Data location - Loosely structured overlays

(4) Loosely structured networks (cont.)

  • A search response leaves routing hints on the path back to the source
  • Hints are used when propagating future requests for similar object ids

AE5J A B C D E F Hints AE5J: D 5B20: E 5B20 Hints AE5J: F Hints AE5J: B AF02 request for AF02

slide-6
SLIDE 6

21

Data location - Loosely structured overlays

(4) Loosely structured networks (cont.) Advantages:

  • no topology constraints, flat architecture
  • searches are more efficient than with plain flooding

Disadvantages:

  • does not support keyword-based searches
  • search requests have a TTL
  • contrary to structured overlays, loosely structured overlay do not

guarantee a low number of hops, nor that the object will be found

22

The location schemes described previously can be classified according to:

  • degree of structure
  • decentralization

Freenet

  • loosely

structured Kad Gnutella decentralized

  • eMule

Gnutella 2 BitTorrent partially centralized structured unstructured

Data location - Summary

23

Churn (node arrivals and departures) can have several effects on a P2P system:

  • data objects may be become unavailable if all replicas disconnect
  • routing tables may become inconsistent

(e.g., entries may point to nodes which have disconnected)

  • the overlay may become partitioned if several nodes suddenly disconnect:

Effects of churn

24

Node arrivals and departures must not disrupt the normal behavior of the system

  • system invariants must be maintained
  • connected overlay (i.e., no partitions), low average path length
  • data objects accessible from anywhere in the network
  • two types of churn tolerance:
  • dynamic repair: ability to react to changes in the overlay to maintain

system invariants (e.g., heal partitions)

  • static resilience: ability to continue operating correctly before adaptation
  • ccurs (e.g., route messages through alternate paths)

Churn tolerance

slide-7
SLIDE 7

25

An simple way to prevent partitions is to increase the node degree

Churn – Preventing partitions

Ring partitions can be avoided by keeping a list of successor nodes

26

Percentage of failed paths for various numbers of successor nodes

Churn – Static resilience (Ring topology)

1 successor 16 successors 48 successors

More successors reduce the chances of “opening” the ring

27

Percentage of failed paths for various overlay topologies

Churn – Static resilience (various topologies)

Some topologies provide more alternate paths between nodes than others

28

  • Dynamically update routing information to adapt to overlay changes
  • Two types of repair algorithms:
  • reactive: start maintenance procedure immediately after detection
  • periodic: execute maintenance procedure periodically

A reactive algorithm can bring down the system instead of repairing it:

  • Each node disconnection triggers a maintenance procedure on a set of nodes
  • Over a given churn rate, the maintenance traffic congests the network
  • The network congestion leads to nodes being considered as disconnected
  • This triggers even more maintenance procedures (positive feedback),

eventually bringing down the network This effect may be avoided using a periodic maintenance algorithm

Churn – Dynamic repair (Ring topology)

slide-8
SLIDE 8

29

Churn is an important issue in P2P overlays:

  • Data may become unavailable, and routing information outdated
  • Static resilience
  • depends on the topology (i.e., the number of alternate paths)
  • increases with the average node degree
  • Dynamic repair protocols must be carefully designed
  • reactive protocols are usually faster
  • periodic protocols can handle higher churn rates

Churn – Summary

30

Newscast

  • Peer-to-peer protocol that creates and maintains an unstructured overlay
  • Highly resilient to churn
  • Can be used to propagate information
  • Extremely simple design based on information gossip:
  • Each node only knows about a small set of other knows (view)
  • Each node periodically picks a random node from this set
  • Both nodes exchange their views and update them
  • The random view exchange makes the algorithm very robust to failures and

changes in the overlay

31

Newscast - View exchange

  • Each node maintains a view v containing c entries, where

entry = {node address, timestamp}

  • Each node executes the following code every T seconds:
  • 1. select random entry r from local view
  • 2. send local view plus an entry with the local address to node r
  • 3. retrieve view of node r and merge it with local view
  • 4. keep the c entries with the most recent timestamps
  • The view of a node changes on each round

32

Newscast - View exchange (cont.)

Example view exchange initiated by node A

B,10 D,5 E,3 X,8 S,12 W,2 view of node A (c = 6) D A E X B W S

slide-9
SLIDE 9

33

Newscast - View exchange (cont.)

A,15 B,10 D,5 E,3 X,8 S,12 W,2 E,15 J,14 C,9 D,8 H,10 L,14 Z,2 view of node E A E

  • 1. Select random node

B,10 D,5 E,3 X,8 S,12 W,2 view of node A selected node

  • 2. Exchange views (plus local entry) with selected node

view of node A

34

Newscast - View exchange (cont.)

E,15 J,14 L,14 S,12 H,10 B,10 C,9 X,8 D,8 D,5 E,3 W,2 Z,2

  • 3. Merge views, and order

by timestamp

merge result

  • 4. Keep c most recent entries

E,15 J,14 L,14 S,12 H,10 B,10 new view

  • f node A

(c = 6)

35

Newscast - Average path length

  • Newscast overlays have a low average path length, i.e., O(log N)

(average number of hops between any two nodes)

36

Newscast - Static resilience

  • The overlay shows high static resilience
slide-10
SLIDE 10

37

Newscast - Summary

  • Simple peer-to-peer algorithm
  • Nodes use only local information
  • Periodic peer-wise data exchanges
  • Emergent properties:
  • low average path length
  • resistance to high churn

38

Security

Security in peer-to-peer systems is hard to enforce:

  • Users have full control on their computers
  • Modified clients may not follow the standard protocol
  • Communications may be eavesdropped
  • Data may be corrupted
  • Private data stored on remote computers may be disclosed

39

Security - Weak identities

  • The user may leave the system and rejoin it with a new identity (i.e, user id)
  • If an attack is detected, the attacker can reenter the system with a new id
  • An attacker may create a large number of false identities (Sybil attack)

S2 A S3 S4 S1 S6 S5

Example of Sybil Attack:

  • Nodes S1 to S6 are actually 6

instances of the P2P client running on the same machine

  • The attacker can intercept all

traffic coming from or going to node A

40

Security - Strong identities

  • The user cannot change its identity
  • Solution: use a centralized, trusted Certification Authority (CA)
  • Each new user must obtain an identity certificate:

certificate = { user id, IP address, user’s public key, signatureCA }

  • The certificate is digitally signed by the CA, whose public key is

known by all users

  • A certificate cannot be forged (would require the CA’s private key)
  • To prove his identity, a user signs a message with his private key,

and attaches the corresponding certificate signed by the CA

  • Strong identities prevent Sybil Attacks
  • If an attacker is caught, it cannot easily rejoin the system
slide-11
SLIDE 11

41

Security – Weak vs. strong identities

  • Strong identities requires a centralized CA
  • new nodes must contact the CA before joining the network:
  • the CA response may be slow (a few days)
  • if the CA is unavailable, new nodes cannot join the system
  • the security of the system depends on the CA:
  • the CA must correctly verify the identity of the requester
  • the CA’s private key must be kept secret
  • Many P2P systems use weak identities
  • IP address already gives some identity information
  • Some systems ensure anonymity (e.g., FreeNet)

42

The end

You can get these slides from http://www.cs.unibo.it/~picconi/slides For any questions, mail me to: picconi@cs.unibo.it