Distributed File Systems: An Overview of Peer-to-Peer Architectures - PDF document

Distributed File Systems: An Overview of Peer-to-Peer Architectures Distributed File Systems • Data is distributed among many sources – Ex. Distributed database systems – Frequently utilize a centralized lookup server for addressing • Completely distributed approach – No centralized services or information 1

Peer-to-peer Systems • What is a P2P System? – Network of nodes with equivalent capabilities and responsibilities • Common Examples – Gnutella – Freenet Peer-to-Peer Systems • P2P is a convenient paradigm which can be used to serve various applications (including the popular use of file sharing). • Problem with P2P systems: locating the particular node which stores the requested data 2

Node Location: Napster • Napster uses a central index to search for nodes with desired file. • If central server goes down, service is lost for all users. • NOT true P2P. Node Location: Gnutella • Search Algorithm: Random • Gnutella broadcasts file requests to nodes • Solution scales extremely poorly, therefore requests cannot be broadcast to all nodes. • With Gnutella, search may fail even though the file exists in the system 3

Node Location: Freenet • Search Algorithm: Random • In the Freenet system, no particular node is responsible for a file. Instead searches look for cached copies of the file. • Problems with Freenet: once again existing files are not guaranteed to be retrieved; no bound on cost. Problems • Random Search Algorithms: – Work poorly for uncommon data – No theoretical bound on the overhead required • How to introduce determinism without centralized mechanisms? – Ans: Hash functions 4

Hash Functions • A hash function is (usually) a one way function which will produce the same “key” given the same input • Hash functions we consider will have follow a uniform distribution in keyspace Hash Based P2P Systems • Chord (Pastry is very similar) – Utilizes a ring based overlay • CAN – Utilizes a hyper-space overlay model 5

What Chord Can Contribute • Chord is truly distributed: No node is more important than another. • If requested object exists in the system, it will definitely be found. • Chord gives performance bounds. Consistent Hashing • Using a consistent hashing function (SHA-1 function), Chord maps a given key to a particular node. • Requests for object with a particular key are easily forwarded to the correct node. • Consistent hashing helps to provide natural load balancing. 6

Mapping of Objects 0 6 2 5 3 • Node identifiers are established in a circle. Keys are assigned to the node with the next highest value. Routing • If every node has knowledge of its successor node, requests can be propagated along the ring. • Problem: it may require traversing all N nodes in order to find the object. • Solution: use finger tables to optimize routing. 7

Finger Tables • Given m bits in the key/node identifiers, every node keeps a routing table with m entries (entries hold node identifier, IP address, and port numbers). • The i th entry in the table at node n contains the identity of the node that succeeds n by at least 2 i -1 . • If s is the i th finger of the node, then s =successor(n+ 2 i -1 ), and is denoted by n . finger [ i ]. node . Finger Tables (cont.) • The first entry of the finger table is the successor of n . • Subsequent entries are spaced out more and more. • If a node doesn’t know the successor of a key k , it passes the request on to a node whose ID is closer. Thus the request is passed at least half the distance to the responsible node. 8

Finding Successors • In order to find the successor of a key, the predecessor of the key is found, and the successor of that node is taken from its finger table. n.find_successor(key) n’=find_predecessor(key); return n’.successor; Finding Predecessors (1) • In order to find the predecessor of a key, request is passed along to next closest node until key falls within appropriate interval. n.find_predecessor(key) n’=n; while(id (n’,n’.successor]) � n’=n’.closest_preceding_finger(key); return n’; 9

Finding Predecessors (2) n.closest_preceding_finger(key) for i=m downto 1 if (finger[i].node (n, key)) � return finger[i].node; return n; Performance • Since the distance to the successor of a key is halved with each step, the bound for number of steps required for lookups is O(log N). 10

Node Joins: Invariants • Every node maintains the correct successor. • For every key k, node successor(k) is responsible for k. Node Joins: Process • Initialize the predecessor and fingers of new node n (to simplify joins and leaves, every node is also responsible for keeping a pointer to their predecessor). • Update the fingers and predecessors of existing nodes. • Notify the higher layer software so that state of keys is transferred appropriately (note that n is the only node to which any transfers occur). 11

Concurrent Operations • Aggressively maintaining finger tables of all nodes difficult to maintain with concurrent joins. • When a single node joins, very few finger table entries need to be modified. • To adjust for concurrent joins, use a stabilization protocol instead of the aggressive correctness protocol. Stabilization Pseudocode • Node n knows of some other node n’ when joining. n.join(n’) predecessor = nil; successor = n’.find_successor(n); 12

Pseudocode (cont.) • Successor’s are verified periodically. n.stabilize() x=successor.predecessor; if(x (n, successor)) � successor = x; successor.notify(n); n.notify(n’) if(predecessor is nil or n’ (predecessor,n)) � predecessor = n’; Pseudocode (cont.) • Finger tables are refreshed periodically. n.fix_fingers() i=random index > 1 into finger[]; finger[i].node = find_successor(finger[i].start); 13

Node Failures • In order for failure recovery, queries must still succeed until system stabilizes. • In order for successful queries, nodes must have correct knowledge of their successors. • Every node keeps a list of its r nearest successors. Quick Note on Pastry • The Chord and Pastry designs are similar for the most part. The notable difference between the two is that Pastry provides locality while Chord does not. 14

CAN • Use an d -dimensional hyperspace to distribute files • A node is assigned to a point in the hyperspace using d hash functions • The hyperspace is divided into partitions according to node placement (should be uniformly distributed due to hash functions) CAN (0, 4) (4, 4) 7 (4, 0) (0, 0) 15

Distributed File Systems: An Overview of Peer-to-Peer Architectures - PDF document

Distributed File Systems: An Overview of Peer-to-Peer Architectures Distributed File Systems Data is distributed among many sources Ex. Distributed database systems Frequently utilize a centralized lookup server for addressing

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

File Management What is a file? Elements of file management File organization

Comparing Hybrid Peer-to-Peer Hybrid peer-to-peer systems Systems Beverly Yang and Hector

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

Serverless networking (peer-to-peer computing) Peer-to-peer models Client-server computing

CPSC 410/611: File Management What is a file? Elements of file management File

Week 10: File Management What is a file? Elements of file management File

THE PEER-TO-PEER NETWORK JOHN NEWBERY @jfnewbery github.com/jnewbery THE PEER-TO-PEER NETWORK

Peer-to-Peer Networks 09 Random Graphs for Peer-to-Peer-Networks Christian Ortolf Technical

P2P: Distributed Hash Tables Chord + Routing Geometries Nirvan Tyagi CS 6410 Fall16

Peer-to-Peer Computing Peer-to-Peer (P2P) employ distributed resources to perform function in a

File Systems: Semantics & Structure What is a File a file is a named collection of

File Systems: Semantics & Structure What is a File a file is a named collection of

Dependability within Dependability within Peer- -to to- -Peer Systems Peer Systems Peer

Hadoop Distributed File System (HDFS) 10/05/2018 1 HDFS Overview A distributed file system

Peer-to-Peer Networks Distribution Decentralized control Self-organization Outline

inDecentralizedOnlineSocialNetworks OnlineSocialNetworks(OSNs)

11/10/08 Today P561: Network Systems Finding content and services Week 7: Finding content

Comparison and Evaluation of Application Level Multicast for Mobile Networks Ingo Juchem Email:

Security of P2P Systems Faraz Makari March 6, 2008 Seminar on Advanced Topics in Distributed

Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and

Adaptive Routing of QoS-Constrained Media Streams over Scalable Overlay Topologies Gerald Fry

Structured P2P Networks Niels Olof Bouvin 1 Distributed Hash Tables DHTs are designed to be

Distributed File Systems: An Overview of Peer-to-Peer Architectures - PDF document

Distributed File Systems: An Overview of Peer-to-Peer Architectures Distributed File Systems Data is distributed among many sources Ex. Distributed database systems Frequently utilize a centralized lookup server for addressing

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

File Management What is a file? Elements of file management File organization

Comparing Hybrid Peer-to-Peer Hybrid peer-to-peer systems Systems Beverly Yang and Hector

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

Serverless networking (peer-to-peer computing) Peer-to-peer models Client-server computing

CPSC 410/611: File Management What is a file? Elements of file management File

Week 10: File Management What is a file? Elements of file management File

THE PEER-TO-PEER NETWORK JOHN NEWBERY @jfnewbery github.com/jnewbery THE PEER-TO-PEER NETWORK

Peer-to-Peer Networks 09 Random Graphs for Peer-to-Peer-Networks Christian Ortolf Technical

P2P: Distributed Hash Tables Chord + Routing Geometries Nirvan Tyagi CS 6410 Fall16

Peer-to-Peer Computing Peer-to-Peer (P2P) employ distributed resources to perform function in a

File Systems: Semantics &amp; Structure What is a File a file is a named collection of

File Systems: Semantics &amp; Structure What is a File a file is a named collection of

Dependability within Dependability within Peer- -to to- -Peer Systems Peer Systems Peer

Hadoop Distributed File System (HDFS) 10/05/2018 1 HDFS Overview A distributed file system

Peer-to-Peer Networks Distribution Decentralized control Self-organization Outline

inDecentralizedOnlineSocialNetworks OnlineSocialNetworks(OSNs)

11/10/08 Today P561: Network Systems Finding content and services Week 7: Finding content

Comparison and Evaluation of Application Level Multicast for Mobile Networks Ingo Juchem Email:

Security of P2P Systems Faraz Makari March 6, 2008 Seminar on Advanced Topics in Distributed

Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and

Adaptive Routing of QoS-Constrained Media Streams over Scalable Overlay Topologies Gerald Fry

Structured P2P Networks Niels Olof Bouvin 1 Distributed Hash Tables DHTs are designed to be

File Systems: Semantics & Structure What is a File a file is a named collection of

File Systems: Semantics & Structure What is a File a file is a named collection of