CS 3700 Networks and Distributed Systems Overlay Networks (P2P DHT - PowerPoint PPT Presentation

Consistent Hashing Example 8 k2 “server A” A “server B” 1 0 “server C” k2 B “server D” A C (hash(str) % 256)/256 k3 � ring location C k1 k3 <“key1”, “value1”> B k1 <“key2”, “value2”> D <“key3”, “value3”> D

Consistent Hashing Example 8 k2 “server A” A “server B” 1 0 “server C” k2 B “server D” A C (hash(str) % 256)/256 “server E” k3 � ring location C k1 k3 <“key1”, “value1”> B k1 <“key2”, “value2”> D <“key3”, “value3”> D E

Consistent Hashing Example 8 k2 “server A” A “server B” 1 0 “server C” k2 B “server D” A C (hash(str) % 256)/256 “server E” k3 � ring location C k1 k3 <“key1”, “value1”> B k1 <“key2”, “value2”> E D <“key3”, “value3”> D E

Consistent Hashing Example 8 k2 “server A” A “server B” 1 0 “server C” k2 B “server D” A C (hash(str) % 256)/256 “server E” k3 � ring location C k1 k3 <“key1”, “value1”> B <“key2”, “value2”> E D <“key3”, “value3”> D k1 E

Practical Implementation 9 � In practice, no need to implement complicated number lines � Store a list of servers, sorted by their hash (floats from 0 � 1) � To put() or get() a pair, hash the key and search through the list for the first server where hash(server) >= hash(key) � O(log n) search time if we use a sorted data structure like a heap � O(log n) time to insert a new server into the list

Improvements to Consistent Hashing 10 1 0

Improvements to Consistent Hashing 10 1 0 B A

Improvements to Consistent Hashing 10 � Problem: hashing may not result in perfect 1 0 balance ( 1/n items per server) B � Solution: balance the load by hashing each A server multiple times consistent_hash(“serverA_1”) = … consistent_hash(“serverA_2”) = … consistent_hash(“serverA_3”) = …

Improvements to Consistent Hashing 10 � Problem: hashing may not result in perfect 1 0 balance ( 1/n items per server) B A � Solution: balance the load by hashing each A server multiple times B consistent_hash(“serverA_1”) = … B A consistent_hash(“serverA_2”) = … consistent_hash(“serverA_3”) = …

Improvements to Consistent Hashing 10 � Problem: hashing may not result in perfect 1 0 balance ( 1/n items per server) B A � Solution: balance the load by hashing each A server multiple times B consistent_hash(“serverA_1”) = … B A consistent_hash(“serverA_2”) = … consistent_hash(“serverA_3”) = … 1 0 � Problem: if a server fails, data may be lost � Solution: replicate keys/value pairs on multiple B servers A

Improvements to Consistent Hashing 10 � Problem: hashing may not result in perfect 1 0 balance ( 1/n items per server) B A � Solution: balance the load by hashing each A server multiple times B consistent_hash(“serverA_1”) = … B A consistent_hash(“serverA_2”) = … consistent_hash(“serverA_3”) = … 1 0 � Problem: if a server fails, data may be lost � Solution: replicate keys/value pairs on multiple B servers k1 A consistent_hash(“key1”) = 0.4

Improvements to Consistent Hashing 10 � Problem: hashing may not result in perfect 1 0 balance ( 1/n items per server) B A � Solution: balance the load by hashing each A server multiple times B consistent_hash(“serverA_1”) = … B A consistent_hash(“serverA_2”) = … consistent_hash(“serverA_3”) = … 1 0 � Problem: if a server fails, data may be lost � Solution: replicate keys/value pairs on multiple B servers k1 consistent_hash(“key1”) = 0.4

Consistent Hashing Summary 11 � Consistent hashing is a simple, powerful tool for building distributed systems � Provides consistent, deterministic mapping between names and servers � Often called locality sensitive hashing ■ Ideal algorithm for systems that need to scale up or down gracefully � Many, many systems use consistent hashing � CDNs � Databases: memcached, redis, Voldemort, Dynamo, Cassandra, etc. � Overlay networks (more on this coming up…)

Outline 12 ❑ Consistent Hashing ❑ Structured Overlays / DHTs

Layering, Revisited 13 � Layering hides low level details from higher layers � IP is a logical, point-to-point overlay Host 1 Host 2 Router Application Application Transport Transport Network Network Network Data Link Data Link Data Link Physical Physical Physical

Towards Network Overlays 14

Towards Network Overlays 14 � IP provides best-effort, point-to-point datagram service � Maybe you want additional features not supported by IP or even TCP � Multicast � Security � Reliable, performance-based routing � Content addressing, reliable data storage � Idea: overlay an additional routing layer on top of IP that adds additional features

Example: Virtual Private Network (VPN) 15 Private Public Private 34.67.0.1 34.67.0.3 74.11.0.1 74.11.0.2 Internet 34.67.0.4 34.67.0.2 � VPNs encapsulate IP packets over an IP network

Example: Virtual Private Network (VPN) 15 Private Public Private 34.67.0.1 34.67.0.3 74.11.0.1 74.11.0.2 Internet 34.67.0.4 34.67.0.2 Dest: 34.67.0.4 � VPNs encapsulate IP packets over an IP network

Example: Virtual Private Network (VPN) 15 Private Public Private 34.67.0.1 34.67.0.3 74.11.0.1 74.11.0.2 Internet 34.67.0.4 34.67.0.2 Dest: 74.11.0.2 Dest: 34.67.0.4 � VPNs encapsulate IP packets over an IP network

Example: Virtual Private Network (VPN) 15 Private Public Private 34.67.0.1 34.67.0.3 74.11.0.1 74.11.0.2 Internet 34.67.0.4 34.67.0.2 Dest: 34.67.0.4 � VPNs encapsulate IP packets over an IP network

Example: Virtual Private Network (VPN) 15 Private Public Private 34.67.0.1 34.67.0.3 • VPN is an IP over IP overlay 74.11.0.1 74.11.0.2 • Not all overlays need to be IP-based Internet 34.67.0.4 34.67.0.2 Dest: 34.67.0.4 � VPNs encapsulate IP packets over an IP network

Network Overlays 16 Host 1 Host 2 Router Application Application Transport Transport VPN Network VPN Network Network Network Network Data Link Data Link Data Link Physical Physical Physical

Network Overlays 16 Host 1 Host 2 Router Application Application P2P Overlay P2P Overlay Transport Transport VPN Network VPN Network Network Network Network Data Link Data Link Data Link Physical Physical Physical

Network Layer, version 2? 17 � Function: � Provide natural, resilient routes based on keys Application � Enable new classes of P2P applications � Key challenge: Network � Routing table overhead � Performance penalty vs. IP Transport Network Data Link Physical

Unstructured P2P Review 18

Unstructured P2P Review 18 Redundancy

Unstructured P2P Review 18 Redundancy Traffic Overhead

Unstructured P2P Review 18 Redundancy What if the file is rare or far away? Traffic Overhead

Unstructured P2P Review 18 Redundancy What if the file is rare or far away? • Search is broken Traffic Overhead • High overhead • No guarantee it will work

Why Do We Need Structure? 19 � Without structure, it is difficult to search � Any file can be on any machine � Centralization can solve this (i.e. Napster), but we know how that ends

Why Do We Need Structure? 19 � Without structure, it is difficult to search � Any file can be on any machine � Centralization can solve this (i.e. Napster), but we know how that ends � How do you build a P2P network with structure? Give every machine and object a unique name 1. Map from objects � machines 2. ■ Looking for object A ? Map( A ) � X , talk to machine X ■ Looking for object B? Map( B ) � Y , talk to machine Y

Why Do We Need Structure? 19 � Without structure, it is difficult to search � Any file can be on any machine � Centralization can solve this (i.e. Napster), but we know how that ends � How do you build a P2P network with structure? Give every machine and object a unique name 1. Map from objects � machines 2. ■ Looking for object A ? Map( A ) � X , talk to machine X ■ Looking for object B? Map( B ) � Y , talk to machine Y � Is this starting to sound familiar?

Naïve Overlay Network 20 � P2P file-sharing network � Peers choose random IDs 1 0 � Locate files by hashing their names

Naïve Overlay Network 20 � P2P file-sharing network � Peers choose random IDs 1 0 � Locate files by hashing their names GoT_s03e04.mkv

Naïve Overlay Network 20 � P2P file-sharing network � Peers choose random IDs 1 0 � Locate files by hashing their names GoT_s03e04.mkv hash(“GoT…”) = 0.314

Naïve Overlay Network 20 � P2P file-sharing network � Peers choose random IDs 1 0 � Locate files by hashing their names 0.322 GoT_s03e04.mkv hash(“GoT…”) = 0.314

Naïve Overlay Network 20 � P2P file-sharing network � Problems? � Peers choose random IDs 1 0 � Locate files by hashing their names 0.322 GoT_s03e04.mkv hash(“GoT…”) = 0.314

Naïve Overlay Network 20 � P2P file-sharing network � Problems? � Peers choose random IDs � How do you know 1 0 the IP addresses of � Locate files by hashing arbitrary peers? their names � There may be millions of peers � Peers come and go 0.322 at random (churn) GoT_s03e04.mkv hash(“GoT…”) = 0.314

Structured Overlay Fundamentals 21 � Every machine chooses a unique, random ID � Used for routing and object location, instead of IP addresses � Deterministic Key � Node mapping � Consistent hashing � Allows peer rendezvous using a common name

Structured Overlay Fundamentals 21 � Every machine chooses a unique, random ID � Used for routing and object location, instead of IP addresses � Deterministic Key � Node mapping � Consistent hashing � Allows peer rendezvous using a common name � Key-based routing � Scalable to any network of size N ■ Each node needs to know the IP of b*log b ( N ) other nodes ■ Much better scalability than OSPF/RIP/BGP � Routing from node A � B takes at most log b ( N ) hops

Structured Overlay Fundamentals 21 � Every machine chooses a unique, random ID � Used for routing and object location, instead of IP addresses � Deterministic Key � Node mapping Advantages � Consistent hashing • Completely decentralized � Allows peer rendezvous using a common name • Self organizing � Key-based routing • Infinitely scalable � Scalable to any network of size N ■ Each node needs to know the IP of b*log b ( N ) other nodes ■ Much better scalability than OSPF/RIP/BGP � Routing from node A � B takes at most log b ( N ) hops

Structured Overlays at 10,000ft. 22 � Node IDs and keys from a randomized namespace � Incrementally route towards to destination ID � Each node knows a small number of IDs + IPs

Structured Overlays at 10,000ft. 22 � Node IDs and keys from a randomized namespace � Incrementally route towards to destination ID � Each node knows a small number of IDs + IPs To: ABCD

Structured Overlays at 10,000ft. 22 � Node IDs and keys from a randomized namespace � Incrementally route towards to destination ID � Each node knows a small number of IDs + IPs Each node has a routing table To: ABCD

Structured Overlays at 10,000ft. 22 � Node IDs and keys from a randomized namespace � Incrementally route towards to destination ID � Each node knows a small number of IDs + IPs Each node has a routing table Forward to the To: ABCD longest prefix match A 930

Structured Overlays at 10,000ft. 22 � Node IDs and keys from a randomized namespace � Incrementally route towards to destination ID � Each node knows a small number of IDs + IPs Each node has a routing table Forward to the To: ABCD AB 5F longest prefix match A 930

CS 3700 Networks and Distributed Systems Overlay Networks (P2P DHT - PowerPoint PPT Presentation

CS 3700 Networks and Distributed Systems Overlay Networks (P2P DHT via KBR FTW) Revised 10/26/2016 Outline 2 Consistent Hashing Structured Overlays / DHTs Key/Value Storage Service 3 Imagine a simple service that stores

CS 3700 Networks and Distributed Systems Logistics (a.k.a. The boring slides) Revised

How we treat Cruise Ship` Diamond Princess` with 3700 passengers Ichiro Takeuchi Chief Professor

Community Arts Advisory Committee Meeting #1 July 15, 2019 | 6:30 8:30 p.m. 3700 South Four

Community Arts Advisory Committee Meeting #2 August 8, 2019 | 6:30 8:30 p.m. 3700 South

Laurie Adelhardt | ag@owlcreek.net | 410.705.3700 Susanne Zilberfarb | susanne@hammondmedia.com |

REAL EST REAL ESTATE TE Over Rs.100,000 CR. Rs. 3700 cr. Sales recorded Estimated revenue

Community Arts Advisory Committee Meeting #3 August 29, 2019 | 6:30 8:30 p.m. 3700 South

The Photometric Performance of NICMOS L. Colina 1 Space Telescope Science Institute, 3700 San

Real Assets Outlook Table of contents VERUSINVESTMENTS.COM SEATTLE 206-622-3700 LOS ANGELES

Making Maps and Mosaics Alex Storrs Space Telescope Science Institute, 3700 San Martin Dr.,

NICMOS Sensitivity to Cosmic Rays Daniela Calzetti Space Telescope Science Institute, 3700 San

The NICMOS CALNICA and CALNICB Pipelines Howard Bushouse Space Telescope Science Institute, 3700

The HST/FOS Wavelength Scale Roeland P. van der Marel 1 Space Telescope Science Institute, 3700

H1 / Q2-FY19 Earnings Presentation At a Glance 2 One of the largest content houses with 3700+

The STIS Calibration Pipeline S.J. Hulbert and P.E. Hodge Space Telescope Science Institute, 3700

ACS Calibration Software W. B. Sparks Space Telescope Science Institute, 3700 San Martin Drive,

Brief Summary on Topology and Performance of Distributed Hash Tables Zhirong Yang Helsinki

Structured P2P Networks Niels Olof Bouvin 1 Distributed Hash Tables DHTs are designed to be

Adaptive Routing of QoS-Constrained Media Streams over Scalable Overlay Topologies Gerald Fry

Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and

Matali Crasset Matali Crasset My Style I try not to work object by object; I prefer to make

Overlay-based IP Routing Richard Hartmann Chair for Network Architectures and Services

III.4 Statistical Language Models 1. Basics of Statistical Language Models 2. Query-Likelihood

Efficient Triangulation for P2P Networked Virtual Environments Eliya Buyukkaya, Maha Abdallah