Algorithms and Methods for Distributed Storage Networks 10 - PowerPoint PPT Presentation

Algorithms and Methods for Distributed Storage Networks 10 Distributed Heterogeneous Hash Tables Christian Schindelhauer Albert-Ludwigs-Universität Freiburg Institut für Informatik Rechnernetze und Telematik Wintersemester 2007/08

Literature ‣ André Brinkmann, Kay Salzwedel, Christian Scheideler, Compact, Adaptive Placement Schemes for Non-Uniform Capacities, 14th ACM Symposium on Parallelism in Algorithms and Architectures 2002 (SPAA 2002) ‣ Christian Schindelhauer, Gunnar Schomaker, Weighted Distributed Hash Tables, 17th ACM Symposium on Parallelism in Algorithms and Architectures 2005 (SPAA 2005) ‣ Christian Schindelhauer, Gunnar Schomaker, SAN Optimal Multi Parameter Access Scheme, ICN 2006, International Conference on Networking, Mauritius, April 23-26, 2006 Rechnernetze und Telematik Distributed Storage Networks 2 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

The Uniform Problem ‣ Given • a dynamic set of n nodes V = {v 1 , ... , v n } Data Items X • data elements X = {x 1 , ..., x m } ‣ Find • a mapping f V : X → V ‣ With the following properties • The mapping is simple mapping f - fV(x) be computed using V and x - without the knowledge of X\{x} • Fairness: - |f V-1 (v)| ≈ |f V-1 (v)| • Monotony: Let V ⊂ W - For all v ∈ V: f V-1 (v) ⊇ f W-1 (v) Nodes: V ‣ where f V-1 (v) := {x ∈ X : f V (x) = v } Rechnernetze und Telematik Distributed Storage Networks 3 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

Distributed Hash Tables THE Solution for the Uniform case Data Items X ‣ “Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web”, • David Karger, Eric Lehman, Tom Leighton, Mathhew Levine, Daniel Lewin, Rina Panigrahy, STOC 1997 Hash Function • Present a simple solution ‣ Distributed Hash Table • Chooose a space M = [0,1[ • Map nodes v to M via hash function - h : V → M Assignment Assignment A s • Map documents and servers to an interval s i g n - h : X → M m Hash Function e • Assign a document to the server which n t minimizes the distance in the interval • f V (x) = argmin{v ∈ V: (h(x)-h(v))mod 1} - where x mod 1 := x - ⎣ x ⎦ Nodes: V Rechnernetze und Telematik Distributed Storage Networks 4 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

The Performance of Distributed Hash Tables ‣ Theorem Data elements are mapped to node i with probability p i = 1/|V|, if the • hash functions behave like perfect random experiments ‣ Balls into bins problem Expected ratio max(p i )/min(p i ) = Ω (log n) • ‣ Solutions: • Use O(log n) copies of a node – Principle of multiple choices - check at some O(log n) positions and choose the largest empty interval for placing a node, – Cookoo-Hashing - every node chooses among two possible position Rechnernetze und Telematik Distributed Storage Networks 5 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

The Heterogeneous Case Data Items X ‣ Given • a dynamic set of n nodes V = {v 1 , ... , v n } • dynamic weights w : V → R+ • dynamic set of data elements X = {x 1 ,...,x m } ‣ Find a mapping f w,V : X → V ‣ With the following properties mapping f • The mapping is simple - f w,V (x) be computed using V, x, w without the knowledge of X\{x} • Fairness: for all u,v ∈ V: - | f w,V-1 (u)|/w(u) ≈ | f w,V-1 (v)|/w(v) • Consistency: - Let V ⊂ W: For all v ∈ V: Nodes: V ✴ f w,V-1 (v) ⊇ f w,W-1 (v) Weights: w - Let for all v ∈ V\{u}: w(v) = w’(v) and w’(u)>w(u): ✴ for all v ∈ V\{u}: f w,V-1 (v) ⊇ f w’,V-1 (v) and f w,V-1 (u) ⊆ f w’,V-1 (u) ‣ where f w,V-1 (v) := { x ∈ X : f w,V (x) = v } Rechnernetze und Telematik Distributed Storage Networks 6 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

Some Application Areas ‣ Proxy Caching • Relieving hot spots in the Internet ‣ Mobile Ad Hoc Networks • Relating ID and routing information ‣ Peer-to-Peer Networks • Finding the index data efficiently ‣ Storage Area Networks • Distributing the data on a set of servers Rechnernetze und Telematik Distributed Storage Networks 7 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

Application Peer-to-Peer Networks ‣ Peer-to-Peer Network: • decentralized overlay network delivering services over the Internet • no client-server structure - example: Gnutella ‣ Problem: Lookup in first generation networks very slow ‣ Solution: • Use an efficient data structure for the links and • map the keys to a hash space ‣ Examples: – CAN - maps keys to a d-dimensional array - builds a toroidal connection network, where each peer is assigned to rectangular areas ✴ – Chord - maps keys and peers to a ring via DHT - establishes binary search like pointers on the ring Rechnernetze und Telematik Distributed Storage Networks 8 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

Application Storage Area Networks (SAN) ‣ Distribute data over a set of hard disks (like RAID) • Nodes = hard disks • Data items = blocks ‣ Problem • Place copies of blocks for redundancy • If a hard disk fails other hard disk carry the information • Add or remove hard disks without unnecessary data movement • Hard disks may have different sizes Rechnernetze und Telematik Distributed Storage Networks 9 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

SAN Architecture ‣ Avoid server based architectures • Assignment of data is not flexible enough • High local storage concentration (for LAN traffic reduction) • Low availability of free capacity ‣ Basic SAN concept • Combine all available disks into a single virtual one • Server independent existence of storage Rechnernetze und Telematik Distributed Storage Networks Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

Challenges in SAN ‣ Heterogeneity • hard disks typically differ in capacity and speed ‣ Popularity • some data is popular and other not (e.g. movies, music :-) • their popularity rank varies over time ‣ Consistency • system changes by adding or re-placing/moving • preserving a fair share rate • only necessary data replacements must be done ‣ Availability • hard disks may fail, but data should not! ‣ Performance Rechnernetze und Telematik Distributed Storage Networks 11 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

Traditional Virtualization in SAN waterproof definitions Rechnernetze und Telematik Distributed Storage Networks 12 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

Deterministic Uniform SAN Strategies ‣ DRAID • distributed Cluster Network for uniform storage nodes • uses RAID: striping/mirroring und Reed-Solomon encoding • organized in matrix rows => scalability only in groups of columns size ‣ Good old stuff • RAID 0, I, IV, V, VI (striping, mirroring, XOR, distributed XOR, XOR + Reed- Solomon) ‣ Problems: • scalability and availability is hard to combine • Re-Striping (time is money), huge offset tables (lookup is expansive), • storage concatenation without load balancing (disks are remaining full) • Only storage nodes with uniform capacities are allowed Rechnernetze und Telematik Distributed Storage Networks 13 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

The Heterogeneous Case  Given – a dynamic set of n nodes V = {v 1 , ... , v n } – dynamic weights w : V → R + – dynamic set of data elements X = {x 1 ,...,x m }  Find a mapping f w,V : X → V s 1  With the following properties D – The mapping is simple s 2 • f w,V (x) be computed using V, x, w S • without the knowledge of X\{x} f w,s : D → S – Fairness : for all u,v ∈ V: • | f w,V -1 (u)|/w(u) ≈ | f w,V -1 (v)|/w(v) – Consistency : s n-1 • minimal replacements to preserve the data distribution s n  where f w,V-1 (v) := { x ∈ X : f w,V (x) = v } Rechnernetze und Telematik Distributed Storage Networks 14 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

The Naive Approach to DHT Small Huge Share Normal ~ 0.1 ~ 1 ~ 1000 15 Rechnernetze und Telematik Distributed Storage Networks Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

SIEVE: Interval based consistent hashing ‣ Interval based approach • Brinkmann, Salzwedel, and Scheideler, SPAA 2000 Small Normal Huge Share ‣ Map nodes to random intervals (via ~ 0.1 ~ 1000 ~ 1 hash function) • interval length proportional to weight ‣ Map data items to random positions (via hash function) ‣ Two problems • What to do if intervals overlap? • What to do if the unions of intervals do not overlap the hash space M? empty overlap Rechnernetze und Telematik Distributed Storage Networks 16 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

Algorithms and Methods for Distributed Storage Networks 10 - PowerPoint PPT Presentation

Algorithms and Methods for Distributed Storage Networks 10 Distributed Heterogeneous Hash Tables Christian Schindelhauer Albert-Ludwigs-Universitt Freiburg Institut fr Informatik Rechnernetze und Telematik Wintersemester 2007/08

Distributed Storage and Consistency Distributed Storage and Consistency Storage moves into the

Distributed Algorithms Distributed Algorithms Distributed Mutual Exclusion Olivier Dalle (*)

Algorithms and Methods for Distributed Storage Networks 8 Storage Virtualization and DHT

Algorithms and Methods for Distributed Storage Networks 7 File Systems Christian Schindelhauer

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Algorithms and Methods for Distributed Storage Networks 2. Hard Disks Christian Schindelhauer

Distributed Algorithms for Message-Passing Systems Contents Part I Distributed Graph

Mat 2170 Methods GPoint Julia Sets Algorithms & Methods Lab 8 Spring 2014 Student

Distributed Storage Systems part 1 Marko Vukoli Distributed Systems and Cloud Computing This

Mat 2170 Methods Week 7 Scope return Examples Methods Algorithms Predicate Methods

Distributed Storage Systems part 2 Marko Vukoli Distributed Systems and Cloud Computing

Algorithms and Methods for Distributed Storage Networks 4: Volume Manager and RAID Christian

Algorithms and Methods for Distributed Storage Networks 1. Motivation, Organization, Overview

Networks and Distributed Systems Olaf Landsiedel Networks and Distributed Systems What is

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

York University www.cs.york.ac.uk/~ndm First order vs Higher order Higher order:

Logic as a Tool Chapter 3: Understanding First-order Logic 3.4 Truth, validity, logical

Challenges for Effective Procurement Control in New Reactor Construction June 3, 2009 Naoki

CONTAINER ORCHESTRATION WITH SWARM MODE, MESOS/MARATHON AND KUBERNETES ADRIAN MOUAT WHO AM I?

INTERCONNECTION NETWORKS Mahdi Nazm Bojnordi Assistant Professor School of Computing University

Towards a Middleware for Configuring Large-scale Storage Infrastructures David M. Eyers Ramani

Interim Storage Facility for Removed Soil and Interim Storage Waste Facility Outline of the

Chapter 14: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management

Algorithms and Methods for Distributed Storage Networks 10 - PowerPoint PPT Presentation

Algorithms and Methods for Distributed Storage Networks 10 Distributed Heterogeneous Hash Tables Christian Schindelhauer Albert-Ludwigs-Universitt Freiburg Institut fr Informatik Rechnernetze und Telematik Wintersemester 2007/08

Distributed Storage and Consistency Distributed Storage and Consistency Storage moves into the

Distributed Algorithms Distributed Algorithms Distributed Mutual Exclusion Olivier Dalle (*)

Algorithms and Methods for Distributed Storage Networks 8 Storage Virtualization and DHT

Algorithms and Methods for Distributed Storage Networks 7 File Systems Christian Schindelhauer

&gt; SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Algorithms and Methods for Distributed Storage Networks 2. Hard Disks Christian Schindelhauer

Distributed Algorithms for Message-Passing Systems Contents Part I Distributed Graph

Mat 2170 Methods GPoint Julia Sets Algorithms &amp; Methods Lab 8 Spring 2014 Student

Distributed Storage Systems part 1 Marko Vukoli Distributed Systems and Cloud Computing This

Mat 2170 Methods Week 7 Scope return Examples Methods Algorithms Predicate Methods

Distributed Storage Systems part 2 Marko Vukoli Distributed Systems and Cloud Computing

Algorithms and Methods for Distributed Storage Networks 4: Volume Manager and RAID Christian

Algorithms and Methods for Distributed Storage Networks 1. Motivation, Organization, Overview

Networks and Distributed Systems Olaf Landsiedel Networks and Distributed Systems What is

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

York University www.cs.york.ac.uk/~ndm First order vs Higher order Higher order:

Logic as a Tool Chapter 3: Understanding First-order Logic 3.4 Truth, validity, logical

Challenges for Effective Procurement Control in New Reactor Construction June 3, 2009 Naoki

CONTAINER ORCHESTRATION WITH SWARM MODE, MESOS/MARATHON AND KUBERNETES ADRIAN MOUAT WHO AM I?

INTERCONNECTION NETWORKS Mahdi Nazm Bojnordi Assistant Professor School of Computing University

Towards a Middleware for Configuring Large-scale Storage Infrastructures David M. Eyers Ramani

Interim Storage Facility for Removed Soil and Interim Storage Waste Facility Outline of the

Chapter 14: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Mat 2170 Methods GPoint Julia Sets Algorithms & Methods Lab 8 Spring 2014 Student