algorithms and methods for distributed storage networks
play

Algorithms and Methods for Distributed Storage Networks 10 - PowerPoint PPT Presentation

Algorithms and Methods for Distributed Storage Networks 10 Distributed Heterogeneous Hash Tables Christian Schindelhauer Albert-Ludwigs-Universitt Freiburg Institut fr Informatik Rechnernetze und Telematik Wintersemester 2007/08


  1. Algorithms and Methods for Distributed Storage Networks 10 Distributed Heterogeneous Hash Tables Christian Schindelhauer Albert-Ludwigs-Universität Freiburg Institut für Informatik Rechnernetze und Telematik Wintersemester 2007/08

  2. Literature ‣ André Brinkmann, Kay Salzwedel, Christian Scheideler, Compact, Adaptive Placement Schemes for Non-Uniform Capacities, 14th ACM Symposium on Parallelism in Algorithms and Architectures 2002 (SPAA 2002) ‣ Christian Schindelhauer, Gunnar Schomaker, Weighted Distributed Hash Tables, 17th ACM Symposium on Parallelism in Algorithms and Architectures 2005 (SPAA 2005) ‣ Christian Schindelhauer, Gunnar Schomaker, SAN Optimal Multi Parameter Access Scheme, ICN 2006, International Conference on Networking, Mauritius, April 23-26, 2006 Rechnernetze und Telematik Distributed Storage Networks 2 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  3. The Uniform Problem ‣ Given • a dynamic set of n nodes V = {v 1 , ... , v n } Data Items X • data elements X = {x 1 , ..., x m } ‣ Find • a mapping f V : X → V ‣ With the following properties • The mapping is simple mapping f - fV(x) be computed using V and x - without the knowledge of X\{x} • Fairness: - |f V-1 (v)| ≈ |f V-1 (v)| • Monotony: Let V ⊂ W - For all v ∈ V: f V-1 (v) ⊇ f W-1 (v) Nodes: V ‣ where f V-1 (v) := {x ∈ X : f V (x) = v } Rechnernetze und Telematik Distributed Storage Networks 3 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  4. Distributed Hash Tables THE Solution for the Uniform case Data Items X ‣ “Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web”, • David Karger, Eric Lehman, Tom Leighton, Mathhew Levine, Daniel Lewin, Rina Panigrahy, STOC 1997 Hash Function • Present a simple solution ‣ Distributed Hash Table • Chooose a space M = [0,1[ • Map nodes v to M via hash function - h : V → M Assignment Assignment A s • Map documents and servers to an interval s i g n - h : X → M m Hash Function e • Assign a document to the server which n t minimizes the distance in the interval • f V (x) = argmin{v ∈ V: (h(x)-h(v))mod 1} - where x mod 1 := x - ⎣ x ⎦ Nodes: V Rechnernetze und Telematik Distributed Storage Networks 4 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  5. The Performance of Distributed Hash Tables ‣ Theorem Data elements are mapped to node i with probability p i = 1/|V|, if the • hash functions behave like perfect random experiments ‣ Balls into bins problem Expected ratio max(p i )/min(p i ) = Ω (log n) • ‣ Solutions: • Use O(log n) copies of a node – Principle of multiple choices - check at some O(log n) positions and choose the largest empty interval for placing a node, – Cookoo-Hashing - every node chooses among two possible position Rechnernetze und Telematik Distributed Storage Networks 5 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  6. The Heterogeneous Case Data Items X ‣ Given • a dynamic set of n nodes V = {v 1 , ... , v n } • dynamic weights w : V → R+ • dynamic set of data elements X = {x 1 ,...,x m } ‣ Find a mapping f w,V : X → V ‣ With the following properties mapping f • The mapping is simple - f w,V (x) be computed using V, x, w without the knowledge of X\{x} • Fairness: for all u,v ∈ V: - | f w,V-1 (u)|/w(u) ≈ | f w,V-1 (v)|/w(v) • Consistency: - Let V ⊂ W: For all v ∈ V: Nodes: V ✴ f w,V-1 (v) ⊇ f w,W-1 (v) Weights: w - Let for all v ∈ V\{u}: w(v) = w’(v) and w’(u)>w(u): ✴ for all v ∈ V\{u}: f w,V-1 (v) ⊇ f w’,V-1 (v) and f w,V-1 (u) ⊆ f w’,V-1 (u) ‣ where f w,V-1 (v) := { x ∈ X : f w,V (x) = v } Rechnernetze und Telematik Distributed Storage Networks 6 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  7. Some Application Areas ‣ Proxy Caching • Relieving hot spots in the Internet ‣ Mobile Ad Hoc Networks • Relating ID and routing information ‣ Peer-to-Peer Networks • Finding the index data efficiently ‣ Storage Area Networks • Distributing the data on a set of servers Rechnernetze und Telematik Distributed Storage Networks 7 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  8. Application Peer-to-Peer Networks ‣ Peer-to-Peer Network: • decentralized overlay network delivering services over the Internet • no client-server structure - example: Gnutella ‣ Problem: Lookup in first generation networks very slow ‣ Solution: • Use an efficient data structure for the links and • map the keys to a hash space ‣ Examples: – CAN - maps keys to a d-dimensional array - builds a toroidal connection network, where each peer is assigned to rectangular areas ✴ – Chord - maps keys and peers to a ring via DHT - establishes binary search like pointers on the ring Rechnernetze und Telematik Distributed Storage Networks 8 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  9. Application Storage Area Networks (SAN) ‣ Distribute data over a set of hard disks (like RAID) • Nodes = hard disks • Data items = blocks ‣ Problem • Place copies of blocks for redundancy • If a hard disk fails other hard disk carry the information • Add or remove hard disks without unnecessary data movement • Hard disks may have different sizes Rechnernetze und Telematik Distributed Storage Networks 9 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  10. SAN Architecture ‣ Avoid server based architectures • Assignment of data is not flexible enough • High local storage concentration (for LAN traffic reduction) • Low availability of free capacity ‣ Basic SAN concept • Combine all available disks into a single virtual one • Server independent existence of storage Rechnernetze und Telematik Distributed Storage Networks Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  11. Challenges in SAN ‣ Heterogeneity • hard disks typically differ in capacity and speed ‣ Popularity • some data is popular and other not (e.g. movies, music :-) • their popularity rank varies over time ‣ Consistency • system changes by adding or re-placing/moving • preserving a fair share rate • only necessary data replacements must be done ‣ Availability • hard disks may fail, but data should not! ‣ Performance Rechnernetze und Telematik Distributed Storage Networks 11 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  12. Traditional Virtualization in SAN waterproof definitions Rechnernetze und Telematik Distributed Storage Networks 12 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  13. Deterministic Uniform SAN Strategies ‣ DRAID • distributed Cluster Network for uniform storage nodes • uses RAID: striping/mirroring und Reed-Solomon encoding • organized in matrix rows => scalability only in groups of columns size ‣ Good old stuff • RAID 0, I, IV, V, VI (striping, mirroring, XOR, distributed XOR, XOR + Reed- Solomon) ‣ Problems: • scalability and availability is hard to combine • Re-Striping (time is money), huge offset tables (lookup is expansive), • storage concatenation without load balancing (disks are remaining full) • Only storage nodes with uniform capacities are allowed Rechnernetze und Telematik Distributed Storage Networks 13 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  14. The Heterogeneous Case  Given – a dynamic set of n nodes V = {v 1 , ... , v n } – dynamic weights w : V → R + – dynamic set of data elements X = {x 1 ,...,x m }  Find a mapping f w,V : X → V s 1  With the following properties D – The mapping is simple s 2 • f w,V (x) be computed using V, x, w S • without the knowledge of X\{x} f w,s : D → S – Fairness : for all u,v ∈ V: • | f w,V -1 (u)|/w(u) ≈ | f w,V -1 (v)|/w(v) – Consistency : s n-1 • minimal replacements to preserve the data distribution s n  where f w,V-1 (v) := { x ∈ X : f w,V (x) = v } Rechnernetze und Telematik Distributed Storage Networks 14 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  15. The Naive Approach to DHT Small Huge Share Normal ~ 0.1 ~ 1 ~ 1000 15 Rechnernetze und Telematik Distributed Storage Networks Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  16. SIEVE: Interval based consistent hashing ‣ Interval based approach • Brinkmann, Salzwedel, and Scheideler, SPAA 2000 Small Normal Huge Share ‣ Map nodes to random intervals (via ~ 0.1 ~ 1000 ~ 1 hash function) • interval length proportional to weight ‣ Map data items to random positions (via hash function) ‣ Two problems • What to do if intervals overlap? • What to do if the unions of intervals do not overlap the hash space M? empty overlap Rechnernetze und Telematik Distributed Storage Networks 16 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend