algorithms and methods for distributed storage networks
play

Algorithms and Methods for Distributed Storage Networks 8 Storage - PowerPoint PPT Presentation

Algorithms and Methods for Distributed Storage Networks 8 Storage Virtualization and DHT Christian Schindelhauer Albert-Ludwigs-Universitt Freiburg Institut fr Informatik Rechnernetze und Telematik Wintersemester 2007/08 Overview


  1. Algorithms and Methods for Distributed Storage Networks 8 Storage Virtualization and DHT Christian Schindelhauer Albert-Ludwigs-Universität Freiburg Institut für Informatik Rechnernetze und Telematik Wintersemester 2007/08

  2. Overview ‣ Concept of Virtualization ‣ Storage Area Networks • Principles • Optimization ‣ Distributed File Systems • Without virtualization, e.g. Network File Systems • With virtualization, e.g. Google File System ‣ Distributed Wide Area Storage Networks • Distributed Hash Tables • Peer-to-Peer Storage Rechnernetze und Telematik Algorithms Theory 2 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  3. Concept of Virtualization File ‣ Principle • A virtual storage constitutes handles all application accesses to the file system • The virtual disk partitions files and stores blocks over several (physical) Virtual Disk hard disks • Control mechanisms allow redundancy and failure repair ‣ Control • Virtualization server assigns data, e.g. blocks of files to hard disks (address space remapping) • Controls replication and redundancy strategy • Adds and removes storage devices Hard Disks Rechnernetze und Telematik Algorithms Theory 3 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  4. Storage Virtualization ‣ Capabilities ‣ Classic Implementation • Replication • Host-based • Pooling - Logical Volume Management • Disk Management - File Systems, e.g. NFS ‣ Advantages • Storage devices based • Data migration - RAID • Higher availability • Network based • Simple maintenance - Storage Area Network • Scalability ‣ New approaches • Distributed Wide Area Storage ‣ Disadvantages Networks • Un-installing is time consuming • Distributed Hash Tables • Compatibility and interoperability • Peer-to-Peer Storage • Complexity of the system Rechnernetze und Telematik Algorithms Theory 4 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  5. Storage Area Networks ‣ Virtual Block Devices • without file system • connects hard disks ‣ Advantages • simpler storage administration • more flexible • servers can boot from the SAN • effective disaster recovery • allows storage replication ‣ Compatibility problems • between hard disks and virtualization server Rechnernetze und Telematik Algorithms Theory 5 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  6. SAN Networking ‣ Networking • FCP (Fibre Channel Protocol) - SCSI over Fibre Channel • iSCSI (SCSI over TCP/IP) • HyperSCSI (SCSI over Ethernet) • ATA over Ethernet • Fibre Channel over Ethernet • iSCSI over InfiniBand • FCP over IP http://en.wikipedia.org/wiki/Storage_area_network Rechnernetze und Telematik Algorithms Theory 6 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  7. SAN File Systems ‣ File system for concurrent read and write operations by multiple computers • without conventional file locking • concurrent direct access to blocks by servers ‣ Examples • Veritas Cluster File System • Xsan • Global File System • Oracle Cluster File System • VMware VMFS • IBM General Parallel File System Rechnernetze und Telematik Algorithms Theory 7 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  8. Distributed File Systems (without Virtualization) ‣ aka. Network File System ‣ Supports sharing of files, tapes, printers etc. ‣ Allows multiple client processes on multiple hosts to read and write the same files • concurrency control or locking mechanisms necessary ‣ Examples • Network File System (NFS) • Server Message Block (SMB), Samba • Apple Filing Protocol (AFP) • Amazon Simple Storage Service (S3) Rechnernetze und Telematik Algorithms Theory 8 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  9. Distributed File Systems with Virtualization ‣ Example: Google File System Application GFS master /foo/bar (file name, chunk index) chunk 2ef0 GFS client File namespace ‣ File system on top of other file (chunk handle, chunk locations) Legend: systems with builtin virtualization Data messages Control messages Instructions to chunkserver • System built from cheap standard Chunkserver state (chunk handle, byte range) GFS chunkserver GFS chunkserver components (with high failure rates) chunk data Linux file system Linux file system • Few large files • Only operations: read, create, append, 4 step 1 Client Master delete 2 3 - concurrent appends and reads Secondary Replica A must be handled 6 • High bandwidth important 7 Primary 5 Replica ‣ Replication strategy Legend: Control • chunk replication 6 Secondary Data Replica B • master replication Figure 2: Write Control and Data Flow The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Rechnernetze und Telematik Algorithms Theory 9 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  10. Distributed Wide Area Storage Networks ‣ Distributed Hash Tables • Relieving hot spots in the Internet • Caching strategies for web servers ‣ Peer-to-Peer Networks • Distributed file lookup and download in Overlay networks • Most (or the best) of them use: DHT Rechnernetze und Telematik Algorithms Theory 10 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  11. WWW Load Balancing www.apple.de www.uni-freiburg.de ‣ Web surfing: www.google.com • Web servers offer web pages • Web clients request web pages ‣ Most of the time these requests are independent ‣ Requests use resources of the web servers • bandwidth • computation time Arne Christian Stefan Rechnernetze und Telematik Algorithms Theory 11 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  12. Load www.google.com ‣ Some web servers have always high load • for permanent high loads servers must be sufficiently powerful ‣ Some suffer under high fluctuations • e.g. special events: - jpl.nasa.gov (Mars mission) Monday Tuesday Wednesday - cnn.com (terrorist attack) • Server extension for worst case not reasonable • Serving the requests is desired Rechnernetze und Telematik Algorithms Theory 12 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  13. Load Balancing in the WWW Monday Tuesday Wednesday ‣ Fluctuations target B A B B A A some servers ‣ (Commercial) solution • Service providers offer exchange servers an • Many requests will be B A distributed among these servers ‣ But how? Rechnernetze und Telematik Algorithms Theory 13 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  14. Literature ‣ Leighton, Lewin, et al. STOC 97 • Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web ‣ Used by Akamai (founded 1997) Web-Cache Rechnernetze und Telematik Algorithms Theory 14 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  15. Start Situation ‣ Without load balancing ‣ Advantage • simple Web-Server ‣ Disadvantage Web pages • servers must be designed for worst case situations request Web-Clients Rechnernetze und Telematik Algorithms Theory 15 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  16. Site Caching Web-Server ‣ The whole web-site is copied to different web caches redirect ‣ Browsers request at web server Web-Cache ‣ Web server redirects requests to Web- Cache ‣ Web-Cache delivers Web pages ‣ Advantage: • good load balancing ‣ Disadvantage: • bottleneck: redirect • large overhead for complete web-site replication Web-Clients Rechnernetze und Telematik Algorithms Theory 16 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  17. Proxy Caching Web-Server ‣ Each web page is distributed to a few web-caches redirect ‣ Only first request is sent to web server Link ‣ Links reference to pages in the web- cache ‣ Then, web clients surfs in the web- cache request Web- ‣ Advantage: Cache • No bottleneck 1. ‣ Disadvantages: 2. 4. 3. • Load balancing only implicit • High requirements for placements Web-Client Rechnernetze und Telematik Algorithms Theory 17 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  18. Requirements Balance Dynamics Efficient insert and delete of web- fair balancing of web pages cache-servers and files ? ? X X new Views Web-Clients „see“ different set of web-caches Rechnernetze und Telematik Algorithms Theory 18 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  19. Hash Functions Buckets Items Set of Items: Set of Buckets: Example: Rechnernetze und Telematik Algorithms Theory 19 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  20. Ranged Hash-Funktionen ‣ Given: • Items , Number • Caches (Buckets), Bucket set: • Views ‣ Ranged Hash-Funktion: • • Prerequisite: for alle views Buckets View Items Rechnernetze und Telematik Algorithms Theory 20 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend