petal and frangipani petal and frangipani petal
play

Petal and Frangipani Petal and Frangipani Petal/Frangipani - PowerPoint PPT Presentation

Petal and Frangipani Petal and Frangipani Petal/Frangipani Petal/Frangipani NFS NFS NAS NAS Frangipani Frangipani SAN SAN Pet al Pet al Petal/Frangipani Petal/Frangipani Unt rust ed NFS NFS OS-agnost ic FS


  1. Petal and Frangipani Petal and Frangipani

  2. Petal/Frangipani Petal/Frangipani NFS NFS “NAS” “NAS” Frangipani Frangipani “SAN” “SAN” Pet al Pet al

  3. Petal/Frangipani Petal/Frangipani Unt rust ed NFS NFS OS-agnost ic FS semant ics Sharing/ coordinat ion Fr angipani Fr angipani Disk aggregat ion (“bricks”) Filesyst em-agnost ic Recovery and reconf igurat ion Load balancing Chained declust ering Pet al Pet al Snapshot s Does not cont rol sharing Each “cloud” may resize or reconf igure independent ly. What indirect ion is required t o make t his happen, and where is it ?

  4. Remaining Slides Remaining Slides The f ollowing slides have been borrowed f rom t he Pet al and Frangipani present at ions, which were available on t he Web unt il Compaq SRC dissolved. This mat erial is owned by Ed Lee, Chandu Thekkat h, and t he ot her aut hors of t he work. The Frangipani mat erial is st ill available t hrough Chandu Thekkat h’s sit e at www.t hekkat h.org. For CPS 212, several issues are import ant : • Underst and t he role of each layer in t he previous slides, and t he st rengt hs and limit at ions of each layer as a basis f or innovat ing behind it s int erf ace (NAS/ SAN). • Underst and t he concept s of virt ual disks and a clust er f ile syst em embodied in Pet al and Fr angipani. • Underst and t he similar it ies/ dif f erences bet ween Pet al and t he ot her reconf igur able clust er service work we have st udied: DDS and Porcupine. • Underst and how t he f eat ures of Pet al simplif y t he design of a scalable clust er f ile syst em (Frangipani) above it . • Underst and t he nat ure, purpose, and role of t he t hree key design element s added f or Frangipani: leased locks, a writ e-ownership consist ent caching prot ocol, and server logging f or recovery.

  5. Petal: Distributed Virtual Disks Petal: Distributed Virtual Disks Systems Research Center Digital Equipment Corporation Edward K. Lee Chandramohan A. Thekkath 5 10/24/2002

  6. Logical System View Logical System View AdvFS NT FS PC FS UFS Scalable Network Petal /dev/vdisk1 /dev/vdisk5 /dev/vdisk3 /dev/vdisk2 /dev/vdisk4 6

  7. Physical System View Physical System View Parallel Database or Cluster File System Scalable Network /dev/shared1 Petal Server Petal Server Petal Server Petal Server 7

  8. Virtual Disks Virtual Disks Each disk provides 2^64 byte address space. Created and destroyed on demand. Allocates disk storage on demand. Snapshots via copy-on-write. Online incremental reconfiguration. 8

  9. Virtual to Physical Translation Virtual to Physical Translation (vdiskID, offset) (server, disk, diskOffset) Server 0 Server 1 Server 2 Server 3 Virtual Disk Directory vdiskID offset GMap PMap0 PMap1 PMap2 PMap3 (disk, diskOffset) 9

  10. Global State Management Global State Management Based on Leslie Lamport’s Paxos algorithm. Global state is replicated across all servers. Consistent in the face of server & network failures. A majority is needed to update global state. Any server can be added/removed in the presence of failed servers. 10

  11. Fault- -Tolerant Global Operations Tolerant Global Operations Fault Create/Delete virtual disks. Snapshot virtual disks. Add/Remove servers. Reconfigure virtual disks. 11

  12. Data Placement & Redundancy Data Placement & Redundancy Supports non-redundant and chained-declustered virtual disks. Parity can be supported if desired. Chained-declustering tolerates any single component failure. Tolerates many common multiple failures. Throughput scales linearly with additional servers. Throughput degrades gracefully with failures. 12

  13. Chained Declustering Declustering Chained Server0 Server1 Server2 Server3 D0 D1 D2 D3 D3 D0 D1 D2 D4 D5 D6 D7 D7 D4 D5 D6 13

  14. Chained Declustering Declustering Chained Server0 Server1 Server2 Server3 D0 D1 D2 D3 D3 D0 D1 D2 D4 D5 D6 D7 D7 D4 D5 D6 14

  15. The Prototype The Prototype Digital ATM network. • 155 Mbit/s per link. 8 AlphaStation Model 600. • 333 MHz Alpha running Digital Unix. 72 RZ29 disks. • 4.3 GB, 3.5 inch, fast SCSI (10MB/s). • 9 ms avg. seek, 6 MB/s sustained transfer rate. Unix kernel device driver. User-level Petal servers. 15

  16. The Prototype The Prototype ……… src-ss1 src-ss2 src-ss8 /dev/vdisk1 /dev/vdisk1 /dev/vdisk1 Digital ATM Network (AN2) /dev/vdisk1 ……… petal1 petal2 petal8 16

  17. Throughput Scaling Throughput Scaling 8 LINEAR Throuput Scale-up 6 512B Rd 8KB Rd 4 64KB Rd 512B Wr 8KB Wr 2 64KB Wr 0 0 2 4 6 8 Number of Servers 17

  18. Virtual Disk Reconfiguration Virtual Disk Reconfiguration 8 servers 30 6 servers 25 Throughput in MB/s 20 15 10 5 0 0 1 2 3 4 5 6 Elapsed Time in Minutes virtual disk w/ 1GB of allocated storage 8KB reads & writes 18

  19. Frangipani: A Scalable Distributed File Frangipani: A Scalable Distributed File System System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation

  20. Why Not An Old File System on Petal? Why Not An Old File System on Petal? Traditional file systems (e.g., UFS, AdvFS) cannot share a block device The machine that runs the file system can become a bottleneck

  21. Frangipani Frangipani Behaves like a local file system • multiple machines cooperatively manage a Petal disk • users on any machine see a consistent view of data Exhibits good performance, scaling, and load balancing Easy to administer

  22. Ease of Administration Ease of Administration Frangipani machines are modular • can be added and deleted transparently Common free space pool • users don’t have to be moved Automatically recovers from crashes Consistent backup without halting the system

  23. Components of Frangipani Components of Frangipani File system core • implements the Digital Unix vnode interface • uses the Digital Unix Unified Buffer Cache • exploits Petal’s large virtual space Locks with leases Write-ahead redo log

  24. Locks Locks Multiple reader/single writer Locks are moderately coarse-grained • protects entire file or directory Dirty data is written to disk before lock is given to another machine Each machine aggressively caches locks • uses lease timeouts for lock recovery

  25. Logging Logging Frangipani uses a write ahead redo log for metadata • log records are kept on Petal Data is written to Petal • on sync, fsync, or every 30 seconds • on lock revocation or when the log wraps Each machine has a separate log • reduces contention • independent recovery

  26. Recovery Recovery Recovery is initiated by the lock service Recovery can be carried out on any machine • log is distributed and available via Petal

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend