file system
play

File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung - PowerPoint PPT Presentation

File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Farnaz Farahanipad 1001134035 Overview Introduction Design overview GFS structure System interaction Master operation Questions and answers Conclusion


  1. File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Farnaz Farahanipad 1001134035

  2. Overview • Introduction • Design overview • GFS structure • System interaction • Master operation • Questions and answers • Conclusion

  3. Introduction: Distributed File systems • What is distributed file system? • A DFS is any file system that allows access to file from multiple hosts sharing via a computer network.

  4. Introduction: • Why not use an existing file system? • Bottle neck problem • Balancing issue • Different workload and design properties • GFS is designed for google apps and workloads

  5. Google file system • GFS is a scalable distributed file system for large data intensive applications. • GFS, has master slave architecture. • Shares many of the same goals as previous distributed file systems such as performance, scalability, reliability, and availability to large networks. 5

  6. GFS design assumption • High component failure rates • “Modest” number of huge files – Just a few millions of big files • Files are write-once, mostly append to • Large streaming reads • High sustained throughput favored over low latency

  7. GFS design decisions • Files stored as chunks • At the fixed size of 64MB • Single Master • Simple centralized management • Reliability through replication • Each chunk is replicated across 3 or more chunk servers • No data caching • Little benefit due to large size of data sets • Familiar interface, but customize the API • Suitable to Google apps • Add snapshot, record append operation

  8. GFS architecture

  9. Client • Interacts with the master for metadata operations: • Client translates offset in file into chunk index within file • Send master request with file name/chunk index • Caches info using file name/chunk index as key • Interact with chunk servers for read/write operation

  10. Chunk servers • Chunk servers are the workers of GFS. • Responsible for storing 64-MB file chunks. • Each chunk replica is stored on a chunk server and is extended only as needed.

  11. Why large chunk size? • Size of meta data is reduced • Involvement of Master is reduced • Network overhead is reduced • Lazy space allocation avoids internal fragmentation

  12. Reliability issue • What if a chunk server goes down? • The GFS, copies every chunk multiple times and store it on different chunk servers.

  13. Single master weakness: Single point of failure • What if master goes down? • GFS Solution: • Shadow master

  14. Single master weakness: Scalability bottleneck • How to solve bottle neck problem? • GFS Solution: • Minimize master involvement • Never move data through master. Use only for meta data • Large chunk size less meta data • Data mutation is done by chunk servers

  15. Master • The master maintains all file system metadata. • Periodically communicates with chunk-servers • Gives instruction, collects state • Chunk creation, re-replication, rebalancing • Garbage collection • Simpler and more reliable • Lazily garbage collects hidden files

  16. Master-Metadata • Global metadata is stored on the master • File and chunk namespaces • Mapping from files to chunks • Location of each chunk’s replica • All in memory(64bytes/chunk) • Fast • Easy access

  17. Master-Operation log • The operation log contains a historical record of critical metadata change. • Defines the order of concurrent operations • Critical • Replicated to multiple remote machines • Respond to client only when it is log locally and remotely

  18. Master-Operation log • Master checkpoints its state whenever the log goes beyond a certain size. • Fast recovery by using checkpoints • Recovery needs only latest files so older files can be deleted freely.

  19. Why it is important to log on information of master? • Using a log allows us to update the master state simply, reliably, and without risking inconsistencies in the event of a master crash.

  20. Master-Keep chunk servers and master synchronized • By sending heart beat messages

  21. System interaction: lease and mutation order • A lease is a grant of ownership or control for a limited time. • The owner/holder can renew or extend the lease. • If the owner fails, the lease expires and is free again.

  22. System interaction: lease and mutation order • A mutation is an operation that changes the contents or metadata of a chunk such as write or an append operation. • Each mutation is performed to all the chunk’s replicas.

  23. System interaction: lease and mutation order • Leases are used to maintain consistent mutation order across replicas.

  24. System interaction: data flow • To avoid network bottlenecks and high-latency links • Each machine forwards data to the closest machine • Latency is being minimized by pipelining the data transfer over TCP connections. Decoupled

  25. System interaction: data flow Time for transferring B bytes to R replicas between two machines without network congestion: 𝑢 = 𝐶 𝑈 + 𝑆𝑀 B=1MB L= 1ms t ~ 80 ms T=100Mbps

  26. Master operation • Namespace management and locking • Replica placement • Creation, Re-replication, Rebalancing

  27. Master operation: Namespace management and locking • We allow multiple operation to be active in master by using locking to ensure proper serialization. • Recall that GFS does not have per-directory data structure. • It only store file and chunks mapping • So, GFS logically represent its namespace as a look up table mapping full pathnames to metadata. • Using read/write lock on each node in the namespace tree to ensure serialization. • Each master operation acquires a set of locks before it runs

  28. Master operation: Namespace management and locking If it involves: /d1/d2/…/ dn/leaf Read locks on the /d1 directory name /d1/d2 … /d1/d2/…/ dn Either a read lock /d1/d2/…/ dn/leaf or a write lock on the full pathname

  29. Master operation: Namespace management and locking • How this locking mechanism can prevent a file /home/user/foo from being created while /home/user is being snap shotted to /save/user Read locks Write locks Snapshot /home /home/user operation /save /save/user Creation /home /home/user/foo operation /home/user

  30. Master operation: Namespace management and locking

  31. Master operation: Namespace management and locking Create new file under a directory: e.g.,create/dir/file3, /dir/file4/, ...... Allow concurrent mutations in the same directory • Key: using read lock for dir. • By locking pathname, it can lock the new file before it is created Prevent from creating files with the same name simultaneously

  32. Master operation: Replica placement • Serves two purposes: • Maximize data reliability and availability • Maximize network bandwidth utilization • Spread chunk replicas across racks: • To ensure chunk survivability • To exploit aggregate read bandwidth of multiple rack • Write traffic has to flow through multiple racks

  33. Master operation: Creation, Re-replication, Rebalancing • Chunk Replicas are created for three reasons: • Chunk Creation • Chunk Replication • Rebalancing

  34. Master operation: Creation, Re-replication, Rebalancing • New chunks are created on chunk servers • Master has to decide which chunk servers could be used for chunk creation. • Put new replicas on chunk servers with below-average disk space utilization. • It wants to reduce the number of creation on each chunk server.(cheap but heavy write traffic) • Spread replicas of a chunk on to different racks.

  35. Master operation: Creation, Re-replication, Rebalancing • Master re-replicates a chunk as soon as the number of available replicas fall below verified goal. • Each chunk that need to be re-replicated is prioritized based on several factors. • Master picks highest priority chunk and “clones” it by instructing some chunk servers by the chunk data directly from an exiting replica • Additionally, each chunk server limits the amount of bandwidth it spends on each replication by controlling its reads requests to the source chunk servers.

  36. Master operation: Creation, Re-replication, Rebalancing • Master rebalances replicas periodically • Examines the current replica distribution and move replica for best space and load balancing • Master gradually fills up a new chunk server rather than instantly swamps it with new chunks and heavy traffic comes with them. • Master must choose which existing replica to remove. • It prefers to remove those on chunk servers with below average free space so as to equalize disk space usage.

  37. Master operation: Garbage collection • Replica that is not known to master is garbage • Master logs the deletion immediately like other changes, but it does not reclaim the resources. • The file renamed to hidden name which includes the deletion timestamp. • During master’s regular scan, it removes the hidden files with in more tan 3 days. • After removing the hidden file from namespace, its in memory metadata is erased.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend