g oogle f ile s ystem
play

[G OOGLE F ILE S YSTEM ] Shrideep Pallickara Computer Science - PDF document

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University CS 555: D ISTRIBUTED S YSTEMS [G OOGLE F ILE S YSTEM ] Shrideep Pallickara Computer Science Colorado State University CS555: Distributed Systems [Fall


  1. CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University CS 555: D ISTRIBUTED S YSTEMS [G OOGLE F ILE S YSTEM ] Shrideep Pallickara Computer Science Colorado State University CS555: Distributed Systems [Fall 2019] November 19, 2019 L25.1 Dept. Of Computer Science , Colorado State University Frequently asked questions from the previous class survey ¨ Which is better: GFS or Dynamo? L25. 2 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L28.1 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

  2. CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Topics covered in this lecture ¨ Google File System ¤ Metadata management ¤ Managing mutations L25. 3 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA A LL system metadata is managed by the Master and stored in Main Memory ① File and chunk namespaces ② Mapping from files to chunks Logs mutations into a permanent log ③ Location of chunks L25. 4 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L28.2 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

  3. CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Why have a single Master? ¨ Vastly simplifies design ¨ Easy to use global knowledge to reason about ¤ Chunk placements ¤ Replication decisions L25. 5 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Communications with the chunk servers ¨ Periodic communications using heartbeats ¤ Instructions to the chunk server ¤ Collect/retrieve state from the chunk server L25. 6 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L28.3 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

  4. CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Chunk size ¨ This is fixed at 64 MB ¤ Much larger than typical filesystem block sizes (512 bytes) ¨ Lazy space allocation ¤ Stored as plain Linux file ¤ Extended only as needed L25. 7 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA But why this big? ¨ Reduces client interaction with the master ¤ Can cache info for a multi-TB working set ¨ Reduce network overhead ¤ With a large chunk, client performs more operations ¤ Persistent connections ¨ Reduce size of metadata stored in the master ¤ 64 bytes of metadata per 64 MB chunk L25. 8 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L28.4 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

  5. CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Why keep the entire metadata in memory? ¨ Speed ¨ Master can scan its state in the background ¤ Implement chunk garbage collection ¤ Re-replicate if there are failures ¤ Chunk migration to balance load and space ¨ Add extra memory to increase file system size L25. 9 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Size of the file system with 1 TB of RAM: Assume file sizes are exact multiples of chunk sizes ¨ Number of entries = 2 40 /2 6 ¨ M AXIMUM S IZE of the file system = Number of entries x Chunk size = 2 40 x 2 6 x 2 20 2 6 = 2 60 = 1 EB L25. 10 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L28.5 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

  6. CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Tracking the chunk servers ¨ Master does not keep a persistent copy of the location of chunk servers ¨ List maintained via heart-beats ¤ Allows list to be in sync with reality despite failures ¤ Chunk server has final word on chunks it holds L25. 11 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Caching at the client/chunk servers ¨ Clients do not cache file data ¤ At client the working set may be too large ¤ Simplify client; eliminate cache-coherence problems ¨ Chunk servers do not cache file data either ¤ Chunks are stored as local files ¤ Linux’s buffer cache already keeps frequently accessed data in memory L25. 12 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L28.6 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

  7. CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Handling writes and appends to a file M ANAGING M UTATIONS CS555: Distributed Systems [Fall 2019] November 19, 2019 L25.13 Dept. Of Computer Science , Colorado State University Mutations ¨ Mutation changes the content or metadata of a chunk ¤ Write ¤ Append ¨ Each mutation is performed at all chunk replicas L25. 14 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L28.7 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

  8. CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University GFS uses leases to maintain consistent mutation order across replicas ¨ Master grants lease to one of the replicas ¤ P RIMARY ¨ Primary picks serial-order ¤ For all mutations to the chunk ¤ Other replicas follow this order n When applying mutations L25. 15 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Lease mechanism designed to minimize communications with the master ¨ Lease has initial timeout of 60 seconds ¨ As long as chunk is being mutated ¤ Primary can request and receive extensions ¨ Extension requests/grants piggybacked over heart-beat messages L25. 16 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L28.8 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

  9. CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Revocation and transfer of leases ¨ Master may revoke a lease before it expires ¨ If communications lost with primary ¤ Master can safely give lease to another replica n O NLY A FTER the lease period for old primary elapses L25. 17 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA How a write is actually performed Client MASTER Secondary Replica A Primary Replica Secondary Replica B L25. 18 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L28.9 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

  10. CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Client pushes data to all the replicas ( I ) ¨ Each chunk server stores data in an LRU buffer until ¤ Data is used ¤ Aged out L25. 19 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Client pushes data to all the replicas ( II ) ¨ When chunk servers acknowledge receipt of data ¤ Client sends a write request to primary ¨ Primary assigns consecutive serial numbers to mutations ¤ Forwards to replicas L25. 20 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L28.10 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

  11. CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Data flow is decoupled from the control flow to utilize network efficiently ¨ Utilize each machine’s network bandwidth ¨ Avoid network bottlenecks ¨ Avoid high-latency links ¨ Leverage network topology ¤ Estimate distances from IP addresses L25. 21 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA What if the secondary replicas could not finish the write operation? ¨ Client request is considered failed ¨ Modified region is inconsistent ¤ No attempt to delete this from the chunk ¤ Client must handle this inconsistency ¨ Client retries the failed mutation L25. 22 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L28.11 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend