The Google File System Presented by: Alexa Leal Architecture the - - PowerPoint PPT Presentation

the google file system
SMART_READER_LITE
LIVE PREVIEW

The Google File System Presented by: Alexa Leal Architecture the - - PowerPoint PPT Presentation

The Google File System Presented by: Alexa Leal Architecture the basic idea Question: 1. GFS does not cache file data. Why does this design choice not lead to performance loss? Single Master Clients never read or write to the master


slide-1
SLIDE 1

The Google File System

Presented by: Alexa Leal

slide-2
SLIDE 2

Architecture – the basic idea

Question:

  • 1. GFS does not cache file data. Why does this design choice not lead to

performance loss?

slide-3
SLIDE 3

Single Master

  • Clients never read or write to the master
  • metadata is kept in memory
  • It has an Operation Log
  • Communicates with chunkservers in HeartBeat messages

Question:

  • 1. What’s the benefit of having only a single master? What’s its

potential performance risk? How does GFS minimize such a risk?

  • 2. Why is GFS’s master able to keep the metadata in memory?
slide-4
SLIDE 4

Chunks & Chunkservers

  • Chunks are 64MB
  • Chunkservers communicate with client
  • Chunkservers keep track of their chunks

and present to them to master (HeartBeat)

  • Allocation of new chunks uses Lazy

space allocation method

Questions:

  • 1. How does GFS collaborate with chunkserver’s local file system to store file chunks? What’s lazy space

allocation and what’s its benefit?

  • 2. How does chunkserver communication help improve the system’s performance?
slide-5
SLIDE 5

Chunk Leases & Mutations

  • Mutation is changing of contents like a write or an append
  • Leases maintain a consistent mutation order across chunks for 60

seconds

*example of a write

slide-6
SLIDE 6

Atomic Record Appends

  • Client only specifies data & GFS chooses offset when appending data to file

then returns that offset to the client

  • Appending cannot exceed chunk size
  • If it fails, the client will have to retry the operation
slide-7
SLIDE 7

Snapshot

  • Instantaneously makes a copy
  • Master will duplicate its metadata
  • The snapshot will point to the same chunk as source files
  • Used to make branch copies
slide-8
SLIDE 8

Chunk creation, re-replication, rebalancing

  • Chunk replicas are used for these three things
  • Creation – master chooses where to place the initially empty replica
  • Master re-replicates a chunk if available replicas fall under a specified

goal

  • Master rebalances periodically

Questions:

  • 1. What are criteria for choosing where to place the initially empty replicas?

2. When a new chunkserver is added into the system, the master mostly uses chunk rebalancing rather than using writing new chunks to fill up it. Why?

slide-9
SLIDE 9

Garbage Collection

Any replica not known to master is garbage Master will remove hidden files if they have existed for 3 days

Question:. How are files and chunks are deleted? What’s the advantages of the delayed space reclamation (garbage collection), rather than eager deletion?

slide-10
SLIDE 10

Stale Replica Detection

  • A chunk replica will become stale if chunkserver fails or misses

mutations

  • Master will remove this stale replica during garbage collection when

version chunk numbers do not match

slide-11
SLIDE 11

Fault tolerance & Diagnosis

  • Fast recovery and replication
  • Data integrity by checksum
slide-12
SLIDE 12

Conclusion

  • Optimized for huge files – appending is the norm and then read

sequentially

  • Component failures are treated as the norm
  • Fault tolerance by constant monitoring, replication, and recovery
slide-13
SLIDE 13

Questions?