File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung - - PowerPoint PPT Presentation

file system
SMART_READER_LITE
LIVE PREVIEW

File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung - - PowerPoint PPT Presentation

File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Farnaz Farahanipad 1001134035 Overview Introduction Design overview GFS structure System interaction Master operation Questions and answers Conclusion


slide-1
SLIDE 1

File System

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Farnaz Farahanipad 1001134035

slide-2
SLIDE 2

Overview

  • Introduction
  • Design overview
  • GFS structure
  • System interaction
  • Master operation
  • Questions and answers
  • Conclusion
slide-3
SLIDE 3
  • What is distributed file system?
  • A DFS is any file system that allows

access to file from multiple hosts sharing via a computer network.

Introduction: Distributed File systems

slide-4
SLIDE 4

Introduction:

  • Why not use an existing file system?
  • Bottle neck problem
  • Balancing issue
  • Different workload and design properties
  • GFS is designed for google apps and workloads
slide-5
SLIDE 5

Google file system

  • GFS is a scalable distributed file system for large

data intensive applications.

  • GFS, has master slave architecture.
  • Shares many of the same goals as previous

distributed file systems such as performance, scalability, reliability, and availability to large networks.

5

slide-6
SLIDE 6

GFS design assumption

  • High component failure rates
  • “Modest” number of huge files

– Just a few millions of big files

  • Files are write-once, mostly append to
  • Large streaming reads
  • High sustained throughput favored over low latency
slide-7
SLIDE 7

GFS design decisions

  • Files stored as chunks
  • At the fixed size of 64MB
  • Single Master
  • Simple centralized management
  • Reliability through replication
  • Each chunk is replicated across 3 or more chunk servers
  • No data caching
  • Little benefit due to large size of data sets
  • Familiar interface, but customize the API
  • Suitable to Google apps
  • Add snapshot, record append operation
slide-8
SLIDE 8

GFS architecture

slide-9
SLIDE 9

Client

  • Interacts with the master for metadata operations:
  • Client translates offset in file into chunk index within file
  • Send master request with file name/chunk index
  • Caches info using file name/chunk index as key
  • Interact with chunk servers for read/write operation
slide-10
SLIDE 10

Chunk servers

  • Chunk servers are the workers of GFS.
  • Responsible for storing 64-MB file chunks.
  • Each chunk replica is stored on a chunk server and is

extended only as needed.

slide-11
SLIDE 11

Why large chunk size?

  • Size of meta data is reduced
  • Involvement of Master is reduced
  • Network overhead is reduced
  • Lazy space allocation avoids internal fragmentation
slide-12
SLIDE 12

Reliability issue

  • What if a chunk server goes down?
  • The GFS, copies every chunk multiple times and

store it on different chunk servers.

slide-13
SLIDE 13

Single master weakness: Single point of failure

  • What if master goes down?
  • GFS Solution:
  • Shadow master
slide-14
SLIDE 14

Single master weakness:

Scalability bottleneck

  • How to solve bottle neck problem?
  • GFS Solution:
  • Minimize master involvement
  • Never move data through master. Use only for meta

data

  • Large chunk size less meta data
  • Data mutation is done by chunk servers
slide-15
SLIDE 15

Master

  • The master maintains all file system metadata.
  • Periodically communicates with chunk-servers
  • Gives instruction, collects state
  • Chunk creation, re-replication, rebalancing
  • Garbage collection
  • Simpler and more reliable
  • Lazily garbage collects hidden files
slide-16
SLIDE 16

Master-Metadata

  • Global metadata is stored on the master
  • File and chunk namespaces
  • Mapping from files to chunks
  • Location of each chunk’s replica
  • All in memory(64bytes/chunk)
  • Fast
  • Easy access
slide-17
SLIDE 17

Master-Operation log

  • The operation log contains a historical record of critical

metadata change.

  • Defines the order of concurrent operations
  • Critical
  • Replicated to multiple remote machines
  • Respond to client only when it is log locally and remotely
slide-18
SLIDE 18

Master-Operation log

  • Master checkpoints its state whenever the

log goes beyond a certain size.

  • Fast recovery by using checkpoints
  • Recovery needs only latest files so older files

can be deleted freely.

slide-19
SLIDE 19

Why it is important to log

  • n

information of master?

  • Using a log allows us to update the master state

simply, reliably, and without risking inconsistencies in the event of a master crash.

slide-20
SLIDE 20

Master-Keep chunk servers and master synchronized

  • By sending heart beat messages
slide-21
SLIDE 21

System interaction: lease and mutation order

  • A lease is a grant of ownership or control for a limited time.
  • The owner/holder can renew or extend the lease.
  • If the owner fails, the lease expires and is free again.
slide-22
SLIDE 22

System interaction: lease and mutation order

  • A mutation is an operation that changes the

contents or metadata of a chunk such as write or an append operation.

  • Each mutation is performed to all the chunk’s

replicas.

slide-23
SLIDE 23

System interaction: lease and mutation order

  • Leases are used to maintain

consistent mutation order across replicas.

slide-24
SLIDE 24
  • To avoid network bottlenecks and high-latency links
  • Each machine forwards data to the closest machine
  • Latency is being minimized by pipelining the data transfer over

TCP connections.

System interaction: data flow

Decoupled

slide-25
SLIDE 25

System interaction: data flow

Time for transferring B bytes to R replicas between two machines without network congestion:

𝑢 = 𝐶 𝑈 + 𝑆𝑀

B=1MB L= 1ms t ~ 80 ms T=100Mbps

slide-26
SLIDE 26

Master operation

  • Namespace management and locking
  • Replica placement
  • Creation, Re-replication, Rebalancing
slide-27
SLIDE 27

Master operation: Namespace management and locking

  • We allow multiple operation to be active in master by using

locking to ensure proper serialization.

  • Recall that GFS does not have per-directory data structure.
  • It only store file and chunks mapping
  • So, GFS logically represent its namespace as a look up table

mapping full pathnames to metadata.

  • Using read/write lock on each node in the namespace tree to

ensure serialization.

  • Each master operation acquires a set of locks before it runs
slide-28
SLIDE 28

Master operation: Namespace management and locking

/d1/d2/…/dn/leaf /d1 /d1/d2 … /d1/d2/…/dn /d1/d2/…/dn/leaf If it involves: Read locks on the directory name Either a read lock

  • r a write lock on

the full pathname

slide-29
SLIDE 29

Master operation: Namespace management and locking

  • How this locking mechanism can prevent a file

/home/user/foo from being created while /home/user is being snap shotted to /save/user

Read locks Write locks Snapshot

  • peration

/home /home/user /save /save/user

Creation

  • peration

/home /home/user/foo /home/user

slide-30
SLIDE 30

Master operation: Namespace management and locking

slide-31
SLIDE 31

Master operation: Namespace management and locking

Create new file under a directory: e.g.,create/dir/file3, /dir/file4/, ......

Allow concurrent mutations in the same directory

  • Key: using read lock for dir.
  • By locking pathname, it can lock the new file before it is

created Prevent from creating files with the same name simultaneously

slide-32
SLIDE 32
  • Serves two purposes:
  • Maximize data reliability and availability
  • Maximize network bandwidth utilization
  • Spread chunk replicas across racks:
  • To ensure chunk survivability
  • To exploit aggregate read bandwidth of multiple rack
  • Write traffic has to flow through multiple racks

Master operation: Replica placement

slide-33
SLIDE 33
  • Chunk Replicas are created for three reasons:
  • Chunk Creation
  • Chunk Replication
  • Rebalancing

Master operation: Creation, Re-replication, Rebalancing

slide-34
SLIDE 34
  • New chunks are created on chunk servers
  • Master has to decide which chunk servers could be

used for chunk creation.

  • Put new replicas on chunk servers with below-average disk

space utilization.

  • It wants to reduce the number of creation on each chunk

server.(cheap but heavy write traffic)

  • Spread replicas of a chunk on to different racks.

Master operation: Creation, Re-replication, Rebalancing

slide-35
SLIDE 35
  • Master re-replicates a chunk as soon as the number of available

replicas fall below verified goal.

  • Each chunk that need to be re-replicated is prioritized based on

several factors.

  • Master picks highest priority chunk and “clones” it by instructing

some chunk servers by the chunk data directly from an exiting replica

  • Additionally, each chunk server limits the amount of bandwidth it

spends on each replication by controlling its reads requests to the source chunk servers.

Master operation: Creation, Re-replication, Rebalancing

slide-36
SLIDE 36
  • Master rebalances replicas periodically
  • Examines the current replica distribution and move replica for best

space and load balancing

  • Master gradually fills up a new chunk server rather than instantly

swamps it with new chunks and heavy traffic comes with them.

  • Master must choose which existing replica to remove.
  • It prefers to remove those on chunk servers with below average free space so as

to equalize disk space usage.

Master operation: Creation, Re-replication, Rebalancing

slide-37
SLIDE 37
  • Replica that is not known to master is garbage
  • Master logs the deletion immediately like other changes, but it

does not reclaim the resources.

  • The file renamed to hidden name which includes the deletion

timestamp.

  • During master’s regular scan, it removes the hidden files with in

more tan 3 days.

  • After removing the hidden file from namespace, its in memory

metadata is erased.

Master operation: Garbage collection

slide-38
SLIDE 38
  • Lazy GC advantages:
  • Simple
  • Reliable in large scale distributed system
  • It is done when master is free
  • Provide safety net against accidental deletion
  • Replica deletion:
  • It provides a uniform way to clean up any replicas not known to

be useful.

Master operation: Garbage collection

slide-39
SLIDE 39

Questions and answers

1)“…its design has been driven by key observations of our application workloads and technological environment,…” What are the workload and technology characteristics GFS assumed in its design and what are their corresponding design choices? A)Answer in Slide 7 and 8

slide-40
SLIDE 40

Questions and answers

2)“…while caching data blocks in the client loses its appeal.” GFS does not cache file data. Why does this design choice not lead to performance loss? What benefit does this choice have? A)Because of large streaming read and limited cache size, the GFS does not cache file data. Otherwise the cache data will be overwritten due to less space. The advantages is we will have more memory space.

slide-41
SLIDE 41

Questions and answers

3)“Small files must be supported, but we need not optimize for them.” Why? A)Because, it is assumed that files are going to be large and there is not too manty small files to take care of.

slide-42
SLIDE 42

Questions and answers

4)“Clients interact with the master for metadata operations, but all data-bearing communication goes directly to the chunk servers.” How does this design help improve the system’s performance? A) For getting chunk location control flow msg are used and for accessing data, data flow msg are used, which helps in using network bandwidth efficiently. Also, reduces the load

  • n master which improves efficiency.
slide-43
SLIDE 43

Questions and answers

5) “A GFS cluster consists of a single master…”. What’s benefit of having only a single master? What’s its potential performance risk? How does GFS minimize such a risk? A)By applying the single master architecture, we simplify the design and enable Master to make sophisticated chunk placement and replication decisions. In other words, it increases the flexibility. We should do our best to minimize the involvement of master to avoid

  • bottlenecks. Other issues like single point of failure is also exist.

The single point of failure In GFS, is addressed by shadow Masters. To solve bottle neck issue, only metadata are saved in Master.

slide-44
SLIDE 44

Questions and answers

6)“Each chunk replica is stored as a plain Linux file on a chunk server and is extended only as needed.” How does GFS collaborate with chunk server's local file system to store file chunks? What’s lazy space allocation and what’s its benefit? A)It tries to have balanced data distribution and distribute the chunks

  • n different chunk servers. With Lazy space allocation, GFS does not

physically allocate a space to files until data reach the 64MB. It will reduce the chance of internal fragmentation.

slide-45
SLIDE 45

Questions and answers

7) “On the other hand, a large chunks size, even with lazy space allocation, has its disadvantages.” Give an example disadvantage. A) Files might be accessed a lot and became hotspots. GFS fixed this problem by storing such executable with higher replication factor. Another solution is to allow clients to read data from

  • ther clients in such situations

8) “One potential concern for this memory-only approach is that the number of chunks and hence the capacity of the whole system is limited by how much memory the master has.” Why is GFS’s master able to keep the metadata in memory? A) As GFS only supports large-size chunk(64MB) with 64bytes of metadata, the memory is not a problem.(64/64MB)

slide-46
SLIDE 46

Questions and answers

9) “We use leases to maintain a consistent mutation order across replicas.” Could you show a scenario where unexpected result may appear if the lease mechanism is not implemented? Also explain how leases help address the problem? A) Absence of leases could result in data inconsistency across the

  • replicas. For instance, the mutations might be stored differently (Order

is different)in chunk servers.

slide-47
SLIDE 47

Questions and answers

9) “We use leases to maintain a consistent mutation order across replicas.” Could you show a scenario where unexpected result may appear if the lease mechanism is not implemented? Also explain how leases help address the problem? A)GFS client send data to chunk servers, which is kept in buffer. Master designate a primary chunk server, which will save/write chunk data on its storage. Primary chunk server then send the order in which it saved chunks, to

  • ther secondary chunk servers.

Secondary chunk servers will then save/write chunk data in same order as of primary chunk server’s and cause consistency.

slide-48
SLIDE 48

Questions and answers

10) “When the master creates a chunk, it chooses where to place the initially empty replicas.“ What are criteria for choosing where to place the initially empty replicas? A)Place new replicas on a chunk servers with below average disk space utilization.(This will equalize disk utilization) Second, the replicas should be placed over different racks on a chunk server. Third, to avoid exhausting the chunk server by heavy write traffic, the number of replica creations should be minimized on each chunk servers.

slide-49
SLIDE 49

Questions and answers

11)“The master re-replicates a chunk as soon as the number of available replicas falls below a user-specified goal.” When a new chunk server is added into the system, the master mostly uses chunk rebalancing rather than using new chunks to fill up it. Why? A) Master rebalance chunk servers gradually, because this will ensure that chunk servers are not being exhausted by heavy traffic, and balances the load across chunk servers.

slide-50
SLIDE 50

Questions and answers

12)“After a file is deleted, GFS does not immediately reclaim the available physical storage. It does so only lazily during regular garbage collection at both the file and chunk levels.” How are files and chunks are deleted? What’s the advantages of the delayed space reclamation (garbage collection), rather than eager deletion? A) Mechanism: When file is deleted by application, master logs the deletion immediately. File is renamed to hidden name that include the deletion timestamp. During master’s regular scan, it remove any such hidden files if they have existed for more than three days. After hidden files are removed from namespace, its in-memory metadata is erased.

slide-51
SLIDE 51

Questions and answers

12)“After a file is deleted, GFS does not immediately reclaim the available physical storage. It does so only lazily during regular garbage collection at both the file and chunk levels.” How are files and chunks are deleted? What’s the advantages of the delayed space reclamation (garbage collection), rather than eager deletion? A)

  • Simple
  • Reliable in large scale distributed system
  • It is done when master is free, Instant deletion may cause too

much workload for master to do.

  • Provide safety net against accidental deletion
slide-52
SLIDE 52

Conclusion

  • GFS, demonstrates qualities essential for large-scale

data processing on commodity hardware.

  • Fault tolerance by constant monitoring, replication,

fast/automatic recovery.

  • It delivers high aggregate throughput to many concurrent

readers and writers by decoupling.

slide-53
SLIDE 53

Q & A