The Google File System
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak - - PowerPoint PPT Presentation
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in GFS Evaluation
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo
system for large distributed data-intensive applications, which runs on inexpensive commodity hardware and provides fault tolerance, high performance to a large number of clients.
distributed file systems such as performance, scalability, reliability, and availability
components that often fail
concurrently append to the same file.
support the usual operations to create, delete,
will be updated periodically
garbage collection, load balance, etc.
mutations for all replicas
(Question)
application workloads and technological environment,…” What are the workload and technology characteristics GFS assumed in its design and what are their corresponding design choices?
—> GFS design assumptions and target workload
components that often fail
concurrently append to the same file.
GFS does not cache file data. Why does this design choice not lead to performance loss? What benefit does this choice have?
client server
(1) stream through huge files (2) working sets too large Client caches offer little benefit. However, clients still cache metadata for future access. (a) Simply design of GFS (b)Eliminating cache coherence issues, challenging
them.” Why? Large and small files exist in almost every systems.
(a) GFS is designed to store millions of large files, each typically 100 MB or larger in size (b) The chunkservers storing chunks which belong to small files may become hot spots if many clients are accessing the same file. In practice, hot spots have not been a major issue because our applications mostly read large multi-chunk files sequentially. (c) One of disadvantages of GFS
read request
translates request and sends it to master
responds with chunk handle and replica locations
Client Master Chunk Chunk Chunk Application
① ② ③ ④ ⑤ ⑥
file name, byte range file name, chunk index chunk handle replica location chunk handle byte range data from file data
Client Master Chunk Chunk Chunk Application
① ② ③ ④ ⑤ ⑥
file name, byte range file name, chunk index chunk handle replica location chunk handle byte range data from file data
location and sends the request
sends requested data to the client
the data to the application
Client Master Chunk replica Chunk (Primary) Chunk replica
② ③ ④ ⑤ ⑥ ⑥ ⑦ ⑦ ⑧
Application
① ⑨
file name, byte range
the request
request and sends it to master
chunk handle and replica locations
Client Master Chunk replica Chunk (Primary) Chunk replica
② ③ ④ ⑤ ⑥ ⑥ ⑦ ⑦ ⑧
Application
① ⑨
file name, byte range
to all locations. Data is stored in chunkserver’s internal buffers
command to primary
Client Master Chunk replica Chunk (Primary) Chunk replica
② ③ ④ ⑤ ⑥ ⑥ ⑦ ⑦ ⑧
Application
① ⑨
file name, byte range
serial order for data instances in its buffer and writes the instances in that
and tells them to perform the write
Client Master Chunk replica Chunk (Primary) Chunk replica
② ③ ④ ⑤ ⑥ ⑥ ⑦ ⑦ ⑧
Application
① ⑨
file name, byte range
back to primary
to the client
applications
data is to be written.
and works for concurrent writers
difference
data-bearing communication goes directly to the chunkservers.” How does this design help improve the system’s performance?
Potential bottleneck minimize clients’ involvement in reads and writes with the master node
benefit of having only a single master? What’s its potential performance risk? How does GFS minimize such a risk?
1, Simplify Design 2, Potential bottleneck 3, Minimize clients’ involvement in reads and writes with the master node
chunkserver and is extended only as needed.” How does GFS collaborate with chunkserver’s local file system to store file chunks? What’s lazy space allocation and what’s its benefit? GFS is composed of many servers Each server is typically a commodity Linux machine running a user-level server process. The file in GFS is finally stored in local server as regular Linux file
chunkserver and is extended only as needed.” How does GFS collaborate with chunkserver’s local file system to store file chunks? What’s lazy space allocation and what’s its benefit?
with help of local file system
chunkserver and is extended only as needed.” How does GFS collaborate with chunkserver’s local file system to store file chunks? What’s lazy space allocation and what’s its benefit? Lazy allocation simply means not allocating a resource until it is actually needed. Benefits: Lazy space allocation avoids wasting space due to internal fragmentation, perhaps the greatest objection against such a large chunksize.
lazy space allocation, has its disadvantages.” Give an example disadvantage.
A small file consists of a small number of chunks, perhaps just one. The chunkservers storing those chunks may become hot spots if many clients are accessing the same file. In practice, hot spots did develop when GFS was first used by a batch-queue system. The few chunkservers storing an executable problem were overloaded by hundreds of simultaneous requests. Fixed by storing such executables with a higher replication factor and by making the batchqueue system stagger application start times.
lazy space allocation, has its disadvantages.” Give an example disadvantage.
[Example] hot spot for small files Chunk
number of chunks and hence the capacity of the whole system is limited by how much memory the master has.” Why is GFS’s master able to keep the metadata in memory? Chunk size (64MB) —> less than 64 bytes Metadata, small enough
replicas.” Could you show a scenario where unexpected result may appear if the lease mechanism is not implemented? Also explain how leases help address the problem?
primary order non-primary order without lease
replicas.” Could you show a scenario where unexpected result may appear if the lease mechanism is not implemented? Also explain how leases help address the problem?
follow it with lease primary order non-primary order
replicas.” Could you show a scenario where unexpected result may appear if the lease mechanism is not implemented? Also explain how leases help address the problem?
Lease: keep mutation
Secondary replicas follows primary replica
write
(Performance)
file namespace.
writing self-validating, self-identifying records
as a producer-consumer queue
initially empty replicas.“ What are criteria for choosing where to place the initially empty replicas? 1, place new replicas on chunkservers with below-average diskspace utilization (balance) 2, limit the number of “recent” creations on each chunkserver (imminent heavy write soon) 3, spread replicas of a chunkacross racks (reliability) new
available replicas falls below a user-specified goal.” When a new chunkserver is added into the system, the master mostly uses chunk rebalancing rather than using new chunks to fill up it. Why? 2, limit the number of “recent” creations on each chunkserver (imminent heavy write soon) 3, spread replicas of a chunkacross racks (reliability) Heavy I/O flow, bad :( Put eggs in one basket, not safe
available physical storage. It does so only lazily during regular garbage collection at both the file and chunk levels.” How are files and chunks are deleted? What’s the advantages of the delayed space reclamation (garbage collection), rather than eager deletion?
File: When a file is deleted by the application, the master logs the deletion immediately. The file is just renamed to a hidden name that includes the deletion timestamp. During the master’s regular scan of the file system namespace, it removes any such hidden files if they have existed for more than three days. Then remove namespace, metadata, etc. Chunk: the master identifies not reachable chunks with heartbeat message and erases the metadata for those chunks.
available physical storage. It does so only lazily during regular garbage collection at both the file and chunk levels.” How are files and chunks are deleted? What’s the advantages of the delayed space reclamation (garbage collection), rather than eager deletion?
Advantages: 1, simple and reliable for large distribute systems 2, it merges storage reclamation into the regular background activities of the master, less overhead or burden for master node 3, avoid accidental, irreversible deletion
100Mpbs Ethernet)
Mbps Ethernet
from 320 GB file set simultaneously.
clients go up due to probability reading from same chunkserver
simultaneously.
propagating data among replicas.
aggregate write bandwidth to large clients.
simultaneously.
up due to network congestion by different clients.
major issue with large clients appending to large shared files.
development
processing
for large distributed data-intensive applications, which runs on inexpensive commodity hardware and provides fault tolerance, high performance to a large number of clients.
distributed file systems but has its own innovations and limitations (master bottleneck, designed for large files, hotspot, etc)
apps and services
from Alibaba
Millions of Products
description, comments, transactions, etc. are all small files.
One chunk contains many small files with hierarchy 1st level index Nth level index Open sourced
Optimization for small files
gfs.ppt
cmsc818k/Lectures/gfs-hdfs.pdf
google-file-system-gfs-presentation
google-file-system-gfs