The Need • Component failures normal – Due to clustered computing • Files are huge Google File System – By traditional standards (many TB) • Most mutations are mutations – Not random access overwrite CSE 454 • Co-Designing apps & file system • Typical: 1000 nodes & 300 TB From paper by Ghemawat, Gobioff & Leung Desiderata Interface • Must monitor & recover from comp failures • Familiar • Modest number of large files – Create, delete, open, close, read, write • Workload • Novel – Large streaming reads + small random reads – Snapshot – Many large sequential writes • Low cost • Random access overwrites don’t need to be efficient – Record append • Need semantics for concurrent appends • Atomicity with multiple concurrent writes • High sustained bandwidth – More important than low latency Architecture Architecture metadata only Master • Store all files – In fixed-size chucks Client Chunk Chunk } { • 64 MB Server Server • 64 bit unique handle Many Client Many • Triple redundancy Chunk Chunk Server Server Client data only Chunk Chunk Client Server Server 1
Architecture Architecture Master • Stores all metadata Client • GFS code implements API – Namespace • Cache only metadata – Access-control information Client – Chunk locations – ‘Lease’ management Client • Heartbeats • Having one master � global knowledge – Allows better placement / replication Client – Simplifies design Using fixed chunk size, translate filename & Replies with chunk handle & location of chunkserver byte offset to chunk index. replicas (including which is ‘primary’) Send request to master Cache info using filename & chunk index as key Request data from nearest chunkserver “chunkhandle & index into chunk” 2
No need to talk more Often initial request asks about About this 64MB chunk Sequence of chunks Until cached info expires or file reopened Metadata Consistency Model • Master stores three types – File & chunk namespaces – Mapping from files � chunks – Location of chunk replicas • Stored in memory • Kept persistent thru logging Consistent = all clients see same data Consistency Model Consistency Model Defined = consistent + clients see full effect Different clients may see different data of mutation Key: all replicas must process chunk-mutation requests in same order 3
Implications Leases & Mutation Order • Apps must rely on appends, not overwrites • Objective • Must write records that – Ensure data consistent & defined – Minimize load on master – Self-validate – Self-identify • Master grants ‘lease’ to one replica • Typical uses – Called ‘ primary ’ chunkserver – Single writer writes file from beginning to end, • Primary serializes all mutation requests then renames file (or checkpoints along way) – Communicates order to replicas – Many writers concurrently append • At-least-once semantics ok • Reader deal with padding & duplicates Write Control & Dataflow Atomic Appends • As in last slide, but… • Primary also checks to see if append spills over into new chunk – If so, pads old chunk to full extent – Tells secondary chunk-servers to do the same – Tells client to try append again on next chunk • Usually works because – max(append-size) < ¼ chunk-size [API rule] – (meanwhile other clients may be appending) Other Issues Master Replication • Fast snapshot • Master log & checkpoints replicated • Master operation • Outside monitor watches master livelihood – Namespace management & locking – Starts new master process as needed – Replica placement & rebalancing • Shadow masters – Garbage collection (deleted / stale files) – Provide read-access when primary is down – Detecting stale replicas – Lag state of true master 4
Read Performance Write Performance Record-Append Performance 5
Recommend
More recommend