CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet- wide Services
Flat Datace cent nter er St Storag age Presenter: Saroj Panda
Flat Datace cent nter er St Storag age Presenter: Saroj Panda - - PowerPoint PPT Presentation
CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet- wide Services Flat Datace cent nter er St Storag age Presenter: Saroj Panda Agenda FDS ARCHITECTURE DATA PLACEMENT ANSWER TO QUESTIONS
CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet- wide Services
Flat Datace cent nter er St Storag age Presenter: Saroj Panda
Data is logically stored in blobs. A blob is a byte sequence
named with a 128-bit GUID.
Read and writes in units called
tracts.
Each disk Managed by a Tract
Server.
Read/Write does not go
through Metadata Server.
Metadata Server keeps track of
list of active tract servers in a table called TLT.
Client application contacts the
metadata server when it starts to get the TLT.
stored by y an any y computer can an be retrie ieved by y an any y oth
is con
simpli licity makes it it eas asy to
se: : computation can an hap appen on
any computer, even in in par arall llel, l, with ithout regard to
performance for an an I/ I/O-in intensiv ive program ru runnin ing on
GFS, how doe
the data plac lacement affect its its perf rformance?
GFS can be viewed as a centralized file system as each read/write request first goes
through the Master (Metadata Server).
Practically, it is a distributed File System. In GFS the data is distributed as chucks across chunk servers. Processing of different
chunks can be done in parallel at different chunk servers. This distribution of load improves performance of the system. Answer:
the datacenter bandwidth shortage.” Show example consequences of relying on locality in program’s execution. Why does a sufficient I/O bandwidth help remove the constraint?
Locality constraints (Computation at where data is) can sometimes affect efficient
resource utilization.
Example(stragglers): If data is singly replicated, a single unexpectedly slow machine
can hinder an entire job’s timely completion, where the faster machines after completing their job have to wait for the straggler machine to complete its part.
With locality constraint, re-tasking computation to other nodes would be expensive
involving huge data movements.
If we consider sufficient I/O Bandwidth then we could expose all of a cluster’s disk
bandwidth to applications. There would be no distinction between local and remote
require expensive data movement. Answer:
FDS metadata server collects list of active tract servers called Tract
Locator Table.
In a single-replicated system, each TLT entry contains the address of a
single tract server. With k-way replication, each entry has k tract servers.
To read or write tract number i from a blob with GUID g, a client first
selects an entry in the TLT by computing an index into it called the tract locator.
The client sends the write to every tract server it contains. Applications are notified after write acknowledgments from all replicas. Read from one tract server.
Tract servers periodically send heartbeat messages to the metadata
server.
On Timeout Metadata Server assumes the Tract Server Dead. Metadata server invalidates the current TLT by incrementing the version
number of each row in which the failed tract server appears.
Randomly picks up replica and fills up those places in TLT. Updates affected servers. Hands out new TLTs to clients when queried.
in units called tracts.” What are blob and tracts? Are they of constant sizes? What are th their respectiv ive equiv ivale lents in in GF GFS?
Blobs are Byte sequences identified by a 128-bit Global Unique Identifiers (GUID). The
GUID can either be selected by the application or assigned randomly by the system.
Tracts are Unit of Data Read/Written to Blobs. Blob can be any length up to system’s storage capacity. Tracts are sized so that random and sequential accesses have same throughput. Tract size is fixed for the storage technology used. Blob is equivalent to File and Tract equivalent to chunk in GFS.
Answer:
Tracts are sized such that random and sequential access achieves nearly the same
throughput.
The tract size is set when the cluster is created based upon cluster storage hardware.
For example, if flash were used instead of disks, the tract size could be made far smaller (e.g., 64kB).
If the tract size is small, the client need to write number of times for a large blob. The
process would be slow. If the tract size is large, the number of writes will reduce.
In the several experiments performed by the authors 8MB was the ideal size to get
the same throughput for random and sequential access. Answer:
FDS is storage system and not a file system. The tract servers directly handle raw disks. Blobs are divide into tracts and numbered
abstracts some of the complexity around the messaging layer.
Tract servers and their network protocol are not exposed directly to FDS applications.
Instead, these details are hidden in a client library with a narrow and straightforward interface.
This design choice gets rid of file system overhead of maintaining the hierarchical file
system structure and maintaining those in memory for efficiency. Answer:
limited:…” What are drawbacks of using a centralized metadata server? How does FDS ad address th the is issu sue?
Centralized metadata server becomes central point of failure. Tract servers locally store their position in the Tract Locator Table, TLT. Metadata server collects this list from all the active tract servers. When the client application starts, it contacts the metadata server to get the TLT. TLT cached in Client. In case of a metadata server failure, the TLT is reconstructed by collecting the table
assignments from each tract server, without loosing any information. Answer:
Q7.
FDS loc locate th the tr tract se server th that stores a a par articular tr tract of
a giv iven blob lob? Why doe
FDS fir first id identify fy a a tr tract loc locator (an (an in index to
an entry ry of
tract loc locator tab able le) an and th then in in th the entry try to
find th the tr tract se server, rath ther th than dir irectly id identify fying a a tr tract se server usi sing a a has ash fu function with ithout havin ing su such ch a a tab able?
Tract number i from a blob with GUID g is identified from the TLT by computing an
index into it called the tract locator by applying a hash function.
Tract Locator row found contains all the tract server replicas for that tract. Directly identifying a tract server using a hash function will cause problem when that
tract server goes down. The hash function will always give out that dead tract server and the client application can not retrieve the tract data.
With tract locator approach, the tracts of the dead tract server are copied to other
active tract servers from the other replicas and the dead tract server names are replaced by new tract server names in the TLT. This allows the client application to retrieve the client data from any of the replicas even in case of tract server failure. Answer:
individual tracts in the system.” and in the GFS paper “The master maintains less than 64 bytes of metadata for each 64 MB chunk.” Compare the TLT table with GFS’s use of a fu full ll chunk-chunk se server map apping tab able le in in th the con
xt of
ficie iency, sc scala labil ility, an and fle flexib ibil ility. [Hint: “It is not modified by tract reads and writes.” “Its size in a single replicated system is proportional to the number of tract servers in the system…”.]
GFS chunk size is large, leading to less number of chunks. Metadata server is able maintain
chunk to chunk server mapping and other metadata information in memory for efficiency, scalability and its also flexible as the chunk server can be obtained directly from the chunk. When new chunks are written, new chunk to chunk server mapping is added to the metadata
In FDS the tract size can be small and there will be many tract servers. The metadata server
maintains tract locator table, which is passed on to the client when the client application comes up for the first time. If a new tract written to a tract server maps to a existing row in the TLT through the hash function, TLT need not be updated. Client first reach the row in TLT and then the tract server. So, its less flexible than GFS. As tract size is less the TLT will grow faster in case of single replicated server making it less scalable than GFS. However, as tract servers also maintain copies of their position in the TLT, the TLT can be reconstructed from those copies in case of metadata server failure without loosing any information and making the system more efficient than GFS.
Answer:
What is normal throughput of a hard disk? What’s the throughput of this recovery? How can an th this is be pos
ible le?
General throughput of a hard disk is 100-150 MB/s Throughput of the FDS recovering 92GB lost from a failed disk in 6.2 seconds = 92/6.2
= 14.84 GB/s.
This is possible, because in case of a tract server failure, the FDS recovers the tracts
from that server by copying those tracts from the replica servers to different tract servers in parallel. In other words, distributing the read/write load across multiple tract servers. Answer:
https://www.usenix.org/conference/osdi12/technical- sessions/presentation/nightingale http://paperswelove.org/2015/video/san-francisco-1- 15-alex-rasmussen-presents-flat-datacenter-storage/