Flat Datace cent nter er St Storag age Presenter: Saroj Panda - - PowerPoint PPT Presentation

flat datace cent nter er st storag age presenter saroj
SMART_READER_LITE
LIVE PREVIEW

Flat Datace cent nter er St Storag age Presenter: Saroj Panda - - PowerPoint PPT Presentation

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet- wide Services Flat Datace cent nter er St Storag age Presenter: Saroj Panda Agenda FDS ARCHITECTURE DATA PLACEMENT ANSWER TO QUESTIONS


slide-1
SLIDE 1

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet- wide Services

Flat Datace cent nter er St Storag age Presenter: Saroj Panda

slide-2
SLIDE 2

Agenda

  • FDS ARCHITECTURE
  • DATA PLACEMENT
  • ANSWER TO QUESTIONS
  • HANDLING FAILURE
  • Q & A
  • REFERENCES
slide-3
SLIDE 3

FDS Architecture

 Data is logically stored in blobs.  A blob is a byte sequence

named with a 128-bit GUID.

 Read and writes in units called

tracts.

 Each disk Managed by a Tract

Server.

slide-4
SLIDE 4

FDS Architecture Cont…

 Read/Write does not go

through Metadata Server.

 Metadata Server keeps track of

list of active tract servers in a table called TLT.

 Client application contacts the

metadata server when it starts to get the TLT.

slide-5
SLIDE 5
  • Q1. “Consider a centralized file server in a small computer science department. Data

stored by y an any y computer can an be retrie ieved by y an any y oth

  • ther. This

is con

  • nceptual si

simpli licity makes it it eas asy to

  • use

se: : computation can an hap appen on

  • n an

any computer, even in in par arall llel, l, with ithout regard to

  • first putting data in the right place.” Is GFS a centralized file system? To achieve good

performance for an an I/ I/O-in intensiv ive program ru runnin ing on

  • n GF

GFS, how doe

  • es th

the data plac lacement affect its its perf rformance?

 GFS can be viewed as a centralized file system as each read/write request first goes

through the Master (Metadata Server).

 Practically, it is a distributed File System.  In GFS the data is distributed as chucks across chunk servers. Processing of different

chunks can be done in parallel at different chunk servers. This distribution of load improves performance of the system. Answer:

slide-6
SLIDE 6
  • Q2. “The root of this cascade of consequences was the locality constraint, itself rooted in

the datacenter bandwidth shortage.” Show example consequences of relying on locality in program’s execution. Why does a sufficient I/O bandwidth help remove the constraint?

 Locality constraints (Computation at where data is) can sometimes affect efficient

resource utilization.

 Example(stragglers): If data is singly replicated, a single unexpectedly slow machine

can hinder an entire job’s timely completion, where the faster machines after completing their job have to wait for the straggler machine to complete its part.

 With locality constraint, re-tasking computation to other nodes would be expensive

involving huge data movements.

 If we consider sufficient I/O Bandwidth then we could expose all of a cluster’s disk

bandwidth to applications. There would be no distinction between local and remote

  • disks. So constraints like re-tasking straggler machine job to another machine doesn’t

require expensive data movement. Answer:

slide-7
SLIDE 7

DATA PLACEMENT

 FDS metadata server collects list of active tract servers called Tract

Locator Table.

 In a single-replicated system, each TLT entry contains the address of a

single tract server. With k-way replication, each entry has k tract servers.

slide-8
SLIDE 8

DATA PLACEMENT

 To read or write tract number i from a blob with GUID g, a client first

selects an entry in the TLT by computing an index into it called the tract locator.

 The client sends the write to every tract server it contains.  Applications are notified after write acknowledgments from all replicas.  Read from one tract server.

slide-9
SLIDE 9

HANDLING FAIL ILURE

 Tract servers periodically send heartbeat messages to the metadata

server.

 On Timeout Metadata Server assumes the Tract Server Dead.  Metadata server invalidates the current TLT by incrementing the version

number of each row in which the failed tract server appears.

 Randomly picks up replica and fills up those places in TLT.  Updates affected servers.  Hands out new TLTs to clients when queried.

slide-10
SLIDE 10
  • Q3. “In FDS, data is logically stored in blobs. ... Reads from and writes to a blob are done

in units called tracts.” What are blob and tracts? Are they of constant sizes? What are th their respectiv ive equiv ivale lents in in GF GFS?

 Blobs are Byte sequences identified by a 128-bit Global Unique Identifiers (GUID). The

GUID can either be selected by the application or assigned randomly by the system.

 Tracts are Unit of Data Read/Written to Blobs.  Blob can be any length up to system’s storage capacity.  Tracts are sized so that random and sequential accesses have same throughput.  Tract size is fixed for the storage technology used.  Blob is equivalent to File and Tract equivalent to chunk in GFS.

Answer:

slide-11
SLIDE 11
  • Q4. “In our cluster, tracts are 8MB”. Why is a tract in FDS sized this large?

 Tracts are sized such that random and sequential access achieves nearly the same

throughput.

 The tract size is set when the cluster is created based upon cluster storage hardware.

For example, if flash were used instead of disks, the tract size could be made far smaller (e.g., 64kB).

 If the tract size is small, the client need to write number of times for a large blob. The

process would be slow. If the tract size is large, the number of writes will reduce.

 In the several experiments performed by the authors 8MB was the ideal size to get

the same throughput for random and sequential access. Answer:

slide-12
SLIDE 12
  • Q5. “Tract servers do not use a file system.” Explain this design choice.

 FDS is storage system and not a file system.  The tract servers directly handle raw disks. Blobs are divide into tracts and numbered

  • sequentially. The client application communicate with the FDS through an API that

abstracts some of the complexity around the messaging layer.

 Tract servers and their network protocol are not exposed directly to FDS applications.

Instead, these details are hidden in a client library with a narrow and straightforward interface.

 This design choice gets rid of file system overhead of maintaining the hierarchical file

system structure and maintaining those in memory for efficiency. Answer:

slide-13
SLIDE 13
  • Q6. “FDS uses a metadata server, but its role during normal operations is simple and

limited:…” What are drawbacks of using a centralized metadata server? How does FDS ad address th the is issu sue?

 Centralized metadata server becomes central point of failure.  Tract servers locally store their position in the Tract Locator Table, TLT.  Metadata server collects this list from all the active tract servers.  When the client application starts, it contacts the metadata server to get the TLT.  TLT cached in Client.  In case of a metadata server failure, the TLT is reconstructed by collecting the table

assignments from each tract server, without loosing any information. Answer:

slide-14
SLIDE 14

Q7.

  • 7. How doe
  • es FD

FDS loc locate th the tr tract se server th that stores a a par articular tr tract of

  • f a

a giv iven blob lob? Why doe

  • es FD

FDS fir first id identify fy a a tr tract loc locator (an (an in index to

  • an

an entry ry of

  • f tr

tract loc locator tab able le) an and th then in in th the entry try to

  • fin

find th the tr tract se server, rath ther th than dir irectly id identify fying a a tr tract se server usi sing a a has ash fu function with ithout havin ing su such ch a a tab able?

 Tract number i from a blob with GUID g is identified from the TLT by computing an

index into it called the tract locator by applying a hash function.

 Tract Locator row found contains all the tract server replicas for that tract.  Directly identifying a tract server using a hash function will cause problem when that

tract server goes down. The hash function will always give out that dead tract server and the client application can not retrieve the tract data.

 With tract locator approach, the tracts of the dead tract server are copied to other

active tract servers from the other replicas and the dead tract server names are replaced by new tract server names in the TLT. This allows the client application to retrieve the client data from any of the replicas even in case of tract server failure. Answer:

slide-15
SLIDE 15
  • Q8. “To be clear, the TLT does not contain complete information about the location of

individual tracts in the system.” and in the GFS paper “The master maintains less than 64 bytes of metadata for each 64 MB chunk.” Compare the TLT table with GFS’s use of a fu full ll chunk-chunk se server map apping tab able le in in th the con

  • ntext

xt of

  • f effi

ficie iency, sc scala labil ility, an and fle flexib ibil ility. [Hint: “It is not modified by tract reads and writes.” “Its size in a single replicated system is proportional to the number of tract servers in the system…”.]

 GFS chunk size is large, leading to less number of chunks. Metadata server is able maintain

chunk to chunk server mapping and other metadata information in memory for efficiency, scalability and its also flexible as the chunk server can be obtained directly from the chunk. When new chunks are written, new chunk to chunk server mapping is added to the metadata

  • server. Single master in a system can still become potential bottleneck.

 In FDS the tract size can be small and there will be many tract servers. The metadata server

maintains tract locator table, which is passed on to the client when the client application comes up for the first time. If a new tract written to a tract server maps to a existing row in the TLT through the hash function, TLT need not be updated. Client first reach the row in TLT and then the tract server. So, its less flexible than GFS. As tract size is less the TLT will grow faster in case of single replicated server making it less scalable than GFS. However, as tract servers also maintain copies of their position in the TLT, the TLT can be reconstructed from those copies in case of metadata server failure without loosing any information and making the system more efficient than GFS.

Answer:

slide-16
SLIDE 16
  • Q9. “In our 1,000 disk cluster, FDS recovers 92GB lost from a failed disk in 6.2 seconds.”

What is normal throughput of a hard disk? What’s the throughput of this recovery? How can an th this is be pos

  • ssib

ible le?

 General throughput of a hard disk is 100-150 MB/s  Throughput of the FDS recovering 92GB lost from a failed disk in 6.2 seconds = 92/6.2

= 14.84 GB/s.

 This is possible, because in case of a tract server failure, the FDS recovers the tracts

from that server by copying those tracts from the replica servers to different tract servers in parallel. In other words, distributing the read/write load across multiple tract servers. Answer:

slide-17
SLIDE 17

Q & A

slide-18
SLIDE 18

References

https://www.usenix.org/conference/osdi12/technical- sessions/presentation/nightingale http://paperswelove.org/2015/video/san-francisco-1- 15-alex-rasmussen-presents-flat-datacenter-storage/