Flat Datacentre Storage Sumit Mokashi Why is the storage described - - PowerPoint PPT Presentation

flat datacentre storage
SMART_READER_LITE
LIVE PREVIEW

Flat Datacentre Storage Sumit Mokashi Why is the storage described - - PowerPoint PPT Presentation

Flat Datacentre Storage Sumit Mokashi Why is the storage described as a flat one? In FDS, data is logically stored in blobs. ... Reads from and writes to a blob are done in units called tracts. What are blob and tracts? Are


slide-1
SLIDE 1

Flat Datacentre Storage

Sumit Mokashi

slide-2
SLIDE 2
slide-3
SLIDE 3
  • Why is the storage described as a “flat” one?
  • “In FDS, data is logically stored in blobs. ... Reads from and writes to a

blob are done in units called tracts.” What are blob and tracts? Are they

  • f constant sizes?
slide-4
SLIDE 4
  • For storage systems with just Hash function for eliminationg metadata

tables: H(GUID, tract #) → Disk IDs (0,….,9999)

  • DHT , Consistent Hashing
  • For FDS , using hash function plus the TLTs :

H(GUID, tract #) → intex to TLT

slide-5
SLIDE 5
  • “In our cluster, tracts are 8MB”. Why is a tract in FDS sized this large?
  • “Tractservers do not use a file system.” Explain this design choice.
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
  • “FDS uses a metadata server, but its role during normal operations is

simple and limited:…” What are potential drawbacks of using a centralized metadata server? How does FDS address the issue?

  • How does FDS locate the trackserver that stores a particular tract of a

given blob? Why does FDS first identify a tract locator (an index to an entry of tract locator table) and then in the entry to find the trackserver, rather than directly identifying a trackserver using a hash function without having such a table?

slide-9
SLIDE 9
slide-10
SLIDE 10
  • “To be clear, the TLT does not contain complete information about the

location of individual tracts in the system.” and in the GFS paper “The master maintains less than 64 bytes of metadata for each 64 MB chunk.” Compare the TLT table with GFS’s use of a full chunk-chunkserver mapping table in the context of efficiency, scalability, and flexibility. [Hint: “It is not modified by tract reads and writes.” “Its size in a single-replicated system is proportional to the number of tractservers in the system…”.]

  • “In our 1,000 disk cluster, FDS recovers 92GB lost from a failed disk in 6.2

seconds.” What is normal throughput of a hard disk? What’s the throughput of this recovery? How can this be possible? [Hint: Describe the procedure of recovering from a dead tractserver to answer this question. See Figure 2 and read Section 3.3]

slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14

References:

  • https://www.usenix.org/system/files/conference/osdi12/osdi12-final-

75.pdf

  • http://ranger.uta.edu/~sjiang/CSE6350-spring-19/lecture-6.pdf