can a file system virtualize
play

Can a file system virtualize processors? Lex Stein, Microsoft - PowerPoint PPT Presentation

Can a file system virtualize processors? Lex Stein, Microsoft Research Asia David Holland, Harvard University Margo Seltzer, Harvard University Zheng Zhang, Microsoft Research Asia A mystery: what is happening to Jos's program? throughput


  1. Can a file system virtualize processors? Lex Stein, Microsoft Research Asia David Holland, Harvard University Margo Seltzer, Harvard University Zheng Zhang, Microsoft Research Asia

  2. A mystery: what is happening to José's program? throughput ideal 22 realized 12 10 6 0 16 32 32 month 100% 75% 70% 42% utilization DesyncFS – Lex Stein 2

  3. Let's look at the program ● An iterative solution to the 1D wave equation: A(i,t+1) = (2.0 * A(i,t)) - A(i,t-1) + (c * (A(i-1,t) - (2.0 * A(i,t)) + A(i+1,t))) 1 2 3 4 ● The slow processors are holding the fast processors back DesyncFS – Lex Stein 3

  4. The problem: ungraceful degradation processor heterogeneity + synchronization = performance cliff throughput 2 1 synchronous 0 0 20 40 60 80 100 slow processors (%) DesyncFS – Lex Stein 4

  5. Abstracting away processor heterogeneity How can we write and run programs to: ● use heterogeneous processors efficiently? ● without knowing the details of the machine? Desynchronizing write: a programming model File System run: a runtime system (DesyncFS) DesyncFS – Lex Stein 5

  6. Return to the wave equation 1 2 3 4 What if we designed a system that? – Allows the fast to charge ahead – Actively moves data from the fast to the slow – Transparently adjusts partitions to shift work from the slow DesyncFS – Lex Stein 6

  7. Design: data and execution 1. Data model: how is application data structured? 2. Execution model: how is data computed? DesyncFS – Lex Stein 7

  8. Design: DesyncFS data model A block is an application data container of a fixed number ● of bytes. Blocks can have any size, including zero A file is an N-dimensional, block addressable space. N > 3, ● 1 dimension for file ID, 1 for versions, and at least 1 for data – Example: a 5D file containing 3D data: versions data file ID ([0] [0 1000] [0 3] [0 3] [0 3]) – An example block address: ([0] [100] [1] [3] [2]) A chunk is a contiguous n-dimensional rectangular set of ● blocks – An example chunk: ([0] [98 100] [0 1] [0 3] [1 2]) – This chunk has 3 * 2 * 4 * 2 = 48 blocks, 3 versions, and 2 * 4 * 2 = 16 blocks per version DesyncFS – Lex Stein 8

  9. Design: DesyncFS data model (diagram) This file (ID 1) is described by: an example 3D file with file ID == 1 ([1] [0 3] [0 2]) versions Chunk has region: 3 ([1] [0 3] [0]) Chunk has region: ([1] [0 3] [1 2]) 2 Chunk has region: ([1] [2] [0 2]) Y 1 Chunk is a special kind of chunk, a version slice of file 1 at 2 0 Block Y has address: ([1] [1] [2]) 0 1 2 Block Y has version 1 block IDs DesyncFS – Lex Stein 9

  10. Design: data and execution 1. Data model: how is application data structured? 2. Execution model: how is data computed? DesyncFS – Lex Stein 10

  11. Design: DesyncFS execution model ● An application defines a compute function: 1 or more new blocks 0 or more existing blocks compute ● This function is stateless. All state is stored in blocks ● Blocks are immutable ● Computation is achieved by generating new blocks DesyncFS – Lex Stein 11

  12. Design: DesyncFS execution model (high level) ● The file system, not the application, controls execution ● The application provides constraints on the execution order – Dependencies (correctness) – Hints (performance) – example: Y = F (X 0 , X 1 ) DesyncFS control flow traditional control flow App App FS FS do [Y] get [X 0 , X 1 ] get [X 0 , X 1 ] F F Y Y DesyncFS – Lex Stein 12

  13. Design: DesyncFS execution model ● Programs do not specify the exact schedule of block computation, instead they constrain the actual execution schedule by providing dependency information: – File system: I am considering block Y, what do I need to compute it? – Application: You need blocks A, B, and C ● Programs express preference among a correct set of execution schedules by hinting a good execution ordering: – File system: Which of blocks X, Y, Z should I consider first? – Application: Try block Y, then ask me again DesyncFS – Lex Stein 13

  14. Design: DesyncFS execution model (detailed view) DesyncFS traditional approach Application Application File system File system get-prereqs [Y] compute [Y] prereqs [Y] [X 0 , X 1 ] prereqs [Y] check [X 0 , X 1 ] compute [Y] read [X 0 , X 1 ] read [X 0 , X 1 ] X 0 , X 1 X 0 , X 1 Y = F (X 0 , X 1 ) write Y Y = F (X 0 , X 1 ) write Y DesyncFS – Lex Stein 14

  15. Design: three models (summary) 1. Data model: how is application data structured? 2. Execution model: how is control flow structured? computation application callbacks execution DesyncFS system calls data reads and writes DesyncFS – Lex Stein 15

  16. Design: DesyncFS application callbacks // Computation: the means to compute any block void appCompute (const blockaddr *block_address, const chunkdesc *file); // Dependencies: the blocks that must exist to compute a block void appDepList (const blockaddr *block_address, const chunkdesc *file, baddrslist *dep_list, int dir); // Iteration: hints to execute through a chunk void *appIterInit (const chunkdesc *chunk); int appIterNext (void *iter, blockaddr *block_address); void appIterDone (void *iter); DesyncFS – Lex Stein 16

  17. Design: DesyncFS system calls (summary) typedef void *rd_handle; int desyncfsExists (const blockaddr *block_address); rd_handle desyncfsRead (const blockaddr *block_address, const void **datap, int *lenp); void desyncfsWrite (const blockaddr *block_address, void *data, int len); void desyncfsFree (rd_handle dp); DesyncFS – Lex Stein 17

  18. Implementation: high-level architecture map chunk assignments nodes . . . bserv bproc bserv bproc bserv bproc global block sharing space DesyncFS – Lex Stein 18

  19. Design: dynamic adaptation ● Load balancing algorithms have 3 components: – transfer policy: under what conditions should tasks be moved? – placement policy: if a task is to be moved, to where should it move? – information policy: how is load information made available to the placement policy? ● DesyncFS provides the information: block request hits and misses per chunk ● Lazy chunking: map does not send all chunks at the beginning of computation, waits to see how the processors do on some initial chunks ● Lazy chunking is transparent to the application DesyncFS – Lex Stein 19

  20. Evaluation: summary ● Experiments on a small cluster of 400 nodes, using up to 100 nodes ● Compared DesyncFS against OpenMPI ● Jacobi solver and integer sort benchmark: – overhead of 10-15% of throughput on homogeneous processors – dependency-based prefetching gives DesyncFS better performance on heterogeneous processors even when limited by homogeneous chunks – dynamic adaptation can take DesyncFS closer to average throughput (rather than minimum) DesyncFS – Lex Stein 20

  21. Questions? please contact me stein@eecs.harvard.edu DesyncFS – Lex Stein 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend