Can a file system virtualize processors? Lex Stein, Microsoft - - PowerPoint PPT Presentation

can a file system virtualize
SMART_READER_LITE
LIVE PREVIEW

Can a file system virtualize processors? Lex Stein, Microsoft - - PowerPoint PPT Presentation

Can a file system virtualize processors? Lex Stein, Microsoft Research Asia David Holland, Harvard University Margo Seltzer, Harvard University Zheng Zhang, Microsoft Research Asia A mystery: what is happening to Jos's program? throughput


slide-1
SLIDE 1

Lex Stein, Microsoft Research Asia David Holland, Harvard University Margo Seltzer, Harvard University Zheng Zhang, Microsoft Research Asia

Can a file system virtualize processors?

slide-2
SLIDE 2

2

DesyncFS – Lex Stein

A mystery: what is happening to José's program?

100% month 16 32 32 75% 70% 42% 22

utilization

6 10 12 ideal realized throughput

slide-3
SLIDE 3

3

DesyncFS – Lex Stein

Let's look at the program

  • An iterative solution to the 1D wave equation:
  • The slow processors are holding the fast processors

back

A(i,t+1) = (2.0 * A(i,t)) - A(i,t-1) + (c * (A(i-1,t) - (2.0 * A(i,t)) + A(i+1,t)))

1 2 3 4

slide-4
SLIDE 4

4

DesyncFS – Lex Stein

The problem: ungraceful degradation

processor heterogeneity + synchronization = performance cliff

20 40 60 80 100

1 2

slow processors (%) throughput synchronous

slide-5
SLIDE 5

5

DesyncFS – Lex Stein

Abstracting away processor heterogeneity

How can we write and run programs to:

  • use heterogeneous processors efficiently?
  • without knowing the details of the machine?

write: a programming model run: a runtime system

Desynchronizing File System (DesyncFS)

slide-6
SLIDE 6

6

DesyncFS – Lex Stein

Return to the wave equation

What if we designed a system that?

– Allows the fast to charge ahead – Actively moves data from the fast to the

slow

– Transparently adjusts partitions to shift

work from the slow

1 2 3 4

slide-7
SLIDE 7

7

DesyncFS – Lex Stein

Design: data and execution

  • 1. Data model: how is application data structured?
  • 2. Execution model: how is data computed?
slide-8
SLIDE 8

8

DesyncFS – Lex Stein

  • A block is an application data container of a fixed number
  • f bytes. Blocks can have any size, including zero
  • A file is an N-dimensional, block addressable space. N > 3,

1 dimension for file ID, 1 for versions, and at least 1 for data

– Example: a 5D file containing 3D data: – An example block address:

  • A chunk is a contiguous n-dimensional rectangular set of

blocks

– An example chunk: – This chunk has 3 * 2 * 4 * 2 = 48 blocks, 3 versions, and

2 * 4 * 2 = 16 blocks per version

Design: DesyncFS data model

([0] [0 1000] [0 3] [0 3] [0 3])

versions data file ID

([0] [100] [1] [3] [2])

([0] [98 100] [0 1] [0 3] [1 2])

slide-9
SLIDE 9

9

DesyncFS – Lex Stein

Chunk has region: Chunk has region: Chunk has region:

Design: DesyncFS data model (diagram)

1 3 2

Y

1 2

([1] [0 3] [0 2]) This file (ID 1) is described by: Block Y has address: ([1] [0 3] [0]) ([1] [0 3] [1 2]) ([1] [1] [2]) ([1] [2] [0 2])

block IDs versions an example 3D file with file ID == 1

Block Y has version 1 Chunk is a special kind of chunk, a version slice of file 1 at 2

slide-10
SLIDE 10

10

DesyncFS – Lex Stein

  • 1. Data model: how is application data structured?
  • 2. Execution model: how is data computed?

Design: data and execution

slide-11
SLIDE 11

11

DesyncFS – Lex Stein

Design: DesyncFS execution model

  • An application defines a compute function:
  • This function is stateless. All state is stored in blocks
  • Blocks are immutable
  • Computation is achieved by generating new blocks

compute 1 or more new blocks 0 or more existing blocks

slide-12
SLIDE 12

12

DesyncFS – Lex Stein

Design: DesyncFS execution model (high level)

  • The file system, not the application, controls

execution

  • The application provides constraints on the execution
  • rder

– Dependencies (correctness) – Hints (performance)

example: Y = F (X0, X1) App FS F F

get [X0, X1] Y

FS App

do [Y] get [X0, X1] Y

traditional control flow DesyncFS control flow

slide-13
SLIDE 13

13

DesyncFS – Lex Stein

Design: DesyncFS execution model

  • Programs do not specify the exact schedule of block

computation, instead they constrain the actual execution schedule by providing dependency information:

– File system: I am considering block Y, what do I need to

compute it?

– Application: You need blocks A, B, and C

  • Programs express preference among a correct set of

execution schedules by hinting a good execution

  • rdering:

– File system: Which of blocks X, Y, Z should I consider first? – Application: Try block Y, then ask me again

slide-14
SLIDE 14

14

DesyncFS – Lex Stein

Design: DesyncFS execution model (detailed view)

traditional approach DesyncFS

File system Application Application File system

compute [Y] prereqs [Y]

prereqs [Y]

get-prereqs [Y]

check [X0, X1] read [X0, X1]

X0, X1

compute [Y] read [X0, X1]

X0, X1

Y = F (X0, X1)

write Y write Y [X0, X1]

Y = F (X0, X1)

slide-15
SLIDE 15

15

DesyncFS – Lex Stein

Design: three models (summary)

  • 1. Data model: how is application data structured?
  • 2. Execution model: how is control flow structured?

computation execution application callbacks data reads and writes DesyncFS system calls

slide-16
SLIDE 16

16

DesyncFS – Lex Stein

Design: DesyncFS application callbacks

// Computation: the means to compute any block void appCompute (const blockaddr *block_address, const chunkdesc *file); // Dependencies: the blocks that must exist to compute a block void appDepList (const blockaddr *block_address, const chunkdesc *file, baddrslist *dep_list, int dir); // Iteration: hints to execute through a chunk void *appIterInit (const chunkdesc *chunk); int appIterNext (void *iter, blockaddr *block_address); void appIterDone (void *iter);

slide-17
SLIDE 17

17

DesyncFS – Lex Stein

Design: DesyncFS system calls (summary)

typedef void *rd_handle; int desyncfsExists (const blockaddr *block_address); rd_handle desyncfsRead (const blockaddr *block_address, const void **datap, int *lenp); void desyncfsWrite (const blockaddr *block_address, void *data, int len); void desyncfsFree (rd_handle dp);

slide-18
SLIDE 18

18

DesyncFS – Lex Stein

Implementation: high-level architecture

map bserv bproc bserv bproc bserv bproc . . .

chunk assignments global block sharing space nodes

slide-19
SLIDE 19

19

DesyncFS – Lex Stein

Design: dynamic adaptation

  • Load balancing algorithms have 3 components:

– transfer policy: under what conditions should tasks be

moved?

– placement policy: if a task is to be moved, to where

should it move?

– information policy: how is load information made

available to the placement policy?

  • DesyncFS provides the information: block request hits

and misses per chunk

  • Lazy chunking: map does not send all chunks at the

beginning of computation, waits to see how the processors do on some initial chunks

  • Lazy chunking is transparent to the application
slide-20
SLIDE 20

20

DesyncFS – Lex Stein

Evaluation: summary

  • Experiments on a small cluster of 400 nodes, using up

to 100 nodes

  • Compared DesyncFS against OpenMPI
  • Jacobi solver and integer sort benchmark:

– overhead of 10-15% of throughput on homogeneous

processors

– dependency-based prefetching gives DesyncFS better

performance on heterogeneous processors even when limited by homogeneous chunks

– dynamic adaptation can take DesyncFS closer to

average throughput (rather than minimum)

slide-21
SLIDE 21

21

DesyncFS – Lex Stein

Questions?

please contact me stein@eecs.harvard.edu