Storage as a First Class Citizen in HPC Environments. James S. - - PowerPoint PPT Presentation

storage as a first class citizen in hpc environments
SMART_READER_LITE
LIVE PREVIEW

Storage as a First Class Citizen in HPC Environments. James S. - - PowerPoint PPT Presentation

Storage as a First Class Citizen in HPC Environments. James S. Plank University of Tennessee CCGSC September 9, 2010 A Personal Historical Perspective Me Erasure codes Y'all HPC A Personal Historical Perspective Jim - 1987 A


slide-1
SLIDE 1

Storage as a First Class Citizen in HPC Environments. James S. Plank University of Tennessee CCGSC September 9, 2010

slide-2
SLIDE 2

A Personal Historical Perspective

Me – Erasure codes Y'all – HPC

slide-3
SLIDE 3

Jim - 1987

A Personal Historical Perspective

slide-4
SLIDE 4

Jim - 1987

A Personal Historical Perspective

Gel er n t er

LINDA: Parallel computing with a “tuple space.” Tuple Space Processing tuples Data tuples

slide-5
SLIDE 5

Jim - 1987

A Personal Historical Perspective

Gel er n t er

LINDA: Parallel computing with a “tuple space.”

  • “Linda processes aspire to know as little

about each other as possible.

  • They never interact

directly with each other;

  • they only deal with tuple space.”
slide-6
SLIDE 6

Jim - 1988

A Personal Historical Perspective

slide-7
SLIDE 7

A Personal Historical Perspective

N augh t o n

SSLS: Shared Single Level Store Gigantic shared, persistent address space

Jim - 1988

slide-8
SLIDE 8

A Personal Historical Perspective

N augh t o n

SSLS: Shared Single Level Store Gigantic shared, persistent address space

Jim - 1988

slide-9
SLIDE 9

Jim - 1989

A Personal Historical Perspective

Li

SVM: Shared Virtual Memory Gigantic shared, persistent address space

Really big

slide-10
SLIDE 10

Jim - 1989

A Personal Historical Perspective

Li

SVM: Shared Virtual Memory Gigantic shared, persistent address space

Really big

slide-11
SLIDE 11

Jim - 1990

A Personal Historical Perspective

Gr an d Fr o mage

HeNCE: Heterogeneous Network Computing Environment.

Functional Dataflow DAG Processing System

slide-12
SLIDE 12

Jim - 1991-98

A Personal Historical Perspective

slide-13
SLIDE 13

Jim - 1991-98

A Personal Historical Perspective

  • Mr. Checkpointing:
slide-14
SLIDE 14

Jim - 1991-98

A Personal Historical Perspective What did I learn:

There are two major difficulties with checkpointing:

  • 1. Fighting the OS / Getting it to work.
  • 2. Mitigating the overhead of getting

all those bytes to disk. Everything else (synchronization, consistency, Lamport time, etc, etc) is in the noise.

slide-15
SLIDE 15

1 2 3

Jim - 1991-98

A Personal Historical Perspective

Getting it to work. Mitigating the

  • verhead of

getting all those bytes to disk. Synchronization, consistency, Lamport time, etc, etc.

Where's the research?

slide-16
SLIDE 16

Jim - 1991-98

A Personal Historical Perspective

C o s t o f R e lia b ility (a t 1

0 % 2 0 % 4 0 % 6 0 % 8 0 % 1 0 0 % 0 .3 0 .6 1 .2 2 .4 4 .8 9 .6 1 9 .2 M T B F (in d Overhead H o u rly E v e ry 2 h o u r s E v e ry 6 h o u rs D a il

Checkpoint Interval

[Elnozahy/Plank 2004]

O v e r h e a d

slide-17
SLIDE 17

Jim – 1999

A Personal Historical Perspective

slide-18
SLIDE 18

Jim – 1999

A Personal Historical Perspective G-Commerce: Brief Foray into Grid Computing

slide-19
SLIDE 19

Jim – 1999 - 2005

A Personal Historical Perspective IBP: Internet Backplane Protocol (Logistical Networking)

w/ Micah Beck

Client malloc()

slide-20
SLIDE 20

Jim – 1999 - 2005

A Personal Historical Perspective

Client malloc()

  • Best effort
  • Time limited
  • Location specific
  • Which supported third-party transfers.

Client

IBP: Internet Backplane Protocol (Logistical Networking)

w/ Micah Beck

slide-21
SLIDE 21

Jim – 1999 - 2005

A Personal Historical Perspective IBP gave data a place to “live”

  • n the network,

perhaps moving from site to site.

eXnode

slide-22
SLIDE 22

Jim – 2005 - ???

A Personal Historical Perspective Into the land of erasure coding. I won't bore you with it.

slide-23
SLIDE 23

Jim – 2010

A Personal Historical Perspective But there's more...

slide-24
SLIDE 24

Jim – 2010

A Personal Historical Perspective 2010 Meeting on Staging for HPC

The Big Iron The Disks

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

“Staging”

slide-25
SLIDE 25

Jim – 2010

A Personal Historical Perspective

The Big Iron The Disks

Caching Checkpointing Alternative Representations Code Coupling Post Processing

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Pre Processing

Oh my

slide-26
SLIDE 26

What do we make of all this?

slide-27
SLIDE 27

What do we make of all this?

  • 1. Checkpointing Sucks.
  • Slow
  • Inelegant
  • Swamps disks and networks to store gigantic

files that are almost never read.

  • Enables you to perform “bad fault-tolerance.”
  • Is a manifestation that something is wrong.
slide-28
SLIDE 28

What do we make of all this?

  • 2. Band-Aids Are Only Temporary Solutions
  • Non-reusable
  • Cover the wounds but don't address the root cause
  • Are a manifestation that something is wrong.
slide-29
SLIDE 29

What do we make of all this?

  • 3. Saving State Sure is Attractive
  • Lets you reason about programs
  • (In theory) lets balance load
  • Allows fault tolerance to fall out naturally
  • However, it's really difficult to do.
  • This is why the MPI model throws it in the trash can.
slide-30
SLIDE 30

What do we make of all this?

  • 4. I Still Think IBP is Pretty Cool &

That There Are Lessons To Be Learned From It

  • Why do we constrain our view of storage as either

the file or the memory segment?

  • Why is storage either permanent or limited by

program lifetime?

  • Why do we jettison best-effort storage resources?
  • Why don't we manage the location of storage?
slide-31
SLIDE 31

What do we make of all this? Why are storage and processing not equal first-class citizens in HPC?

slide-32
SLIDE 32

When I Close My Eyes and Dream ......

The Big Iron looks like this. And these guys: are promoted to first class citizens.

And program state is represented explicitly in here!

slide-33
SLIDE 33

When I Close My Eyes and Dream ......

And these guys compose seamlessly. Over extremely wide areas ....

And program state is represented explicitly in here!

slide-34
SLIDE 34

When I Close My Eyes and Dream ......

And the Eagles win the Super Bowl... Every Year... And I retire to that mansion in Capri...

slide-35
SLIDE 35

Me – Erasure codes Y'all – HPC

And then I wake up and go back to studying erasure codes.

slide-36
SLIDE 36

Storage as a First Class Citizen in HPC Environments. James S. Plank University of Tennessee CCGSC September 9, 2010