A Foundation for Automated Placement of Data Douglass Otstott, Sean - - PowerPoint PPT Presentation

a foundation for automated placement of data
SMART_READER_LITE
LIVE PREVIEW

A Foundation for Automated Placement of Data Douglass Otstott, Sean - - PowerPoint PPT Presentation

A Foundation for Automated Placement of Data Douglass Otstott, Sean Williams, Latchesar Ionkov, Michael Lang, Ming Zhao LA-UR-17-22686 Managed by Triad National Security, LLC for the U.S. Department of Energys NNSA Memory and Storage are


slide-1
SLIDE 1

Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

A Foundation for Automated Placement

  • f Data

Douglass Otstott, Sean Williams, Latchesar Ionkov, Michael Lang, Ming Zhao

LA-UR-17-22686

slide-2
SLIDE 2

Los Alamos National Laboratory

Memory and Storage are Converging

10/22/2019

  • Persistent storage on the memory bus (NVDIMMs)
  • Remote memory (GenZ)
  • Which memory bus? (DRAM, HBM, GPU memory, …)

2

slide-3
SLIDE 3

Data Layouts are Different

3

pressure=5.1 temp=33.1 density=0.4 N

... ... ... ... ...

M

Dataset

N

... ... ... ... ...

M

pressure

N

... ... ... ... ...

M

temp

N

... ... ... ... ...

M

density

Memory Storage

...

row 1

...

row 2

...

row 3

...

row M

temperature

...

...

row 1

...

row 2

...

row 3

...

row M

pressure=5.1 density=0.4

data

slide-4
SLIDE 4

Data Sharing

  • With less distinction — more confusion
  • With more complicated workloads there are a lot of options
  • In situ, in transit, …
  • No generic way for sharing data in memory between applications
  • ad-hoc
  • in-memory file system
  • What data format?
  • data producer
  • data consumer

4

slide-5
SLIDE 5

Need for Data Management Service

  • Handles all data that application shares
  • Moves data between the many memory and storage layers
  • Allows data layout transformations
  • This work
  • describes the foundations for building such service
  • allows data movement and transformation
  • doesn’t include the support for global data optimizations

5

slide-6
SLIDE 6

Components

  • Name server
  • handles metadata
  • global
  • Runtime
  • runs on every node
  • handles local data
  • talks to runtimes on other nodes
  • Global/Local placement services (not

included)

  • optimize data locality and format
  • Application (not included)

6

slide-7
SLIDE 7

Data Model

  • Dataset
  • types
  • primitive types (integer, floating point, string)
  • structs
  • (multidimensional) arrays
  • variables
  • Fragments
  • subsets of a dataset
  • types - based on dataset types
  • variables - based on dataset variables
  • Versions
  • provide consistent view of distributed dataset

7

slide-8
SLIDE 8

Declarative Data Language & Transformations

  • For the computers: transformation rules

that convert data between dataset and subsets

8

default viz

T 0004

pba

T 0000

pa

S 0000 S 0008 S 0004 T 0000

p

S 0000 S 0004 S 0008 field a field b field a field a fi e l d b field c dest dest dest dest dest dest dest

  • For the user: define the abstract

dataset and subsets

fragment dataset { var p struct { a, b, c float64 } } fragment default { var p = p } fragment viz { var pa { a } = p var pba { b, a } = p }

slide-9
SLIDE 9
  • API
  • create object
  • name
  • dataset description
  • attach fragment
  • dataset name
  • fragment description
  • version
  • publish fragment
  • data pointer
  • version

9

  • Operations
  • object registered in the name

server

  • runtime
  • finds the locations of necessary

fragments that contain the relevant data and version

  • brings the data and transforms it

to the required format

  • runtime
  • registers the fragment version in

the name server

  • keeps copy of the data in

memory or local storage

slide-10
SLIDE 10

10

F11 F12 F13 F21 F31

A

F22

  • Can be used for

communication between ranks

  • Fragment can have read-
  • nly and read-write parts of

complex geometry

slide-11
SLIDE 11

Results

  • Synthetic benchmark
  • Evaluates the overhead
  • f the operations
  • Single name server
  • 16 ranks per node

11

20000 40000 60000 80000 100000 120000 140000 16 32 64 128 256 512 1024 2048 Operations/sec Ranks create_object attach publish

slide-12
SLIDE 12

Results: SNAP checkpoint

  • Original SNAP (no

checkpoints) vs. adding the checkpoint code

  • Evaluate the overhead

12

20 40 60 80 100 16 32 64 128 256 512 1024 2048 Time(s) Ranks RT/NS SNAP SNAP

slide-13
SLIDE 13

13

500 1000 1500 2000 2500 3000 3500 16 32 64 128 256 512 1024 2048 Time(s) Ranks RT/NS SNAP MPI-IO SNAP

slide-14
SLIDE 14

14

20 40 60 80 100 120 4 8 16 32 64 128 256 512 1024 2048 Time(s) Ranks N to N restart N to N over 2 restart N to N over 4 restart

slide-15
SLIDE 15

Results: VPIC

15

10 20 30 40 50 60 70 80 90 16 32 64 128 256 512 1024 Percent Overhead Ranks VPIC I/O RT/NS I/O RT/NS No I/O

slide-16
SLIDE 16

Conclusions

  • Scalable data service
  • Easy to use API
  • Future
  • Integration with data placement services
  • Additional applications (E3SM)
  • Scalable name server

16