Enabling the convergence of HPC and Data Analytics in highly distributed computing infrastructures
Rosa M Badia
1-2 July 2019 Yale: 80 in 2019, Barcelona
Enabling the convergence of HPC and Data Analytics in highly - - PowerPoint PPT Presentation
Enabling the convergence of HPC and Data Analytics in highly distributed computing infrastructures Rosa M Badia 1-2 July 2019 Yale: 80 in 2019, Barcelona What was I doing when I first met Yale? Challenges in highly distributed
1-2 July 2019 Yale: 80 in 2019, Barcelona
Sensors Instruments Actuators HPC Exascale computing Cloud Edge devices Fog devices A I e v e r y w h e r e
and data transfer decisions
@task(c=INOUT) def multiply(a, b, c): c += a*b initialize_variables() startMulTime = time.time() for i in range(MSIZE): for j in range(MSIZE): for k in range(MSIZE): multiply (A[i][k], B[k][j], C[i][j]) compss_barrier() mulTime = time.time() - startMulTime
@constraint (MemorySize=1.0, ProcessorType =”ARM”, ) @task (c=INOUT) def myfunc_in_the_edge (a, b, c): ... @constraint (MemorySize=6.0, ProcessorPerformance=“5000”) @task (c=INOUT) def myfunc(a, b, c): ...
(polymorphism)
@implement (source class=”myclass”, method=”myfunc”) @constraint (MemorySize=1.0, ProcessorType =”ARM”) @task (c=INOUT) def myfunc_in_the_edge (a, b, c): ... @constraint (MemorySize=6.0, ProcessorPerformance=“5000”) @task (c=INOUT) def myfunc(a, b, c): ...
@constraint (computingUnits= "248") @mpi (runner="mpirun", computingNodes= ”16”, ...) @task (returns=int, stdOutFile=FILE_OUT_STDOUT, ...) def nems(stdOutFile, stdErrFile): pass
9
management
fails in an unstable device
@task(file_path=FILE_INOUT, on_failure='CANCEL_SUCCESSORS') def task(file_path): ... if cond : raise Exception()
a = SampleClass () a.make_persistent() Print a.func (3, 4) a.mytask() compss_barrier()
infrastructure depending on the actual workload
systems
effective use of resources
power is a constraint
Expanded SLURM Job X Initial SLURM Job X
Master Node
Main App
m p Ss A p p
Compute Node C COMPSs Worker
SLURM Manager
COMPSs Runtime
m p Ss A p p
Compute Node B COMPSs Worker
m p Ss A p p
Compute Node A
COMPSs Worker
Task Task
SLURM Connector
Request for a new node SLURM Job Y
Compute Node N
COMPSs Worker
Task Task
… … … …
Update original job SLURM creates the new job
Easy to use interface for interactivity
component?
server
Tensorflow + PyCOMPSs + dataClay
an independent distributed data representation
PyCOMPSs parallelism hidden
dislib.bsc.es
execution, agents interact between them
and ELASTIC
and actuators
resilience and energy efficiency
Sensors Instruments Actuators HPC Exascale computing AI Edge devices
17
18
www.bsc.es
Sensors Instruments Actuators HPC Exascale computing Cloud Edge devices Fog devices
Flight AZ626 now
boarding, gate B32 Sardinian handcraft
2 3 4 1
App install, topics setting
Layer 0, cloud: OpenStack Layer 1, fog aggregatot: Nuvla Box Layer 2, fog: Laptop Layer 3, IOT layer: Raspberry Pi, smartphones Grant Agreement No 730929
V2 V2V V2 V2I Cl Cloud Co Computing V2 V2C Ca Car Co Computing Re Resources Ci City y Co Computing Re Resources Co Compute Co Continuum Edge Com
allows programmers to express concepts in fewer lines of code
scientific and numeric
machine learning (Tensorflow, PyTorch, dask, scikit-learn, ...)
programming languages
24 * From python.org F
t r a n C/C++ Python HPC HPDA R Scala SQL Java J u l i a