Enabling the convergence of HPC and Data Analytics in highly - - PowerPoint PPT Presentation

enabling the convergence of hpc and data analytics in
SMART_READER_LITE
LIVE PREVIEW

Enabling the convergence of HPC and Data Analytics in highly - - PowerPoint PPT Presentation

Enabling the convergence of HPC and Data Analytics in highly distributed computing infrastructures Rosa M Badia 1-2 July 2019 Yale: 80 in 2019, Barcelona What was I doing when I first met Yale? Challenges in highly distributed


slide-1
SLIDE 1

Enabling the convergence of HPC and Data Analytics in highly distributed computing infrastructures

Rosa M Badia

1-2 July 2019 Yale: 80 in 2019, Barcelona

slide-2
SLIDE 2

What was I doing when I first met Yale?

slide-3
SLIDE 3

Challenges in highly distributed infrastructures

  • Resources that appear and disappear
  • How to dynamically add/remove nodes to the infrastructure
  • Heterogeneity
  • Different HW characteristics (performance, memory, etc)
  • Different architectures -> compilation issues
  • Network
  • Different types of networks
  • Instability
  • Trust and Security
  • Power constraints from the devices

in the edge

Sensors Instruments Actuators HPC Exascale computing Cloud Edge devices Fog devices A I e v e r y w h e r e

slide-4
SLIDE 4

Data and storage challenge

  • Sensors and instruments as sources of large amounts of

heterogeneous data

  • Control of edge devices and remote access to sensor data
  • Edge devices typically have SDcards, much slower than SSD
  • Compute and store close to the sensors
  • To avoid data transfers
  • For privacy/security aspects
  • New data storage abstractions that enable access from the different

devices

  • Object store versus file system?
  • Data reduction/lossy compression
  • Task flow versus data flow:

data streaming

  • Metadata and traceability
slide-5
SLIDE 5 17 18 d17 19 d17 20 d17 21 d17 22 d18 35 d19 36 d19 23 d20 25 d20 27 d21 29 d21 31 d22 33 d22 d23 24 d24 d25 26 d26 d23 28 d24 d25 30 d26 d23 32 d24 d25 34 d26 d23 d25 37 d24 38 d26 39 d27 d30 40 d31 42 d31 d33 79 d34 d36 44 d37 46 d37 d39 82 d40 d42 48 d43 50 d43 d45 83 d46 52 d48 80 d50 54 d53 55 d53 81 d55 d56 41 d57 d58 43 d59 d56 45 d57 d58 47 d59 d56 49 d57 d58 51 d59 d56 d58 d56 d58 56 d57 57 d59 58 d60 d63 d66 84 d67 d69 63 d70 65 d70 d72 86 d73 88 d73 d75 67 d76 69 d76 d78 90 d79 92 d79 94 73 d85 74 d85 98 d87 100 d87 75 d90 76 d90 101 d92 102 d92 62 d96 d93 64 d94 d95 66 d96 d93 68 d94 d95 70 d96 d93 d95 d93 d95 d96 103 d104 d106 d109 105 d110 107 d110 d112 d115 109 d116 111 d116 115 117 d124 119 d124 121 d128 123 d128 125 d134 d134 d134 d134 d135 93 d136 95 d136 97 d137 99 d137 85 d138 87 d138 89 d139 91 d139 d140 d141 d142 d143 d140 d141 d142 d143 d140 d141 d142 d143 d140 d141 d142 d143 d141 d143 d144 d147 104 d148 106 d148 d150 126 d151 d153 108 d154 110 d154 d156 130 d157 d159 114 d160 127 d165 116 d166 118 d166 d168 128 d169 120 d171 122 d171 129 d173 d174 d175 d176 d177 d174 d175 d176 d177 d174 d1 d176 d177 d174 d175 d176 d177 d174 d175 d176 d177 d177 d181 d184 131 d185 d187 d190 133 d191 135 d191 d196 137 d197 139 d197 d199 d202 141 d203 143 d203 d205 d208 145 d209 147 d209 148 d213 149 d213 d214 d214 d214 d214 d215 136 d216 138 d216 140 d217 142 d217 144 d218 146 d218 132 d219 134 d219 d220 d221 d222 d223 d220 d221 d222 d223 d220 d221 d222 d223 d220 d221 d222 d223 d221 d223 d227 d230 150 d231 d233 d236 151 d237 d239 d242 152 d243 d245 d248 153 d249 154 d253 d254 d254 d254 d254 createBlockTask qrTask transposeBlockTask

Orchestration challenges

  • How to describe the workflows in such environment? Which is the

right interface?

  • Focus:
  • Integration of computational workloads, with machine learning

and data analytics

  • Intelligent runtime that can make

scheduling and allocation, data-transfer, and other decisions

slide-6
SLIDE 6

Programming with PyCOMPSs/COMPSs

  • Sequential programming, parallel execution
  • General purpose programming language + annotations/hints
  • To identify tasks and directionality of data
  • Task based: task is the unit of work
  • Builds a task graph at runtime that

express potential concurrency

  • Exploitation of parallelism
  • … and of parallelism created later on
  • Simple linear address space
  • Agnostic of computing

platform

  • Runtime takes all scheduling

and data transfer decisions

@task(c=INOUT) def multiply(a, b, c): c += a*b initialize_variables() startMulTime = time.time() for i in range(MSIZE): for j in range(MSIZE): for k in range(MSIZE): multiply (A[i][k], B[k][j], C[i][j]) compss_barrier() mulTime = time.time() - startMulTime

slide-7
SLIDE 7

Other decorators: Tasks’ constraints

  • Constraints enable to define HW or SW features required to execute a task
  • Runtime performs the match-making between the task and the computing nodes
  • Support for multi-core tasks and for tasks with memory constraints
  • Support for heterogeneity on the devices in the platform

@constraint (MemorySize=1.0, ProcessorType =”ARM”, ) @task (c=INOUT) def myfunc_in_the_edge (a, b, c): ... @constraint (MemorySize=6.0, ProcessorPerformance=“5000”) @task (c=INOUT) def myfunc(a, b, c): ...

slide-8
SLIDE 8

Other decorators: Tasks’ constraints and versions

  • Constraints enable to define HW or SW features required to execute a task
  • Runtime performs the match-making between the task and the computing nodes
  • Support for multi-core tasks and for tasks with memory constraints
  • Support for heterogeneity on the devices in the platform
  • Versions: Mechanism to support multiple implementations of a given behavior

(polymorphism)

  • Runtime selects to execute the task in the most appropriate device in the platform

@implement (source class=”myclass”, method=”myfunc”) @constraint (MemorySize=1.0, ProcessorType =”ARM”) @task (c=INOUT) def myfunc_in_the_edge (a, b, c): ... @constraint (MemorySize=6.0, ProcessorPerformance=“5000”) @task (c=INOUT) def myfunc(a, b, c): ...

slide-9
SLIDE 9

Other decorators: linking with other programming models

  • A task can be more than a sequential function
  • A task in PyCOMPSs can be sequential, multicore or multi-node
  • External binary invocation: wrapper function generated automatically
  • Supports for alternative programming models: MPI and OmpSs
  • Additional decorators:
  • @binary(binary=“app.bin”)
  • @ompss(binary=“ompssApp.bin”)
  • @mpi(binary=“mpiApp.bin”, runner=“mpirun”, computingNodes=8)
  • Can be combined with the @constraint and @implement decorators

@constraint (computingUnits= "248") @mpi (runner="mpirun", computingNodes= ”16”, ...) @task (returns=int, stdOutFile=FILE_OUT_STDOUT, ...) def nems(stdOutFile, stdErrFile): pass

9

slide-10
SLIDE 10

Failure management

  • Default behaviour till now:
  • On task failure, retry the execution a number of times
  • If failure persists, close the application safely
  • New interface than enables the programmer to give hints about failure

management

  • Options: RETRY, CANCEL_SUCCESSORS, FAIL, IGNORE
  • Implications on file management:
  • I.e, on IGNORE, output files: are generated empty
  • Offers the possibility of task speculation on the execution of applications
  • Possibility of ignoring part of the execution of the workflow, for example if a task

fails in an unstable device

@task(file_path=FILE_INOUT, on_failure='CANCEL_SUCCESSORS') def task(file_path): ... if cond : raise Exception()

slide-11
SLIDE 11

Integration with persistent memory

  • Programmer may decide to make persistent specific objects in its

code

  • Persistent objects are managed same way as regular objects
  • Tasks can operate with them
  • Objects can be accessed/shared

transparently in a distributed computing platform

a = SampleClass () a.make_persistent() Print a.func (3, 4) a.mytask() compss_barrier()

  • = a.another_object
slide-12
SLIDE 12

Support for elasticity

  • Possibility to adapt the computing

infrastructure depending on the actual workload

  • Now also for SLURM managed

systems

  • Feature that contributes to a more

effective use of resources

  • Is very relevant in the edge, where

power is a constraint

Expanded SLURM Job X Initial SLURM Job X

Master Node

Main App

m p Ss A p p

Compute Node C COMPSs Worker

SLURM Manager

COMPSs Runtime

m p Ss A p p

Compute Node B COMPSs Worker

m p Ss A p p

Compute Node A

COMPSs Worker

Task Task

SLURM Connector

Request for a new node SLURM Job Y

Compute Node N

COMPSs Worker

Task Task

… … … …

Update original job SLURM creates the new job

slide-13
SLIDE 13

Support for interactivity

  • Jupyter notebooks:

Easy to use interface for interactivity

  • Where to map every

component?

  • Everything local
  • Prototyping and demos
  • Running notebook and COMPSs runtime locally
  • Some tasks can be executed locally
  • Some tasks can run remotely
  • Data acquisition in edge devices
  • Remote execution of compute intensive tasks in large clusters
  • Run browser in laptop and the notebook server and COMPSs runtime in a remote

server

  • Enables the interactive execution of large computational workflows
  • Issue with large HPC systems if login node does not offer remote connection
  • Smoother integration if JupyterHub available
slide-14
SLIDE 14

Integration with Machine Learning

  • Thanks to the Python interface, the integration

with ML packages is smooth:

  • Tensorflow, PyTorch, ...
  • Tiramisu: transfer learning framework

Tensorflow + PyCOMPSs + dataClay

  • dislib: Collection of machine learning algorithms developed on top of

PyCOMPSs

  • Unified interface, inspired in scikit-learn (fit-predict)
  • Unified data acquisition methods and using

an independent distributed data representation

  • Parallelism transparent to the user –

PyCOMPSs parallelism hidden

  • Open source, available to the community

dislib.bsc.es

slide-15
SLIDE 15

COMPSs in a fog-to-cloud architecture

  • Decentralized approach to deal with large amounts of data
  • New COMPSs runtime handles distribution, parallelism and heterogeneity
  • Runtime deployed as a microservice in an agent:
  • Agents are independent, can act as master or worker in an application

execution, agents interact between them

  • Hierarchical structure
  • Data managed by dataClay, in a federated mode
  • Support for data recovery when fog nodes disappear
  • Fog-to-fog and Fog-to-cloud
  • Developed in mF2C, used in CLASS

and ELASTIC

slide-16
SLIDE 16

Going beyond: what is missing

  • Programming interfaces:
  • Explore graphical or higher-level interfaces to describe the workflows
  • How to better integrate the compute and data flows
  • Integrate metadata, enable data traceability
  • Streaming
  • Better support for interactivity, data-steering
  • Add more intelligence to the runtime
  • Support for mapping sensors

and actuators

  • Not only performance aspects,

resilience and energy efficiency

  • Use of machine learning

Sensors Instruments Actuators HPC Exascale computing AI Edge devices

slide-17
SLIDE 17

Further Information

  • Project page: http://www.bsc.es/compss
  • Documentation
  • Virtual Appliance for testing & sample applications
  • Tutorials
  • Source Code

https://github.com/bsc-wdc/compss

  • Docker Image

https://hub.docker.com/r/compss/compss-ubuntu16/

  • Applications

https://github.com/bsc-wdc/apps https://github.com/bsc-wdc/dislib

17

slide-18
SLIDE 18

18

Projects where COMPSs is involved

ELASTIC

slide-19
SLIDE 19

www.bsc.es

Thanks!

slide-20
SLIDE 20

Challenges we are facing

  • Complex infrastructures
  • Large number of nodes
  • Nodes that appear and disappear
  • Heterogeneous
  • Other relevant aspects: security and trust, power, ...
  • Large amount of heterogeneous data from multiple
  • sources. New storage technologies with different

capabilities

  • Need to orchestrate complex applications

in such complex environment

Sensors Instruments Actuators HPC Exascale computing Cloud Edge devices Fog devices

slide-21
SLIDE 21

mF2c - Smart Fog Hub System

  • Indoor navigation and recommender

solution at the Cagliari airport

Flight AZ626 now

boarding, gate B32 Sardinian handcraft

2 3 4 1

App install, topics setting

Layer 0, cloud: OpenStack Layer 1, fog aggregatot: Nuvla Box Layer 2, fog: Laptop Layer 3, IOT layer: Raspberry Pi, smartphones Grant Agreement No 730929

slide-22
SLIDE 22

Other use cases

  • Intelligent traffic management
  • Advanced driving assistance

systems

  • Next Generation Autonomous

Positioning (NGAP)

  • Advanced Driving Assistant

System (ADAS) (obstacle detection)

  • Predictive maintenance

V2 V2V V2 V2I Cl Cloud Co Computing V2 V2C Ca Car Co Computing Re Resources Ci City y Co Computing Re Resources Co Compute Co Continuum Edge Com

  • mputing
slide-23
SLIDE 23

Why Python?

Python is powerful... and fast; plays well with others; runs everywhere; is friendly & easy to learn; is Open.*

  • Emphasizes code readability, its syntax

allows programmers to express concepts in fewer lines of code

  • Large community using it, including

scientific and numeric

  • Large number of software modules available
  • Very well integrated with data analytics and

machine learning (Tensorflow, PyTorch, dask, scikit-learn, ...)

  • Intersection with HPC and data analytics

programming languages

24 * From python.org F

  • r

t r a n C/C++ Python HPC HPDA R Scala SQL Java J u l i a