A Scalable Tools Communication Infrastructure Darius Buntinas, - - PowerPoint PPT Presentation

a scalable tools communication infrastructure
SMART_READER_LITE
LIVE PREVIEW

A Scalable Tools Communication Infrastructure Darius Buntinas, - - PowerPoint PPT Presentation

A Scalable Tools Communication Infrastructure Darius Buntinas, George Bosilca, Richard L. Graham, Geoffroy Valle and Gregory R. Watson Motivation Not many tools exist for HPC application developers Standalone Domain-,


slide-1
SLIDE 1

A Scalable Tools Communication Infrastructure

Darius Buntinas, George Bosilca, Richard L. Graham, Geoffroy Vallée and Gregory R. Watson

slide-2
SLIDE 2

2

Motivation

 Not many tools exist for HPC application developers – Standalone – Domain-, application-, problem- and/or site-specific – Not scalable – Not interoperable with other tools  Tool infrastructure is reinvented each time – Process launch – Process management – Communication  Upcoming ultrascale systems have greater demands – Scalability – Robustness  Common, portable infrastructure services will be essential to enable – More extensive tool capabilities – New types of analysis tools

slide-3
SLIDE 3

3

Scalable Tool Communications Infrastructure (STCI)

 STCI collaboration was formed to address tool infrastructure needs at the ultrascale – System architecture independent API – Implementation design guided by ultrascale and multi-tool requirements  STCI capabilities – Multicast/reduction-style network " Scalable communication between tool UI and data sources/sinks – Aggregate and point-to-point communication – Scalable system resource management – Tool lifecycle management  Tool use cases – Interactive tool – Instrumented code

slide-4
SLIDE 4

4

Use Cases: Interactive Tool

Front End Compute Resource

slide-5
SLIDE 5

5

Use Cases: Interactive Tool

Compute Resource Front End

slide-6
SLIDE 6

6

Use Cases: Instrumented Code

Compute Resource Front End

slide-7
SLIDE 7

7

Use Cases: Instrumented Code

Compute Resource Front End

slide-8
SLIDE 8

8

STCI Tool Model

 Monolithic tools are no longer feasible – Scalable tools comprise cooperating parts  Tool model – Tool front-end " Typically interacts with the user, e.g., GUI – Tool agent(s) " Interact with application processes, e.g., debugger, profiler – Tool junction(s) " Aggregate, filter, modify, transform data sent between FE and agents  Tool developer will implement these parts  STCI will manage interaction between them

slide-9
SLIDE 9

9

Architecture: Operation

Streams User supplied component A Agent PI Plug-in Physical node IN Infrastructure node CNCompute node SCTI component

IN IN CN CN CN IN IN

J J J J J J J

lib lib lib lib lib

Laptop

Front end STCI lib

A A A A A

App App App

slide-10
SLIDE 10

10

Tool Front End Operating System

Architecture: API

Scalable Tools Communication Infrastructure Tool Agents Tool Junctions User Application Front End API Agent API Junction API

STCI Components Tool Components External Components

slide-11
SLIDE 11

11

Services Provided by STCI

 STCI provides services related to – Execution contexts – Sessions – Communication – Persistence – Security

slide-12
SLIDE 12

12

Execution Contexts

 Bootstrapping – Managing infrastructure lifecycle " Installation and deployment of STCI – Managing tool lifecycle  Execution context management – Starting/killing processes – Monitoring – Reacting to changes (e.g., process dies)  Resource management – E.g., allocate locations (aka nodes)

slide-13
SLIDE 13

13

Sessions

 All tool activities are performed within a session  A session consists of – Resource allocation (e.g., CPUs, networks adapters) – Set of tool agents and junctions – Description of how agents and junctions are mapped onto resources – One or more streams

slide-14
SLIDE 14

14

Streams

 A stream connects the FE to one or more Agents – Possibly through junctions  Depending on the junctions, a stream can – Broadcast, gather, scatter, reduce, etc. – Modify, filter messages – Route messages  Streams can be expanded/contracted – Minimize effect on communication – Don’t require stop and flush

FE J A A A A J J

slide-15
SLIDE 15

15

Streams (cont’ed)

 Formed by mapping topology onto resources  Topology – Predefined e.g., binary tree – Tool defined  Mapping – Automatic – Tool defined " Specific resource

  • e.g., put junction “X” on node “c562”

" Class

  • e.g., put junction “X” on any “I/O node” and an agent “Y” on

any “compute node”

slide-16
SLIDE 16

16

FE j0r0 a0r3 a1r4 a2r5 a3r6 j1r1 j2r2 FE j0 a0 a1 a2 a3 j1 j2 r0 r1 r2 r3 r4 r5 r6 r7 r8 Stream Resources Topology

slide-17
SLIDE 17

17

Communications

 All communication is performed over a stream  Active messages  Stream parameters – Message ordering – Reliability  Flow control – Pause and buffer – Pause and drop – Flush or quiesce a stream  Group communication: Bcast, reduce, etc. – Can be implemented by tool using junctions – STCI provides built-in group communication streams  Datatypes – Describe data layout and basic datatypes – Non-contiguous data – Heterogeneous system support

slide-18
SLIDE 18

18

Persistence

 Persistent state is maintained by STCI – State of the infrastructure " Location of infrastructure components – Active sessions " Allocated resources – Policy & security  Facilities for front-end disconnect and reconnect – Where to reconnect  Cleanup when sessions exit or abort

slide-19
SLIDE 19

19

Security

 Security services manage and control interaction between entities – Users, tools, applications, system resources – According to policies of a single security domain  Services – Session authentication " Tool provides credentials to create or reconnect to a session – Service authorization " Tool will not have access to any greater privilege than the user would be allowed  Keep as simple as possible – avoid conflicting with existing security mechanisms

slide-20
SLIDE 20

20

Conclusion

 Developing efficient scalable tools has always been a challenge – Exascale systems make this even harder  Existing tools are often – Architecture specific – Problem domain specific – Application specific  Tools often have to re-invent the wheel  STCI provides a standard HPC tool infrastructure – Scalability – Efficiency – Portability – Interoperability

slide-21
SLIDE 21

21

For More Information

 STCI website – http://www.scalable-tools.org  Email me – buntinas@mcs.anl.gov