A Lightweight Library for Building Scalable T ools Emily R. - - PowerPoint PPT Presentation

a lightweight library for building scalable t ools
SMART_READER_LITE
LIVE PREVIEW

A Lightweight Library for Building Scalable T ools Emily R. - - PowerPoint PPT Presentation

A Lightweight Library for Building Scalable T ools Emily R. Jacobson , Michael J. Brim, Barton P. Miller Paradyn Project University of Wisconsin jacobson@cs.wisc.edu June 6, 2010 Para 2010: State of the Art in Scientific and Parallel


slide-1
SLIDE 1

A Lightweight Library for Building Scalable T

  • ols

Emily R. Jacobson, Michael J. Brim, Barton P. Miller Paradyn Project University of Wisconsin jacobson@cs.wisc.edu June 6, 2010 Para 2010: State of the Art in Scientific and Parallel Computing

slide-2
SLIDE 2

MRNet Motivation

2

Example T

  • ol:

Performance Analysis

FE BE BE BE BE

slide-3
SLIDE 3

MRNet Motivation

3

Example T

  • ol:

Performance Analysis As scale increases, front-end becomes bottleneck

100,000s of nodes

FE BE BE BE BE

slide-4
SLIDE 4

MRNet Goals

  • Provide infrastructure for building tools that scale

to the largest computing platforms

  • Support scalability for command, computation,

and data collection

4

slide-5
SLIDE 5

MRNet Features

5

  • Scalable – Handle 100,000’s of nodes
  • Multi-platform – Cray XT, IBM BlueGene, Linux

clusters, AIX, Solaris, Windows

  • Reliable – Automatic fault recovery
  • Flexible – Target a wide variety of tools,

applications, and architectures

  • Customizable – Easily extend to new

algorithms and requirements

  • Open Source
slide-6
SLIDE 6

TBŌN Model

6

FE

BE BE BE BE

slide-7
SLIDE 7

TBŌN Model

7

MRNet makes use of a software tree-based

  • verlay network, TBŌN

FE CP CP CP CP CP CP

… …

BE BE BE BE

CP

slide-8
SLIDE 8

TBŌN Model

8

TBŌNs provide:

  • Scalable multicast
  • Scalable gather
  • Scalable data aggregation

FE CP CP CP CP CP CP

… …

BE BE BE BE

CP

slide-9
SLIDE 9

TBŌN Model

9

Application Front-end Application Back-ends Tree of Communication Processes

FE CP CP CP CP CP CP

… …

BE BE BE BE

CP

slide-10
SLIDE 10

MRNet: An Easy-to-use TBŌN

10

FE CP CP CP CP CP CP

… …

BE BE BE BE

CP

Logical channels called streams connect the front- end to the back-ends Data is sent along a stream in a packet

slide-11
SLIDE 11

MRNet: An Easy-to-use TBŌN

11

Easily multicast messages to backend nodes

CP CP CP CP CP CP

… …

BE BE BE BE

FE CP

slide-12
SLIDE 12

MRNet: An Easy-to-use TBŌN

12

Easily multicast messages to backend nodes and aggregate data as it is sent to the frontend

CP CP CP CP CP CP

… …

BE BE BE BE

FE CP

slide-13
SLIDE 13

MRNet: An Easy-to-use TBŌN

13

Application-level packet Packet filter

CP CP CP CP CP CP

… …

BE BE BE BE

FE CP

slide-14
SLIDE 14

MRNet Filters

14

Packet Batching/Unbatching

Transformation Filter

Packet Batching/Unbatching

Synchronization Filter

Packet Filter

slide-15
SLIDE 15

MRNet Components

15

T

  • ol Front-end

libmrnet T

  • ol Back-End

libmrnet

CP

filter

CP

filter

CP

filter

T

  • ol Back-End

libmrnet T

  • ol Back-End

libmrnet T

  • ol Back-End

libmrnet Provided by MRNet User Written

slide-16
SLIDE 16

Example MRNet Tool

… …

CP CP CP CP CP CP BE BE BE BE

num_vals, delay num_vals, delay num_vals, delay num_vals, delay num_vals, delay

FE CP

Performance T

  • ol:

gather load information from backends

slide-17
SLIDE 17

Example MRNet Tool

… …

CP CP CP CP CP CP BE BE BE BE

num_vals, delay num_vals, delay num_vals, delay num_vals, delay num_vals, delay cur_load cur_load cur_load cur_load

avg filter avg filter

avg_load avg_load

FE CP

avg filter

avg_load avg_load avg_load

Performance T

  • ol:

gather load information from backends

slide-18
SLIDE 18

Example MRNet Tool

… …

CP CP CP CP CP CP BE BE BE BE

num_vals, delay num_vals, delay num_vals, delay num_vals, delay num_vals, delay cur_load cur_load cur_load cur_load

avg filter avg filter

avg_load avg_load

FE CP

avg filter

avg_load avg_load avg_load

Performance T

  • ol:

gather load information from backends

slide-19
SLIDE 19

Example Frontend Code

front_end_main(int argc, char ** argv) { Network * net = Network::CreateNetworkFE(topo_file, bckend_exe, &dummy_argv); int filter_id = net->load_FilterFunc(so_file, “LoadAvg”); Communicator * comm = net->get_BroadcastCommunicator(); Stream * strm = net->new_Stream(comm, filter_id, SFILTER_WAITFORALL); int tag = PROT_SUM; strm->send(tag, “%d %d”, num_vals, delay); for (i = 0; i < num_vals; i++) { strm->recv(&tag, pkt); pkt->unpack(“%d”, &recv_val); } strm->send(PROT_EXIT, “”); delete net; }

19

slide-20
SLIDE 20

Example Frontend Code

front_end_main(int argc, char ** argv) { Network * net = Network::CreateNetworkFE(topo_file, bckend_exe, &dummy_argv); int filter_id = net->load_FilterFunc(so_file, “LoadAvg”); Communicator * comm = net->get_BroadcastCommunicator(); Stream * strm = net->new_Stream(comm, filter_id, SFILTER_WAITFORALL); int tag = PROT_SUM; strm->send(tag, “%d %d”, num_vals, delay); for (i = 0; i < num_vals; i++) { strm->recv(&tag, pkt); pkt->unpack(“%d”, &recv_val); } strm->send(PROT_EXIT, “”); delete net; }

20

Create a new instance of a Network

slide-21
SLIDE 21

Example Frontend Code

front_end_main(int argc, char ** argv) { Network * net = Network::CreateNetworkFE(topo_file, bckend_exe, &dummy_argv); int filter_id = net->load_FilterFunc(so_file, “LoadAvg”); Communicator * comm = net->get_BroadcastCommunicator(); Stream * strm = net->new_Stream(comm, filter_id, SFILTER_WAITFORALL); int tag = PROT_SUM; strm->send(tag, “%d %d”, num_vals, delay); for (i = 0; i < num_vals; i++) { strm->recv(&tag, pkt); pkt->unpack(“%d”, &recv_val); } strm->send(PROT_EXIT, “”); delete net; }

21

Get broadcast communicator and create a Stream that uses this communicator

slide-22
SLIDE 22

Example Frontend Code

front_end_main(int argc, char ** argv) { Network * net = Network::CreateNetworkFE(topo_file, bckend_exe, &dummy_argv); int filter_id = net->load_FilterFunc(so_file, “LoadAvg”); Communicator * comm = net->get_BroadcastCommunicator(); Stream * strm = net->new_Stream(comm, filter_id, SFILTER_WAITFORALL); int tag = PROT_SUM; strm->send(tag, “%d %d”, num_vals, delay); for (i = 0; i < num_vals; i++) { strm->recv(&tag, pkt); pkt->unpack(“%d”, &recv_val); } strm->send(PROT_EXIT, “”); delete net; }

22

Send num_vals and delay to the BEs

slide-23
SLIDE 23

Example Frontend Code

front_end_main(int argc, char ** argv) { Network * net = Network::CreateNetworkFE(topo_file, bckend_exe, &dummy_argv); int filter_id = net->load_FilterFunc(so_file, “LoadAvg”); Communicator * comm = net->get_BroadcastCommunicator(); Stream * strm = net->new_Stream(comm, filter_id, SFILTER_WAITFORALL); int tag = PROT_SUM; strm->send(tag, “%d %d”, num_vals, delay); for (i = 0; i < num_vals; i++) { strm->recv(&tag, pkt); pkt->unpack(“%d”, &recv_val); } strm->send(PROT_EXIT, “”); delete net; }

23

Receive and unpack Packet with a single int

slide-24
SLIDE 24

Example Frontend Code

front_end_main(int argc, char ** argv) { Network * net = Network::CreateNetworkFE(topo_file, bckend_exe, &dummy_argv); int filter_id = net->load_FilterFunc(so_file, “LoadAvg”); Communicator * comm = net->get_BroadcastCommunicator(); Stream * strm = net->new_Stream(comm, filter_id, SFILTER_WAITFORALL); int tag = PROT_SUM; strm->send(tag, “%d %d”, num_vals, delay); for (i = 0; i < num_vals; i++) { strm->recv(&tag, pkt); pkt->unpack(“%d”, &recv_val); } strm->send(PROT_EXIT, “”); delete net; }

24

T eardown the Network

slide-25
SLIDE 25

Example Filter Code

const char * LoadAvg_format_string = “%d”; void LoadAvg(std::vector<PacketPtr> & pkts_in, std::vector<PacketPtr> & pkts_out, std::vector<PacketPtr> &, /* packets_out_reverse */ void **, /* client data */ PacketPtr &) { /* params */ int avg = 0; for (unsigned int i = 0; i < pkts_in.size(); i++) { PacketPtr cur_packet = pkts_in[i]; int val; cur_packet->unpack(“%d”, &val); avg += val; } avg = avg / pkts_in.size(); PacketPtr new_pkt (new Packet(pkts_in[0]->get_StreamId(), pkts_in[0]->get_Tag(), “%d”, avg)); pkts_out.push_back(new_pkt); }

25

slide-26
SLIDE 26

Example Filter Code

const char * LoadAvg_format_string = “%d”; void LoadAvg(std::vector<PacketPtr> & pkts_in, std::vector<PacketPtr> & pkts_out, std::vector<PacketPtr> &, /* packets_out_reverse */ void **, /* client data */ PacketPtr &) { /* params */ int avg = 0; for (unsigned int i = 0; i < pkts_in.size(); i++) { PacketPtr cur_packet = pkts_in[i]; int val; cur_packet->unpack(“%d”, &val); avg += val; } avg = avg / pkts_in.size(); PacketPtr new_pkt (new Packet(pkts_in[0]->get_StreamId(), pkts_in[0]->get_Tag(), “%d”, avg)); pkts_out.push_back(new_pkt); }

26

Declare format of data expected by the filter

slide-27
SLIDE 27

Example Filter Code

const char * LoadAvg_format_string = “%d”; void LoadAvg(std::vector<PacketPtr> & pkts_in, std::vector<PacketPtr> & pkts_out, std::vector<PacketPtr> &, /* packets_out_reverse */ void **, /* client data */ PacketPtr &) { /* params */ int avg = 0; for (unsigned int i = 0; i < pkts_in.size(); i++) { PacketPtr cur_packet = pkts_in[i]; int val; cur_packet->unpack(“%d”, &val); avg += val; } avg = avg / pkts_in.size(); PacketPtr new_pkt (new Packet(pkts_in[0]->get_StreamId(), pkts_in[0]->get_Tag(), “%d”, avg)); pkts_out.push_back(new_pkt); }

27

Use generic function signature

slide-28
SLIDE 28

Example Filter Code

const char * LoadAvg_format_string = “%d”; void LoadAvg(std::vector<PacketPtr> & pkts_in, std::vector<PacketPtr> & pkts_out, std::vector<PacketPtr> &, /* packets_out_reverse */ void **, /* client data */ PacketPtr &) { /* params */ int avg = 0; for (unsigned int i = 0; i < pkts_in.size(); i++) { PacketPtr cur_packet = pkts_in[i]; int val; cur_packet->unpack(“%d”, &val); avg += val; } avg = avg / pkts_in.size(); PacketPtr new_pkt (new Packet(pkts_in[0]->get_StreamId(), pkts_in[0]->get_Tag(), “%d”, avg)); pkts_out.push_back(new_pkt); }

28

Aggregate incoming packets

slide-29
SLIDE 29

Example Filter Code

const char * LoadAvg_format_string = “%d”; void LoadAvg(std::vector<PacketPtr> & pkts_in, std::vector<PacketPtr> & pkts_out, std::vector<PacketPtr> &, /* packets_out_reverse */ void **, /* client data */ PacketPtr &) { /* params */ int avg = 0; for (unsigned int i = 0; i < pkts_in.size(); i++) { PacketPtr cur_packet = pkts_in[i]; int val; cur_packet->unpack(“%d”, &val); avg += val; } avg = avg / pkts_in.size(); PacketPtr new_pkt (new Packet(pkts_in[0]->get_StreamId(), pkts_in[0]->get_Tag(), “%d”, avg)); pkts_out.push_back(new_pkt); }

29

Create new outgoing Packet

slide-30
SLIDE 30

pkt->unpack(“%d %d”,&n,&delay);

Example Backend Code

Network * net = Network::CreateNetworkBE(argc, argv); Stream * strm; PacketPtr pkt; net->recv(&tag, pkt, &strm); for (i = 0; i < n; i++) { strm->send(tag, “%d”, get_load()); sleep(delay); } back_end_main(int argc, char ** argv) { }

30

slide-31
SLIDE 31

pkt->unpack(“%d %d”,&n,&delay);

Example Backend Code

Network * net = Network::CreateNetworkBE(argc, argv); Stream * strm; PacketPtr pkt; net->recv(&tag, pkt, &strm); for (i = 0; i < n; i++) { strm->send(tag, “%d”, get_load()); sleep(delay); } back_end_main(int argc, char ** argv) { }

31

Create a new instance of a Network

slide-32
SLIDE 32

pkt->unpack(“%d %d”,&n,&delay);

Example Backend Code

Network * net = Network::CreateNetworkBE(argc, argv); Stream * strm; PacketPtr pkt; net->recv(&tag, pkt, &strm); for (i = 0; i < n; i++) { strm->send(tag, “%d”, get_load()); sleep(delay); } back_end_main(int argc, char ** argv) { }

32

Declare some necessary variables

slide-33
SLIDE 33

pkt->unpack(“%d %d”,&n,&delay);

Example Backend Code

Network * net = Network::CreateNetworkBE(argc, argv); Stream * strm; PacketPtr pkt; net->recv(&tag, pkt, &strm); for (i = 0; i < n; i++) { strm->send(tag, “%d”, get_load()); sleep(delay); } back_end_main(int argc, char ** argv) { }

33

Do an anonymous network receive

slide-34
SLIDE 34

Example Backend Code

Network * net = Network::CreateNetworkBE(argc, argv); Stream * strm; PacketPtr pkt; net->recv(&tag, pkt, &strm); for (i = 0; i < n; i++) { strm->send(tag, “%d”, get_load()); sleep(delay); } back_end_main(int argc, char ** argv) { }

34

Unpack Packet containing two ints

pkt->unpack(“%d %d”,&n,&delay);

slide-35
SLIDE 35

Example Backend Code

Network * net = Network::CreateNetworkBE(argc, argv); Stream * strm; PacketPtr pkt; net->recv(&tag, pkt, &strm); for (i = 0; i < n; i++) { strm->send(tag, “%d”, get_load()); sleep(delay); } back_end_main(int argc, char ** argv) { }

35

Send current load up the stream, then sleep for specified time

pkt->unpack(“%d %d”,&n,&delay);

slide-36
SLIDE 36

Using the Lightweight Backend API

  • For those already using MRNet, learning to use

the lightweight API is easy

36

int Stream::send(int tag, char * format_string, …); int Stream_send(Stream_t * stream, int tag, char * format_string, …);

slide-37
SLIDE 37

pkt->unpack(“%d %d”,&n,&delay);

Example Backend Code

Network * net = Network::CreateNetworkBE(argc, argv); Stream * strm; PacketPtr pkt; net->recv(&tag, pkt, &strm); for (i = 0; i < n; i++) { strm->send(tag, “%d”, get_load()); sleep(delay); } back_end_main(int argc, char ** argv) { }

37

slide-38
SLIDE 38

for (i = 0; i < n; i++) { Stream_send(strm,tag,“%d”, get_load()); sleep(delay); }

Example Backend Code

Network * net = Network::CreateNetworkBE(argc, argv); Stream * strm; PacketPtr pkt; net->recv(&tag, pkt, &strm); pkt->unpack(“%d %d”,&n,&delay); for (i = 0; i < n; i++) { strm->send(tag, “%d”, get_load()); sleep(delay); } back_end_main(int argc, char ** argv) { } Network_t * net = Network_CreateNetworkBE(argc, argv); Stream_t * strm; Packet_t * pkt; Network_recv(net, &tag, pkt, &strm); Packet_unpack(pkt,“%d %d”,&n,&delay); back_end_main(int argc, char ** argv) { }

38

slide-39
SLIDE 39

for (i = 0; i < n; i++) { Stream_send(strm,tag,“%d”, get_load()); sleep(delay); }

Example Backend Code

Network * net = Network::CreateNetworkBE(argc, argv); Stream * strm; PacketPtr pkt; net->recv(&tag, pkt, &strm); pkt->unpack(“%d %d”,&n,&delay); for (i = 0; i < n; i++) { strm->send(tag, “%d”, get_load()); sleep(delay); } back_end_main(int argc, char ** argv) { } Network_t * net = Network_CreateNetworkBE(argc, argv); Stream_t * strm; Packet_t * pkt; Network_recv(net, &tag, pkt, &strm); Packet_unpack(pkt,“%d %d”,&n,&delay); back_end_main(int argc, char ** argv) { } Network_t * Network * Stream_t * Packet_t * Stream * PacketPtr

39

slide-40
SLIDE 40

for (i = 0; i < n; i++) { Stream_send(strm,tag,“%d”, get_load()); sleep(delay); } Stream_send(strmtag, “%d”, get_load());

Example Backend Code

Network * net = Network::CreateNetworkBE(argc, argv); Stream * strm; PacketPtr pkt; net->recv(&tag, pkt, &strm); pkt->unpack(“%d %d”,&n,&delay); for (i = 0; i < n; i++) { strm->send(tag, “%d”, get_load()); sleep(delay); } back_end_main(int argc, char ** argv) { } Network_t * net = Network_CreateNetworkBE(argc, argv); Stream_t * strm; Packet_t * pkt; Network_recv(net, &tag, pkt, &strm); Packet_unpack(pkt,“%d %d”,&n,&delay); back_end_main(int argc, char ** argv) { } Network_t * Network * Stream_t * Packet_t * Stream * PacketPtr Network::CreateNetworkBE(argc, argv); Network_CreateNetworkBE(argc, argv); pkt->unpack(“%d %d”,&n,&delay); net->recv(&tag, pkt, &strm); Packet_unpack(pkt,“%d %d”,&n,&delay); Network_recv(net, &tag, pkt, &strm); strm->send(tag, “%d”, get_load());

40

slide-41
SLIDE 41

New Lightweight Backend Library

  • Can embed C library in application processes
  • Lightweight MRNet BE can run on reduced node

kernels (such as BlueGene compute node)

  • Avoids the need to deal with threading in

application or tool

41

slide-42
SLIDE 42

Lightweight MRNet

  • Standard MRNet back-end as part of tool
  • C++ library
  • Multi-threaded
  • Dedicated thread receives data
  • Filtering at back-ends
  • Lightweight back-end as part of C-based tool or

application

  • C library
  • Single-threaded
  • No filtering at back-end

42

slide-43
SLIDE 43

Some MRNet Users

  • Stack Trace Analysis T
  • ol (STAT, LLNL)
  • Cray Application T

ermination Processing (ATP, Cray)

  • T
  • talView using TBON-FS (T
  • talView and Univ. Wisconsin)
  • TAU over MRNet performance tool (T
  • M, Univ. Oregon)
  • Open|SpeedShop, Component Based T
  • ol Framework (CBTF,

Krell Institute)

  • Paradyn Performance T
  • ol (Univ. Wisconsin)
  • Group File Operations & TBON-FS (Univ. Wisconsin)
  • Clustering algorithms (Mean shift, Univ. Wisconsin Vision Group)
  • On-line detection of large scale application structure (BSC)

43

slide-44
SLIDE 44

Conclusions

  • MRNet provides infrastructure for scalable

communication and computation

  • Runs full-scale on large systems, for example:
  • Jaguar – 216K processes
  • BlueGene/L – 208K processes
  • Used by a wide variety of tools
  • Handles the hard work of tool scaling
  • Provides fault tolerance support
  • MRNet can be integrated with tools written in

either C or C++

44

slide-45
SLIDE 45

Questions?

MRNet 3.0 Available Soon! http://www.paradyn.org/html/downloads.html Manuals and Publications Available: http://www.paradyn.org/mrnet Additional Questions? jacobson@cs.wisc.edu

45