Porting Charm++ to a New System Writing a Machine Layer Sayantan - - PowerPoint PPT Presentation

porting charm to a new system
SMART_READER_LITE
LIVE PREVIEW

Porting Charm++ to a New System Writing a Machine Layer Sayantan - - PowerPoint PPT Presentation

Porting Charm++ to a New System Writing a Machine Layer Sayantan Chakravorty 5/01/2008 Parallel Programming Laboratory 1 Why have a Machine Layer ? User Code .ci .C .h Charm++ Load balancing Virtualization Scheduler Converse Memory


slide-1
SLIDE 1

Porting Charm++ to a New System

Writing a Machine Layer

Sayantan Chakravorty

5/01/2008 1 Parallel Programming Laboratory

slide-2
SLIDE 2

Why have a Machine Layer ?

User Code

.ci .C .h

5/01/2008 2 Parallel Programming Laboratory

Charm++

Load balancing Virtualization Scheduler Memory management Message delivery Timers

Converse Machine Layer

slide-3
SLIDE 3

Where is the Machine Layer ?

  • Code exists in charm/src/arch/<Layer Name>
  • Files needed for a machine layer

– machine.c : Contains C code – conv-mach.sh : Defines environment variables – conv-mach.h : Defines macros to choose version of machine.c – Can produce many variants based on the same machine.c by varying conv-mach-<option>.*

  • 132 versions based on only 18 machine.c files

5/01/2008 3 Parallel Programming Laboratory

slide-4
SLIDE 4

What all does a Machine Layer do?

5/01/2008 Parallel Programming Laboratory 4

ConverseInit FrontEnd ConverseInit ConverseInit CmiSyncSendFn CmiSyncSendFn CmiSyncSendFn CmiSyncBroadcastFn ConverseExit CmiAbort ConverseExit CmiAbort ConverseExit CmiAbort

slide-5
SLIDE 5

Different kinds of Machine Layers

  • Differentiate by Startup method

– Uses lower level library/ run time

  • MPI: mpirun is the frontend

– cray, sol, bluegenep

  • VMI: vmirun is the frontend

– amd64, ia64

  • ELAN: prun is the frontend

– axp, ia64

– Charm run time does startup

  • Network based (net) : charmrun is the frontend

– amd64, ia64,ppc – Infiniband, Ethernet, Myrinet

5/01/2008 5 Parallel Programming Laboratory

slide-6
SLIDE 6

Net Layer: Why ?

  • Why do we need a startup in Charm RTS ?

– Using a low level interconnect API, no startup provided

  • Why use low level API ?

– Faster » Why faster

  • Lower overheads
  • We can design for a message driven system

– More flexible » Why more flexible ?

  • Can implement functionality with exact semantics needed

5/01/2008 6 Parallel Programming Laboratory

slide-7
SLIDE 7

Net Layer: What ?

  • Code base for implementing a machine layer on

low level interconnect API

5/01/2008 Parallel Programming Laboratory 7

ConverseInit charmrun CmiSyncSendFn CmiSyncBroadcastFn ConverseExit CmiAbort

req_client_connect DeliverViaNetwork CmiMachineInit node_addresses_obtain CmiMachineExit CommunicationServer

slide-8
SLIDE 8

Net Layer: Startup

5/01/2008 Parallel Programming Laboratory 8

charmrun.c

main(){ // read node file nodetab_init(); //fire off compute node processes start_nodes_rsh(); //Wait for all nodes to reply //Send nodes their node table req_client_connect(); //Poll for requests while (1) req_poll(); }

machine.c

ConverseInit(){ //Open socket with charmrun skt_connect(..); //Initialize the interconnect CmiMachineInit(); //Send my node data //Get the node table node_addresses_obtain(..); //Start the Charm++ user code ConverseRunPE(); }

Node data Node Table

slide-9
SLIDE 9

Net Layer: Sending messages

5/01/2008 Parallel Programming Laboratory 9

CmiSyncSendFn(int proc,int size,char *msg){ //common function for send CmiGeneralSend(proc,size,`S’,msg); } CmiGeneralSend(int proc,int size, int freemode, char *data){ OutgoingMsg ogm = PrepareOutgoing(cs,pe, size,freemode,data); DeliverOutgoingMessage(ogm); //Check for incoming messages and completed //sends CommunicationServer(); } DeliverOutgoingMessage(OutgoingMsg ogm){ //Send the message on the interconnect DeliverViaNetwork(ogm,..); }

slide-10
SLIDE 10

Net Layer: Exit

5/01/2008 Parallel Programming Laboratory 10

ConverseExit(){ //Shutdown the interconnect cleanly CmiMachineExit(); //Shutdown Converse ConverseCommonExit(); //Inform charmrun this process is done ctrl_sendone_locking("ending",NULL,0, NULL,0); }

slide-11
SLIDE 11

Net Layer: Receiving Messages

  • No mention of receiving messages
  • Result of message driven paradigm

– No explicit Receive calls

  • Receive starts in CommunicationServer

– Interconnect specific code collects received message – Calls CmiPushPE to handover message

5/01/2008 Parallel Programming Laboratory 11

slide-12
SLIDE 12

Let’s write a Net based Machine Layer

5/01/2008 Parallel Programming Laboratory 12

slide-13
SLIDE 13

A Simple Interconnect

  • Let’s make up an interconnect

– Simple

  • Each node has a port
  • Other Nodes send it messages on that port
  • A node reads its port for incoming messages
  • Messages are received atomically

– Reliable – Does Flow control itself

5/01/2008 13 Parallel Programming Laboratory

slide-14
SLIDE 14

The Simple Interconnect AMPI

  • Initialization

– void si_init() – int si_open() – NodeID si_getid()

  • Send a message

– int si_write(NodeID node, int port, int size, char *msg)

  • Receive a message

– int si_read(int port, int size, char *buf)

  • Exit

– int si_close(int port) – void si_done()

5/01/2008 Parallel Programming Laboratory 14

slide-15
SLIDE 15

Let’s start

  • Net layer based implementation for SI

5/01/2008 15 Parallel Programming Laboratory

conv-mach-si.h

#undef CMK_USE_SI #define CMK_USE_SI 1 //Polling based net layer #undef CMK_NETPOLL #define CMK_NETPOLL 1

conv-mach-si.sh

CMK_INCDIR=“-I/opt/si/include” CMK_LIBDIR=“-I/opt/si/lib” CMK_LIB=“$CMK_LIBS –lsi”

slide-16
SLIDE 16

Net based SI Layer

5/01/2008 Parallel Programming Laboratory 16

machine-si.c

#include “si.h” CmiMachineInit DeliverViaNetwork CommunicationServer CmiMachineExit

machine.c

//Message delivery #include “machine-dgram.c”

machine-dgram.c

#if CMK_USE_GM #include "machine-gm.c“ #elif CMK_USE_SI #include “machine-si.c” #elif …

slide-17
SLIDE 17

Initialization

5/01/2008 Parallel Programming Laboratory 17

charmrun.c

void req_client_connect(){ //collect all node data for(i=0;i<nClients;i++){ ChMessage_recv(req_clients[i],&msg); ChSingleNodeInfo *m=msg->data; #ifdef CMK_USE_SI nodetab[m.PE].nodeID = m.info.nodeID nodetab[m.PE].port = m.info.port #endif } //send node data to all for(i=0;i<nClients;i++){ //send nodetab on req_clients[i] }

machine.c

static OtherNode nodes; void node_adress_obtain(){ ChSingleNodeinfo me; #ifdef CMK_USE_SI me.info.nodeID = si_nodeID; me.info.port = si_port; #endif //send node data to chamrun ctrl_sendone_nolock("initnode",&me, sizeof(me),NULL,0); //receive and store node table ChMessage_recv(charmrun_fd, &tab); for(i=0;i<Cmi_num_nodes;i++){ nodes[i].nodeID = tab->data[i].nodeID; nodes[i].port = tab->data[i].port; }

machine-si.c

NodeID si_nodeID; int si_port; CmiMachineInit(){ si_init(); si_port = si_open(); si_nodeID = si_getid(); }

slide-18
SLIDE 18

Messaging: Design

  • Small header with every message

– contains the size of the message – Source NodeID (not strictly necessary)

  • Read the header

– Allocate a buffer for incoming message – Read message into buffer – Send it up to Converse

5/01/2008 Parallel Programming Laboratory 18

slide-19
SLIDE 19

Messaging: Code

5/01/2008 Parallel Programming Laboratory 19

machine-si.c

typedef struct{ unsigned int size; NodeID nodeID; } si_header; void DeliverViaNetwork(OutgoingMsg

  • gm, int dest,…) {

DgramHeaderMake(ogm->data,…); si_header hdr; hdr.nodeID = si_nodeID; hdr.size = ogm->size; OtherNode n = nodes[dest]; if(!si_write(n.nodeID, n.port,sizeof(hdr), &hdr) ){} if(!si_write(n.nodeID, n.port, hdr.size,

  • gm->data) ){}

}

machine-si.c

void CommunicationServer(){ si_header hdr; while(si_read(si_port,sizeof(hdr),&hdr)!= 0) { void *buf = CmiAlloc(hdr.size); int readSize,readTotal=0; while(readTotal < hdr.siez){ if((readSize= si_read(si_port,hdr.size,buf) ) <0){} readTotal += readSize; } //handover to Converse } }

slide-20
SLIDE 20

Exit

5/01/2008 Parallel Programming Laboratory 20

machine-si.c

NodeID si_nodeID; int si_port; CmiMachineExit (){ si_close(si_port); si_done(); }

slide-21
SLIDE 21

More complex Layers

  • Receive buffers need to be posted

– Packetization

  • Unreliable interconnect

– Error and Drop detection – Packetization – Retransmission

  • Interconnect requires memory to be registered

– CmiAlloc implementation

5/01/2008 Parallel Programming Laboratory 21

slide-22
SLIDE 22

Thank You

5/01/2008 Parallel Programming Laboratory 22