case studies in asynchronous message driven shared memory
play

Case Studies in Asynchronous, Message-Driven Shared Memory - PowerPoint PPT Presentation

Case Studies in Asynchronous, Message-Driven Shared Memory Programming Pritish Jetley Parallel Programming Laboratory pjetley2@illinois.edu 04/25/11 1 Outline Shared memory programming today Charm++ on multicore systems Shared


  1. Case Studies in Asynchronous, Message-Driven Shared Memory Programming Pritish Jetley Parallel Programming Laboratory pjetley2@illinois.edu 04/25/11 1

  2. Outline ● Shared memory programming today ● Charm++ on multicore systems ● Shared memory (SM) programming in Charm++ ● Case studies ● Barnes-Hut (SPLASH) ● SAH-based k d-tree construction 04/25/11 2

  3. SM programming today ● Fork-join ● Amorphous, thread-based (pthreads) ● Data parallelism-centric (OpenMP) ● Tasks (TBB, Cilk) ● Message-driven execution (Charm++) 04/25/11 3

  4. Fork-join model - + Forced synchrony Simple to program (?) Low-level Mutex Global view of control Grainsize control Natural fit for certain problems 04/25/11 4

  5. Charm++ on multicore systems ● Decompose algorithm into objects encapsulating its natural elements ● Objects present reactive interfaces ● Control flows through asynch. entry method invocations ● Data flows through pointer exchange 04/25/11 5

  6. SM programming with Charm++ and MDE - - + No global view of control Charm++ has no Natural decomposition faults whatsoever MDE is low-level Dependencies = messages Asynchrony Dynamic load balancing Task prioritization 04/25/11 6

  7. Performance and productivity studies ● How easy (or hard) is it to write SM programs in Charm++? ● Can we expect improvements in performance? ● Are there abstractions that would improve programmability in Charm++? 04/25/11 7

  8. Comparison points ● SPLASH2 Barnes-Hut benchmark ● Stud y e vo lutio n o f s e lf-g ra vita ting s ys te m s ● Tre e -b a s e d c o d e ● U s e s pth re a d s ● SAH-based k d-tree construction ● H ig h -pe rfo rm a nc e ra y tra c ing ● Ne s te d pa ra lle lis m ● U s e s TB B 04/25/11 8

  9. SPLASH Barnes-Hut ● Domain decomposition and tree building ● Partition space into compact, disjoint regions containing approximately equal numbers of particles ● Regions arranged in an octree ● Independent subtrees: task parallel ● Shuffle particles into child bins: data parallel ● Force calculation ● Objects own non-intersecting sets of particles, and calculate forces on them 04/25/11 9

  10. Decomposition ● Recursively divide partition into quadrants if more than τ particles within it τ = 3 04/25/11 10

  11. Domain decomposition Node task N Last particle First particle Combined messages Child tasks 04/25/11 11

  12. Decomposition with pthreads void decompose(){ for(int I = 0; I < myNP; I++){ Particle *p = myParticles[I]; Cell *cell = g_root; while(1){ c e ll->LOC K (); if(!cell->isLeaf()){ save = cell; int which = cell->which(p->key); cell = cell->child(which); s a ve ->UN LOC K (); } else{ cell->particles.add(p); cell->split(); c e ll->UN LOC K (); break; } } 04/25/11 12 } }

  13. Decomposition with Charm++ Tre e P ie c e ::re c vP a rtic le s (Particle *ptr, int np){ void Tre e P ie c e ::de c om pos e (){ if(myRoot->isLeaf()){ for(int I = 0; I < myNP; I++){ myRoot->addParticles(ptr,np); Particle *p = myParticles[I]; if(myRoot->split()){ int which = g_root->whichChild(p->key); forw a rdP a rtic le s ToC hildre n (myRoot->particles); buffe rP a rtic le (which,p); } if(outParticles[which].size() > THRESH){ } flus hP a rtic le s (which); else{ } forw a rdP a rtic le s ToC hildre n (ptr,np); } } flus hAllP a rtic le s (); } } void TreePiece::flushParticles(int I){ void TreePiece::forwardParticlesToChildren( tre e P ie c e P roxy[I].re c vP a rtic le s (buffered[I], for(int I = 0; I < NUM_CHILDREN; I++){ buffered[I].size()); t re e P ie c e P roxy[c hildInde x[I]].re c vP a rtic le s ( } childParticles[I], childPartilces[I].size()); } 04/25/11 13 }

  14. Tree traversal Tra ve rs e (Leaf b, Node n){ if( Is L e a f (n)){ L e a fF orc e s (b,n); } else if( S ide (n)/| r (n)- r (b)| < Theta_T){ C e llF orc e s (b,n); } 04/25/11 14

  15. Fewer barriers Title:100k.1.comparison.eps Title:10k.1.comparison.eps Creator:gnuplot 4.2 patchlevel 6 Creator:gnuplot 4.2 patchlevel 6 CreationDate:Tue Apr 19 01:05:26 2011 CreationDate:Tue Apr 19 01:03:33 2011 04/25/11 15

  16. Performance profile 04/25/11 16

  17. Performance profile 04/25/11 17

  18. More results Title:100k.2.comparison.eps Title:10k.2.comparison.eps Creator:gnuplot 4.2 patchlevel 6 Creator:gnuplot 4.2 patchlevel 6 CreationDate:Tue Apr 19 01:08:05 2011 CreationDate:Tue Apr 19 01:08:11 2011 04/25/11 18

  19. SAH-based k d-trees ● Used to efficiently render complex graphical scenes ● Task parallel construction of independent subtrees (dynamically created chares ) ● Data parallel calculation of node split point ( chare arrays ) 04/25/11 19

  20. Binary Space Partitioning ● SAH decides position of partition based on triangle distribution and partition surface area Partitioning plane Extents 04/25/11 20

  21. k d-tree construction Node task N Last triangle First triangle Particle chare array P Child tasks 04/25/11 21

  22. Charm++ pseudocode ● Use SDAG to sequence events in parallel scan e ntry void Worke r :: s c a nTria ng le C ounts (ActivationRec ar, NodeTaskID N){ dist = W >> 1; w hile (dist > 0){ if (thisIdx < dist){ ScanMsg m; m.NL = myNL; m.NR = ar.nTris-myNR; Re fN um (m) = dist; workers[thisIdx+dist]. re c vN e ig hborC ounts (m); } w he n recvNeighborCounts[ dis t ](ScanMsg m1){ myNL += m.NL; myNR -= m.NR; dist >>= 1; } } Plane bestPlane = c a lc ula te S AH (); 04/25/11 22 re duc e (bestPlane,N, N ode Ta s k :: g e tB e s tP la ne s ); }

  23. Charm++ implementation ● One chare for each node of kd-tree (orchestrator) ● For data-parallel operations, orchestrator either ● Fire s ne w c h a re s (d yna m ic lo a d b a la nc e ) ● U s e s c h a re a rra y (lo w o ve rh e a d o f us e ) ● Several optimizations in place ● Prio ritiza tio n ● A rra y-le ve l m c a s ts /re d uc tio ns ● M a nua l “s m e a ring ” o f ta s ks a t to p le ve l ● U s e o f c hunke d a rra ys – Re d uc e s fa ls e s h a ring 04/25/11 23 – Re d uc e s a m o unt o f c o o rd ina tio n c o m m unic a tio n

  24. Results Title:bunny.eps Title:fairy.eps Creator:gnuplot 4.2 patchlevel 6 Creator:gnuplot 4.2 patchlevel 6 CreationDate:Tue Apr 19 01:18:08 2011 CreationDate:Tue Apr 19 01:18:08 2011 Title:angel.eps Title:happy.eps Creator:gnuplot 4.2 patchlevel 6 Creator:gnuplot 4.2 patchlevel 6 CreationDate:Tue Apr 19 01:18:08 2011 CreationDate:Tue Apr 19 01:18:08 2011 04/25/11 24

  25. Performance profile 04/25/11 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend