Offload Mode Case Study James Briggs 1 COSMOS DiRAC April 28, 2015 - PowerPoint PPT Presentation

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Offload Mode Case Study James Briggs 1 COSMOS DiRAC April 28, 2015

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Case Study: Modal2d MODAL is an early universe simulation and analysis code used to probe the Cosmic Microwave Background (CMB). Analyses higher-order correlation functions beyond the power spectrum. Novel algorithm for efficient mode expansion to measure reconstruct the CMB bispectrum for the first time. Bispectrum of CMB. Source: Planck 2013 Fast and efficient way to probe results. XXIV. Constraints on primordial cosmological data for hints of new non-Gaussianity physics in the early universe.

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Surveying the Code Original code is pure C and parallelised with MPI only . Already vectorised the code on Xeon to great success and there is enough potential parallelism for threads ⇒ great Xeon Phi potential? Library dependencies – GSL, iniparser, FFTW – for initialisation and I/O. (Outside of main loop). Compiling for native with -mmic tedious because I need to compile the external libraries for Xeon Phi too. Likely less tedious to test Xeon Phi with offload than native.

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Pseudo-code Want to offload the computationally most expensive part. Pseudo-code for main loop: MPI for n i n primoridal modes : MPI for m i n late modes : y = double [ x s i z e ] f o r x i n range (0 , x s i z e ) : y [ i ] += x [ i ] ∗ x [ i ] ∗ gamma pt (n ,m, i ) ; gamma [ n ] [m] = g s l i n t e g r a t e ( x [ ] , y [ ] ) ; MPI Reduce (gamma [ ] [ ] ) ; Output = gamma[][] . The n and m loops are decomposed over MPI tasks. Typical size O (1000). gamma pt routine has a lot of work and is well vectorised.

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Making it Offloadable (1/3) MPI for n i n primoridal modes : MPI for m i n late modes : y = double [ x s i z e ] f o r x i n range (0 , x s i z e ) : y [ i ] += x [ i ] ∗ x [ i ] ∗ gamma pt (n ,m, i ) ; gamma [ n ] [m] = g s l i n t e g r a t e ( x [ ] , y [ ] ) ; MPI Reduce (gamma [ ] [ ] ) ; Integration has GSL dependency. Negligible in profile ⇒ write my own integration routine and remove the dependency.

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Making it Offloadable (2/3) MPI for n i n primoridal modes : MPI for m i n late modes : y = double [ x s i z e ] f o r x i n range (0 , x s i z e ) : y [ i ] += x [ i ] ∗ x [ i ] ∗ gamma pt (n ,m, i ) ; gamma [ n ] [m] = m y i n t e g r at e ( x [ ] , y [ ] ) ; MPI Reduce (gamma [ ] [ ] ) ; Integration has GSL dependency. Negligible in profile ⇒ write my own integration routine and remove the dependency.

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Making it Offloadable (3/3) Add offload pragma before main loop... #pragma o f f l o a d t a r g e t ( mic : 0 ) \ inout (gamma : l e n g t h (N ∗ M) ALLOC FREE) \ i n ( primordial modes , late modes , mpi vars ) MPI for n i n primoridal modes : MPI for m i n late modes : y [ 0 : x s i z e ] = 0 . 0 ; f o r x i n range (0 , x s i z e ) : y [ i ] += x [ i ] ∗ x [ i ] ∗ gamma pt (n ,m, i ) ; gamma [ n ] [m] = m y i n t e g r at e ( x [ ] , y [ ] ) ; // end o f f l o a d r e g i o n MPI Reduce (gamma [ ] [ ] ) ; Done? Nope. Just starting!

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Tracking Down the Offloadables (1/3) Doesn’t compile! – Missing symbols. Need to track down all the functions and global variables used in the main loop and declare them offloadable : a t t r i b u t e (( t a r g e t ( mic ) ) ) double gamma pt ( i n t n , i n t m, i n t i ) ; This part can be fiddly . Help: Missing symbols will be found at compile time. ctags with Vim or Emacs very useful for chasing down dependencies. IDE could also have useful tools to help do this.

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Tracking Down the Offloadables (2/3) Code now compiles, but the result is garbage! Declaring offloadable is only half the battle. Code has a lot of read-only global variables. Declaring variables offloadable just means that their symbols are visible on the MIC side. Data isn’t necessarily also there .

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Tracking Down the Offloadables (3/3) Need to track down the required global variables, and do an #pragma offload transfer when their values are set. Allinea DDT offload debugger is useful for finding uninitialised variables offload-side. Now done :-).

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Aside: Multi-dimensional Arrays Main loop reads several multi-dimensional arrays. These are implemented as arrays-of-pointers . Offload data transfers in LEO won’t offload these properly. Work-around : transfer them flat , then rebuild / reinterpret dimensions on the ’other-side’. C one-liner to reinterpret flat array (basis flat) as 2-dimensional (basis): double ( ∗ r e s t r i c t b a s i s ) [ l s i z e p a d ] = ( double ( ∗ r e s t r i c t ) [ l s i z e p a d ] ) b a s i s f l a t ;

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Xeon Phi Performance After offloading added threads via OpenMP of nm loops. This makes code OpenMP/MPI hybrid. Each MPI rank offloads to its own card and uses all the cores. With vectorisation enabled in main loop, test case: 2 × SandyBridge = 167s (2.7 × original). 1 × Xeon Phi = 75s (6.0 × original). 1 × Xeon Phi = 2.23 × 2 × SandyBridge.

Offload Mode Case Study James Briggs 1 COSMOS DiRAC April 28, 2015 - PowerPoint PPT Presentation

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Offload Mode Case Study James Briggs 1 COSMOS DiRAC April 28, 2015 Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Case

HIERARCHICAL QOS HARDWARE OFFLOAD Yossi Kuperman, Maxim Mikityanskiy, 2020 AGENDA Hierarchical

Control of switch-mode converters Current Programmed Mode control CPM Mor M. Peretz, Switch-Mode

Hardware accelerating Linux network functions Roopa Prabhu, Wilson Kok Proceedings of netdev

NATIVE MODE PROGRAMMING Fiona Reid Overview What is native mode? What codes are suitable

OFFLOAD MODE PROGRAMMING Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Overview

Case study 2 Case study 2 Case study 2 Case study 2 Former Industrial Site, London: How has

Direct fibre excitation with a digital laser 1 Proof of principle Mode transfer Mode detection

CMF2012F Series Common Mode SMD Filter for Signal Line FEATURES This common mode filter is

The standard mode of DSQSS/DLA Standard mode files Input files of dla Simple mode file

Org-mode Nick Higham April 22, 2013 Nick Higham Org-mode 1 / 7 University of Manchester What

NATIVE MODE PROGRAMMING Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Overview What is

switchport mode access switchport mode trunk switchport mode trunk

1 [9-4] Mor M. Peretz, Switch-Mode Power Supplies Current feedback loop I o L i o V o v o S V

Traps and Faults Traps and Faults Review: Mode and Space Review: Mode and Space C A B data

NATIVE MODE PORTING CASE STUDY Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Native mode

Selection of Mode of Presentation Selection of the mode of presentation online does not confirm

PDSF User Meeting June 4, 2013 Lisa Gerhardt Utilization - 2

The computational complexity of integer programming with alternations Igor Pak, UCLA Joint work

Gene-set analysis and data integration Le Leif if Wi Wigge leif.wigge@scilifelab.se

GSP Coordination Committee ing December Coor Coordina dination tion Commit Committee tee

A Formal Framework for UML Modeling with Timed Constraints: Application to Railway Control

Final Project Update Stefan Behr Another Project Change... CV stuff a bit too much given

The Magic Of Mentorship : Carol E. Murray Getting Value on Both Moderator Sides of the

Genera&ve Stochas&c Networks Trainable by Backprop Yoshua

Offload Mode Case Study James Briggs 1 COSMOS DiRAC April 28, 2015 - PowerPoint PPT Presentation

Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Offload Mode Case Study James Briggs 1 COSMOS DiRAC April 28, 2015 Case Study: Modal2d Surveying the Code Making it Offloadable Xeon Phi Performance Case

HIERARCHICAL QOS HARDWARE OFFLOAD Yossi Kuperman, Maxim Mikityanskiy, 2020 AGENDA Hierarchical

Control of switch-mode converters Current Programmed Mode control CPM Mor M. Peretz, Switch-Mode

Hardware accelerating Linux network functions Roopa Prabhu, Wilson Kok Proceedings of netdev

NATIVE MODE PROGRAMMING Fiona Reid Overview What is native mode? What codes are suitable

OFFLOAD MODE PROGRAMMING Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Overview

Case study 2 Case study 2 Case study 2 Case study 2 Former Industrial Site, London: How has

Direct fibre excitation with a digital laser 1 Proof of principle Mode transfer Mode detection

CMF2012F Series Common Mode SMD Filter for Signal Line FEATURES This common mode filter is

The standard mode of DSQSS/DLA Standard mode files Input files of dla Simple mode file

Org-mode Nick Higham April 22, 2013 Nick Higham Org-mode 1 / 7 University of Manchester What

NATIVE MODE PROGRAMMING Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Overview What is

switchport mode access switchport mode trunk switchport mode trunk

1 [9-4] Mor M. Peretz, Switch-Mode Power Supplies Current feedback loop I o L i o V o v o S V

Traps and Faults Traps and Faults Review: Mode and Space Review: Mode and Space C A B data

NATIVE MODE PORTING CASE STUDY Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Native mode

Selection of Mode of Presentation Selection of the mode of presentation online does not confirm

PDSF User Meeting June 4, 2013 Lisa Gerhardt Utilization - 2

The computational complexity of integer programming with alternations Igor Pak, UCLA Joint work

Gene-set analysis and data integration Le Leif if Wi Wigge leif.wigge@scilifelab.se

GSP Coordination Committee ing December Coor Coordina dination tion Commit Committee tee

A Formal Framework for UML Modeling with Timed Constraints: Application to Railway Control

Final Project Update Stefan Behr Another Project Change... CV stuff a bit too much given

The Magic Of Mentorship : Carol E. Murray Getting Value on Both Moderator Sides of the

Genera&amp;ve Stochas&amp;c Networks Trainable by Backprop Yoshua

Genera&ve Stochas&c Networks Trainable by Backprop Yoshua