XEON PHI BASICS
Adrian Jackson
adrianj@epcc.ed.ac.uk @adrianjhpc
XEON PHI BASICS Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc - - PowerPoint PPT Presentation
XEON PHI BASICS Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Xeon Phi Basics Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
Adrian Jackson
adrianj@epcc.ed.ac.uk @adrianjhpc
Xeon Phi Basics
This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_US
This means you are free to copy and redistribute the material and adapt and build on the material under the following terms: You must give appropriate credit, provide a link to the license and indicate if changes were made. If you adapt or build on the material you must distribute your work under the same license as the original. Note that presentations may contains images owned by others. Please seek their permission before reusing these images.
Xeon Phi Basics
Xeon Phi Basics
Xeon Phi Basics
Programming models
Main Memory
Xeon Phi Basics
Programming models
Main Memory
Xeon Phi Basics
Programming models
Host
Main Memoryssh
(PCIe)
int main() { do stuff(); } int main() { do stuff(); }
Xeon Phi Basics
Programming models
Host
Main MemoryXeon Phi via ssh
“as usual”
ssh
(PCIe)
int main() { do stuff(); }
Xeon Phi Basics
Programming models
Pros:
Xeon Phi Basics
Programming models
Pros:
Cons:
serial regions and ‘complex codes’
Xeon Phi Basics
Programming models
Host
Main Memoryssh
(PCIe)
Coprocessor
do_stuff(){ … } do_stuff(){ … } int … #pr do_ … } int main() { … #pragma offload do_stuff() … }
Xeon Phi Basics
Programming models
Host
Main MemoryXeon Phi
ssh
(PCIe)
Coprocessor
do_stuff(){ … } int … #pr do_ … } int main() { … #pragma offload do_stuff() … }
Xeon Phi Basics
Programming models
Host
Main MemoryXeon Phi
where execution continues
ssh
(PCIe)
Coprocessor
do_stuff(){ … } do_stuff(){ … } int … #pr do_ … } int main() { … #pragma offload do_stuff() … }
Xeon Phi Basics
Programming models
Pros:
CPU cores
executed efficiently on Xeon Phi
Phi memory
Xeon Phi Basics
Programming models
Pros:
efficiently on Xeon Phi
Cons:
Xeon Phi via (slow) PCIe Bus
CPU/XeonPhi (idle time)
Xeon Phi Basics
Programming models
Host
Main Memoryssh
(PCIe)
Coprocessor
int … do_ … } int main() { … do_stuff() … } int main() { … do_stuff() … }
MPI_RANK=0…15 MPI_RANK=16…255
Xeon Phi Basics
Programming models
Host
Main Memoryssh
(PCIe)
Coprocessor
int … do_ … } int main() { … do_stuff() … } int main() { … do_stuff() … }
MPI_RANK=0…15 MPI_RANK=16…255
Xeon Phi Basics
Programming models
Host
Main Memoryfor MPI to use
ssh
(PCIe)
Coprocessor
int … do_ … } int main() { … do_stuff() … } int main() { … do_stuff() … }
MPI_RANK=0…15 MPI_RANK=16…255
Xeon Phi Basics
Programming models
Pros:
memory copies
Xeon Phi Basics
Programming models
Pros:
executed efficiently on Xeon Phi
memory
Cons:
and Xeon Phi
Xeon Phi Basics
Xeon Phi Basics
Parallelisation
Xeon Phi Basics
Parallelisation
Xeon Phi
Image from Colfax training material
Xeon Phi Basics
Parallelisation
Xeon Phi
Image from Colfax training material
Xeon Phi Basics
Parallelisation
Xeon Phi
OpenMP multithreading
Image from Colfax training material
Xeon Phi Basics
Parallelisation
Xeon Phi Basics
Xeon Phi Basics
Compilers & Tools
Xeon Phi Basics
Compilers & Tools
Xeon Phi Basics
Compilers & Tools
(but mainly Intel)
Xeon Phi Basics
Compilers & Tools
Parallel Studio XE
profiler)
library)
profiler)
for DDT & Map)
Xeon Phi Basics
Compilers & Tools
Xeon Phi Basics
Compilers & Tools
(Intel Manycore Platform Software Stack)
Xeon Phi Basics
Compilers & Tools
…
For more details: http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/xeon-phi- software-configuration-users-guide.pdf https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID- E1EC94AE-A13D-463E-B3C3-6D7A7205F5A1.htm
…
proc | tail -n 3 …
Xeon Phi Basics
Xeon Phi Basics
Performance Considerations
Xeon Phi Basics
Performance Considerations
Xeon Phi Basics
Performance Considerations
dependant on vector units.
units → Code optimised for Intel Xeon will run faster on Intel Xeon Phi
use 512-AVX vector units → Code
run faster on Intel Xeon Phi KNL
*(KNC-KNL not binary compatible)
Xeon Phi Basics
Performance Considerations
dependant on vector units.
vector units → Code optimised for Intel Xeon will run faster on Intel Xeon Phi
use 512-AVX vector units → Code
run faster on Intel Xeon Phi KNL
*(KNC-KNL not binary compatible)
Xeon Phi Basics
Performance Considerations
dependant on vector units.
units → Code optimised for Intel Xeon will run faster on Intel Xeon Phi
use 512-AVX vector units → Code
also run faster on Intel Xeon Phi KNL
*(KNC-KNL not binary compatible)
Xeon Phi Basics
Performance Considerations
Xeon Phi Basics
Performance Considerations
Xeon Phi Basics
Performance Considerations
Xeon Phi Basics
Performance Considerations
interconnect
poor settings.
e.g. 240 threads competing for the use of 30 cores, while 30 other cores are idle.
Xeon Phi Basics
Performance Considerations
interconnect
poor settings.
e.g. 240 threads competing for the use of 30 cores, while 30 other cores are idle.
Xeon Phi Basics
Performance Considerations
interconnect
poor settings.
e.g. 240 threads competing for the use of 30 cores, while 30 other cores are idle.
Xeon Phi Basics
Performance Considerations
interconnect
poor settings.
e.g. 240 threads competing for the use of 30 cores, while 30 other cores are idle.
Xeon Phi Basics
Performance Considerations
and adapt your algorithm to exploit as many and as much as possible
Xeon Phi Basics
Thread
Performance Considerations
Xeon Phi Basics
Thread
Performance Considerations
Xeon Phi Basics
Thread
Performance Considerations
Xeon Phi Basics
Thread
Performance Considerations
Xeon Phi Basics
Thread Thread Thread
Thread
Performance Considerations
Xeon Phi Basics
Thread Thread Thread
Thread
Performance Considerations
Xeon Phi Basics
Thread Thread Thread
Thread
Performance Considerations
Xeon Phi Basics
Thread Thread Thread
Thread
Performance Considerations
Xeon Phi Basics
Thread Thread Thread
Performance Considerations
Thread
Xeon Phi Basics
Xeon Phi Basics
variables: micinfo, KMP_AFFINITY
Summary
Xeon Phi Basics
variables: micinfo, KMP_AFFINITY
Summary
Xeon Phi Basics
variables: micinfo, KMP_AFFINITY
Summary
Xeon Phi Basics
variables: micinfo, KMP_AFFINITY
Summary
Xeon Phi Basics