all the things you need to know about intel mpi library
play

All the things you need to know about Intel MPI Library Jerome - PowerPoint PPT Presentation

All the things you need to know about Intel MPI Library Jerome Vienne viennej@tacc.utexas.edu Texas Advanced Computing Center The University of Texas at Austin Austin, TX November 12th, 2016 A Heterogeneous Environment MPI performance


  1. All the things you need to know about Intel MPI Library Jerome Vienne viennej@tacc.utexas.edu Texas Advanced Computing Center The University of Texas at Austin Austin, TX November 12th, 2016

  2. A Heterogeneous Environment MPI performance depends on many factors MPI libraries have to make choices Why ? Because the number of combinations is too large. Are these choices optimal for my application ? Not necessarily. Can we change them ? Yes, this is why we are there. All the things you need to know about Intel MPI Library | November 12th, 2016 | 2 ▶ CPUs (Number of cores, Cache sizes, Frequency) ▶ Memory (Amount, Frequency) ▶ Network Speed (10,20,40 … Gbit/s) ▶ Size of the job ▶ Type of code: Hybrid (ex: OpenMP+MPI) or Pure MPI

  3. A Heterogeneous Environment MPI performance depends on many factors MPI libraries have to make choices All the things you need to know about Intel MPI Library | November 12th, 2016 | 2 ▶ CPUs (Number of cores, Cache sizes, Frequency) ▶ Memory (Amount, Frequency) ▶ Network Speed (10,20,40 … Gbit/s) ▶ Size of the job ▶ Type of code: Hybrid (ex: OpenMP+MPI) or Pure MPI ▶ Why ? Because the number of combinations is too large. ▶ Are these choices optimal for my application ? Not necessarily. ▶ Can we change them ? Yes, this is why we are there.

  4. Aim of this talk your MPI application All the things you need to know about Intel MPI Library | November 12th, 2016 | 3 ▶ ”How to tune MPI” cannot be found easily inside books. ▶ Show that MPI libraries are not black boxes. ▶ Describe concepts that are common inside MPI libraries. ▶ Understand the difgerence between MPI libraries. ▶ Provide some useful commands for Intel MPI. ▶ Result: Help you to reduce the time and memory foot print of

  5. Before to start Warnings ‼! | November 12th, 2016 | All the things you need to know about Intel MPI Library worth it the most important ones. 4 common. TACC. OpenMPI). ▶ Talk based on Intel MPI (few references to MVAPICH2 and ▶ All experiments were done on Stampede supercomputer at ▶ Tuning options are specific to a MPI library ! But concepts are ▶ Options can have counter-efgects ! ▶ MPI libraries have lot of options for tuning, we will only cover ▶ Tuning could be time consuming, but long-term, it might be

  6. Plan Collective Tuning | November 12th, 2016 | All the things you need to know about Intel MPI Library To conclude Intra-node Point-to-Point Optimization Inter-node Point-to-Point Optimization To conclude Process Placement Hostfile Profiling The Choice of the Benchmark 5 • Basic Tuning • Intermediate Tuning • Conclusion

  7. Plan Collective Tuning | November 12th, 2016 | All the things you need to know about Intel MPI Library To conclude Intra-node Point-to-Point Optimization Inter-node Point-to-Point Optimization To conclude Process Placement Hostfile Profiling The Choice of the Benchmark 6 • Basic Tuning • Intermediate Tuning • Conclusion

  8. Plan Collective Tuning | November 12th, 2016 | All the things you need to know about Intel MPI Library To conclude Intra-node Point-to-Point Optimization Inter-node Point-to-Point Optimization To conclude Process Placement Hostfile Profiling The Choice of the Benchmark 7 • Basic Tuning • Intermediate Tuning • Conclusion

  9. The Choice of Benchmarks Difgerent MPI library = Tuning Based on Difgerent Benchmarks IMB or OMB, which one is the best to use ? Both are communication intensive without computation Depend on your application The best benchmark is your application ! But… let’s take a look at them in detail ! All the things you need to know about Intel MPI Library | November 12th, 2016 | 8 ▶ Intel MPI: Intel MPI Benchmarks (IMB) ▶ MVAPICH2: OSU Micro-Benchmarks (OMB)

  10. The Choice of Benchmarks Difgerent MPI library = Tuning Based on Difgerent Benchmarks IMB or OMB, which one is the best to use ? But… let’s take a look at them in detail ! All the things you need to know about Intel MPI Library | November 12th, 2016 | 8 ▶ Intel MPI: Intel MPI Benchmarks (IMB) ▶ MVAPICH2: OSU Micro-Benchmarks (OMB) ▶ Both are communication intensive without computation ▶ Depend on your application ▶ The best benchmark is your application !

  11. The Choice of Benchmarks Difgerent MPI library = Tuning Based on Difgerent Benchmarks IMB or OMB, which one is the best to use ? But… let’s take a look at them in detail ! All the things you need to know about Intel MPI Library | November 12th, 2016 | 8 ▶ Intel MPI: Intel MPI Benchmarks (IMB) ▶ MVAPICH2: OSU Micro-Benchmarks (OMB) ▶ Both are communication intensive without computation ▶ Depend on your application ▶ The best benchmark is your application !

  12. Intel MPI Benchmarks (IMB) Details (IMB-MPI1) All the things you need to know about Intel MPI Library | November 12th, 2016 | 9 ▶ Originally know as Pallas MPI Benchmarks (PMB) ▶ Support Point-to-Point and Collective operations ▶ 1 program with lot of options for classical MPI functions ▶ Root changes afuer each iteration for collectives

  13. Intel MPI Benchmarks (IMB) Intel MPI vs MVAPICH2 using IMB Bcast with 256 cores | November 12th, 2016 | All the things you need to know about Intel MPI Library 9 10000 Mvapich2 2.2 Intel MPI 2017 1000 Time (us) 100 10 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Message Size (Bytes)

  14. OSU Micro-Benchmarks (OMB) Details All the things you need to know about Intel MPI Library | November 12th, 2016 | 10 ▶ Very simple to use ▶ Support Point-to-Point and Collective operations ▶ Multiples programs with simple options ▶ Keep the same root during all iterations + use barrier

  15. OSU Micro-Benchmarks (OMB) Intel MPI vs MVAPICH2 using OMB Bcast with 256 cores | November 12th, 2016 | All the things you need to know about Intel MPI Library 10 1000 Mvapich2 2.2 Intel MPI 2017 100 Time (us) 10 1 4 16 64 256 1K 4K 16K 64K 256K 1M Message Size (Bytes)

  16. OSU Micro-Benchmarks (OMB) Tuned Intel MPI vs MVAPICH2 using OMB Bcast with 256 | November 12th, 2016 | All the things you need to know about Intel MPI Library 10 cores 10000 Mvapich2 2.2 Intel MPI 2017 1000 Time (us) 100 10 1 4 16 64 256 1K 4K 16K 64K 256K 1M Message Size (Bytes)

  17. Benchmarks: What you need to know To resume two MPI libraries be painful, we will see it later :) All the things you need to know about Intel MPI Library | November 12th, 2016 | 11 ▶ Don’t trust them ! ▶ They have difgerent behaviors: so, KNOW your benchmark ! ▶ Don’t provide you necessarily the best results by default. ▶ Be sure that you tune things correctly if you want to compare ▶ Collective tuning for a particular benchmark/application could

  18. Plan Collective Tuning | November 12th, 2016 | All the things you need to know about Intel MPI Library To conclude Intra-node Point-to-Point Optimization Inter-node Point-to-Point Optimization To conclude Process Placement Hostfile Profiling The Choice of the Benchmark 12 • Basic Tuning • Intermediate Tuning • Conclusion

  19. To know what you need to tune first Why MPI profiling is important ? choices: communications (size, time spent, functions called etc…) Scalasca, IPM, mpiP …) All the things you need to know about Intel MPI Library | November 12th, 2016 | 13 ▶ To identify which MPI functions are used, you have two ▶ Look at the code ▶ Profile your application ▶ Profiling provides you all the information regarding MPI ▶ Could be integrated in the MPI library (ex: Intel MPI) ▶ Lot of tools can help you to profile your application (TAU,

  20. How to profile ? With Intel MPI at runtime mpiexec -genv I_MPI_STATS=ipm I_MPI_STATS_FILE=myprofile.txt …. Tools All the things you need to know about Intel MPI Library | November 12th, 2016 | 14 ▶ MPI Performance Snapshots (MPS) ▶ Intel Trace Analyzer and Collector (ITAC)

  21. Plan Collective Tuning | November 12th, 2016 | All the things you need to know about Intel MPI Library To conclude Intra-node Point-to-Point Optimization Inter-node Point-to-Point Optimization To conclude Process Placement Hostfile Profiling The Choice of the Benchmark 15 • Basic Tuning • Intermediate Tuning • Conclusion

  22. Impact of the hostfile Example of command: mpirun -np 4 -hostfile host ./a.out difgerent results ! All the things you need to know about Intel MPI Library | November 12th, 2016 | 16 ▶ Hostfile provides the list of nodes that will be used ▶ Depending on the MPI library, the same hostfile could lead to

  23. A Qvick Performance Example Intel MPI | November 12th, 2016 | All the things you need to know about Intel MPI Library 19 sec. Correct Hostfile/Command: Default: 51 sec. + Process Placement: 19 sec. NAS SP-MZ on Stampede Correct Hostfile: 176 sec. Default: 176 sec. Mvapich2 node2 node1 mpirun -np 4 -hostfile host ./sp-mz.C.4 2 nodes, 2 MPI tasks/node with 8 OpenMP threads 17

  24. A Qvick Performance Example Intel MPI | November 12th, 2016 | All the things you need to know about Intel MPI Library 19 sec. Correct Hostfile/Command: Default: 51 sec. + Process Placement: 19 sec. NAS SP-MZ on Stampede Correct Hostfile: 176 sec. Default: 176 sec. Mvapich2 node2 node1 mpirun -np 4 -hostfile host ./sp-mz.C.4 2 nodes, 2 MPI tasks/node with 8 OpenMP threads 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend