power performance issues on interconnection network in
play

Power/Performance Issues on Interconnection Network (IN) Is - PDF document

NsimPower: Interconnect Simulator for Power and Performance Prediction Koji Inoue Kyushu University, Japan 1 Power/Performance Issues on Interconnection Network (IN) Is interconnect power problem? Roughly 10 to 30 % of


  1. NsimPower: Interconnect Simulator for Power and Performance Prediction � Koji Inoue Kyushu University, Japan � 1 Power/Performance Issues on Interconnection Network (IN) � • Is interconnect power problem? – Roughly 10 to 30 % of total power – Increase in the number of computing nodes – High-bandwidth/Low-latency requirements for strong scaling • Toward to power/energy efficient supercomputing – Need to consider computing node, memory, and interconnection network at the same time! – Bandwidth, latency and energy efficiency optimization from the view point of interconnects 2

  2. Why We Need Interconnection Network Simulators? � • For system designers – Design space exploration for high-performance, power-efficient large scale supercomputers – Detailed analysis for hardware (e.g. buffer size) and software (e.g. all-to-all algorithm) design parameters • For application users – Understand execution behavior of own programs – Can be exploited for program optimizations 3 WHAT IS NSIM? � 4

  3. � � � � � NSIM: Execution Driven Interconnection Network Simulator � EM�W� =����O[���O�� A;9B�� AM�� 7[�kSa�M�U[�� D�[S�M�� ��C �� ��������� ���g� �� �����e�M�W ¡+��44�)���g� FU�a�M�U[��F��a�� �������� ��������� ����PNaR%O[a��%� �������1 %� A[�U�[�U�S�F��a�� ������������e�M�W��%��MS%� ����������2�/�� �3� �����C� ����h������g� E[a����F��O�� ����B�C�B� �������� �����/��E ���ObNaR%O[a��%� �������1 %�� ��������������e�M�W&�%�MS%� G[�[�[Se��G[�a�%�A��T%� E�]a���� DMOW��� ����������2�/�� %�� :M�&G����� ����������������M�a��3� ����h� ��C�B������C� ���� B�CDB� �)3� AD=�Cb��T�MP�F��a�%� h� ���D��C����������� 9�O�� � AD=&�UW���[a�O��O[P���[� ���O�UN��O[�����M������ ���� �� �+� � ��� ¡� ������) � �� ¡� ��� � +� ¡� ��� � �� ¡� � �� ¡� �.�� �%� � �� ¡� � �� ¡� ��� � �� ¡� ++� � %� ¡� �������� �������� �������� �������� %������� � �� ¡� � � �� ������(� �� � 5 NSIM Execution Image � n n GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� D�[O���� D�[O���� D�[O���� D�[O���� D�[O���� D�[O���� D�[O���� D�[O���� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� B[P�� B[P�� B[P�� B[P�� B[P�� B[P�� B[P�� B[P�� GM�S���Fe����� =����O[���O�� 9MOT�T[�����[O�����U�a�M�����b��M���M�S��� ��[O�����%��[P��%�M�P��[a����� BF=A� BF=A� BF=A� BF=A� BF=A� BF=A� BF=A� BF=A� D�[O��� D�[O��� D�[O��� D�[O��� D�[O��� D�[O��� D�[O��� D�[O��� )� �� +� ,� -‐‑–� .� /� 0� �[���Fe����� AD=�9�bU�[������ 7CE9� 7CE9� 7CE9� 7CE9� 7CE9� 7CE9� 7CE9� 7CE9� )� � +� ,� -‐‑–� .� /� 0� D�[O���[��)� D�[O���[���� 6

  4. Comparison with Other Simulators � ��D����� ¡�� ���D��C�B� ��������3��� �����C����3��� ��C�B������C� ����� ���D��C�B3��� �ea�Ta�H�Ub�� ��b��[���� F�M�R[�P� H=H7� =6A� =F=G %�:aVU��a� FU�a�M�U[�� 9d�Oa�U[����Ub��� G�MO����Ub��� G�MO����Ub��� 9d�Oa�U[����Ub��� A��T[P� �U�O�����9b���� �U�O�����9b���� �U�O�����9b���� ��Ub��� ��Ub��� ��Ub��� DM�M�����FU�a�M�U[��� &&&� �C��U�U��UO�� �C��U�U��UO�� �7[����bM�Ub��� ;�M�a�M�U�e� :�U����b��� DMOW�����b��� DM�M�����AMOTU��� DM�M�����AMOTU��� DM�M�����AMOTU��� 9d���D�M�R[��� F�]a���UM��9d��� ��U���UNa��P�A��[�e�� �FTM��P�A��[�e�� ��U���UNa��P�A��[�e�� FU�a�M�U[��GM�S��� F�M��� �M�S�� FUf�� ���S��i�)�B[P���� ���S��,+�i/-‐‑–��B[P���� K�L�J���M��e�M�P�6��G[c���%�jD�U�OU�����M�P�D�MO�UO���[R�=����O[���O�U[��B��c[�W�%m�A[�SM���MaR�M���DaN�U�T����=�O��+)),�� T���2((ObM���M�R[�P��Pa(N[[W�(��U�(� K+L�B��7T[aPTa�e%�G ��A�T�M%�G �����JU��M��T%�9�����6[T�%�M�P����I���M��%�jFOM�U�S�M��[��U�U��UO��M�M������U�a�M�U[��[R��M�S�&�OM��� U����O[���O�U[�����c[�W�%m�D�[O��[R��T��JU�����FU�a�M�U[��7[�R����O�%�����-‐‑–&0%���O�+)).� K,L�B��E��5PUSM%�A��5��6�a��UOT%����7T��%�D ��7[��a�%�5��;M�M%�A��9��;UM��M�M%�D ����UP��N��S��%�F��FU�ST%�6�����F��U��MOT��& 6a�[c%�G ��GMWW��%�A��G�M[%�M�P�D ��I�M�M�%�j6�a��;���(���[�a��U����O[���O�U[�����c[�W%m�=6A��[a��M��[R��E���M�OT��� 7 ��b��[�����%�I[���-‐‑–1%�B[��+(,%����+/.l+0/%�+)).�� Accuracy � =�k�U6M�P�:M�&G���%�EM�P[��EU�S%�+A6�A���MS��� • Other evaluation – BlueGene/L (IBM) – Kei-Supercomputer (RIKEN/Fujitsu) – FX10 (Fujitsu) 8

  5. Simulation Performance ~The Case for Bruck’s All-to-All~ � 4B (NSIM) 1024B (NSIM) 4B (BigNetSim) 1024B (BigNetSim) 60hour Simulator Execution Time 1hour 1min 1sec 1/60s 1/3600s 2x2x2 4x2x2 4x4x2 4x4x4 8x4x4 8x8x4 8x8x8 16x16x8 16x16x16 32x16x16 32x32x16 32x32x32 64x32x32 16x8x8 Node Size of 3D-Torus (XxYxZ) 9 EXTENSION FOR POWER- PERFORMANCE ANALYSIS � 10

  6. Overview of NsimPower � Boxfish Extended NSIM for visualization (support low power Idle mode) � (LLNL) � A;9B�� �����C� D�[S�M�� ����B�C�B� EM�W�AM�� E�]a���� DMOW��� ��C�B������C� D[c��� =����O[���O�� ���D��C����������� D�[k��� 7[�kSa�M�U[�� • D[c����M�M������ • F����&�U����T���T[�P� PHY’s � Low$Power$Idle$$ power � Wakeup � Sleep � Mode � Ac!ve � Ac!ve � !me � 11 Chunk based Power Modeling � Power of router- j #of links connected in chunk- i to router- j Nlink ∑ { } P P + P ij = ACT BASE k = 1 Ave. active link power Ave. static power of router- j of router- j Chunk � Power[W] � t � 1 � 2 � 12 Chunk-id �

  7. Supporting Low-Power Idle (LPI) Technology � LPI mode ACTIVE mode ACTIVE mode Mode Mode LPI Th. LPI Th. Transition Transition (timeout) (timeout) Power Consumption Active Power P LPI P ACT Static Power P BASE t Traffic No Traffic Latency Penalty 13 Power Model Supporting Low- Power Idel Operations � Power of router- j #of links connected LPI rate of link- k in chunk- i to router- j in chunk- i Nlink ∑ { } P P ACT × (1 − R LPI − k ) + P LPI × R LPI − k + P ij = BASE k = 1 Ave. link active power Ave. link idle power Ave. static power of router- j of router- j of router- j 14

  8. CASE STUDY � 15 Case7Study � Ave.7Base7Power7per7Router � 17.807W7( 1.0x,$0.25x )7 Ave.7Power7on7ACTIVE7mode7 1.027W7 per7link7 Ave.7Power7on7LPI7mode7 0.107W7 power7link � WakeLup7Transi!on7Time � 07ns7L>7 Ideal$case$ Sleep7Transi!on7Time � 07ns7L>7 Ideal$case$ LPI7Threshold � 07μs7L>7 Ideal$case$ Chunk7Length � 50,0007ns7 Topology � 3D7Torus7(8x8x8) � Link7Bandwidth � 5GB/s7 Packet7Size � 2,0487B7 Communica!on � AllLtoLAll7(simple7spread) � 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend