Power/Performance Issues on Interconnection Network (IN) Is - - PDF document

power performance issues on interconnection network in
SMART_READER_LITE
LIVE PREVIEW

Power/Performance Issues on Interconnection Network (IN) Is - - PDF document

NsimPower: Interconnect Simulator for Power and Performance Prediction Koji Inoue Kyushu University, Japan 1 Power/Performance Issues on Interconnection Network (IN) Is interconnect power problem? Roughly 10 to 30 % of


slide-1
SLIDE 1

NsimPower: Interconnect Simulator for Power and Performance Prediction

  • Koji Inoue

Kyushu University, Japan

  • 1

Power/Performance Issues on Interconnection Network (IN)

  • Is interconnect power problem?

– Roughly 10 to 30 % of total power – Increase in the number of computing nodes – High-bandwidth/Low-latency requirements for strong scaling

  • Toward to power/energy efficient supercomputing

– Need to consider computing node, memory, and interconnection network at the same time! – Bandwidth, latency and energy efficiency optimization from the view point of interconnects

2

slide-2
SLIDE 2

Why We Need Interconnection Network Simulators?

  • For system designers

– Design space exploration for high-performance, power-efficient large scale supercomputers – Detailed analysis for hardware (e.g. buffer size) and software (e.g. all-to-all algorithm) design parameters

  • For application users

– Understand execution behavior of own programs – Can be exploited for program optimizations

3

WHAT IS NSIM?

4

slide-3
SLIDE 3

NSIM: Execution Driven Interconnection Network Simulator

  • C

BCB

A;9B D[SM

E]a DMOW

=O[O 7[kSaMU[

++
  • %
  • +
  • %
. )
  • (
¡ ¡ + ¡ ¡ ¡ ¡ ¡ ¡ % ¡ ¡

CBC DC

FUaMU[Fa A[U[USFa G[[[SeG[a%AT% :M&G E[aFO AD=CbTMPFa% 9O

Cg

  • eMW ¡+44)g

PNaR%O[a% 1% eMW%MS% 2/3 hg /EObNaR%O[a% 1% eMW&%MS% 2/% Ma3 h BCDB)3 h AD=&UW[aOO[P[

OUNO[M

EMW AM

5

NSIM Execution Image

  • GMSFe

[Fe

BF=A D[O ) BF=A D[O

  • BF=A

D[O + BF=A D[O , BF=A D[O

  • ‐‑–

BF=A D[O . BF=A D[O / BF=A D[O

AD=9bU[ =O[O

GMS B[P

GMS D[O

GMS B[P GMS D[O GMS B[P GMS D[O GMS B[P GMS D[O GMS B[P GMS D[O GMS B[P GMS D[O GMS B[P GMS D[O GMS B[P GMS D[O

  • D[O[)

7CE9 ) 7CE9

  • 7CE9

+ 7CE9 ,

  • D[O[

7CE9

  • ‐‑–

7CE9 . 7CE9 / 7CE9

9MOTT[[OUaMbMMS [O%[P%MP[a

n n

6

slide-4
SLIDE 4

Comparison with Other Simulators

  • DCB

3 C3 D ¡ CBC DCB3

  • b[

FMR[P H=H7 =6A eaTaHUb =F=G %:aVUa FUaMU[ AT[P 9dOaU[Ub GMOUb GMOUb 9dOaU[Ub DMMFUaMU[ &&& UO9b Ub CUUUO UO9b Ub CUUUO UO9b Ub 7[bMUb ;MaMUe :Ub DMOWb 9dDMR[ F]aUM9d DMMAMOTU

UUNaPA[e

DMMAMOTU

FTMPA[e

DMMAMOTU

UUNaPA[e

FUaMU[GMS FUf FM Si)B[P MS S,+i/-‐‑–B[P

KLJMeMP6G[c%jDUOUMPDMOUO[R=O[OU[Bc[W%mA[SMMaRMDaNUT=O+)), T2((ObMMR[PPa(N[[W(U( K+LB7T[aPTae%G ATM%G JUMT%96[T%MPIM%jFOMUSM[UUUOMMUaMU[[RMS&OM UO[OU[c[W%mD[O[RTJUFUaMU[7[RO%-‐‑–&0%O+)). K,LBE5PUSM%A56aUOT%7T%D 7[a%5;MM%A9;UMMM%D UPNS%FFUST%6FUMOT& 6a[c%G GMWW%AGM[%MPD IMM%j6a;([aUO[OU[c[W%m=6A[aM[REMOT b[%I[-‐‑–1%B[+(,%+/.l+0/%+)).

7

Accuracy

  • Other evaluation

– BlueGene/L (IBM) – Kei-Supercomputer (RIKEN/Fujitsu) – FX10 (Fujitsu)

=kU6MP:M&G%EMP[EUS%+A6AMS

8

slide-5
SLIDE 5

Simulation Performance

~The Case for Bruck’s All-to-All~

  • 2x2x2

4x2x2 4x4x2 4x4x4 8x4x4 8x8x4 8x8x8 16x8x8 16x16x8 16x16x16 32x16x16 32x32x16 32x32x32 64x32x32 Simulator Execution Time Node Size of 3D-Torus (XxYxZ) 4B (NSIM) 1024B (NSIM) 4B (BigNetSim) 1024B (BigNetSim) 60hour 1hour 1min 1sec 1/60s 1/3600s

9

EXTENSION FOR POWER- PERFORMANCE ANALYSIS

10

slide-6
SLIDE 6

Overview of NsimPower

  • C

BCB

A;9B D[SM

E]a DMOW

=O[O 7[kSaMU[

CBC DC

EMWAM

Extended NSIM (support low power Idle mode)

  • D[cMM
  • F&UTT[P

!me power Ac!ve Sleep Wakeup Ac!ve

Low$Power$Idle$$ Mode

PHY’s

Boxfish for visualization (LLNL)

D[c D[k

11

Chunk based Power Modeling

  • P

ij =

P

ACT

{ }

k=1 Nlink

+ P

BASE

Power of router-j in chunk-i

  • Ave. active link power
  • f router-j

#of links connected to router-j

  • Ave. static power
  • f router-j

Chunk-id Power[W] Chunk 1 2 t

12

slide-7
SLIDE 7

Traffic Power Consumption t

PBASE

No Traffic

Active Power

PACT

LPI Th. (timeout)

Mode Transition

PLPI

Mode Transition Latency Penalty

LPI mode

Supporting Low-Power Idle (LPI) Technology

  • Static Power

ACTIVE mode ACTIVE mode

LPI Th. (timeout)

13

Power Model Supporting Low- Power Idel Operations

  • P

ij =

P

ACT ×(1− RLPI−k)+ P LPI × RLPI−k

{ }

k=1 Nlink

+ P

BASE

Power of router-j in chunk-i

  • Ave. link active power
  • f router-j

LPI rate of link-k in chunk-i #of links connected to router-j

  • Ave. static power
  • f router-j
  • Ave. link idle power
  • f router-j

14

slide-8
SLIDE 8

CASE STUDY

15

Case7Study

Ave.7Base7Power7per7Router 17.807W7(1.0x,$0.25x)7 Ave.7Power7on7ACTIVE7mode7 per7link7 1.027W7 Ave.7Power7on7LPI7mode7 power7link 0.107W7 WakeLup7Transi!on7Time 07ns7L>7Ideal$case$ Sleep7Transi!on7Time 07ns7L>7Ideal$case$ LPI7Threshold 07μs7L>7Ideal$case$ Chunk7Length 50,0007ns7 Topology 3D7Torus7(8x8x8) Link7Bandwidth 5GB/s7 Packet7Size 2,0487B7 Communica!on AllLtoLAll7(simple7spread)

16

slide-9
SLIDE 9

Potential of LPI Optimization

  • w/o LPI

w/ LPI

PBASE=17.8W PBASE=4.5W (0.25x) 12.25 KW 10.36 KW (-15.4%) 3.58 KW (-33.8%)

w/o LPI w/ LPI

5.41 KW

17

Concusions

  • Summary

– NsimPower: Large Scale Interconnection Network Simulator for Power/Performance Analysis – Japan/US collaborated work

  • On-going Work

– Verify the accuracy of power estimation – Apply to large-scale power-performance analysis – Extend to system-wide power-performance prediction

18

slide-10
SLIDE 10