Low-Overhead LogGP Parameter Assessment for Modern Interconnection - - PowerPoint PPT Presentation

low overhead loggp parameter assessment for modern
SMART_READER_LITE
LIVE PREVIEW

Low-Overhead LogGP Parameter Assessment for Modern Interconnection - - PowerPoint PPT Presentation

Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks T. Hoefler, A. Lichei, W. Rehm Open Systems Lab Computer Architecture Group Indiana University Technical University of Chemnitz Bloomington, USA Chemnitz, Germany


slide-1
SLIDE 1

Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks

  • T. Hoefler, A. Lichei, W. Rehm

Open Systems Lab Computer Architecture Group Indiana University Technical University of Chemnitz Bloomington, USA Chemnitz, Germany

IPDPS’07 - PMEO-PDS’07 Workshop

Long Beach, CA, USA

30th March 2007

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-2
SLIDE 2

Introduction

network performance prediction is important assess the runtime of parallel algorithms

  • ptimize communication patterns (e.g., collective)

runtime message scheduling (heterogeneous interfaces) Our approach We propose a contention-free LogGP parameter assessment method to be used in (changing) runtime environments.

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-3
SLIDE 3

Introduction

network performance prediction is important assess the runtime of parallel algorithms

  • ptimize communication patterns (e.g., collective)

runtime message scheduling (heterogeneous interfaces) Our approach We propose a contention-free LogGP parameter assessment method to be used in (changing) runtime environments.

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-4
SLIDE 4

The LogGP Model

CPU Network

  • s

L

  • r

level time g, G Sender Receiver g, G

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-5
SLIDE 5

The Tools to Measure

no central clock → measurements on one host only

client server client server

  • g

g

  • ...
  • L

L L L L L

...

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-6
SLIDE 6

Related Work

Culler et al. / Ianello et al. differentiates between os and or

  • s: issue small number (n) of sends and divide by n
  • r: delay between messages, larger as RTT, substract os

g: flood network L: RTT/2 - or - os (third order errors) Kielmann et al. changes the model to pLogP

  • s: time for a single send
  • r: time to copy the message from the receive buffer

g: flood network (if accurate) L: (RTT(0)-2g(0))/2 (second order errors)

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-7
SLIDE 7

Related Work

Culler et al. / Ianello et al. differentiates between os and or

  • s: issue small number (n) of sends and divide by n
  • r: delay between messages, larger as RTT, substract os

g: flood network L: RTT/2 - or - os (third order errors) Kielmann et al. changes the model to pLogP

  • s: time for a single send
  • r: time to copy the message from the receive buffer

g: flood network (if accurate) L: (RTT(0)-2g(0))/2 (second order errors)

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-8
SLIDE 8

Related Work

Bell et al. differentiates between os and or

  • s: uses delay between message sends (adjust delay until

d + o = g + (s − 1)G (multiple measurements) ⇒

  • s = g + (s − 1)G − d (second order errors)
  • r: similar to Culler et al.

g: flood network (similar to Kielmann et al.) L: not measured (network effects) EEL: end-to-end latency (RTT)

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-9
SLIDE 9

Our Approach

Definition of PRTT(n, d, s) parametrized round-trip-time n - number of successive messages d - delay between messages s - message size PRTT(n, d, s) and LogGP PRTT(1, 0, s) = 2 · (L + 2o + (s − 1)G) Gall = g + (s − 1)G PRTT(n, 0, s) = 2 · (L + 2o + (s − 1)G) + (n − 1) · Gall PRTT(n, 0, s) = PRTT(1, 0, s) + (n − 1) · Gall PRTT(n, d, s) = PRTT(1, 0, s) + (n − 1) · max{o + d, Gall}

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-10
SLIDE 10

Our Approach

Definition of PRTT(n, d, s) parametrized round-trip-time n - number of successive messages d - delay between messages s - message size PRTT(n, d, s) and LogGP PRTT(1, 0, s) = 2 · (L + 2o + (s − 1)G) Gall = g + (s − 1)G PRTT(n, 0, s) = 2 · (L + 2o + (s − 1)G) + (n − 1) · Gall PRTT(n, 0, s) = PRTT(1, 0, s) + (n − 1) · Gall PRTT(n, d, s) = PRTT(1, 0, s) + (n − 1) · max{o + d, Gall}

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-11
SLIDE 11

Assessment of the Overhead

Assessing o

PRTT(n,d,s)−PRTT(1,0,s) n−1

= max{o + d, Gall} we chose d > Gall

PRTT(n,d,s)−PRTT(1,0,s) n−1

= o + d we chose d = PRTT(1, 0, s) (fall back to d = PRTT(2, 0, s) for high gaps)

  • (s−1)G

L

  • (s−1)G

(s−1)G

  • d
  • (s−1)G

(s−1)G

  • PRTT(1,0,s)

(s−1)G

  • (s−1)G

d (s−1)G 2d+2o client server PRTT(n,d,s)

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-12
SLIDE 12

Assessment of the Gaps

Assessing g and G G(s − 1) + g = PRTT(n,0,s)−PRTT(1,0,s)

n−1

polynomial of degree 1 (linear function) ⇒ measue PRTT(n, 0, s) and PRTT(1, 0, s) for different s more measurement values increase accuracy (⇒ least squares linear fit) Detecting Protocol Changes

  • comm. subsystems use data-size dependent protocols

different parameters autodetection possible changes in the mean least squares deviation

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-13
SLIDE 13

Assessment of the Gaps

Assessing g and G G(s − 1) + g = PRTT(n,0,s)−PRTT(1,0,s)

n−1

polynomial of degree 1 (linear function) ⇒ measue PRTT(n, 0, s) and PRTT(1, 0, s) for different s more measurement values increase accuracy (⇒ least squares linear fit) Detecting Protocol Changes

  • comm. subsystems use data-size dependent protocols

different parameters autodetection possible changes in the mean least squares deviation

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-14
SLIDE 14

Assessment of the Gaps - Example Fit

4 6 8 10 12 14 16 18 20 2000 4000 6000 8000 10000 Time in microseconds Datasize in bytes (s) (PRTT(n,0,s)-PRTT(1,0,s))/(n-1) fit (g=4.408, G=0.00099)

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-15
SLIDE 15

Results

Netgauge support for multiple low-level Interfaces e.g., TCP , UDP , IB, GM, SCI, MPI ⇒ low-level and MPI measurements (lib overhead) different communication patterns (different measuements) implemented new pattern: loggp MPI Benchmark Environment MPICH2 1.0.3 for Gigabit Ethernet NMPI for SCI Open MPI 1.1 for InfiniBandTM/OFED Open MPI 1.1 for Myrinet/GM

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-16
SLIDE 16

Results

Netgauge support for multiple low-level Interfaces e.g., TCP , UDP , IB, GM, SCI, MPI ⇒ low-level and MPI measurements (lib overhead) different communication patterns (different measuements) implemented new pattern: loggp MPI Benchmark Environment MPICH2 1.0.3 for Gigabit Ethernet NMPI for SCI Open MPI 1.1 for InfiniBandTM/OFED Open MPI 1.1 for Myrinet/GM

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-17
SLIDE 17

TCP/IP over Gigabit Ethernet

100 200 300 400 500 600 10000 20000 30000 40000 50000 60000 Time in microseconds Datasize in bytes (s) MPICH2 - G*s+g MPICH2 - o TCP - G*s+g TCP o

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-18
SLIDE 18

InfiniBandTM/OFED

10 20 30 40 50 60 70 80 90 10000 20000 30000 40000 50000 60000 Time in microseconds Datasize in bytes (s) Open MPI - G*s+g Open MPI - o OpenIB - G*s+g OpenIB - o

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-19
SLIDE 19

Myrinet/GM

50 100 150 200 250 300 350 10000 20000 30000 40000 50000 60000 Time in microseconds Datasize in bytes (s) Open MPI - G*s+g Open MPI - o Myrinet/GM - G*s+g Myrinet/GM - o

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-20
SLIDE 20

SCI (MPI only)

50 100 150 200 250 10000 20000 30000 40000 50000 60000 Time in microseconds Datasize in bytes (s) NMPI - G*s+g NMPI - o

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-21
SLIDE 21

Numeric Results

Trans.

  • Prot. Int.

L (µs)

  • (1) (µs)

g (µs) G (µs/byte) TCP 1 ≤ s 45.74 3.46 0.915 0.00849 SCI 1 ≤ s < 12289 5.48 6.10 7.78 0.0045 12289 ≤ s 5.48 6.10 13.34 0.0037 OFED 1 ≤ s < 12289 5.96 4.72 5.14 0.00073 12289 ≤ s 5.96 4.72 21.39 0.00103 GM 1 ≤ s < 32769 10.53 1.27 9.44 0.0092 32769 ≤ s 10.53 1.27 52.01 0.0042

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-22
SLIDE 22

Conclusions and Future Work

Conclusions new measurement technique for LogGP parameters congestion-free low overhead accurate (no higher-order errors, multiple datapoints) detects protocol changes automatically Future Work apply scheme to heterogeneous NIC scheduling analyze communication schemes measure non-blocking MPI communication

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment

slide-23
SLIDE 23

Conclusions and Future Work

Conclusions new measurement technique for LogGP parameters congestion-free low overhead accurate (no higher-order errors, multiple datapoints) detects protocol changes automatically Future Work apply scheme to heterogeneous NIC scheduling analyze communication schemes measure non-blocking MPI communication

  • T. Hoefler, A. Lichei, W. Rehm

LogGP Parameter Assessment