R ELATED W ORK : P ERFORMANCE E VALUATION IN C LUSTERS Analytic - - PowerPoint PPT Presentation

r elated w ork
SMART_READER_LITE
LIVE PREVIEW

R ELATED W ORK : P ERFORMANCE E VALUATION IN C LUSTERS Analytic - - PowerPoint PPT Presentation

The imagination driving Australias ICT future. C OMPREHENSIVE T HROUGHPUT E VALUATION OF LAN S IN C LUSTERS OF PC S WITH S WITCHBENCH or How to Bring Your Switch to Its Knees Felix Rauch National ICT Australia felix.rauch@nicta.com.au The


slide-1
SLIDE 1

The imagination driving Australia’s ICT future.

COMPREHENSIVE THROUGHPUT EVALUATION OF LANS IN CLUSTERS

OF PCS WITH SWITCHBENCH

  • r

How to Bring Your Switch to Its Knees

Felix Rauch National ICT Australia felix.rauch@nicta.com.au

slide-2
SLIDE 2

The imagination driving Australia’s ICT future.

CLUSTERS OF PCS

Harness the power of many compute nodes coupled together.

Rack-mounted compute cluster Network of workstations

Successful because:

  • Commodity off-the-shelf components (PCs, LAN)
  • Often do-it-yourself approach
  • Cost-effective high-performance computing

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 2

slide-3
SLIDE 3

The imagination driving Australia’s ICT future.

UNDERSTANDING PERFORMANCE

IN CLUSTERS OF COMMODITY PCS

PC node PC node PC node PC node PC node PC node PC node PC node

Switchbench measures the overall network performance.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 3

slide-4
SLIDE 4

The imagination driving Australia’s ICT future.

UNDERSTANDING PERFORMANCE

IN CLUSTERS OF COMMODITY PCS

PC node PC node PC node PC node PC node PC node PC node PC node

Switchbench measures the overall network performance.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 4

slide-5
SLIDE 5

The imagination driving Australia’s ICT future.

UNDERSTANDING PERFORMANCE

IN CLUSTERS OF COMMODITY PCS

PC node PC node PC node PC node PC node PC node PC node PC node

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 5

slide-6
SLIDE 6

The imagination driving Australia’s ICT future.

UNDERSTANDING PERFORMANCE

IN CLUSTERS OF COMMODITY PCS

PC node PC node PC node PC node PC node PC node PC node PC node

Switchbench measures the overall network performance.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 5-A

slide-7
SLIDE 7

The imagination driving Australia’s ICT future.

OVERVIEW

  • Introduction
  • Network Performance
  • Evaluation principles
  • Switchbench microbenchmarks with evaluation examples
  • Conclusions

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 6

slide-8
SLIDE 8

The imagination driving Australia’s ICT future.

NETWORK PERFORMANCE IN CLUSTERS OF PCS

Supercomputers:

  • Balanced
  • Full bisection
  • Remote deposit

➜ Built by design

Commodity Clusters:

  • Cheap (commodity) parts
  • One-fits-all (LAN)
  • Sometimes hacks to im-

prove performance

➜ Built by shopping

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 7

slide-9
SLIDE 9

The imagination driving Australia’s ICT future.

NETWORK PERFORMANCE IN CLUSTERS OF PCS

Supercomputers:

  • Balanced
  • Full bisection
  • Remote deposit

➜ Built by design

Commodity Clusters:

  • Cheap (commodity) parts
  • One-fits-all (LAN)
  • Sometimes hacks to im-

prove performance

➜ Built by shopping

Problems when choosing commodity components (they are all different!):

  • make sure products adhere to specifications (not all do!)
  • know performance characteristics (they differ widely!)

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 7-A

slide-10
SLIDE 10

The imagination driving Australia’s ICT future.

NETWORK PERFORMANCE IN CLUSTERS OF PCS

Supercomputers:

  • Balanced
  • Full bisection
  • Remote deposit

➜ Built by design

Commodity Clusters:

  • Cheap (commodity) parts
  • One-fits-all (LAN)
  • Sometimes hacks to im-

prove performance

➜ Built by shopping

Problems when choosing commodity components (they are all different!):

  • make sure products adhere to specifications (not all do!)
  • know performance characteristics (they differ widely!)

➜ Need benchmark tools for comprehensive evaluation.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 7-B

slide-11
SLIDE 11

The imagination driving Australia’s ICT future.

RELATED WORK: PERFORMANCE EVALUATION IN CLUSTERS

Analytic models:

  • LogP (Culler 1993)
  • LogGP (Alexandrov 1995)

Overall benchmark for parallel machines:

  • High-Performance Linpack (Dongarra 1979)

Point-to-point network benchmarks:

  • Netperf (Jones)
  • NetPIPE (Turner)
  • TTCP (PCAUSA)

Distributed network benchmark framework:

  • IPbench (Wienand 2004)

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 8

slide-12
SLIDE 12

The imagination driving Australia’s ICT future.

BANDWITH VS. LATENCY

How to evaluate networks / switches? Latency vs. bandwidth:

  • Latency mostly “given by nature”.

Addressed with latency hiding techniques.

  • One can purchase (additional) bandwidth.

There are more interesting cost/performance tradeoffs for additional bandwidth than for lower latency.

➜ Focus on bandwidth

How to measure bandwith of entire networks?

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 9

slide-13
SLIDE 13

The imagination driving Australia’s ICT future.

NETWORK LIMITATIONS

Three main limitations: End nodes Hardware: Network interface controller, CPU, memory, I/O bus. Software: Communication protocol stack. Switches Processing limit (number of packets per second). Internal bandwidth limitation. Bisection bandwidth Network architecture (topology).

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 10

slide-14
SLIDE 14

The imagination driving Australia’s ICT future.

FULL BISECTION BANDWIDTH

A network with N nodes has full bisection bandwidth if the sum of the link bandwidths between any two halves of the network is N/2 times the bandwidth of a single link. ⇔ Nodes of any two halves can communicate at full speed with each other.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 11

slide-15
SLIDE 15

The imagination driving Australia’s ICT future.

FULL BISECTION BANDWIDTH

A network with N nodes has full bisection bandwidth if the sum of the link bandwidths between any two halves of the network is N/2 times the bandwidth of a single link. ⇔ Nodes of any two halves can communicate at full speed with each other.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 12

slide-16
SLIDE 16

The imagination driving Australia’s ICT future.

FULL BISECTION BANDWIDTH

A network with N nodes has full bisection bandwidth if the sum of the link bandwidths between any two halves of the network is N/2 times the bandwidth of a single link. ⇔ Nodes of any two halves can communicate at full speed with each other.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 13

slide-17
SLIDE 17

The imagination driving Australia’s ICT future.

FULL BISECTION BANDWIDTH

A network with N nodes has full bisection bandwidth if the sum of the link bandwidths between any two halves of the network is N/2 times the bandwidth of a single link. ⇔ Nodes of any two halves can communicate at full speed with each other.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 14

slide-18
SLIDE 18

The imagination driving Australia’s ICT future.

FULL BISECTION BANDWIDTH

A network with N nodes has full bisection bandwidth if the sum of the link bandwidths between any two halves of the network is N/2 times the bandwidth of a single link. ⇔ Nodes of any two halves can communicate at full speed with each other. Important for programs with global communication patterns. Important communication pattern requiring full bisection:

  • All-to-all personalised communication (AAPC).

Every node exchanges some data with every other node.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 15

slide-19
SLIDE 19

The imagination driving Australia’s ICT future.

IMPLEMENTATION

  • Based on earlier work done at ETH Zurich, together with
  • C. Kurmann & T. Stricker.
  • GNU public license.
  • Core functionality in two small C programs.
  • Shell scripts support:

– starting programs on many nodes (by ssh) – specify node ranges – reordering of virtual node numbers to match physical layout

  • Results in human-readable text file.
  • Implemented and tested on GNU/Linux.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 16

slide-20
SLIDE 20

The imagination driving Australia’s ICT future.

BENCHMARK: DAISY CHAIN

Virtual TCP daisy chain through an increasing number of nodes.

PC node PC node PC node PC node PC node PC node PC node PC node

V Next-neighbour communication X Bisection bandwidth not tested V Full-speed duplex connections on all ports V Limited by switch performance V Increase load to find switch’s limit

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 17

slide-21
SLIDE 21

The imagination driving Australia’s ICT future.

BENCHMARK: DAISY CHAIN

Virtual TCP daisy chain through an increasing number of nodes.

PC node PC node PC node PC node PC node PC node PC node PC node

V Next-neighbour communication X Bisection bandwidth not tested V Full-speed duplex connections on all ports V Limited by switch performance V Increase load to find switch’s limit

Result: Bandwidth of TCP chain. Taken from Dolly partition-casting tool (disk cloning):

  • Successfully used to install large clusters

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 17-A

slide-22
SLIDE 22

The imagination driving Australia’s ICT future.

DAISY-CHAIN BENCHMARK: EXAMPLE EVALUATION PLATFORM

Cluster with 16 nodes:

  • 2 Intel PentiumIII, 1 GHz
  • 512 MByte RAM
  • Intel Ethernet Pro 100, Fast Ethernet adapter
  • Packet Engines G-NIC II, Gigabit Ethernet adapter

Experiments to compare performance characteristics of 3 different switches:

  • Cisco 2900 XL Fast Ethernet switch (24 ports)
  • ATI FS724I Fast Ethernet switch (24 ports)
  • Cabletron SSR8600 Gigabit Ethernet switch (16 ports

configured)

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 18

slide-23
SLIDE 23

The imagination driving Australia’s ICT future.

DAISY-CHAIN BENCHMARK: EXAMPLE EVALUATION

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 20 40 60 80 100 120 140 160 180 200 200 400 600 800 1000 1200 Aggregate switching capacity [MByte/s] Aggregate switching capacity [MByte/s] Number of client nodes ATI FS724i Fast Ethernet 24 ports (left scale) Cisco 2900 XL Fast Ethernet 24 ports (left scale) SSR8600 Gigabit Ethernet 16 ports (right scale) SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 19

slide-24
SLIDE 24

The imagination driving Australia’s ICT future.

BENCHMARK: PAIRWISE STREAMING

Any duplex communication pattern for increasing number of nodes.

Great for debugging networks and switches Less automated Specific patterns, hard to com- pare

Result: Average bandwidths of all pairswise connections. Successfully identified critical bottlenecks in commercial switches.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 20

slide-25
SLIDE 25

The imagination driving Australia’s ICT future.

BENCHMARK: PAIRWISE STREAMING

Any duplex communication pattern for increasing number of nodes.

PC node PC node PC node PC node PC node PC node PC node PC node

Great for debugging networks and switches Less automated Specific patterns, hard to com- pare

Result: Average bandwidths of all pairswise connections. Successfully identified critical bottlenecks in commercial switches.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 21

slide-26
SLIDE 26

The imagination driving Australia’s ICT future.

BENCHMARK: PAIRWISE STREAMING

Any duplex communication pattern for increasing number of nodes.

PC node PC node PC node PC node PC node PC node PC node PC node

Great for debugging networks and switches Less automated Cannot compare results

Result: Average bandwidths of all pairswise connections. Successfully identified critical bottlenecks in commercial switches.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 22

slide-27
SLIDE 27

The imagination driving Australia’s ICT future.

BENCHMARK: PAIRWISE STREAMING

Any duplex communication pattern for increasing number of nodes.

PC node PC node PC node PC node PC node PC node PC node PC node

Great for debugging networks and switches Less automated Cannot compare results

Result: Average bandwidths of all pairswise connections. Successfully identified critical bottlenecks in commercial switches.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 23

slide-28
SLIDE 28

The imagination driving Australia’s ICT future.

BENCHMARK: PAIRWISE STREAMING

Any duplex communication pattern for increasing number of nodes.

PC node PC node PC node PC node PC node PC node PC node PC node

V Great for debugging networks and switches X Less automated V Any pattern X Cannot compare results

Result: Bandwidth of pairwise connections. Successfully identified critical bottlenecks in commercial switches.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 24

slide-29
SLIDE 29

The imagination driving Australia’s ICT future.

EXAMPLE EVALUATION PLATFORM

ETH “Xibalba” cluster with 128 nodes:

  • 1–2 Intel PentiumIII, 1 GHz
  • 256 MByte RAM per processor
  • 2 Intel-based Fast Ethernet adapters
  • Myrinet Gbit/s adapters (part.)

Network infrastructure:

  • Enterasys Matrix E7 Fast Ethernet switch (mid range)

Enterasys X-pedition ER16 Fast Ethernet switch 8 Enterasys Horizon VH-2402 Fast Ethernet switches Myricom M3-E64 Gbit/s Myrinet switch

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 25

slide-30
SLIDE 30

The imagination driving Australia’s ICT future.

EVALUATION WITH PAIRWISE STREAMING

Detailed measurement to find limiting bisections on Matrix E7 switch.

M1

48 port switch modules (6H302−48):

M1 M1 M1

7 8 9 12 2 4 6 8 10 12 Transfer rate [MByte/s] Communication pairs 14 15 16 24 Communication pairs 7 8 12 48 2 4 6 8 10 12 Transfer rate [MByte/s] Communication pairs

Pairwise tests show severe inter-module bottleneck.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 26

slide-31
SLIDE 31

The imagination driving Australia’s ICT future.

BENCHMARK: ALL-TO-ALL

Congestion-controlled all-to-all personalised communication (AAPC):

  • Requires full bisection bandwidth
  • Use phases to avoid congestion

parallel algorithm all-to-all 1 for i = 1 to n − 1 do 2 concurrently send data to node nself +i mod n and receive data from node nself −i mod n 3 wait for barrier

➜ Communication with increasing distance.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 27

slide-32
SLIDE 32

The imagination driving Australia’s ICT future.

BENCHMARK: CONGESTION-CONTROLLED AAPC

PC node PC node PC node PC node PC node PC node PC node PC node

Phase 1

One-number result: Average (CHECK THIS!) throughput One-number result: Average (CHECK THIS!) throughput

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 28

slide-33
SLIDE 33

The imagination driving Australia’s ICT future.

BENCHMARK: CONGESTION-CONTROLLED AAPC

PC node PC node PC node PC node PC node PC node PC node PC node

Phase 2

One-number result: Average (CHECK THIS!) throughput One-number result: Average (CHECK THIS!) throughput

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 29

slide-34
SLIDE 34

The imagination driving Australia’s ICT future.

BENCHMARK: CONGESTION-CONTROLLED AAPC

PC node PC node PC node PC node PC node PC node PC node PC node

Phase 4

One-number result: Average (CHECK THIS!) throughput One-number result: Average (CHECK THIS!) throughput

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 30

slide-35
SLIDE 35

The imagination driving Australia’s ICT future.

BENCHMARK: CONGESTION-CONTROLLED AAPC

PC node PC node PC node PC node PC node PC node PC node PC node

Phase 1 V Automatic V Comprehensively tests all com- munication distances V More realistic communication pat- tern

  • Simple result: Bandwidth for whole run
  • More detailed results: Bandwidth for each phase

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 31

slide-36
SLIDE 36

The imagination driving Australia’s ICT future.

ALL-TO-ALL BENCHMARK: EXAMPLE EVALUATION PLATFORM

ETH “Xibalba” cluster with 128 nodes:

  • 1–2 Intel PentiumIII, 1 GHz
  • 256 MByte RAM per processor
  • 2 Intel-based Fast Ethernet adapters
  • Myrinet Gbit/s adapters (only 32 nodes)

Network infrastructure:

  • Enterasys Matrix E7 Fast Ethernet switch (mid range)
  • Enterasys X-pedition ER16 Fast Ethernet switch (high end)
  • 8 Enterasys Horizon VH-2402 Fast Ethernet switches (cheap DIY)
  • Myricom M3-E64 Gbit/s Myrinet switch (Gbit/s class)

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 32

slide-37
SLIDE 37

The imagination driving Australia’s ICT future.

EVALUATION WITH ALL-TO-ALL: EXECUTION TIMES

Execution times of AAPC benchmark on different networks (60 CPUs):

Maintenance network Matrix E7 switch X−pedition ER16 switch 100 200 300 400 500 600 700 800 900 1000 AAPC execution time [s] 610 s 4.2 MB/s 249 s 10.3 MB/s 830 s 3.1 MB/s

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 33

slide-38
SLIDE 38

The imagination driving Australia’s ICT future.

EVALUATION WITH ALL-TO-ALL: PHASES

Minimal bandwidth for each phase:

4 8 12 16 20 24 28 32 36 40 44 48 52 56 2 4 6 8 10 12 Minimal bandwidth in phase [MByte/s] Communication phase # (offset) X−pedition ER16 Matrix E7 Maintenance network

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 34

slide-39
SLIDE 39

The imagination driving Australia’s ICT future.

EVALUATION WITH ALL-TO-ALL: PHASES

Minimal bandwidth for each phase:

2 4 6 8 10 12 14 16 18 20 22 24 26 28 20 40 60 80 100 120 140 Minimal bandwidth in phase [MByte/s] Communication phase # (offset) Myrinet (Gbit/s)

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 35

slide-40
SLIDE 40

The imagination driving Australia’s ICT future.

CONCLUSIONS

Switchbench is a set of three microbenchmarks for measuring and debugging networks and switches. Switchbench found:

  • significant differences and variations in switch performance
  • some data sheets are plain wrong!

➜ FREE switch upgrade from the producer

Switchbench is useful to:

  • better understand performance
  • better adapt applications to existing networks in clusters

Future work: Complete automatic performance characterisation. Switchbench is a valuable tool to evaluate network performance.

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 36

slide-41
SLIDE 41

The imagination driving Australia’s ICT future.

QUESTIONS?

Switchbench download page: http://www.ertos.nicta.com.au/Software/ Embedded, Real-Time and Operating Systems (ERTOS) research program, National ICT Australia (NICTA)

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 37

slide-42
SLIDE 42

The imagination driving Australia’s ICT future.

APPLICATION BENCHMARK: HIGH-PERFORMANCE LINPACK (HPL)

Popular benchmark for supercomputers and clusters

16 24 32 64 5 10 15 20 25 30 HPL performance [GFlops] Number of CPUs Ethernet maint. net Ethernet Matrix E7 Ethernet X−pedition ER16 Myrinet (shared Myrinet)

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 38

slide-43
SLIDE 43

The imagination driving Australia’s ICT future.

APPLICATION BENCHMARK: QTPLAN LARGE-SCALE TRAFFIC SIMULATION

M a i n t . n e t M a t r i x E 7 X−ped. ER16 S h a r e d M y r i n e t M a i n t . n e t M a t r i x E 7 X−ped. ER16 S h a r e d M y r i n e t 20 40 60 80 100 120 140 160 80 160 240 320 400 480 560 640 Execution time [s] Execution time [s] 50’000 cars 990’000 cars

SWITCHBENCH — HOW TO BRING YOUR SWITCH TO ITS KNEES 39