Design and Evaluation of a 2048 Introduction Core Cluster System - - PowerPoint PPT Presentation

design and evaluation of a 2048
SMART_READER_LITE
LIVE PREVIEW

Design and Evaluation of a 2048 Introduction Core Cluster System - - PowerPoint PPT Presentation

CHiC 2007 Frank Mietke Design and Evaluation of a 2048 Introduction Core Cluster System The CHiC Project Benchmarks Summary Frank Mietke , Torsten Hfler, Torsten Mehlan and Wolfgang Rehm Computer Architecture Group Department of Computer


slide-1
SLIDE 1

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Design and Evaluation of a 2048 Core Cluster System

Frank Mietke, Torsten Höfler, Torsten Mehlan and Wolfgang Rehm

Computer Architecture Group Department of Computer Science Chemnitz University of Technology

December 12, 2007

slide-2
SLIDE 2

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Outline

1

Introduction

2

The CHiC Project

3

Benchmarks

4

Summary

slide-3
SLIDE 3

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Supercomputing in General

Clusters are dominant (81.2%) Power Consumption problematic (Green500)

slide-4
SLIDE 4

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Supercomputing at Chemnitz

Since 1994 Growing User Community Parsytec – 20 GFlop/s CLiC – 221.6 GFlop/s

slide-5
SLIDE 5

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Cluster Design

Campus network

  • login node (with hdd)

management node (with hdd)

... ...

IO node (w/o hdd) graphics node (with hdd) InfiniBand cable storage complex

  • max. 8 cables

GigaBit−Ethernet cable

InfiniBand Fabric

12 graphics nodes

  • max. 8 cables

(Redundancy) 2 cables each 6 cables each Campus network access gateway

512 compute nodes

compute node (w/o hdd)

slide-6
SLIDE 6

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Network Design

  • Cisco

6500 Cisco 6500

...

6 cables each 6 cables each (288−Port) (288−Port) 12 cables each Campus network Campus network

InfiniBand− Switch Switch InfiniBand−

InfiniBand Fabric access GbE module with 6 ports Firewall module GigaBit−Ethernet cable InfiniBand cable 24−Port InfiniBand switch InfiniBand / GbE gateway

slide-7
SLIDE 7

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Storage Design

InfiniBand

  • SAS

5x

MDS OSS OSS

IBM x3455 IBM x3455

RAID−Controller RAID−Controller

slide-8
SLIDE 8

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

The CHiC – Top500

Rank 80 (Nov. 2006 - inofficial) Rank 117 (Jun. 2007) Rank 237 (Nov. 2007) CHiC – 8.21 TFlop/s

slide-9
SLIDE 9

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

But we provide more ...

12+ TFlops (Single Precision) www.gpgpu.org

slide-10
SLIDE 10

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Experiences

Hardware Very good Hardware Reliability (so far) IB-Eth-Gateway or Fabric Inconsistencies (Load Sit.)

Complex IB Fabric (3,5,7-stage CLOS)

RAID-Controller in Storage Hardware (Config. Issues) Software Lustre-1.6b7 and Lustre-1.6.3 (Bugs) OFED-1.1 and IPoIB Failover MPI Start-Up (Failed Processes and Scalability) TORQUE and ulimit Values

slide-11
SLIDE 11

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

STREAM – Triad

a[i] = b[i] + q · c[i] balance = peak floating ops/s sustained memory ops/s

pathscale-3.0 Opteron Woodcrest BW (MB/s) Balance BW (MB/s) Balance 2 Ds 5655.7 7.3 3672.8 17.4 1 T 4 Ds 5572.9 7.4 3896.4 16.4 8 Ds 5769.8 7.2 3959.6 16.2 2 Ds 6056.0 13.7 3967.9 32.2 2 Ts 4 Ds 6114.7 13.6 5061.7 25.3 8 Ds 6520.9 12.7 5876.6 21.8 2 Ds 5025.1 33.1 3949.3 64.8 4 Ts 4 Ds 11527.4 14.4 5111.2 50.1 8 Ds 12796.4 13.0 5653.6 45.3

slide-12
SLIDE 12

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

HPL

8.21 TFlop/s (76%) measured (2080 Cores)

58 60 62 64 66 68 70 72 4_4 2_8 1_16 Floating Point Performance (Gflop/s) P_Q Grid HPL Results for 4 Nodes (16 Cores) OpenIB_4DIMMS OpenIB_8DIMMS TCP_4DIMMS

slide-13
SLIDE 13

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

IOR

20 Object Storage Targets (RAID-5 a 8 HDDs) 3.2 GiB/s Write Performance 2.6 GiB/s Read Performance

500 1000 1500 2000 2500 3000 20 40 60 80 100 120 140 160 Aggregate I/O Throughput (MiB/s)

  • No. of Nodes

IOR Results for 1MB Transfer size READ fpp_1OST fpp_20OST seg_20OST str_20OST 500 1000 1500 2000 2500 3000 3500 20 40 60 80 100 120 140 160 Aggregate I/O Throughput (MiB/s)

  • No. of Nodes

IOR Results for 1MB Transfer size WRITE fpp_1OST fpp_20OST seg_20OST str_20OST

slide-14
SLIDE 14

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Latest IOZone Results

18 Object Storage Targets (RAID-6)

9 RAID-6 with 10 HDDs 9 RAID-6 with 6 HDDs

120 Clients (Lustre-1.6.3) 5GB Data File each 3.7GiB/s Read Performance 3.2GiB/s Write Performance

slide-15
SLIDE 15

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Application Benchmarks

ABINIT: AMD Cluster Intel Cluster Time in s 1,384.6 1,454.2 NAMD:

20 30 40 50 60 70 80 90 100 110 64 32 16 Running Time in s

  • No. of Cores

NAMD Results for 16 Nodes Opteron System Woodcrest System

slide-16
SLIDE 16

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Summary

Extremely Good Price-Performance Ratio Achieved Ambitious Project Deadlines (Compromises) Self-Design vs. Self-Made Performance Numbers of Intel/AMD Processor (Memory Bandwidth more important for us) Lustre Failover Configuration Expensive (Backup Strategy)

slide-17
SLIDE 17

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Thank You! Any Questions?

slide-18
SLIDE 18

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Backup Slides

slide-19
SLIDE 19

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Software–Environment

Scientific Linux 4.4 / 5.0 Open Fabcris Enterprise Ed. 1.2 Lustre 1.6.3 –> Lustre 1.6.4 Open MPI 1.2.4, MVAPICH-1.0beta and MVAPICH2-1.0.1 GNU Compiler 3.4.6 and 4.2.2, and EKOPath Compiler 3.1 TORQUE 2.1.8 and Maui 3.2.6p13 Nagios 2.9 xCAT 1.2.0 and Warewulf 2.6

slide-20
SLIDE 20

CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary

Cluster Installation

1 Month Deployment 21,6 Tons Material (Racks + Components) 4200 Nuts and 4600 Skrews necessary 4900 Cables with 9800 Connectors (8km Length) 300 Man-Days Effort