Interconnect Technologies for Clusters Interconnect approaches - - PowerPoint PPT Presentation

interconnect technologies for clusters interconnect
SMART_READER_LITE
LIVE PREVIEW

Interconnect Technologies for Clusters Interconnect approaches - - PowerPoint PPT Presentation

Cluster Computing Interconnect Technologies for Clusters Interconnect approaches Cluster Computing WAN infinite distance LAN Few kilometers SAN Few meters Backplane Not scalable Physical Cluster


slide-1
SLIDE 1

Cluster Computing

Interconnect Technologies for Clusters

slide-2
SLIDE 2

Cluster Computing

Interconnect approaches

  • WAN

– ’infinite distance’

  • LAN

– Few kilometers

  • SAN

– Few meters

  • Backplane

– Not scalable

slide-3
SLIDE 3

Cluster Computing

Physical Cluster Interconnects

  • FastEther
  • Gigabit EtherNet
  • 10 Gigabit EtherNet
  • ATM
  • cLan
  • Myrinet
  • Memory Channel
  • SCI
  • Atoll
  • ServerNet
slide-4
SLIDE 4

Cluster Computing

Switch technologies

  • Switch design

– Fully interconnected – Omega

  • Package handling

– Store and forward – Cut-through routing (worm-hole routing)

slide-5
SLIDE 5

Cluster Computing

Implications of switch technologies

  • Switch design

– Affects the constant associated with routing

  • Package handling

– Affects the overall routing latency in a major may

slide-6
SLIDE 6

Cluster Computing

Store-and-fwd vs. Worm- hole one step

  • T ( ν) =Overhead+Channel+Time Routing Delay
  • Cut through:
  • Store ’n fw:
slide-7
SLIDE 7

Cluster Computing

Store-and-fwd vs. Worm- hole ten steps

  • T ( ν) =Overhead+Channel+Time Routing Delay
  • Cut through:
  • Store ’n fw:
slide-8
SLIDE 8

Cluster Computing

FastEther

  • 100 Mbit/sec

+ Generally supported + Extremely cheap

  • Limited bandwidth
  • Not really that standard
  • Not all implementations support zero-copy

protocols

slide-9
SLIDE 9

Cluster Computing

Gigabit EtherNet

  • Ethernet is hype-only at this stage
  • Bandwidth really is 1Gb/sec
  • Latency is only slightly improved

– Down to 20us from 22us in 100Mb

  • Current standard

– But NICs are as different as with FE

slide-10
SLIDE 10

Cluster Computing

10 Gigabit EtherNet

  • Target applications not really defined

– But clusters are not the most likely customers – Perhaps as backbone for large clusters

  • Optical interconnects only

– Copper currently being proposed

slide-11
SLIDE 11

Cluster Computing

ATM

  • Used to be the holy grail in cluster

computing

  • Turns out to be poorly suited for clusters

– High price – Tiny packages – Designed for throughput not reliability

slide-12
SLIDE 12

Cluster Computing

cLAN

  • Virtual Interface Architecture
  • API standard not HW standard
  • 1.2 Gbit/sec
slide-13
SLIDE 13

Cluster Computing

Myrinet

  • Long time ’defacto-standard’
  • LAN and SAN architectures
  • Switch-based
  • Extremely programmable
slide-14
SLIDE 14

Cluster Computing

Myrinet

  • Very high bandwidth

– 0.64Gb + 0.64 Gb in gen 1 (1994) – 1.28Gb + 1.28 Gb in gen 2 (1997) – 2.0Gb + 2.0 Gb in gen 3 (2000) – (10.0Gb + 10Gb in gen4 (2005)) ether

  • 18 bit parallel wires
  • Error-rate at 1bit per 24 hours
  • Very limited physical distance
slide-15
SLIDE 15

Cluster Computing

Myrinet Interface

  • Hosts a fast RISC processor

– 132 MHz in newest version

  • Large memory onboard

– 2,4 or 8MB in newest version

  • Memory is used as both send and recieve

buffers and run at CPU speed

– 7.5ns in newest version

slide-16
SLIDE 16

Cluster Computing

Myrinet-switch

  • Worm-hole routed

– 5 ns route time

  • Process to process

– 9us (133 MHz LANai) – 7us (200 MHz LANai)

slide-17
SLIDE 17

Cluster Computing

Myrinet

slide-18
SLIDE 18

Cluster Computing

Myrinet Prices

  • PCI/SAN interface

– $495, $595, $795

  • SAN Switch

– 8 port $4,050 – 16 port $5,625 – 128 port $51,200

  • 10 ft. cable $75
slide-19
SLIDE 19

Cluster Computing

Memory Channel

  • Digital Equipment Corporation product
  • Raw performance:

– Latency 2.9 us – Bandwidth 64 MB/s

  • MPI performance

– Latency 7 us – Bandwidth 61 MB/s

slide-20
SLIDE 20

Cluster Computing

Memory Channel

slide-21
SLIDE 21

Cluster Computing

Memory Channel

slide-22
SLIDE 22

Cluster Computing

SCI

  • Scalable Coherent Interface
  • IEEE standard
  • Not widely implemented
  • Coherency protocol is very complex

– 29 stable states – An enourmous amount of transient states

slide-23
SLIDE 23

Cluster Computing

SCI

slide-24
SLIDE 24

Cluster Computing

SCI Coherency

  • States

– Home: no remote cache in the system contains a copy of the block – Fresh: one or more remote caches may have a read-only copy, and the copy in memory is valid. – Gone: another remote cache contains a writeable copy. There is no valid copy on the local node.

slide-25
SLIDE 25

Cluster Computing

SCI Coherency

  • State is named by two components

– ONLY – HEAD – TAIL – MID – Dirty: modified and writable – Clean: unmodified (same as memory) but writable – Fresh:data may be read, but not written until memory is informed – Copy: unmodified and readable

slide-26
SLIDE 26

Cluster Computing

SCI Coherency

  • List construction: adding a new node (sharer) to the

head of a list

  • Rollout: removing a node from a sharing list, which

requires that a node communicate with its upstream and downstream neighbors informing them of their new neighbors so they can update their pointers

  • Purging (invalidation): the node at the head may

purge or invalidate all other nodes, thus resulting in a single-element list. Only the head node can issue a purge.

slide-27
SLIDE 27

Cluster Computing

Atoll

  • University research project
  • Should be very fast and very cheap
  • Keeps comming ’very soon now’
  • I have stopped waiting
slide-28
SLIDE 28

Cluster Computing

Atoll

  • Grid architecture
  • 250 MB/sec bidirectional links

– 9 bit – 250MHz clock

slide-29
SLIDE 29

Cluster Computing

Atoll

slide-30
SLIDE 30

Cluster Computing

Atoll

slide-31
SLIDE 31

Cluster Computing

Atoll

slide-32
SLIDE 32

Cluster Computing

Servernet-II

  • Supports 64-bit, 66-MHz PCI
  • Bidirectional links

– 1.25+1.25Gbit/sec

  • VIA compatible
slide-33
SLIDE 33

Cluster Computing

Servernet II

slide-34
SLIDE 34

Cluster Computing

Servernet-II

slide-35
SLIDE 35

Cluster Computing

Infiniband

  • New standard
  • An extension of PCI-X

– 1x = 2.5Gbps – 4x = 10Gbps – current standard – 12x = 30Gbps

slide-36
SLIDE 36

Cluster Computing

InfiniBand Price / Performance

InfiniBand PCI-Express

10GigE GigE Myrinet D Myrinet E

Data Bandwidth

(Large Messages)

950MB/s 900MB/s 100MB/s 245MB/s 495MB/ s MPI Latency

(Small Messages)

5us 50us 50us 6.5us 5.7us HCA Cost

(Street Price)

$550 $2K-$5K Free $535 $880 Switch Port $250 $2K-$6K $100- $300 $400 $400 Cable Cost

(3m Street Price)

$100 $100 $25 $175 $175

  • Myrinet pricing data from Myricom Web Site (Dec 2004)

** InfiniBand pricing data based on Topspin avg. sales price (Dec 2004) *** Myrinet, GigE, and IB performance data from June 2004 OSU study

  • Note: MPI Processor to Processor latency – switch latency is less
slide-37
SLIDE 37

Cluster Computing

InfiniBand Cabling

  • CX4 Copper (15m)
  • Flexible 30-Gauge

Copper (3m)

  • Fiber Optics up to

150m

slide-38
SLIDE 38

Cluster Computing The InfiniBand Driver Architecture

BSD Sockets FS API TCP SDP IP Drivers VERBS ETHER INFINIBAND HCA DAT FILE SYSTEM SCSI SRP FC FCP SDP INFINIBAND SAN API BSD Sockets NFS-RDMA LAN/WAN SERVER FABRIC SAN INFINIBAND SWITCH ETHER SWITCH FC SWITCH FC GW E ETH GW NETWORK APPLICATION UDAPL TS TS IPoIB User Kernel