Lecture 20: Networks and Communication Abhinav Bhatele, Department - - PowerPoint PPT Presentation

lecture 20 networks and communication
SMART_READER_LITE
LIVE PREVIEW

Lecture 20: Networks and Communication Abhinav Bhatele, Department - - PowerPoint PPT Presentation

Introduction to Parallel Computing (CMSC498X / CMSC818X) Lecture 20: Networks and Communication Abhinav Bhatele, Department of Computer Science Announcements Assignment 3 posted online Only for 818X students Due on November 23


slide-1
SLIDE 1

Lecture 20: Networks and Communication

Abhinav Bhatele, Department of Computer Science

Introduction to Parallel Computing (CMSC498X / CMSC818X)

slide-2
SLIDE 2

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Announcements

  • Assignment 3 posted online
  • Only for 818X students
  • Due on November 23
  • Quiz 2: November 12

2

slide-3
SLIDE 3

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

High-speed interconnection networks

  • Typically supercomputers and HPC clusters are connected by low latency and high

bandwidth networks

  • The connections between nodes form different topologies
  • Popular topologies:
  • Fat-tree: Charles Leiserson in 1985
  • Mesh and torus networks
  • Dragonfly networks

3

slide-4
SLIDE 4

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Network components

  • Network interface controller or card
  • Router or switch
  • Network cables: copper or optical

4

slide-5
SLIDE 5

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

N-dimensional mesh / torus networks

  • Each switch as a small number of nodes connected to

it (typically 1)

  • Each switch has direct links to 2n switches where n is

the number of dimensions

  • Torus = wraparound links
  • Examples: IBM Blue Gene, Cray X* machines

5

slide-6
SLIDE 6

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Fat-tree network

  • Router radix = k, Number of nodes on each router = k/2
  • A pod is a group of k/2 switches, Max. number of pods = k

6

slide-7
SLIDE 7

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Fat-tree network

  • Router radix = k, Number of nodes on each router = k/2
  • A pod is a group of k/2 switches, Max. number of pods = k

6

Compute Nodes

slide-8
SLIDE 8

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Fat-tree network

  • Router radix = k, Number of nodes on each router = k/2
  • A pod is a group of k/2 switches, Max. number of pods = k

6

Compute Nodes

slide-9
SLIDE 9

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Fat-tree network

  • Router radix = k, Number of nodes on each router = k/2
  • A pod is a group of k/2 switches, Max. number of pods = k

6

Compute Nodes

slide-10
SLIDE 10

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Fat-tree network

  • Router radix = k, Number of nodes on each router = k/2
  • A pod is a group of k/2 switches, Max. number of pods = k

6

Level 1 Compute Nodes

slide-11
SLIDE 11

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Fat-tree network

  • Router radix = k, Number of nodes on each router = k/2
  • A pod is a group of k/2 switches, Max. number of pods = k

6

Level 1 Level 2 Compute Nodes

slide-12
SLIDE 12

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Fat-tree network

  • Router radix = k, Number of nodes on each router = k/2
  • A pod is a group of k/2 switches, Max. number of pods = k

6

Level 1 Level 2 Compute Nodes

slide-13
SLIDE 13

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Fat-tree network

  • Router radix = k, Number of nodes on each router = k/2
  • A pod is a group of k/2 switches, Max. number of pods = k

6

Level 1 Level 2 Compute Nodes

slide-14
SLIDE 14

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Fat-tree network

  • Router radix = k, Number of nodes on each router = k/2
  • A pod is a group of k/2 switches, Max. number of pods = k

6

Level 1 Level 2 Level 3 Compute Nodes

slide-15
SLIDE 15

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Dragonfly network

  • Two-level hierarchical network using high-radix routers
  • Low network diameter

7

  • One supernode in the PERCS topology

LL LR D

slide-16
SLIDE 16

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Life-cycle of a message

8

Source Source Source Source Source Message origin points : destination, frequency, size, etc. determined by application 1 micro sec - 10s of sec

slide-17
SLIDE 17

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Life-cycle of a message

8

Source Source Source Source Source NIC Message origin points : destination, frequency, size, etc. determined by application 1 micro sec - 10s of sec Packetization and injection : delay:100s of ns

slide-18
SLIDE 18

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Life-cycle of a message

8

Source Source Source Source Source NIC Message origin points : destination, frequency, size, etc. determined by application 1 micro sec - 10s of sec Routers/ Switches Packetization and injection : delay:100s of ns Path finding delay ~100 ns Temp storage in buffers

slide-19
SLIDE 19

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Life-cycle of a message

8

Source Source Source Source Source NIC Message origin points : destination, frequency, size, etc. determined by application 1 micro sec - 10s of sec Routers/ Switches Packetization and injection : delay:100s of ns Path finding delay ~100 ns Temp storage in buffers Links - congestion points traversal time: 1-50 ns

slide-20
SLIDE 20

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Life-cycle of a message

8

Source Source Source Source Source NIC Message origin points : destination, frequency, size, etc. determined by application 1 micro sec - 10s of sec Routers/ Switches Routers/ Switches NIC Destination Packetization and injection : delay:100s of ns Path finding delay ~100 ns Temp storage in buffers Links - congestion points traversal time: 1-50 ns Message destination points: application dependent 1 micro sec - 10s of sec

slide-21
SLIDE 21

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Congestion due to network sharing

  • Sharing refers to network flows of different programs using the same hardware

resources: links, switches

  • When multiple programs communicate on the network, they all suffer from

congestion on shared links

9

slide-22
SLIDE 22

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Congestion due to network sharing

  • Sharing refers to network flows of different programs using the same hardware

resources: links, switches

  • When multiple programs communicate on the network, they all suffer from

congestion on shared links

9

Program A Program B Switch/router

slide-23
SLIDE 23

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Congestion due to network sharing

  • Sharing refers to network flows of different programs using the same hardware

resources: links, switches

  • When multiple programs communicate on the network, they all suffer from

congestion on shared links

9

Program A Program B Switch/router

slide-24
SLIDE 24

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Congestion due to network sharing

  • Sharing refers to network flows of different programs using the same hardware

resources: links, switches

  • When multiple programs communicate on the network, they all suffer from

congestion on shared links

9

Program A Program B Switch/router

slide-25
SLIDE 25

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Congestion due to network sharing

  • Sharing refers to network flows of different programs using the same hardware

resources: links, switches

  • When multiple programs communicate on the network, they all suffer from

congestion on shared links

9

Program A Program B Switch/router

slide-26
SLIDE 26

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Congestion due to network sharing

  • Sharing refers to network flows of different programs using the same hardware

resources: links, switches

  • When multiple programs communicate on the network, they all suffer from

congestion on shared links

9

Program A Program B Switch/router

slide-27
SLIDE 27

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Routing algorithm

  • Decides how a packet is routed between a source and destination switch
  • Static routing: each router is pre-programmed with a routing table
  • Can change it at boot time
  • Dynamic routing: routing can change at runtime
  • Adaptive routing: adapts to network congestion

10

slide-28
SLIDE 28

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Different approaches to mitigating congestion

  • Network topology aware node allocation
  • Congestion or network flow aware adaptive routing
  • Within a job: network topology aware mapping of processes or chares to allocated

nodes

11

slide-29
SLIDE 29

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

topology-aware node allocation

12

slide-30
SLIDE 30

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

topology-aware node allocation

12

slide-31
SLIDE 31

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

topology-aware node allocation

12

slide-32
SLIDE 32

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

topology-aware node allocation

12

slide-33
SLIDE 33

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

topology-aware node allocation

12

slide-34
SLIDE 34

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

topology-aware node allocation

12

slide-35
SLIDE 35

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

topology-aware node allocation

12

slide-36
SLIDE 36

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

topology-aware node allocation

12

slide-37
SLIDE 37

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

topology-aware node allocation

12

Solution: allocate nodes in a manner that prevents sharing of links by multiple jobs while maintaining high utilization

slide-38
SLIDE 38

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

AFAR: adaptive flow aware routing

13

A B C D E F

slide-39
SLIDE 39

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

AFAR: adaptive flow aware routing

Given: traffic for each pair of nodes in the system and the current routing

  • 1. Calculate current load (network

traffic) on all links in system

  • 2. Find link with maximum load
  • 3. If maximum > threshold, re-route one

flow crossing that link to an under- utilized link

  • 4. Repeat from 1. using new routing

13

Solution: dynamically re-route traffic to alleviate hot-spots

A B C D E F

slide-40
SLIDE 40

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

AFAR: adaptive flow aware routing

Given: traffic for each pair of nodes in the system and the current routing

  • 1. Calculate current load (network

traffic) on all links in system

  • 2. Find link with maximum load
  • 3. If maximum > threshold, re-route one

flow crossing that link to an under- utilized link

  • 4. Repeat from 1. using new routing

13

Solution: dynamically re-route traffic to alleviate hot-spots

A B C D E F

slide-41
SLIDE 41

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

AFAR: adaptive flow aware routing

Given: traffic for each pair of nodes in the system and the current routing

  • 1. Calculate current load (network

traffic) on all links in system

  • 2. Find link with maximum load
  • 3. If maximum > threshold, re-route one

flow crossing that link to an under- utilized link

  • 4. Repeat from 1. using new routing

13

Solution: dynamically re-route traffic to alleviate hot-spots

A B C D E F

slide-42
SLIDE 42

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

AFAR: adaptive flow aware routing

Given: traffic for each pair of nodes in the system and the current routing

  • 1. Calculate current load (network

traffic) on all links in system

  • 2. Find link with maximum load
  • 3. If maximum > threshold, re-route one

flow crossing that link to an under- utilized link

  • 4. Repeat from 1. using new routing

13

Solution: dynamically re-route traffic to alleviate hot-spots

A B C D E F

slide-43
SLIDE 43

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

AFAR: adaptive flow aware routing

Given: traffic for each pair of nodes in the system and the current routing

  • 1. Calculate current load (network

traffic) on all links in system

  • 2. Find link with maximum load
  • 3. If maximum > threshold, re-route one

flow crossing that link to an under- utilized link

  • 4. Repeat from 1. using new routing

13

Solution: dynamically re-route traffic to alleviate hot-spots

A B C D E F

slide-44
SLIDE 44

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

AFAR: adaptive flow aware routing

Given: traffic for each pair of nodes in the system and the current routing

  • 1. Calculate current load (network

traffic) on all links in system

  • 2. Find link with maximum load
  • 3. If maximum > threshold, re-route one

flow crossing that link to an under- utilized link

  • 4. Repeat from 1. using new routing

13

Solution: dynamically re-route traffic to alleviate hot-spots

A B C D E F

slide-45
SLIDE 45

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

AFAR: adaptive flow aware routing

Given: traffic for each pair of nodes in the system and the current routing

  • 1. Calculate current load (network

traffic) on all links in system

  • 2. Find link with maximum load
  • 3. If maximum > threshold, re-route one

flow crossing that link to an under- utilized link

  • 4. Repeat from 1. using new routing

13

Solution: dynamically re-route traffic to alleviate hot-spots

A B C D E F

slide-46
SLIDE 46

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

AFAR: adaptive flow aware routing

Given: traffic for each pair of nodes in the system and the current routing

  • 1. Calculate current load (network

traffic) on all links in system

  • 2. Find link with maximum load
  • 3. If maximum > threshold, re-route one

flow crossing that link to an under- utilized link

  • 4. Repeat from 1. using new routing

13

Solution: dynamically re-route traffic to alleviate hot-spots

A B C D E F

slide-47
SLIDE 47

Abhinav Bhatele 5218 Brendan Iribe Center (IRB) / College Park, MD 20742 phone: 301.405.4507 / e-mail: bhatele@cs.umd.edu