Network Layer (Routing) Recap: Why do we need a Network layer? - - PowerPoint PPT Presentation

network layer routing recap why do we need a network layer
SMART_READER_LITE
LIVE PREVIEW

Network Layer (Routing) Recap: Why do we need a Network layer? - - PowerPoint PPT Presentation

Network Layer (Routing) Recap: Why do we need a Network layer? Internetworking Need to connect different link layer networks Addressing Need a globally unique way to address hosts Routing and forwarding Now Need to


slide-1
SLIDE 1

Network Layer (Routing)

slide-2
SLIDE 2

Recap: Why do we need a Network layer?

  • Internetworking
  • Need to connect different link layer networks
  • Addressing
  • Need a globally unique way to “address” hosts
  • Routing and forwarding
  • Need to find and traverse paths between hosts

CSE 461 University of Washington 2

Now this

slide-3
SLIDE 3

Recap: Routing versus Forwarding

  • Forwarding is the

process of sending a packet on its way

  • Routing is the process of

deciding in which direction to send traffic

CSE 461 University of Washington 3

Forward!

packet

Which way? Which way? Which way?

slide-4
SLIDE 4

Overview of Internet Routing and Forwarding

  • Hosts on same network have IPs in the same IP prefix
  • Hosts send off-network traffic to the gateway router
  • Routers discover routes to different prefixes (routing)
  • Routers use longest prefix matching to send packets

to the right next hop (forwarding)

CSE 461 University of Washington 7

slide-5
SLIDE 5

Longest Prefix Matching

  • Prefixes in the forwarding table

can overlap

  • Longest prefix matching forwarding rule:
  • For each packet, find the longest prefix that contains the

destination address, i.e., the most specific entry

  • Forward the packet to the next hop router for that prefix

CSE 461 University of Washington 8

Prefix Next Hop 0.0.0.0/0 A 192.24.0.0/19 B 192.24.12.0/22 C

slide-6
SLIDE 6

Longest Prefix Matching (2)

CSE 461 University of Washington 9

Prefix Next Hop 192.24.0.0/19 D 192.24.12.0/22 B 192.24.0.0 192.24.63.255 /19 /22 192.24.12.0 192.24.15.255 IP address

192.24.6.0 à ? 192.24.14.32 à ? 192.24.54.0 à ?

More specific

slide-7
SLIDE 7

Flexibility of Longest Prefix Matching

  • Can provide default behavior, with less specifics
  • Send traffic going outside an organization to a border

router (gateway)

  • Can special case behavior, with more specifics
  • For performance, economics, security, …

CSE 461 University of Washington 10

slide-8
SLIDE 8

Performance of Longest Prefix Matching

  • Uses hierarchy for a compact table
  • Relies on use of large prefixes
  • Lookup more complex than table
  • Used to be a concern for fast routers
  • Not an issue in practice these days

CSE 461 University of Washington 11

slide-9
SLIDE 9

Goals of Routing Algorithms

  • We want several properties of any routing scheme:

CSE 461 University of Washington 12

Property Meaning

Correctness Finds paths that work Efficient paths Uses network bandwidth well Fair paths Doesn’t starve any nodes Fast convergence Recovers quickly after changes Scalability Works well as network grows large

slide-10
SLIDE 10

Rules of Fully Distributed Routing

  • All nodes are alike; no controller
  • Nodes learn by exchanging messages with neighbors
  • Nodes operate concurrently
  • There may be node/link/message failures

CSE 461 University of Washington 13

Who’s there?

slide-11
SLIDE 11

Simple routing that obeys the rules

  • Send out routes for hosts you have paths to
  • And the routes they’ve sent you

CSE 461 University of Washington 14

P A B E

E B A,B,E

  • This works
  • All routers find a

path to all hosts

  • But scales poorly!
slide-12
SLIDE 12

CSE 461 University of Washington 15

Internet Growth

  • Over a billion

Internet hosts and growing …

slide-13
SLIDE 13

Impact of Routing Growth

  • 1. Forwarding tables grow
  • Larger router memories, may increase lookup time
  • 2. Routing messages grow
  • Need to keeps all nodes informed of larger topology
  • 3. Routing computation grows
  • Shortest path calculations grow faster than the network

CSE 461 University of Washington 16

slide-14
SLIDE 14

Techniques to Scale Routing

  • First: Network hierarchy
  • Route to network regions
  • Next: IP prefix aggregation
  • Combine, and split, prefixes

CSE 461 University of Washington 17

slide-15
SLIDE 15

Scaling Idea 1: Hierarchical Routing

slide-16
SLIDE 16

Idea

  • Scale routing using hierarchy with regions
  • Route to regions, not individual nodes

CSE 461 University of Washington 19

To the West!

West East Destination

slide-17
SLIDE 17

Hierarchical Routing

  • Introduce a larger routing unit
  • IP prefix (hosts) ß from one host
  • Region, e.g., ISP network
  • Route first to the region, then to the IP prefix within

the region

  • Hide details within a region from outside of the region

CSE 461 University of Washington 20

slide-18
SLIDE 18

Hierarchical Routing (2)

CSE 461 University of Washington 21

slide-19
SLIDE 19

Hierarchical Routing (3)

CSE 461 University of Washington 22

slide-20
SLIDE 20

Hierarchical Routing (4)

  • Penalty is longer paths

CSE 461 University of Washington 23

1C is best route to region 5, except for destination 5C

slide-21
SLIDE 21

Observations

  • Outside a region, nodes have one route to all hosts

within the region

  • This gives savings in table size, messages and computation
  • However, each node may have a different route to

an outside region

  • Routing decisions are still made by individual nodes; there

is no single decision made by a region

CSE 461 University of Washington 24

slide-22
SLIDE 22

Scaling Idea 2: IP Prefix Aggregation and Subnets

slide-23
SLIDE 23

Idea

  • Scale routing by adjusting the size of IP prefixes
  • Split (subnets) and join (aggregation)

CSE 461 University of Washington 26

I’m the whole region

Region

1 2 3

IP /16

IP1 /19 IP2 /18 IP3 /17

slide-24
SLIDE 24

Recall

  • IP addresses are allocated in blocks called IP

prefixes, e.g., 18.31.0.0/16

  • Hosts on one network in same prefix
  • “/N” prefix has the first N bits fixed and contains

232-N addresses

  • E.g., a “/24” has 256 addresses
  • Routers keep track of prefix lengths
  • Use it as part of longest prefix matching

27

Routers can change prefix lengths without affecting hosts

slide-25
SLIDE 25

Prefixes and Hierarchy

  • IP prefixes help to scale routing, but can go further
  • Use a less specific (larger) IP prefix as a name for a region

CSE 461 University of Washington 28

I’m the whole region

Region

1 2 3

IP /16

IP1 /19 IP2 /18 IP3 /17

slide-26
SLIDE 26

Subnets and Aggregation

  • Two use cases for adjusting the size of IP prefixes;

both reduce routing table

  • 1. Subnets
  • Internally split one large prefix into multiple smaller ones
  • 2. Aggregation
  • Join multiple smaller prefixes into one large prefix

CSE 461 University of Washington 29

slide-27
SLIDE 27

Subnets

  • Internally split up one IP prefix

32K addresses One prefix sent to rest of Internet 16K 8K 4K Company Rest of Internet

slide-28
SLIDE 28

Aggregation

  • Externally join multiple separate IP prefixes

One prefix sent to rest of Internet

\

ISP Rest of Internet

slide-29
SLIDE 29

Routing Process

  • 1. Ship these prefixes or regions around to nearby routers
  • 2. Receive multiple prefixes and the paths of how you got them
  • 3. Build a global routing table
slide-30
SLIDE 30

CSE 461 University of Washington 33

Internet Routing Growth

Source: bgp.potaroo.net

slide-31
SLIDE 31

Finding “Best” Paths

slide-32
SLIDE 32

CSE 461 University of Washington 35

What are “Best” paths anyhow?

  • Many possibilities:
  • Latency, avoid circuitous paths
  • Bandwidth, avoid slow links
  • Money, avoid expensive links
  • Hops, to reduce switching
  • But only consider topology
  • Ignore workload, e.g., hotspots

A B C D E F G H

slide-33
SLIDE 33

Shortest Paths

We’ll approximate “best” by a cost function that captures the factors

  • Often called “least cost” or “shortest”
  • 1. Assign each link a cost (distance)
  • 2. Define best path between each pair of nodes as

the path that has the least total cost

  • 3. Pick randomly to any break ties

CSE 461 University of Washington 36

slide-34
SLIDE 34

CSE 461 University of Washington 37

Shortest Paths (2)

  • Find the shortest path A à E
  • All links are bidirectional, with

equal costs in each direction

  • Can extend model to unequal

costs if needed

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

slide-35
SLIDE 35

CSE 461 University of Washington 38

Shortest Paths (3)

  • ABCE is a shortest path
  • cost(ABCE) = 4 + 2 + 1 = 7
  • It is shorter than:
  • cost(ABE) = 8
  • cost(ABFE) = 9
  • cost(AE) = 10
  • cost(ABCDE) = 10

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

slide-36
SLIDE 36

CSE 461 University of Washington 39

Shortest Paths (4)

  • Optimality property:
  • Subpaths of shortest paths are

also shortest paths

  • ABCE is a shortest path

àSo are ABC, AB, BCE, BC, CE

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

slide-37
SLIDE 37

CSE 461 University of Washington 40

Sink Trees

  • Sink tree for a destination is

the union of all shortest paths towards the destination

  • Similarly source tree
  • Find the sink tree for E

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

slide-38
SLIDE 38

CSE 461 University of Washington 41

Sink Trees (2)

  • Implications:
  • Only need to use destination to

follow shortest paths

  • Each node only need to send to

the next hop

  • Forwarding table at a node
  • Lists next hop for each

destination

  • Routing table may know more

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

slide-39
SLIDE 39

Distance Vector Routing

slide-40
SLIDE 40

Distance Vector Routing

  • Simple, early routing approach
  • Used in ARPANET, and RIP
  • One of two main approaches to routing
  • Distributed version of Bellman-Ford
  • Works, but very slow convergence after some failures
  • Link-state algorithms are now typically used in

practice

  • More involved, better behavior

CSE 461 University of Washington 43

slide-41
SLIDE 41

Distance Vector Setting

Each node computes its forwarding table in a distributed setting:

1. Nodes know only the cost to their neighbors; not topology 2. Nodes can talk only to their neighbors using messages 3. All nodes run the same algorithm concurrently 4. Nodes and links may fail, messages may be lost

CSE 461 University of Washington 44

slide-42
SLIDE 42

Distance Vector Algorithm

Each node maintains a vector of (distance, next hop) to all destinations

1. Initialize vector with 0 (zero) cost to self, ∞ (infinity) to

  • ther destinations

2. Periodically send vector to neighbors 3. Update vector for each destination by selecting the shortest distance heard, after adding cost of neighbor link 4. Use the best neighbor for forwarding

CSE 461 University of Washington 45

slide-43
SLIDE 43

Distance Vector (2)

  • Consider from the point of view of node A
  • Can only talk to nodes B and E

CSE 461 University of Washington 46

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

To Cost A B ∞ C ∞ D ∞ E ∞ F ∞ G ∞ H ∞

Initial vector

slide-44
SLIDE 44

Distance Vector (3)

  • First exchange with B, E; learn best 1-hop routes

47

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

A’s Cost A’s Next

  • 4

B ∞

  • 10

E ∞

  • To

B says E says A ∞ ∞ B ∞ C ∞ ∞ D ∞ ∞ E ∞ F ∞ ∞ G ∞ ∞ H ∞ ∞ B +4 E +10 ∞ ∞ 4 ∞ ∞ ∞ ∞ ∞ ∞ 10 ∞ ∞ ∞ ∞ ∞ ∞

Learned better route

slide-45
SLIDE 45

Distance Vector (4)

  • Second exchange; learn best 2-hop routes

CSE 461 University of Washington 48

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

A’s Cost A’s Next

  • 4

B 6 B 12 E 8 B 7 B 7 B ∞

  • To

B says E says A 4 10 B 4 C 2 1 D ∞ 2 E 4 F 3 2 G 3 ∞ H ∞ ∞ B +4 E +10 8 20 4 14 6 11 ∞ 12 8 10 7 12 7 ∞ ∞ ∞

slide-46
SLIDE 46

Distance Vector (4)

  • Third exchange; learn best 3-hop routes

CSE 461 University of Washington 49

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

A’s Cost A’s Next

  • 4

B 6 B 8 B 7 B 7 B 7 B 9 B To B says E says A 4 8 B 3 C 2 1 D 4 2 E 3 F 3 2 G 3 6 H 5 4 B +4 E +10 8 18 4 13 6 11 8 12 7 10 7 12 7 16 9 14

slide-47
SLIDE 47

Distance Vector (5)

  • Subsequent exchanges; converged

CSE 461 University of Washington 50

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

A’s Cost A’s Next

  • 4

B 6 B 8 B 8 B 7 B 7 B 9 B To B says E says A 4 7 B 3 C 2 1 D 4 2 E 3 F 3 2 G 3 6 H 5 4 B +4 E +10 8 17 4 13 6 11 8 12 7 10 7 12 7 16 9 14

slide-48
SLIDE 48

Distance Vector Dynamics

  • Adding routes:
  • News travels one hop per exchange
  • Removing routes:
  • When a node fails, no more exchanges, other nodes forget

CSE 461 University of Washington 51

Problem?

slide-49
SLIDE 49

Count to Infinity: Problem

  • Good news travels quickly, bad news slowly

(inferred)

CSE 461 University of Washington 52

“Count to infinity” scenario Desired convergence

X

slide-50
SLIDE 50

Count to Infinity: Heuristics

  • “Split horizon”
  • Don’t send route back to where you learned it from.
  • Poison reverse
  • Send “infinity” when you notice a disconnect

CSE 461 University of Washington 53

X X

slide-51
SLIDE 51

Count to Infinity: Heuristics (2)

  • Neither split horizon and poison reverse are very

effective in practice

  • Link state is now favored except when resource-limited

CSE 461 University of Washington 54

slide-52
SLIDE 52

RIP (Routing Information Protocol)

  • DV protocol with hop count as metric
  • Infinity is 16 hops; limits network size
  • Includes split horizon, poison reverse
  • Routers send vectors every 30 seconds
  • Runs on top of UDP
  • Time-out in 180 secs to detect failures
  • RIPv1 specified in RFC1058 (1988)

CSE 461 University of Washington 55

slide-53
SLIDE 53

Link-State Routing

slide-54
SLIDE 54

Link-State Routing

  • Second broad class of routing algorithms
  • More computation than DV but better dynamics
  • Widely used in practice
  • Used in Internet/ARPANET from 1979
  • Modern networks use OSPF (L3) and IS-IS (L2)

CSE 461 University of Washington 57

slide-55
SLIDE 55

Link-State Setting

Same distributed setting as for distance vector:

1. Nodes know only the cost to their neighbors; not topology 2. Nodes can talk only to their neighbors using messages 3. All nodes run the same algorithm concurrently 4. Nodes/links may fail, messages may be lost

CSE 461 University of Washington 58

slide-56
SLIDE 56

Link-State Algorithm

Proceeds in two phases:

  • 1. Nodes flood topology with link state packets
  • Each node learns full topology
  • 2. Each node computes its own forwarding table
  • By running Dijkstra (or equivalent)

CSE 461 University of Washington 59

slide-57
SLIDE 57

Part 1: Flood Routing

slide-58
SLIDE 58

Flooding

  • Rule used at each node:
  • Sends an incoming message on to all other neighbors
  • Remember the message so that it is only flood once

CSE 461 University of Washington 61

slide-59
SLIDE 59

Flooding (2)

  • Consider a flood from A; first reaches B via AB, E via

AE

CSE 461 University of Washington 62

A B C D E F G H

slide-60
SLIDE 60

Flooding (3)

  • Next B floods BC, BE, BF, BG, and E floods EB, EC, ED,

EF

CSE 461 University of Washington 63

A B C D E F G H

E and B send to each other

slide-61
SLIDE 61

Flooding (4)

  • C floods CD, CH; D floods DC; F floods FG; G floods

GF

64

A B C D E F G H

F gets another copy

slide-62
SLIDE 62

Flooding (5)

  • H has no-one to flood … and we’re done

CSE 461 University of Washington 65

A B C D E F G H

Each link carries the message, and in at least one direction

slide-63
SLIDE 63

Flooding Details

  • Remember message (to stop flood) using source

and sequence number

  • So next message (with higher sequence) will go through
  • To make flooding reliable, use ARQ
  • So receiver acknowledges, and sender resends if needed

CSE 461 University of Washington 66

Problem?

slide-64
SLIDE 64

Flooding Problem

  • F receives the same message multiple times

CSE 461 University of Washington 67

A B C D E F G H

E and B send to each other too

slide-65
SLIDE 65

Part 2: Dijkstra’s Algorithm

slide-66
SLIDE 66

CSE 461 University of Washington 69

Edsger W. Dijkstra (1930-2002)

  • Famous computer scientist
  • Programming languages
  • Distributed algorithms
  • Program verification
  • Dijkstra’s algorithm, 1969
  • Single-source shortest paths, given

network with non-negative link costs

By Hamilton Richards, CC-BY-SA-3.0, via Wikimedia Commons

slide-67
SLIDE 67

Dijkstra’s Algorithm Algorithm:

  • Mark all nodes tentative, set distances from source to 0

(zero) for source, and ∞ (infinity) for all other nodes

  • While tentative nodes remain:
  • Extract N, a node with lowest distance
  • Add link to N to the shortest path tree
  • Relax the distances of neighbors of N by lowering any better

distance estimates

CSE 461 University of Washington 70

slide-68
SLIDE 68

Dijkstra’s Algorithm (2)

  • Initialization

CSE 461 University of Washington 71

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

∞ ∞ ∞ ∞ ∞ ∞

We’ll compute shortest paths from A

slide-69
SLIDE 69

Dijkstra’s Algorithm (3)

  • Relax around A

CSE 461 University of Washington 72

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

∞ ∞

10 4

∞ ∞ ∞

slide-70
SLIDE 70

Dijkstra’s Algorithm (4)

  • Relax around B

CSE 461 University of Washington 73

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

8 4

Distance fell!

6 7 7

slide-71
SLIDE 71

Dijkstra’s Algorithm (5)

  • Relax around C

CSE 461 University of Washington 74

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

7 4

Distance fell again!

6 7 7 8 9

slide-72
SLIDE 72

Dijkstra’s Algorithm (6)

  • Relax around G (say)

CSE 461 University of Washington 75

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

7 4

Didn’t fall …

6 7 7 8 9

slide-73
SLIDE 73

Dijkstra’s Algorithm (7)

  • Relax around F (say)

CSE 461 University of Washington 76

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

7 4

Relax has no effect

6 7 7 8 9

slide-74
SLIDE 74

Dijkstra’s Algorithm (8)

  • Relax around E

CSE 461 University of Washington 77

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

7 4 6 7 7 8 9

slide-75
SLIDE 75

Dijkstra’s Algorithm (9)

  • Relax around D

CSE 461 University of Washington 78

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

7 4 6 7 7 8 9

slide-76
SLIDE 76

Dijkstra’s Algorithm (10)

  • Finally, H … done

CSE 461 University of Washington 79

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

7 4 6 7 7 8 9

slide-77
SLIDE 77

Dijkstra Comments

  • Finds shortest paths in order of increasing distance

from source

  • Leverages optimality property
  • Runtime depends on cost of extracting min-cost node
  • Superlinear in network size (grows fast)
  • Using Fibonacci Heaps the complexity turns out to be

O(|E|+|V|log| V|)

  • Gives complete source/sink tree
  • More than needed for forwarding!
  • But requires complete topology

CSE 461 University of Washington 80

slide-78
SLIDE 78

Bringing it all together…

slide-79
SLIDE 79

CSE 461 University of Washington 82

Phase 1: Topology Dissemination

  • Each node floods link state packet

(LSP) that describes their portion of the topology

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

  • Seq. #

A 10 B 4 C 1 D 2 F 2

Node E’s LSP flooded to A, B, C, D, and F

slide-80
SLIDE 80

Phase 2: Route Computation

  • Each node has full topology
  • By combining all LSPs
  • Each node simply runs Dijkstra
  • Replicated computation, but finds required routes directly
  • Compile forwarding table from sink/source tree
  • That’s it folks!

CSE 461 University of Washington 83

slide-81
SLIDE 81

Forwarding Table

CSE 461 University of Washington 84

To Next A C B C C C D D E

  • F

F G F H C

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

Source Tree for E (from Dijkstra) E’s Forwarding Table

slide-82
SLIDE 82

Handling Changes

  • On change, flood updated LSPs, re-compute routes
  • E.g., nodes adjacent to failed link or node initiate

CSE 461 University of Washington 85

A B C D E F G H

2 1 10 2 2 4 2 4 4 3 3 3

XXXX

  • Seq. #

A 4 C 2 E 4 F 3 G

B’s LSP

  • Seq. #

B 3 E 2 G

F’s LSP Failure!

slide-83
SLIDE 83

Handling Changes (2)

  • Link failure
  • Both nodes notice, send updated LSPs
  • Link is removed from topology
  • Node failure
  • All neighbors notice a link has failed
  • Failed node can’t update its own LSP
  • But it is OK: all links to node removed

CSE 461 University of Washington 86

slide-84
SLIDE 84

Handling Changes (3)

  • Addition of a link or node
  • Add LSP of new node to topology
  • Old LSPs are updated with new link
  • Additions are the easy case …

CSE 461 University of Washington 87

slide-85
SLIDE 85

Link-State Complications

  • Things that can go wrong:
  • Seq. number reaches max, or is corrupted
  • Node crashes and loses seq. number
  • Network partitions then heals
  • Strategy:
  • Include age on LSPs and forget old information that is not

refreshed

  • Much of the complexity is due to handling corner cases

CSE 461 University of Washington 88

slide-86
SLIDE 86

DV/LS Comparison

CSE 461 University of Washington 89

Goal Distance Vector Link-State Correctness Distributed Bellman-Ford Replicated Dijkstra Efficient paths

  • Approx. with shortest paths
  • Approx. with shortest paths

Fair paths

  • Approx. with shortest paths
  • Approx. with shortest paths

Fast convergence Slow – many exchanges Fast – flood and compute Scalability Excellent – storage/compute Moderate – storage/compute

slide-87
SLIDE 87

IS-IS and OSPF Protocols

  • Widely used in large enterprise and ISP networks
  • IS-IS = Intermediate System to Intermediate System
  • OSPF = Open Shortest Path First
  • Link-state protocol with many added features
  • E.g., “Areas” for scalability

CSE 461 University of Washington 90

slide-88
SLIDE 88

Equal-Cost Multi-Path Routing

slide-89
SLIDE 89

Multipath Routing

  • Allow multiple routing paths from node to

destination be used at once

  • Topology has them for redundancy
  • Using them can improve performance
  • Questions:
  • How do we find multiple paths?
  • How do we send traffic along them?

CSE 461 University of Washington 92

slide-90
SLIDE 90

CSE 461 University of Washington 93

Equal-Cost Multipath Routes

  • One form of multipath routing
  • Extends shortest path model by

keeping set if there are ties

  • Consider AàE
  • ABE = 4 + 4 = 8
  • ABCE = 4 + 2 + 2 = 8
  • ABCDE = 4 + 2 + 1 + 1 = 8
  • Use them all!

A B C D E F G H

2 2 10 1 1 4 2 4 4 3 3 3

slide-91
SLIDE 91

Source “Trees”

  • With ECMP, source/sink “tree” is a directed acyclic

graph (DAG)

  • Each node has set of next hops
  • Still a compact representation

CSE 461 University of Washington 94

Tree DAG

slide-92
SLIDE 92

CSE 461 University of Washington 95

Source “Trees” (2)

  • Find the source “tree” for E
  • Procedure is Dijkstra, simply

remember set of next hops

  • Compile forwarding table similarly,

may have set of next hops

  • Straightforward to extend DV too
  • Just remember set of neighbors

A B C D E F G H

2 2 10 1 1 4 2 4 4 3 3 3

slide-93
SLIDE 93

Source “Trees” (3)

CSE 461 University of Washington 96

Source Tree for E E’s Forwarding Table

A B C D E F G H

2 2 10 1 1 4 2 4 4 3 3 3

Node Next hops A B, C, D B B, C, D C C, D D D E

  • F

F G F H C, D

New for ECMP

slide-94
SLIDE 94

Forwarding with ECMP

  • Could randomly pick a next hop for each packet

based on destination

  • Balances load, but adds jitter
  • Instead, try to send packets from a given

source/destination pair on the same path

  • Source/destination pair is called a flow
  • Map flow identifier to single next hop
  • No jitter within flow, but less balanced

CSE 461 University of Washington 97

slide-95
SLIDE 95

Forwarding with ECMP (2)

CSE 461 University of Washington 98

A B C D E F G H

2 2 10 1 1 4 2 4 4 3 3 3

Multipath routes from F/E to C/H E’s Forwarding Choices

Flow Possible next hops Example choice F à H C, D D F à C C, D D E à H C, D C E à C C, D C

Use both paths to get to one destination

slide-96
SLIDE 96

Border Gateway Protocol (BGP)

slide-97
SLIDE 97

Structure of the Internet

  • Networks (ISPs, CDNs, etc.) group with IP prefixes
  • Networks are richly interconnected, often using IXPs

CDN C Prefix C1 ISP A Prefix A1 Prefix A2 Net F Prefix F1

IXP IXP IXP IXP

CDN D Prefix D1 Net E Prefix E1 Prefix E2 ISP B Prefix B1

slide-98
SLIDE 98

Internet-wide Routing Issues

  • Two problems beyond routing within a network
  • 1. Scaling to very large networks
  • Techniques of IP prefixes, hierarchy, prefix aggregation
  • 2. Incorporating policy decisions
  • Letting different parties choose their routes to suit their
  • wn needs

CSE 461 University of Washington 101

Yikes!

slide-99
SLIDE 99

CSE 461 University of Washington 102

Effects of Independent Parties

  • Each party selects routes to

suit its own interests

  • e.g, shortest path in ISP
  • What path will be chosen

for A2àB1 and B1àA2?

  • What is the best path?

Prefix B2 Prefix A1

ISP A ISP B

Prefix B1 Prefix A2

slide-100
SLIDE 100

CSE 461 University of Washington 103

Effects of Independent Parties (2)

  • Selected paths are longer

than overall shortest path

  • And asymmetric too!
  • Consequence of

independent goals and decisions, not hierarchy

Prefix B2 Prefix A1

ISP A ISP B

Prefix B1 Prefix A2

slide-101
SLIDE 101

Routing Policies

  • Capture the goals of different parties
  • Could be anything
  • E.g., Internet2 only carries non-commercial traffic
  • Common policies we’ll look at:
  • ISPs give TRANSIT service to customers
  • ISPs give PEER service to each other

CSE 461 University of Washington 104

slide-102
SLIDE 102

CSE 461 University of Washington 105

Routing Policies – Transit

  • One party (customer) gets TRANSIT

service from another party (ISP)

  • ISP accepts traffic for customer from

the rest of Internet

  • ISP sends traffic from customer to the

rest of Internet

  • Customer pays ISP for the privilege

Customer 1

ISP

Customer 2

Rest of Internet

Non- customer

slide-103
SLIDE 103

CSE 461 University of Washington 106

Routing Policies – Peer

  • Both party (ISPs in example) get

PEER service from each other

  • Each ISP accepts traffic from the other

ISP only for their customers

  • ISPs do not carry traffic to the rest of

the Internet for each other

  • ISPs don’t pay each other

Customer A1

ISP A

Customer A2 Customer B1

ISP B

Customer B2

slide-104
SLIDE 104

Routing with BGP

  • iBGP is for internal routing
  • eBGP is interdomain routing for the Internet
  • Path vector, a kind of distance vector

107

ISP A Prefix A1 Prefix A2 Net F Prefix F1

IXP

ISP B Prefix B1 Prefix F1 via ISP B, Net F at IXP

slide-105
SLIDE 105

Routing with BGP (2)

  • Parties like ISPs are called AS (Autonomous Systems)
  • AS numbers are unique identifiers
  • AS’s configure their internal BGP routes
  • External routes go through complicated filters
  • Intra-AS BGP routers communicate to keep consistent

routing information

CSE 461 University of Washington 108

slide-106
SLIDE 106

Routing with BGP (3)

  • Border routers of ASes announce BGP routes
  • Route announcements have IP prefix, path

vector, next hop

  • Path vector is list of ASes on the way to the prefix
  • List is to find loops
  • Route announcements move in the opposite

direction to traffic

CSE 461 University of Washington 109

slide-107
SLIDE 107

Routing with BGP (4)

CSE 461 University of Washington 110

Prefix

slide-108
SLIDE 108

Routing with BGP (5)

Policy is implemented in two ways:

  • 1. Border routers of ISP announce paths only to
  • ther parties who may use those paths
  • Filter out paths others can’t use
  • 2. Border routers select the best path of the ones

they hear in any way (not necessarily shortest)

CSE 461 University of Washington 111

slide-109
SLIDE 109

Routing with BGP (6)

  • TRANSIT: AS1 says [B, (AS1, AS3)], [C, (AS1, AS4)] to AS2

CSE 461 University of Washington 112

slide-110
SLIDE 110

Routing with BGP (7)

  • CUSTOMER (other side of TRANSIT): AS2 says [A, (AS2)] to AS1

CSE 461 University of Washington 113

slide-111
SLIDE 111

Routing with BGP (8)

  • PEER: AS2 says [A, (AS2)] to AS3, AS3 says [B, (AS3)] to AS2

CSE 461 University of Washington 114

slide-112
SLIDE 112

Routing with BGP (9)

  • AS2 has two routes to B (AS1, AS3) and chooses AS3 (Free!)

CSE 461 University of Washington 115

slide-113
SLIDE 113
slide-114
SLIDE 114
slide-115
SLIDE 115

BGP Thoughts

  • Much more beyond basics to explore!
  • Policy is a substantial factor
  • Can independent decisions be sensible overall?
  • Other important factors:
  • Convergence effects
  • How well it scales
  • Integration with intradomain routing
  • And more …

CSE 461 University of Washington 118

slide-116
SLIDE 116

BGP “bad gadget”: Non-convergence

[3, 0] > [0] > [3, 1, 0] [2, 0] > [0] > [2, 3, 0] [1, 0] > [0] > [1, 2, 0]

slide-117
SLIDE 117

BGP slow convergence

1 2 3 4 [1, 0]

  • [3, 1, 0]

[4, 1, 0] [1, 0]

  • [2, 1, 0]

[3, 1, 0] [1, 0]

  • [4, 1, 0]

[2, 1, 0]

x

slide-118
SLIDE 118

BGP slow convergence

1 2 3 4 [3, 1, 0]

  • [4, 1, 0]

[2, 1, 0]

  • [3, 1, 0]

[4, 1, 0]

  • [2, 1, 0]

x

slide-119
SLIDE 119

BGP slow convergence

1 2 3 4 [3, 4, 1, 0] [2, 3, 1, 0] [4, 2, 1, 0]

x

slide-120
SLIDE 120

Cellular Routing

slide-121
SLIDE 121

Addressing in Cellular

  • Everyone has a unique physical

identifier: SIM Card

  • IMSI: International Mobile Subscriber

Identity

  • Has associated mobile provider
  • Phone number not present
  • Known as “msisdn”
slide-122
SLIDE 122

Cellular Core Networks

slide-123
SLIDE 123

In-network routing

  • 1. User dials phone number
  • 2. Number is “looked up” in some database
  • 3. If local, we get the associated IMSI
  • 4. Check that sender can send and receiver can receive
  • 5. Look up tower group of IMSIs last registration
  • 6. Page the receiver
  • 7. Bill them both
slide-124
SLIDE 124

Out-of-network Routing

  • Signaling System No. 7 (SS7)
  • Performs number translation, local number portability,

prepaid billing, Short Message Service (SMS), roaming, and other stuff

  • Either directly connected or connected through

aggregators such as Cybase

  • Business vs Protocols