Switching
An Engineering Approach to Computer Networking An Engineering Approach to Computer Networking
Switching An Engineering Approach to Computer Networking An - - PowerPoint PPT Presentation
Switching An Engineering Approach to Computer Networking An Engineering Approach to Computer Networking What is it all about? How do we move traffic from one part of the network to another? How do we move traffic from one part of the network to
An Engineering Approach to Computer Networking An Engineering Approach to Computer Networking
How do we move traffic from one part of the network to another? How do we move traffic from one part of the network to another?
Connect end-systems to switches, and switches to each other Connect end-systems to switches, and switches to each other
Data arriving to an input port of a switch have to be moved to Data arriving to an input port of a switch have to be moved to
Telephone switches Telephone switches
switch samples
switch samples
Datagram routers Datagram routers
switch datagrams
switch datagrams
ATM switches ATM switches
switch ATM cells
switch ATM cells
Packet vs. circuit switches Packet vs. circuit switches
packets have headers and samples don
packets have headers and samples donʼ ʼt t
Connectionless vs. connection oriented Connectionless vs. connection oriented
connection oriented switches need a call setup
connection oriented switches need a call setup
setup is handled in
setup is handled in control plane control plane by switch controller
connectionless switches deal with self-contained datagrams
Connectionless (router) Connection-oriented (switching system) Packet switch Internet router ATM switching system Circuit switch Telephone switching system
Participate in routing algorithms Participate in routing algorithms
to build routing tables
to build routing tables
Resolve contention for output trunks Resolve contention for output trunks
scheduling
scheduling
Admission control Admission control
to guarantee resources to certain streams
to guarantee resources to certain streams
We Weʼ ʼll discuss these later ll discuss these later
Here we focus on pure data movement Here we focus on pure data movement
Capacity of switch is the maximum rate at which it can move Capacity of switch is the maximum rate at which it can move information, assuming all data paths are simultaneously active information, assuming all data paths are simultaneously active
Primary goal: Primary goal: maximize capacity maximize capacity
subject to cost and reliability constraints
subject to cost and reliability constraints
Circuit switch must reject call if can Circuit switch must reject call if canʼ ʼt find a path for samples t find a path for samples from input to output from input to output
goal:
goal: minimize call blocking minimize call blocking
Packet switch must reject a packet if it can Packet switch must reject a packet if it canʼ ʼt find a buffer to store t find a buffer to store it awaiting access to output trunk it awaiting access to output trunk
goal:
goal: minimize packet loss minimize packet loss
Don Donʼ ʼt reorder t reorder packets packets
Circuit switching Circuit switching
Packet switching Packet switching
Switch generations
Switch generations
Switch fabrics
Switch fabrics
Buffer placement
Buffer placement
Multicast switches
Multicast switches
Moving 8-bit samples from an input port to an output port Moving 8-bit samples from an input port to an output port
Recall that samples have no headers Recall that samples have no headers
Destination of sample depends on Destination of sample depends on time time at which it arrives at the at which it arrives at the switch switch
actually, relative order within a
actually, relative order within a frame frame
We Weʼ ʼll first study something simpler than a switch: a multiplexor ll first study something simpler than a switch: a multiplexor
Most trunks time division multiplex voice samples Most trunks time division multiplex voice samples
At a central office, trunk is demultiplexed and distributed to At a central office, trunk is demultiplexed and distributed to active circuits active circuits
Synchronous multiplexor Synchronous multiplexor
N input lines
N input lines
Output runs N times as fast as input
Output runs N times as fast as input
Demultiplexor Demultiplexor
one input line and N outputs that run N times slower
samples are placed in output buffer in round robin order
samples are placed in output buffer in round robin order
Neither multiplexor nor demultiplexor needs addressing Neither multiplexor nor demultiplexor needs addressing information (why?) information (why?)
Can cascade multiplexors Can cascade multiplexors
need a standard
need a standard
example: DS hierarchy in the US and Japan
example: DS hierarchy in the US and Japan
Takes a high bit-rate stream and scatters it across multiple Takes a high bit-rate stream and scatters it across multiple trunks trunks
At the other end, combines multiple streams At the other end, combines multiple streams
resequencing
resequencing to accommodate variation in delays to accommodate variation in delays
Allows high-speed virtual links using existing technology Allows high-speed virtual links using existing technology
A switch that can handle N calls has N logical inputs and N A switch that can handle N calls has N logical inputs and N logical outputs logical outputs
N up to 200,000
N up to 200,000
In practice, input trunks are multiplexed In practice, input trunks are multiplexed
example: DS3 trunk carries 672 simultaneous calls
example: DS3 trunk carries 672 simultaneous calls
Multiplexed trunks carry Multiplexed trunks carry frames frames = set of samples = set of samples
Goal: extract samples from frame, and depending on position in Goal: extract samples from frame, and depending on position in frame, switch to output frame, switch to output
each incoming sample has to get to the right output line and the
each incoming sample has to get to the right output line and the right slot in the output frame right slot in the output frame
demultiplex
demultiplex, switch, multiplex , switch, multiplex
Can Canʼ ʼt find a path from input to output t find a path from input to output
Internal blocking Internal blocking
slot in output frame exists, but no path
slot in output frame exists, but no path
Output blocking Output blocking
no slot in output frame is available
no slot in output frame is available
Output blocking is reduced in Output blocking is reduced in transit transit switches switches
need to put a sample in one of
need to put a sample in one of several several slots going to the desired next hop
Key idea: when Key idea: when demultiplexing demultiplexing, position in frame determines , position in frame determines
Time division switching interchanges sample position within a Time division switching interchanges sample position within a frame: time slot interchange (TSI) frame: time slot interchange (TSI)
Limit is time taken to read and write to memory Limit is time taken to read and write to memory
For 120,000 circuits For 120,000 circuits
need to read and write memory once every 125 microseconds
need to read and write memory once every 125 microseconds
each operation takes around 0.5
each operation takes around 0.5 ns ns => impossible with current => impossible with current technology technology
Need to look to other techniques Need to look to other techniques
Each sample takes a different path through the switch, Each sample takes a different path through the switch, depending on its destination depending on its destination
Simplest possible space-division switch Simplest possible space-division switch
Crosspoints Crosspoints can be turned on or off
For multiplexed inputs, need a switching schedule (why?) Internally nonblocking
but need N2 crosspoints time taken to set each crosspoint grows quadratically vulnerable to single faults (why?)
In a crossbar during each switching time only one In a crossbar during each switching time only one crosspoint crosspoint per per row or column is active row or column is active
Can save crosspoints if a Can save crosspoints if a crosspoint crosspoint can attach to more than can attach to more than
This is done in a multistage crossbar This is done in a multistage crossbar
Need to rearrange connections every switching time Need to rearrange connections every switching time
Can suffer internal blocking Can suffer internal blocking
unless sufficient number of second-level stages
unless sufficient number of second-level stages
Number of crosspoints < N Number of crosspoints < N2
2
Finding a path from input to output requires a depth-first-search Finding a path from input to output requires a depth-first-search
Scales better than crossbar, but still not too well Scales better than crossbar, but still not too well
120,000 call switch needs ~250 million crosspoints
120,000 call switch needs ~250 million crosspoints
Precede each input trunk in a crossbar with a TSI Precede each input trunk in a crossbar with a TSI
Delay samples so that they arrive at the right time for the space Delay samples so that they arrive at the right time for the space division switch division switchʼ ʼs schedule s schedule
Allowed to flip samples both on input and output trunk Allowed to flip samples both on input and output trunk
Gives more flexibility => lowers call blocking probability Gives more flexibility => lowers call blocking probability
Circuit switching Circuit switching
Packet switching Packet switching
Switch generations
Switch generations
Switch fabrics
Switch fabrics
Buffer placement
Buffer placement
Multicast switches
Multicast switches
In a circuit switch, path of a sample is determined at time of In a circuit switch, path of a sample is determined at time of connection establishment connection establishment
No need for a sample header--position in frame is enough No need for a sample header--position in frame is enough
In a packet switch, packets carry a destination field In a packet switch, packets carry a destination field
Need to look up destination port on-the-fly Need to look up destination port on-the-fly
Datagram Datagram
lookup based on entire destination address
lookup based on entire destination address
Cell Cell
lookup based on VCI
lookup based on VCI
Other than that, very similar Other than that, very similar
Repeaters: at physical level Repeaters: at physical level
Bridges: at datalink level (based on MAC addresses) (L2) Bridges: at datalink level (based on MAC addresses) (L2)
discover attached stations by listening
discover attached stations by listening
Routers: at network level (L3) Routers: at network level (L3)
participate in routing protocols
participate in routing protocols
Application level gateways: at application level (L7) Application level gateways: at application level (L7)
treat entire network as a single hop
treat entire network as a single hop
e.g mail gateways and
e.g mail gateways and transcoders transcoders
Gain functionality at the expense of forwarding speed Gain functionality at the expense of forwarding speed
for best performance, push functionality as low as possible
for best performance, push functionality as low as possible
Look up output port based on destination address Look up output port based on destination address
Easy for VCI: just use a table Easy for VCI: just use a table
Harder for datagrams: Harder for datagrams:
need to find
need to find longest prefix match longest prefix match
e.g. packet with address 128.32.1.20
e.g. packet with address 128.32.1.20
entries: (128.32.*, 3), (128.32.1.*, 4), (128.32.1.20, 2)
entries: (128.32.*, 3), (128.32.1.*, 4), (128.32.1.20, 2)
A standard solution: A standard solution: trie trie
Two ways to improve performance Two ways to improve performance
cache recently used addresses in a CAM
cache recently used addresses in a CAM
move common entries up to a higher level (match longer strings)
move common entries up to a higher level (match longer strings)
Can have both internal and output blocking Can have both internal and output blocking
Internal Internal
no path to output
no path to output
Output Output
trunk unavailable
trunk unavailable
Unlike a circuit switch, cannot predict if packets will block (why?) Unlike a circuit switch, cannot predict if packets will block (why?)
If packet is blocked, must either buffer or drop it If packet is blocked, must either buffer or drop it
Overprovisioning Overprovisioning
internal links much faster than inputs
internal links much faster than inputs
Buffers Buffers
at input or output
at input or output
Backpressure Backpressure
if switch fabric doesn
if switch fabric doesnʼ ʼt have buffers, prevent packet from entering t have buffers, prevent packet from entering until path is available until path is available
Parallel switch fabrics Parallel switch fabrics
increases effective switching capacity
increases effective switching capacity
Circuit switching Circuit switching
Packet switching Packet switching
Switch generations
Switch generations
Switch fabrics
Switch fabrics
Buffer placement
Buffer placement
Multicast switches
Multicast switches
Different trade- Different trade-offs
between cost and performance
Represent evolution in switching capacity, rather than in Represent evolution in switching capacity, rather than in technology technology
With same technology, a later generation switch achieves greater
With same technology, a later generation switch achieves greater capacity, but at greater cost capacity, but at greater cost
All three generations are represented in current products All three generations are represented in current products
Most Ethernet switches and cheap packet routers Most Ethernet switches and cheap packet routers
Bottleneck can be CPU, host- Bottleneck can be CPU, host-adaptor adaptor or I/O bus, depending
First generation router built with 133 MHz Pentium First generation router built with 133 MHz Pentium
Mean packet size 500 bytes
Mean packet size 500 bytes
Interrupt takes 10 microseconds, word access take 50
Interrupt takes 10 microseconds, word access take 50 ns ns
Per-packet processing time takes 200 instructions = 1.504 µs
Per-packet processing time takes 200 instructions = 1.504 µs
Copy loop Copy loop
register <- memory[read_ register <- memory[read_ptr ptr] ] memory [write_ memory [write_ptr ptr] <- register ] <- register read_ read_ptr ptr <- read_ <- read_ptr ptr + 4 + 4 write_ write_ptr ptr <- write_ <- write_ptr ptr + 4 + 4 counter <- counter -1 counter <- counter -1 if (counter not 0) branch to top of loop if (counter not 0) branch to top of loop
4 instructions + 2 memory accesses = 130.08 4 instructions + 2 memory accesses = 130.08 ns ns
Copying packet takes 500/4 *130.08 = 16.26 µs; interrupt 10 µs Copying packet takes 500/4 *130.08 = 16.26 µs; interrupt 10 µs
Total time = 27.764 µs => speed is 144.1 Total time = 27.764 µs => speed is 144.1 Mbps Mbps
Amortized interrupt cost balanced by routing protocol cost Amortized interrupt cost balanced by routing protocol cost
Port mapping intelligence in line cards Port mapping intelligence in line cards
ATM switch guarantees hit in lookup cache ATM switch guarantees hit in lookup cache
Ipsilon Ipsilon IP switching IP switching
assume underlying ATM network
assume underlying ATM network
by default, assemble packets
by default, assemble packets
if detect a flow, ask upstream to send on a particular VCI, and
if detect a flow, ask upstream to send on a particular VCI, and install entry in port install entry in port mapper mapper => implicit signaling => implicit signaling
Bottleneck in second generation switch is the bus (or ring) Bottleneck in second generation switch is the bus (or ring)
Third generation switch provides parallel paths (fabric) Third generation switch provides parallel paths (fabric)
Features Features
self-routing fabric
self-routing fabric
output buffer is a point of contention
unless we
unless we arbitrate arbitrate access to fabric access to fabric
potential for unlimited scaling, as long as we can resolve contention
potential for unlimited scaling, as long as we can resolve contention for output buffer for output buffer
Circuit switching Circuit switching
Packet switching Packet switching
Switch generations
Switch generations
Switch fabrics
Switch fabrics
Buffer placement
Buffer placement
Multicast switches
Multicast switches
Transfer data from input to output, ignoring scheduling and Transfer data from input to output, ignoring scheduling and buffering buffering
Usually consist of links and Usually consist of links and switching elements switching elements
Simplest switch fabric Simplest switch fabric
think of it as 2N buses in parallel
think of it as 2N buses in parallel
Used here for Used here for packet packet routing: routing: crosspoint crosspoint is left open long is left open long enough to transfer a packet from an input to an output enough to transfer a packet from an input to an output
For fixed-size packets and known arrival pattern, can compute For fixed-size packets and known arrival pattern, can compute schedule in advance schedule in advance
Otherwise, need to compute a schedule on-the-fly (what does Otherwise, need to compute a schedule on-the-fly (what does the schedule depend on?) the schedule depend on?)
What happens if packets at two inputs both want to go to same What happens if packets at two inputs both want to go to same
Can defer one at an input buffer Can defer one at an input buffer
Or, buffer crosspoints Or, buffer crosspoints
Packets are tagged with output port # Packets are tagged with output port #
Each output matches tags Each output matches tags
Need to match N addresses in parallel at each output Need to match N addresses in parallel at each output
Useful only for small switches, or as a stage in a large switch Useful only for small switches, or as a stage in a large switch
Can build complicated fabrics from a simple element Can build complicated fabrics from a simple element
Routing rule: if 0, send packet to upper output, else to lower Routing rule: if 0, send packet to upper output, else to lower
If both packets to same output, buffer or drop If both packets to same output, buffer or drop
NxN NxN switch with switch with bxb bxb elements has elements with elements has elements with elements per stage elements per stage
Fabric is Fabric is self routing self routing
Recursive Recursive
Can be synchronous or asynchronous Can be synchronous or asynchronous
Regular and suitable for VLSI implementation Regular and suitable for VLSI implementation
! " N b /
Simplest self-routing recursive fabric Simplest self-routing recursive fabric
(why does it work?) (why does it work?)
What if two packets both want to go to the same output? What if two packets both want to go to the same output?
output blocking
Can avoid with a buffered Can avoid with a buffered banyan banyan switch switch
but this is too expensive
but this is too expensive
hard to achieve zero loss even with buffers
hard to achieve zero loss even with buffers
Instead, can check if path is available before sending packet Instead, can check if path is available before sending packet
three-phase scheme
three-phase scheme
send requests
send requests
inform winners
inform winners
send packets
send packets
Or, use several Or, use several banyan banyan fabrics in parallel fabrics in parallel
intentionally misroute and tag one of a colliding pair
intentionally misroute and tag one of a colliding pair
divert tagged packets to a second
divert tagged packets to a second banyan banyan, and so on to k stages , and so on to k stages
expensive
expensive
can reorder packets
can reorder packets
output buffers have to run k times faster than input
Can avoid blocking by choosing order in which packets appear Can avoid blocking by choosing order in which packets appear at input ports at input ports
If we can If we can
present packets at inputs sorted by output
present packets at inputs sorted by output
remove duplicates
remove duplicates
remove gaps
remove gaps
precede
precede banyan banyan with a perfect shuffle stage with a perfect shuffle stage
then no internal blocking
then no internal blocking
For example, [X, 010, 010, X, 011, X, X, X] -(sort)-> For example, [X, 010, 010, X, 011, X, X, X] -(sort)-> [010, 011, 011, X, X, X, X, X] -(remove [010, 011, 011, X, X, X, X, X] -(remove dups dups)-> )-> [010, 011, X, X, X, X, X, X] -(shuffle)-> [010, 011, X, X, X, X, X, X] -(shuffle)-> [010, X, 011, X, X, X, X, X] [010, X, 011, X, X, X, X, X]
Need sort, shuffle, and trap networks Need sort, shuffle, and trap networks
Build sorters from merge networks Build sorters from merge networks
Assume we can merge two sorted lists Assume we can merge two sorted lists
Sort pairwise, merge, Sort pairwise, merge, recurse recurse
What about trapped duplicates? What about trapped duplicates?
recirculate
recirculate to beginning to beginning
or run output of trap to multiple
banyans ( (dilation dilation) )
A major motivation for small fixed packet size in ATM is ease of A major motivation for small fixed packet size in ATM is ease of building large parallel fabrics building large parallel fabrics
In general, smaller size => more per-packet overhead, but more In general, smaller size => more per-packet overhead, but more preemption points/sec preemption points/sec
At high speeds, overhead dominates!
At high speeds, overhead dominates!
Fixed size packets helps build synchronous switch Fixed size packets helps build synchronous switch
But we could fragment at entry and reassemble at exit
But we could fragment at entry and reassemble at exit
Or build an asynchronous fabric
Or build an asynchronous fabric
Thus, variable size doesn
Thus, variable size doesnʼ ʼt hurt too much t hurt too much
Maybe Internet routers can be almost as cost-effective as ATM Maybe Internet routers can be almost as cost-effective as ATM switches switches
Circuit switching Circuit switching
Packet switching Packet switching
Switch generations
Switch generations
Switch fabrics
Switch fabrics
Buffer placement
Buffer placement
Multicast switches
Multicast switches
All packet switches need buffers to match input rate to service All packet switches need buffers to match input rate to service rate rate
or cause heavy packet loses
Where should we place buffers? Where should we place buffers?
input
input
in the fabric
in the fabric
output
shared
shared
No speedup in buffers or trunks (unlike output queued switch) No speedup in buffers or trunks (unlike output queued switch)
Needs arbiter Needs arbiter
Problem: Problem: head of line blocking head of line blocking
with randomly distributed packets, utilization at most 58.6%
with randomly distributed packets, utilization at most 58.6%
worse with
worse with hot spots hot spots
Per-output queues at inputs Per-output queues at inputs
Arbiter must choose one of the input ports for each output port Arbiter must choose one of the input ports for each output port
How to select? How to select?
Parallel Iterated Matching Parallel Iterated Matching
inputs tell arbiter which outputs they are interested in
inputs tell arbiter which outputs they are interested in
output selects one of the inputs
some inputs may get more than one
some inputs may get more than one grant grant, others may get none , others may get none
if >1 grant, input picks one at random, and tells output
if >1 grant, input picks one at random, and tells output
losing inputs and outputs try again
losing inputs and outputs try again
Used in DEC Used in DEC Autonet Autonet 2 switch 2 switch
Don Donʼ ʼt suffer from head-of-line blocking t suffer from head-of-line blocking
But output buffers need to run much faster than trunk speed But output buffers need to run much faster than trunk speed (why?) (why?)
Can reduce some of the cost by using the Can reduce some of the cost by using the knockout knockout principle principle
unlikely that all N inputs will have packets for the same output
unlikely that all N inputs will have packets for the same output
drop extra packets, fairly distributing losses among inputs
drop extra packets, fairly distributing losses among inputs
Route only the header to output port Route only the header to output port
Bottleneck is time taken to read and write Bottleneck is time taken to read and write multiported multiported memory memory
Doesn Doesnʼ ʼt scale to large switches t scale to large switches
But can form an element in a multistage switch But can form an element in a multistage switch
Reduces read/write cost by doing wide reads and writes Reduces read/write cost by doing wide reads and writes
1.2 1.2 Gbps Gbps switch for $50 parts cost switch for $50 parts cost
Buffers in each switch element Buffers in each switch element
Pros Pros
Speed up is only as much as fan-in
Speed up is only as much as fan-in
Hardware
Hardware backpressure backpressure reduces buffer requirements reduces buffer requirements
Cons Cons
costly (unless using single-chip switches)
costly (unless using single-chip switches)
scheduling is hard
scheduling is hard
Buffers at more than one point Buffers at more than one point
Becomes hard to analyze and manage Becomes hard to analyze and manage
But common in practice But common in practice
Circuit switching Circuit switching
Packet switching Packet switching
Switch generations
Switch generations
Switch fabrics
Switch fabrics
Buffer placement
Buffer placement
Multicast switches
Multicast switches
Useful to do this in hardware Useful to do this in hardware
Assume Assume portmapper portmapper knows list of outputs knows list of outputs
Incoming packet must be copied to these output ports Incoming packet must be copied to these output ports
Two Two subproblems subproblems
generating and distributing copies
generating and distributing copies
VCI translation for the copies
VCI translation for the copies
Either implicit or explicit Either implicit or explicit
Implicit Implicit
suitable for bus-based, ring-based, crossbar, or broadcast switches
suitable for bus-based, ring-based, crossbar, or broadcast switches
multiple outputs enabled after placing packet on shared bus
multiple outputs enabled after placing packet on shared bus
used in Paris and
used in Paris and Datapath Datapath switches switches
Explicit Explicit
need to copy a packet at switch elements
need to copy a packet at switch elements
use a
use a copy copy network network
place # of copies in tag
place # of copies in tag
element copies to both outputs and decrements count on one of
element copies to both outputs and decrements count on one of them them
collect copies at outputs
collect copies at outputs
Both schemes increase blocking probability Both schemes increase blocking probability
Normally, in-VCI to out-VCI translation can be done either at Normally, in-VCI to out-VCI translation can be done either at input or output input or output
With multicasting, translation easier at output port (why?) With multicasting, translation easier at output port (why?)
Use separate port mapping and translation tables Use separate port mapping and translation tables
Input maps a VCI to a set of output ports Input maps a VCI to a set of output ports
Output port swaps VCI Output port swaps VCI
Need to do two lookups per packet Need to do two lookups per packet