A Benes Packet Network Longbo Huang & Jean Walrand EECS @ UC - - PowerPoint PPT Presentation

a benes packet network
SMART_READER_LITE
LIVE PREVIEW

A Benes Packet Network Longbo Huang & Jean Walrand EECS @ UC - - PowerPoint PPT Presentation

A Benes Packet Network Longbo Huang & Jean Walrand EECS @ UC Berkeley Longbo Huang Institute for Interdisciplinary Information Sciences (IIIS) Tsinghua University Data centers are important computing resources Provide most of our


slide-1
SLIDE 1

A Benes Packet Network

Longbo Huang & Jean Walrand EECS @ UC Berkeley

slide-2
SLIDE 2

Longbo Huang Institute for Interdisciplinary Information Sciences (IIIS) Tsinghua University

slide-3
SLIDE 3

Data centers are important computing resources

Google data centers within US

Src: http://royal.pingdom.com/2008/04/11/map-of-all-google-data-center-locations/

Provide most of our computing services

  • Web service: Facebook, Email
  • Information processing: MapReduce
  • Data storage: Flickr, Google Drive
slide-4
SLIDE 4

Data centers are important computing resources

Google data centers within US

Src: http://royal.pingdom.com/2008/04/11/map-of-all-google-data-center-locations/

slide-5
SLIDE 5

Data centers are important computing resources

Google data centers within US

Src: http://royal.pingdom.com/2008/04/11/map-of-all-google-data-center-locations/

We focus on data center networking!

slide-6
SLIDE 6

The data center networking problem

Networking is the foundation of data centers’ functionality

  • Hundreds of thousands of interconnected servers
  • Dynamic traffic flowing among servers
  • Large volume of data requiring small latency
  • Traffic statistical info may be hard to obtain
slide-7
SLIDE 7

The data center networking problem

Questions:

  • How to connect the servers?
  • How to route traffic to achieve best rate allocation?
  • How to ensure small delay?
  • How to adapt to traffic changes?

Networking is the foundation of data centers’ functionality

  • Hundreds of thousands of interconnected servers
  • Dynamic traffic flowing among servers
  • Large volume of data requiring small latency
  • Traffic statistical info may be hard to obtain
slide-8
SLIDE 8

Benes Network + Utility Optimization + Backpressure

Benes Network:

  • High throughput
  • Small delay (logarithmic in network size)
  • Connecting 2N servers with O(NlogN) switch modules

Backpressure:

  • Throughput optimal
  • Robust to system dynamics
  • Require no statistical info

Flow Utility Maximization

  • Ensure best allocation of resources
slide-9
SLIDE 9

Benes Network

Building a 2nx2n Benes network

slide-10
SLIDE 10

Benes Network

Routing circuits:

1  3 2  1 3  2 4  4 UP DOWN UP DOWN

slide-11
SLIDE 11

Benes Network

Routing circuits:

1  4 2  2n 3  1 4  2n – 1 ……. 2n – 1  2 2n  3

 non-blocking for circuits  full-throughput for packets

slide-12
SLIDE 12

Benes Network Flow Utility Maximization

  • Random arrival Asd(t)
  • Flow control, admit Rsd(t) in

[0, Asd(t)]

  • Each (s, d) flow has utility

Usd(rsd)

  • Each link has capacity 1pk/s

The flow utility maximization problem:

slide-13
SLIDE 13

Benes Network Flow Utility Maximization

Backpressure can be directly applied. However, each node needs 2n queues, one for each destination

  • Random arrival Asd(t)
  • Flow control, admit Rsd(t) in

[0, Asd(t)]

  • Each (s, d) flow has utility

Usd(rsd)

  • Each link has capacity 1pk/s
slide-14
SLIDE 14

Grouped-Backpressure (G-BP)

The idea:

  • Divide traffic into two groups
  • Perform routing & scheduling on the mixed traffic
  • Rely on Backpressure & symmetry for stability

Key components

  • 1. A fictitious reference system for control
  • 2. A special queueing structure
  • 3. An admission & regulation mechanism
  • 4. Dynamic scheduling
slide-15
SLIDE 15

G-BP Component 1 - Reference System

These nodes remain the same

slide-16
SLIDE 16

G-BP Component 2 – Queueing Structure

  • Each switch node in columns 1 to n-1

maintains 4 queues (same for both systems)

slide-17
SLIDE 17

G-BP Component 2 – Queueing Structure

  • Each input server in column 0 maintains 2

queues (same for both systems)

slide-18
SLIDE 18

G-BP Component 2 – Queueing Structure

  • Each node in column n maintains 2 queues

for D1 and D2 (also in the physical system)

slide-19
SLIDE 19

G-BP Component 2 – Queueing Structure

  • Each node in columns n to 2n-1 maintains 2

queues (only the physical system)

slide-20
SLIDE 20

G-BP Component 3 – Admission & Regulation

Admission queue at input: Regulation queue at output:

slide-21
SLIDE 21

G-BP Component 3 – Admission & Regulation

Input server admits pkts

Admission decisions at input:

  • Update γsd(t):
  • Admit packets:

(up flow to d in D1)

Note: qd(t) is “idealized” In practice:

  • delayed arrivals at d
  • delayed feedback to s
slide-22
SLIDE 22

G-BP Component 3 – Admission & Regulation

The need to admit Source congestion Input server admits pkts Destination congestion, passed to source

Admission decisions at input:

  • Update γsd(t):
  • Admit packets:

(up flow to d in D1)

slide-23
SLIDE 23

G-BP Component 3 – Admission & Regulation

Input server rejects pkts

Admission decisions at input:

  • Update γsd(t):
  • Admit packets:

(low flow to d in D2)

slide-24
SLIDE 24

Admission decisions at input:

  • Update γsd(t):
  • Admit packets:

(low flow to d in D2)

G-BP Component 3 – Admission & Regulation

Source congestion Destination congestion, passed to source The need to admit Input server rejects pkts

slide-25
SLIDE 25

Grouped-Backpressure

Admission control

slide-26
SLIDE 26

G-BP Component 4 – Dynamic Scheduling

Which flow to serve over this link?

slide-27
SLIDE 27

G-BP Component 4 – Dynamic Scheduling

Define flow weights:

slide-28
SLIDE 28

G-BP Component 4 – Dynamic Scheduling

Define flow weights:

slide-29
SLIDE 29

G-BP Component 4 – Dynamic Scheduling

  • If W1U>W2U & W1U>0, send 1U packets over link [m, m’]
  • At m’, randomly put the arrival into 1U or 1L
slide-30
SLIDE 30

G-BP Component 4 – Dynamic Scheduling

  • If W1U<W2U & W2U>0, send 2U packets over link [m, m’]
  • At m’, randomly put the arrival into 2U or 2L
slide-31
SLIDE 31

G-BP Component 4 – Dynamic Scheduling

  • If queue is not empty, transmit packet
  • Else remain idle
slide-32
SLIDE 32

Grouped-Backpressure

Admission control G-Backpressure based on fic sys

slide-33
SLIDE 33

G-BP Component 4 – Dynamic Scheduling

  • If queue is not empty, transmit packet
  • Place packets into corresponding queues
slide-34
SLIDE 34

Grouped-Backpressure

Admission control G-Backpressure based on fic sys Free-flow forwarding

slide-35
SLIDE 35

Grouped-Backpressure – Performance

Theorem: Under the G-BP* algorithm, (i) both physical & fictitious networks are stable, and (ii) we achieve:

* This is the idealized algorithm ….

slide-36
SLIDE 36

Grouped-Backpressure – Performance

Remarks:

  • No statistical info is needed
  • Distributed hop-by-hop routing & scheduling
  • Four queues per node (BP needs 2n)

Theorem: Under the G-BP algorithm, (i) both physical & fictitious networks are stable, and (ii) we achieve:

slide-37
SLIDE 37

Grouped-Backpressure – Analysis Idea

  • Update γi(t)
  • Admit packets:
  • If Hi(t)>Qi(t)+q(t), admit arrivals
  • Else, do not admit

0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1

N1: BP N2: FF

slide-38
SLIDE 38

Grouped-Backpressure – Analysis Idea

  • Update γi(t)
  • Admit packets:
  • If Hi(t)>Qi(t)+q(t), admit arrivals
  • Else, do not admit

H1(t), H2(t) are bdd q(t) is bounded

0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1

N1: BP N2: FF

slide-39
SLIDE 39

Grouped-Backpressure – Analysis Idea

0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1

H1(t), H2(t) are bdd q(t) is bounded Rates into Q5(t), Q6(t) are (1-ε)/2<0.5 Q5(t), Q6(t) stable N1: BP N2: FF

slide-40
SLIDE 40

Grouped-Backpressure – Analysis Idea

0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1-ε

Q5(t), Q6(t) stable Q1(t) – Q4(t) stable by Backpressure

Network stability

N1: BP N2: FF

slide-41
SLIDE 41

Grouped-Backpressure – Intuition

The flow optimization problem: The augmented & relaxed flow opt problem: The dual form:

Due to the random arrival Taking the dual decomposition

slide-42
SLIDE 42

Grouped-Backpressure – Intuition

The flow optimization problem: The augmented & relaxed flow opt problem: The dual form:

Due to the random arrival Taking the dual decomposition Admission queue Data queue

slide-43
SLIDE 43

Step 1 - Define a Lyapnov function: Step 2 - Compute a Lyapunov drift Δ(t)=E{ L(t+1) - L(t) | X(t) } Step 3 - Plug in the opt solution of the relaxed problem, γε

*=rε *

Step 4 - Do a telescoping sum Step 5 - H(t) is stable

Grouped-Backpressure – Proof Steps

slide-44
SLIDE 44

Grouped-Backpressure – Simulation*

Setting: 16x16 Benes network, ε=0.01, utility=log(1+r)

* This is the idealized algorithm ….

slide-45
SLIDE 45

Grouped-Backpressure – Simulation

Setting: 16x16 Benes network, ε=0.01, utility=log(1+r) Note: For 1Gbps links and 500-Byte packets

1.8ms 0.5ms 1ms

slide-46
SLIDE 46

Grouped-Backpressure – Simulation

Delay versus network size – logarithmic growth

V=20, ε=0.01 Delay reduced by “biasing” BP

slide-47
SLIDE 47

Assume each packet has 500 bytes, each link has 1Gbit/second. Then every slot is 4 microsecond. 1ms 0.7ms 0.6ms 0.5ms 0.35ms 0.6ms 0.5ms 0.4ms 0.3ms 0.26ms

Grouped-Backpressure – Simulation

Delay versus network size – logarithmic growth

V=20, ε=0.01

slide-48
SLIDE 48

Setting: 16x16 Benes network, ε=0.01, utility=wlog(1+r)

Grouped-Backpressure – Simulation

Adaptation to change of traffic – At time 5, weights wsd change

slide-49
SLIDE 49

Summary

  • Using Benes network and Backpressure for data center

networking

  • Scalable: built with basic switch modules
  • Simple: four queues per node
  • Small delay: logarithmic in network size
  • High throughput: supports all rates in capacity region
  • Distributed: hop-by-hop routing and scheduling
  • Future research: Implementation issues
slide-50
SLIDE 50

50

Thank you very much !

More info: www.eecs.berkeley.edu/~huang