Neural Packet Routing Shihan Xiao , Haiyan Mao, Bo Wu, Wenjie Liu, - - PowerPoint PPT Presentation

neural packet routing
SMART_READER_LITE
LIVE PREVIEW

Neural Packet Routing Shihan Xiao , Haiyan Mao, Bo Wu, Wenjie Liu, - - PowerPoint PPT Presentation

Neural Packet Routing Shihan Xiao , Haiyan Mao, Bo Wu, Wenjie Liu, Fenglin Li Network Technology Lab, Huawei Technologies Co., Ltd., Beijing, China 1 Motivation Todays distributed routing protocols Future network expectations Flexible


slide-1
SLIDE 1

Neural Packet Routing

Shihan Xiao, Haiyan Mao, Bo Wu, Wenjie Liu, Fenglin Li Network Technology Lab, Huawei Technologies Co., Ltd., Beijing, China

1

slide-2
SLIDE 2

Motivation

  • Advantage

– Connectivity guarantee

  • Disadvantage

– 1. Difficult to be extended to satisfy flexible optimization goals – 2. Big time and human costs to design and tune the configurations to achieve the

  • ptimal

2

Today’s distributed routing protocols

  • Flexible optimization goals beyond

connectivity guarantee

– 5G applications desire lowest end-to-end delay – Industry network applications require deterministic end-to-end delay

  • Less human costs to achieve the optimal

– Future network is expected to be highly automated with less and less human costs

Future network expectations

slide-3
SLIDE 3

Motivation

  • Advantage

– Connectivity guarantee

  • Disadvantage

– 1. Difficult to be extended to satisfy flexible optimization goals – 2. Big time and human costs to design and tune the configurations to achieve the

  • ptimal

3

Today’s distributed routing protocols

  • Flexible optimization goals beyond

connectivity guarantee

– 5G applications desire lowest end-to-end delay – Industry network applications require deterministic end-to-end delay

  • Less human costs to achieve the optimal

– Future network is expected to be highly automated with less and less human costs

Future network expectations

Can we achieve the flexible and automated optimal protocol design at the same time?

slide-4
SLIDE 4

Motivation

4

  • Deep learning can be seen as one potential way for achieving both the

flexible and automated optimality of distributed routing

Deep learning in multi-agent game surpass human performance Line-rate neural network inference in future switches

Figure source: https://arxiv.org/abs/2002.08987 Figure source: https://arxiv.org/abs/1807.01281 [Swamy et al., 2020] [DeepMind, 2018]

slide-5
SLIDE 5

Motivation

  • Deep learning is a good start
  • A simple learning-based distributed routing framework

5

Neural Network Packet ID Forwarding port

  • Key idea: train a deep neural network at each node (router/switch) to compute the forwarding

port for each packet

slide-6
SLIDE 6

Motivation

  • Deep learning is a good start
  • A simple learning-based distributed routing framework

6

Question about the “learning safety”: What will happen if neural network makes mistakes?

  • Key idea: train a deep neural network at each node (router/switch) to compute the forwarding

port for each packet

Neural Network Packet ID Forwarding port

slide-7
SLIDE 7

Motivation

  • Deep learning is a good start, but there is still a reality “gap”
  • A simple learning-based distributed routing framework

7

Persistent routing loops generated by NN error!

3 1 2 5 4 6 7 8 Correct shortest path Routing loops (computed by NN)

Simulation of shortest-path supervised learning with 97.5% training accuracy

slide-8
SLIDE 8

Motivation

  • Deep learning is a good start, but there is still a reality “gap”
  • A simple learning-based distributed routing framework

8

Persistent routing loops generated by NN error!

3 1 2 5 4 6 7 8 Correct shortest path Routing loops (computed by NN)

Simulation of shortest-path supervised learning with 97.5% training accuracy

The inference error in deep learning is unavoidable. Can we still achieve reliability guarantee while keeping the advantages of deep learning?

slide-9
SLIDE 9

Solution: Neural Guided Routing (NGR)

  • Overview of NGR design:

– 1. A reliable distributed routing framework – 2. Combine deep learning into the framework – 3. Handle the topology changes

9

slide-10
SLIDE 10

Solution: Neural Guided Routing (NGR)

  • A reliable distributed routing framework

– We define a routing path is reliable if it reaches the destination without any persistent loops/blackholes

  • Desired properties of the routing framework

– 1. Controllable:

  • It has some parameters W to directly control the routing path for each packet

– 2. Optimality capacity:

  • Any reliable routing path can be implement by setting proper W

– 3. Error-tolerant and reliability guarantee:

  • It always generates a reliable routing path no matter what errors happen in setting W

10

slide-11
SLIDE 11

Solution: Neural Guided Routing (NGR)

  • The challenge in finding such a routing framework

11

Neural Network Packet ID Forwarding port

Solution 1: a direct port computing 1. Controllable 2. Optimality capacity 3. Error-tolerant and reliablity guarantee

slide-12
SLIDE 12

Solution: Neural Guided Routing (NGR)

  • The challenge in finding such a routing framework

12

Neural Network Packet ID Forwarding port

Solution 1: a direct port computing 1. Controllable 2. Optimality capacity 3. Error-tolerant and reliability guarantee Solution 2: triangle-constraint routing 1. Controllable 2. Optimality capacity 3. Error-tolerant and reliability guarantee

Triangle constraint: only use neighboring nodes that are closer to the destination as the next hop Triangle-constraint routing Optimal routing

Optimality Gap

slide-13
SLIDE 13

Solution: Neural Guided Routing (NGR)

  • The challenge in finding such a routing framework

13

Neural Network Packet ID Forwarding port

Solution 1: a direct port computing 1. Controllable 2. Optimality capacity 3. Error-tolerant and reliability guarantee Solution 2: triangle-constraint routing 1. Controllable 2. Optimality capacity 3. Error-tolerant and reliability guarantee

Triangle constraint: only use neighboring nodes that are closer to the destination as the next hop Triangle-constraint routing Optimal routing

Optimality Gap

Key Question: Can we find a framework satisfying all the desired properties?

slide-14
SLIDE 14

Solution: Neural Guided Routing (NGR)

  • NGR proposes new routing framework S-LRR following the link reversal theory

– Key idea: assign a value to each node, and link directions are defined from high-value node to low-value node; add updated node value into packet head to implement the link reversal

14

  • Forwarding rule 1: the next-hop node can only be selected from lower-value neighboring nodes; when

there are multiple choices of next-hop nodes, select the one with lowest value Workflow example: A packet with destination D arrives at node A

B A C D ValueA=3 ValueB=2 ValueC=1 ValueD=0

Next-hop node is C

slide-15
SLIDE 15

Solution: Neural Guided Routing (NGR)

15

B A C D

  • Forwarding rule 2: if current node is a sink node (i.e., no out-links), perform the link reversal operation
  • -change the current node value to new Value = max{neighboring node values} + 1
  • -add the key-value pair {current node index, new Value} into packet head
  • -next-hop node will extract the packet head to get the newest node value

B A C D S-LRR: add {C, 4} to the packet head ValueA=3 ValueB=2 ValueC= max{ValueA, ValueB}+1 = 4 ValueD=0 vA=3, vB=2, vC=1, vD=0

  • NGR proposes new routing framework S-LRR following the link reversal theory

– Key idea: assign a value to each node, and link directions are defined from high-value node to low-value node; add updated node value into packet head to implement the link reversal

slide-16
SLIDE 16

Solution: Neural Guided Routing (NGR)

16

B A C D vA=3, vB=2, vC=1, vD=0

Repeat using the following rules until reach the destination (guaranteed by the link reversal theory):

  • Forwarding rule 1: the next-hop node can only be selected from lower-value neighboring nodes; when there are

multiple choices of next-hop nodes, select the one with lowest value

  • Forwarding rule 2: if current node is a sink node, perform the link reversal operation

B A C D S-LRR: add {C, 4} to the packet head B A C D ValueA=2 ValueB=3 ValueC=4 ValueD=0

  • NGR proposes new routing framework S-LRR following the link reversal theory

– Key idea: assign a value to each node, and link directions are defined from high-value node to low-value node; add updated node value into packet head to implement the link reversal

slide-17
SLIDE 17

Solution: Neural Guided Routing (NGR)

  • Combining deep learning into the routing framework

– Key idea: the final routing path is controlled by the node values – NGR trains a neural network at each node to compute the node values, so that the neural network can learn to optimize the routing path directly

17

Packet ID … … …

Value vector S-LRR Algorithm

Forwarding Port

Forwarding NN

Computation Module: Neural-Guided Forwarding

slide-18
SLIDE 18

Solution: Neural Guided Routing (NGR)

  • Combining deep learning into the routing framework

– Key idea: the final routing path is controlled by the node values – NGR trains a neural network at each node to compute the node values, so that the neural network can learn to optimize the routing path directly

18

Packet ID … … …

Value vector S-LRR Algorithm

Forwarding Port

Forwarding NN

Computation Module: Neural-Guided Forwarding

The above framework is 1) Controllable based on node values 2) Error-tolerant with reliability guarantee based on link reversal theory But what about its optimality capacity?

slide-19
SLIDE 19

Solution: Neural Guided Routing (NGR)

  • Combining deep learning into the routing framework

– To achieve the optimality capacity of the combined deep-learning framework, a fine-grained patch is required

  • Each node is assigned two values separately, termed the Prime value and Secondary value

19

  • Forwarding rule 1: the next-hop node can only be

selected from lower-value neighboring nodes; when there are multiple choices of next-hop nodes, select the

  • ne with lowest value
  • Forwarding rule 2: if current node is a sink node, perform

the link reversal operation Prime value: Decide the feasible set of next-hop nodes Secondary value: Decide the final next-hop selection

slide-20
SLIDE 20

Solution: Neural Guided Routing (NGR)

  • Combining deep learning into the routing framework

– To achieve the optimality capacity of the combined deep-learning framework, a fine-grained patch is required

  • Each node is assigned two values separately, termed the Prime value and Secondary value

20

  • Forwarding rule 1: the next-hop node can only be

selected from lower-value neighboring nodes; when there are multiple choices of next-hop nodes, select the

  • ne with lowest value
  • Forwarding rule 2: if current node is a sink node, perform

the link reversal operation Prime value: Decide the feasible set of next-hop nodes Secondary value: Decide the final next-hop selection

We prove that the above framework can achieve the reliability guarantee while keeping the optimality capacity of deep learning

slide-21
SLIDE 21

Solution: Neural Guided Routing (NGR)

  • Handle the topology changes

– NGR uses Graph Neural Network (GNN) to handle topology changes – When topology changes (e.g., link failures),

  • Step 1. Each node informs its local hidden vector to the

neighboring nodes

  • Step 2. Each node aggregates the hidden vectors received

from neighboring nodes to update its local hidden vector

  • Step 3. Go to step 1 until the local hidden vector does not

change (or reach a pre-defined number of steps)

– In this way, the new topology information is embedded into the node feature vector

21 ① ① ① ①

③ ③ ③ ③

slide-22
SLIDE 22

Solution: Neural Guided Routing (NGR)

  • Putting all the pieces together

22

GNN Module

Packet ID … … …

Value vector S-LRR Algorithm

Feature vectors from neighboring nodes Forwarding Port update

Forwarding NN

Feature Vector

Communication Module: GNN Computation Module: Neural-Guided Forwarding

slide-23
SLIDE 23

Evaluation

  • Topology

– Internet zoo topologies [Knight et al., 2011]

  • Max node count: 9, Max link count: 20
  • NGR parameter

– Forwarding NN:

  • Input: one-hot vectors of node index and the topology

feature vector

  • Three hidden layers
  • Output: the node value

– GNN:

  • Aggregation: GRU with output feature vector of dim 200
  • Compared solution

– Non-learning based:

  • Shortest path routing (SPF)
  • Triangle-constraint-based routing (PEFT)

– Learning based:

  • GQNN [Geyer et al., 2018]
  • Tasks

– Load balancing – Shortest path routing

23 Figure source: http://www.topology-zoo.org/

slide-24
SLIDE 24

Evaluation

24

NGR shows near-optimal performance in different traffic demands

slide-25
SLIDE 25

Evaluation

25

  • 1. SPF has more than twice blackholes that of GQNN and NGR
  • 2. GQNN generates routing loops when more link failures happen due to neural network errors
  • 3. NGR does not generate any loops or blackholes in all the cases, at the cost of slightly longer paths

Simulation of routing reliability when link fails

slide-26
SLIDE 26

26

Evaluation

  • 1. NGR achieves a small path length expansion ratio below 6% under all the cases
  • 2. The link reversal overhead in NGR keeps small and increases slowly with more link failures

Simulation of packet-head overheads and path lengths of NGR when link fails

slide-27
SLIDE 27

Conclusion

  • NGR: reliable deep-learning-based distributed routing

– Reliable: connectivity guarantee based on the link reversal theory – Flexible: support flexible customized optimization goals – Optimal: attain the capacity to achieve the optimality with respect to the given optimization goal

  • Future work

– Reinforcement learning to learn more complex optimization objective (e.g., delay) – Scalability and generalization for large-scale network topologies – Line-rate neural network inference in network devices

27

slide-28
SLIDE 28

References

  • [Swamy et al., 2020] Swamy, T. and Rucker, A. and Shahbaz, M. and Olukotun, K.

– Taurus: An Intelligent Data Plane – https://arxiv.org/abs/2002.08987

  • [DeepMind, 2018] Jaderberg, M. and Czarnecki, W.M. and Dunning, I. and et al.

– Human-level performance in first-person multiplayer games with population-based deep reinforcement learning – https://arxiv.org/abs/1807.01281

  • [Geyer et al., 2018] Geyer, F. and Carle, G.

– Learning and Generating Distributed Routing Protocols Using Graph-Based Deep Learning – Proceedings of the 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks. 2018

  • [Liu et al. 2013] Liu, J., Panda, A., Singla, A., Godfrey, B., Schapira, M., & Shenker, S.

– Ensuring Connectivity via Data Plane Mechanisms – NSDI 2013

  • [Knight et al., 2011] Knight, S. and Nguyen, H.X. and Falkner, N. and Bowden, R. and Roughan, M.

– The Internet Topology Zoo – IEEE J. Sel. Areas Commun., 29(9):1765–1775.

28

slide-29
SLIDE 29

29