Neural Packet Routing
Shihan Xiao, Haiyan Mao, Bo Wu, Wenjie Liu, Fenglin Li Network Technology Lab, Huawei Technologies Co., Ltd., Beijing, China
1
Neural Packet Routing Shihan Xiao , Haiyan Mao, Bo Wu, Wenjie Liu, - - PowerPoint PPT Presentation
Neural Packet Routing Shihan Xiao , Haiyan Mao, Bo Wu, Wenjie Liu, Fenglin Li Network Technology Lab, Huawei Technologies Co., Ltd., Beijing, China 1 Motivation Todays distributed routing protocols Future network expectations Flexible
Shihan Xiao, Haiyan Mao, Bo Wu, Wenjie Liu, Fenglin Li Network Technology Lab, Huawei Technologies Co., Ltd., Beijing, China
1
– Connectivity guarantee
– 1. Difficult to be extended to satisfy flexible optimization goals – 2. Big time and human costs to design and tune the configurations to achieve the
2
Today’s distributed routing protocols
connectivity guarantee
– 5G applications desire lowest end-to-end delay – Industry network applications require deterministic end-to-end delay
– Future network is expected to be highly automated with less and less human costs
Future network expectations
– Connectivity guarantee
– 1. Difficult to be extended to satisfy flexible optimization goals – 2. Big time and human costs to design and tune the configurations to achieve the
3
Today’s distributed routing protocols
connectivity guarantee
– 5G applications desire lowest end-to-end delay – Industry network applications require deterministic end-to-end delay
– Future network is expected to be highly automated with less and less human costs
Future network expectations
Can we achieve the flexible and automated optimal protocol design at the same time?
4
flexible and automated optimality of distributed routing
Deep learning in multi-agent game surpass human performance Line-rate neural network inference in future switches
Figure source: https://arxiv.org/abs/2002.08987 Figure source: https://arxiv.org/abs/1807.01281 [Swamy et al., 2020] [DeepMind, 2018]
5
Neural Network Packet ID Forwarding port
port for each packet
6
Question about the “learning safety”: What will happen if neural network makes mistakes?
port for each packet
Neural Network Packet ID Forwarding port
7
Persistent routing loops generated by NN error!
3 1 2 5 4 6 7 8 Correct shortest path Routing loops (computed by NN)
Simulation of shortest-path supervised learning with 97.5% training accuracy
8
Persistent routing loops generated by NN error!
3 1 2 5 4 6 7 8 Correct shortest path Routing loops (computed by NN)
Simulation of shortest-path supervised learning with 97.5% training accuracy
The inference error in deep learning is unavoidable. Can we still achieve reliability guarantee while keeping the advantages of deep learning?
– 1. A reliable distributed routing framework – 2. Combine deep learning into the framework – 3. Handle the topology changes
9
– We define a routing path is reliable if it reaches the destination without any persistent loops/blackholes
– 1. Controllable:
– 2. Optimality capacity:
– 3. Error-tolerant and reliability guarantee:
10
11
Neural Network Packet ID Forwarding port
Solution 1: a direct port computing 1. Controllable 2. Optimality capacity 3. Error-tolerant and reliablity guarantee
12
Neural Network Packet ID Forwarding port
Solution 1: a direct port computing 1. Controllable 2. Optimality capacity 3. Error-tolerant and reliability guarantee Solution 2: triangle-constraint routing 1. Controllable 2. Optimality capacity 3. Error-tolerant and reliability guarantee
Triangle constraint: only use neighboring nodes that are closer to the destination as the next hop Triangle-constraint routing Optimal routing
Optimality Gap
13
Neural Network Packet ID Forwarding port
Solution 1: a direct port computing 1. Controllable 2. Optimality capacity 3. Error-tolerant and reliability guarantee Solution 2: triangle-constraint routing 1. Controllable 2. Optimality capacity 3. Error-tolerant and reliability guarantee
Triangle constraint: only use neighboring nodes that are closer to the destination as the next hop Triangle-constraint routing Optimal routing
Optimality Gap
Key Question: Can we find a framework satisfying all the desired properties?
– Key idea: assign a value to each node, and link directions are defined from high-value node to low-value node; add updated node value into packet head to implement the link reversal
14
there are multiple choices of next-hop nodes, select the one with lowest value Workflow example: A packet with destination D arrives at node A
B A C D ValueA=3 ValueB=2 ValueC=1 ValueD=0
Next-hop node is C
15
B A C D
B A C D S-LRR: add {C, 4} to the packet head ValueA=3 ValueB=2 ValueC= max{ValueA, ValueB}+1 = 4 ValueD=0 vA=3, vB=2, vC=1, vD=0
– Key idea: assign a value to each node, and link directions are defined from high-value node to low-value node; add updated node value into packet head to implement the link reversal
16
B A C D vA=3, vB=2, vC=1, vD=0
Repeat using the following rules until reach the destination (guaranteed by the link reversal theory):
multiple choices of next-hop nodes, select the one with lowest value
B A C D S-LRR: add {C, 4} to the packet head B A C D ValueA=2 ValueB=3 ValueC=4 ValueD=0
– Key idea: assign a value to each node, and link directions are defined from high-value node to low-value node; add updated node value into packet head to implement the link reversal
– Key idea: the final routing path is controlled by the node values – NGR trains a neural network at each node to compute the node values, so that the neural network can learn to optimize the routing path directly
17
Packet ID … … …
Value vector S-LRR Algorithm
Forwarding Port
Forwarding NN
Computation Module: Neural-Guided Forwarding
– Key idea: the final routing path is controlled by the node values – NGR trains a neural network at each node to compute the node values, so that the neural network can learn to optimize the routing path directly
18
Packet ID … … …
Value vector S-LRR Algorithm
Forwarding Port
Forwarding NN
Computation Module: Neural-Guided Forwarding
The above framework is 1) Controllable based on node values 2) Error-tolerant with reliability guarantee based on link reversal theory But what about its optimality capacity?
– To achieve the optimality capacity of the combined deep-learning framework, a fine-grained patch is required
19
selected from lower-value neighboring nodes; when there are multiple choices of next-hop nodes, select the
the link reversal operation Prime value: Decide the feasible set of next-hop nodes Secondary value: Decide the final next-hop selection
– To achieve the optimality capacity of the combined deep-learning framework, a fine-grained patch is required
20
selected from lower-value neighboring nodes; when there are multiple choices of next-hop nodes, select the
the link reversal operation Prime value: Decide the feasible set of next-hop nodes Secondary value: Decide the final next-hop selection
We prove that the above framework can achieve the reliability guarantee while keeping the optimality capacity of deep learning
– NGR uses Graph Neural Network (GNN) to handle topology changes – When topology changes (e.g., link failures),
neighboring nodes
from neighboring nodes to update its local hidden vector
change (or reach a pre-defined number of steps)
– In this way, the new topology information is embedded into the node feature vector
21 ① ① ① ①
②
③ ③ ③ ③
22
GNN Module
Packet ID … … …
Value vector S-LRR Algorithm
Feature vectors from neighboring nodes Forwarding Port update
Forwarding NN
Feature Vector
Communication Module: GNN Computation Module: Neural-Guided Forwarding
– Internet zoo topologies [Knight et al., 2011]
– Forwarding NN:
feature vector
– GNN:
– Non-learning based:
– Learning based:
– Load balancing – Shortest path routing
23 Figure source: http://www.topology-zoo.org/
24
NGR shows near-optimal performance in different traffic demands
25
Simulation of routing reliability when link fails
26
Simulation of packet-head overheads and path lengths of NGR when link fails
– Reliable: connectivity guarantee based on the link reversal theory – Flexible: support flexible customized optimization goals – Optimal: attain the capacity to achieve the optimality with respect to the given optimization goal
– Reinforcement learning to learn more complex optimization objective (e.g., delay) – Scalability and generalization for large-scale network topologies – Line-rate neural network inference in network devices
27
– Taurus: An Intelligent Data Plane – https://arxiv.org/abs/2002.08987
– Human-level performance in first-person multiplayer games with population-based deep reinforcement learning – https://arxiv.org/abs/1807.01281
– Learning and Generating Distributed Routing Protocols Using Graph-Based Deep Learning – Proceedings of the 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks. 2018
– Ensuring Connectivity via Data Plane Mechanisms – NSDI 2013
– The Internet Topology Zoo – IEEE J. Sel. Areas Commun., 29(9):1765–1775.
28
29