The Next Generation Lossless Network in the Data Center BrightTalk, - PowerPoint PPT Presentation

IEEE 802 Industry Connections Report The Next Generation Lossless Network in the Data Center BrightTalk, Data Center Transformation 3.0, January 2019 Paul Congdon, PhD

Disclaimer All speakers presenting information on IEEE standards speak as individuals, and their views ⚫ should be considered the personal views of that individual rather than the formal position, explanation, or interpretation of the IEEE. Page 3

Acknowledgements The initial technical contribution and sponsorship for this work was provided by Huawei ⚫ Technologies Co., Ltd. This presentation summaries work from the IEEE 802 Network Enhancements for the ⚫ Next Decade Industry Connections Activity (Nendica). Nendica : IEEE 802 “Network Enhancements for the Next Decade” Industry Connections ⚫ Activity An IEEE Industry Connections Activity  Organized under the IEEE 802.1 Working Group  https://1.ieee802.org/802-nendica/  Report Freely Available at: https://ieeexplore.ieee.org/document/8462819  Page 4

Our Digital Lives are driving Innovation in the DC Interactive Interactive Speech Image Recognition Recognition Human / Machine Interaction Autonomous Driving Page 5

Critical Use Case – Online Data Intensive Services (OLDI) • OLDI applications have real-time Aggregator Request deadlines and run in parallel on 1000s Deadline = 250 ms Deadline = 50 ms of servers. • Incast is a naturally occurring Aggregator Aggregator … Aggregator phenomenon. Deadline = 10 ms • Tail latency reduces the quality of the Worker Worker … Worker Worker … Worker results Page 6

Critical Use Case – Deep Learning • Massively parallel HPC applications, such AI training, are dependent on low latency and high throughput network. • Billions of parameters. Rank 0 … Partition 0 • Scale out is limited by network Rank 1 performance. … Partition 1 Rank 2 … Partition 2 Comsumed Time Sweet Spot Start Elapsed Time Dataset Computing Time Feed Data Training Network Time Overall Time MPI Allreduce Weights Send Weight Number of Computing Nodes Page 7

Critical Use Case – NVMe Over Fabrics • Disaggregated resource pooling, such as NVMe over Fabrics, use RDMA and run over converged network infrastructure. • Low latency and lossless are critical. • Ease of deployment and cloud scale are important success factors. Page 8

Critical Use Case – Cloudification of the Central Office Traditional Central Office Cloudified Central Office • Massive growth in Mobile and Orchestration Base-Band CDN Units Internet traffic is driving … Network Function Virtualization Infrastructure investment Firewall BRAS • To meet performance requirements IP VPN Telephony of traditional purpose built Standard High-Speed DPI equipment, SDN and NFV must run Ethernet Storage Switches on low-latency, low-loss, scalable and highly available network Subscribers Subscribers infrastructure Page 9

We are dealing with massive amounts of data and computing Divide and Conquer Cloud Infrastructure Neural Network Requirements: High Speed Network Storage • Fast-scalable storage • Parallel applications and data Real-time Natural • Cloud-ified Infrastructure Human/Machine Response Page 10

Congestion Creates the Problems Packet Loss Massive Data Network Latency Massive Compute Congestion Loss Parallelism can create congestion which leads to Massive Messaging loss making end-user Throughput unhappy Loss Page 11

The Impact of Congestion in Lossless Network The impact of congestion on network performance can be very serious. ⚫ As shown in paper (Pedro J. Garcia et al, IEEE Micro 2006) [1]: ⚫ Injecting hot-spot traffic Injecting hot-spot traffic Throughput diminishing by 70% Latency increasing of three orders of magnitude Network Throughput and Generated Traffic Average Packet Latency Network Performance Degrades Dramatically after Congestion Appears [1] Garcia, Pedro Javier, et al. "Efficient, scalable congestion management for interconnection networks." IEEE Micro 26.5 (2006): 52-66. Page 12

Dealing with Congestion today Explicit Congestion Notification (ECN) + Priority-based Flow Control (PFC) ECMP – Equal Cost MultiPath Routing ECN Congestion Feedback PFC Congestion … … … … … … ECN Mark ECMP … … … … … … … … … … … … Page 13

Ongoing challenges with congestion ECN Control Loop Delay Head-of-line Blocking ECMP Collisions ECN Congestion Feedback PFC Congestion 30G 30G 30G 15G HOLB … … … … … … ECN Mark 30G 30G ECMP 30G 15G … … … … … … … … … … … … 40G 40G Links Links Page 14

Potential New Lossless Technologies for the Data Center Goal = No Loss No Packet Loss ⚫ No Latency Loss ⚫ No Throughput Loss ⚫ Solutions Virtual Input Queuing - VIQ ⚫ Dynamic Virtual Lanes - DVL ⚫ Load-Aware Packet Spraying - LPS ⚫ Push & Pull Hybrid Scheduling - PPH ⚫ Page 15

VIQ (Virtual Input Queues) ： Resolve Internal Packet Loss Incast Congestion leading to Coordinated egress-ingress queuing internal packet loss PFC threshold 1. During incast scenario, ingress queue counter doesn’t exceed the PFC threshold, so will not send PFC Pause Ingress queue counter frame to upstream. Packet will always come in from ingress port. Egress queue Ingress queue counter 2. But the physical egress queue has backlog because of convergence effect. VIQ could be looked as: that on out port, assign a dedicated queue for Packet loss occurs without egress- every in port. Memory changes from sharing to virtually monopolized ingress coordination. according to in ports. So that every in port could get fair scheduling. The tail latency of business could be controlled effectively. PFC threshold Page 16

DVL (Dynamic Virtual Lanes) 2 2 1 Upstream 3 1 Downstream 3 4 4 Ingress Port Egress Port Ingress Port Egress Port (Virtual Queues) (Virtual Queues) 1. Identify the flow Congested Flows causing congestion Non-Congested Flows and isolate locally 2. Signal to neighbor CIP when congested queue fills 3. Upstream isolates the flow too, eliminating Eliminate head-of-line blocking HoL Blocking PFC 4. If congested queue continues to fill, invoke PFC for lossless Page 17

LPS (Load-Aware Packet Spraying) Load Balancing Design Space Framework State Granularity ◼ Centralized ◼ Stateless ◼ Flow (e.g. Hedera, B4, SWAN) Slow to react for Data Centers ◼ Local ◼ Flowlet (e.g. ECMP, Flare, LocalFlow) ◼ Distributed Poor handling of asymmetric traffic ◼ Flowcell Notes ◼ Global ◼ Packet May require packet re-ordering LPS = Packet Spraying + Endpoint Reordering + Load-Aware Page 18

PPH (Push & Pull Hybrid Scheduling) Light load: All Light congestion: Heavy load: All Push. Acquire low Open Pull for part of Pull. Reduce latency. the congested path queuing delay, improve throughput. PPH = Congestion aware traffic scheduling Request (Pull) Push Data Push Data Push when load is light Grant Long RTT (Pull) Pull when load is high Request Pull Data Short RTT (Pull) Spine Spine Spine Spine 1 Request Request Grant 2 Grant 3 Data Data Leaf Leaf Leaf Leaf Leaf Leaf … … … … … … source source destination Page 19

Innovation for the Lossless Network Innovation Congestion Impact Mitigating Congestion Ingress thresholds unrelated Coordinate egress availability to egress buffer availability. Coordinated Virtual Input Queues with ingress demand. Avoid Incast causes internal packet Resources internal switch packet loss loss. Allow time for end-to-end Priority-based Flow Control congestion control. Move Isolate Dynamic Virtual Lane (Coarse grain). Victim flows Congestion congested flows out of the way. hurt by the congested flows Eliminate head-of-line blocking. Unbalanced load sharing. Load-balance flows at higher Spread the Load-aware Packet Spraying Elephant flow collisions block granularity. Use congestion Load mice flows. awareness to avoid collisions Unscheduled and network Source Source Scheduling decision integrated Schedule resource unaware many-to- Push & Pull Hybrid Scheduling the information from source, Appropriately Network Network one communication leads to network and destination. incast packet loss Destination Destination Page 20

Thank You Page 21

The Next Generation Lossless Network in the Data Center BrightTalk, - PowerPoint PPT Presentation

IEEE 802 Industry Connections Report The Next Generation Lossless Network in the Data Center BrightTalk, Data Center Transformation 3.0, January 2019 Paul Congdon, PhD Disclaimer All speakers presenting information on IEEE standards speak as

VIDEO SIGNALS Lossless coding g LOSSLESS CODING LOSSLESS CODING The goal of lossless image

Lossless compression in lossy compression systems Almost every lossy compression system

Lecture 3 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National

Lossless Congestion Control Motivation Control packet retransmissions, which is undesirable for

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding

Lecture 3 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National

Next Generation Next Generation gTLD Dir gTLD Directory Services ectory Services Pr

Next Generation Climate Next Generation Climate Grades 6-8 Supports NGSS Lots of graphs and

Next Generation ACO Model Open Door Forum: Next Generation ACO Application Overview March 29,

Next Generation ACO Model Open Door Forum: Next Generation ACO Application Overview March 14,

Video Consoles - The Next Generation consoles and games from Next Generation 1994 - present

TOWARDS LOSSLESS DATA CENTER RECONFIGURATION: CONSISTENT NETWORK UPDATES IN SDNS KLAUS-TYCHO

Next Generation Lighting Industry Alliance Keith Cook Keith Cook Chair Chair The Next

sentecacommerce.com Evolution of eCommerce 1 st Generation 2nd Generation Next Generation ERP

Introduction Evolution of the Network Architectures Next Gen Next Generation Network

HIV tropism assessment HIV tropism assessment HIV tropism assessment HIV tropism assessment

PennyMac Mortgage Investment Trust May 2020 Investor Presentation Forward-Looking Statements

2 0 1 9 VAS-aggregator VAS-platform vendor Content and Service provider Anti-Fraud service

Alternative arrangements for renewables 3 rd October 2012 These slides are initial thoughts to

Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers Franois Tessier

Analyzing Streamers By Jose Arroyo Platform of Choice Catalogue Size Over 58 million Over

Verification System PUBLIC INPUT 2019 Nancy Nikolas Maier, Aging Services Division Director

Pregel Large-Scale Graph Processing William Jones Analysing large graphs is hard. We are

New Distribution Capability (NDC) NDC Certification February 2016 NDC Certification. Why? To