Loom: Flexible and Efficient NIC Packet Scheduling NSDI 2019 Brent - PowerPoint PPT Presentation

Loom: Flexible and Efficient NIC Packet Scheduling NSDI 2019 Brent Stephens Aditya Akella, Mike Swift

Loom is a new Network Interface Card (NIC) design that offloads all per-flow scheduling decisions out of the OS and into the NIC • Why is packet scheduling important? • What is wrong with current NICs? • Why should all packet scheduling be offloaded to the NIC? 42

Why is packet scheduling important? 43

Collocation (Application and Tenant) is Important for Infrastructure Efficiency Tenant 1 Tenant 2 CPU Isolation Policy: Tenant 1: Memcached: 3 cores Spark: 1 core Tenant 2: Spark: 4 cores 44

Network Performance Goals Different applications have differing network performance goals Low Latency High Throughput 45

Network Policies VM1 VM1 Pseudocode Tenant_1.Memcached -> Pri_1:high Tenant_1.Spark -> Pri_1:low Pri_1 Pri_1 -> RL_WAN(Dst == WAN: 15Gbps) Pri_1 -> RL_None(Dst != WAN: No Limit) RL_WAN -> FIFO_1; RL_None -> FIFO_1 FIFO_1-> Fair_1:w1 Tenants_2.Spark -> Fair_1:w1 Fair_1 -> Wire Network operators must specify and enforce a network isolation policy • Enforcing a network isolation policy requires scheduling 46

Network Policies VM1 VM1 Pseudocode Tenant_1.Memcached -> Pri_1:high Tenant_1.Spark -> Pri_1:low Pri_1 Pri_1 -> RL_WAN(Dst == WAN: 15Gbps) Pri_1 -> RL_None(Dst != WAN: No Limit) RL_WAN RL_None RL_WAN -> FIFO_1; RL_None -> FIFO_1 FIFO_1-> Fair_1:w1 FIFO_1 Tenants_2.Spark -> Fair_1:w1 Fair_1 -> Wire Network operators must specify and enforce a network isolation policy • Enforcing a network isolation policy requires scheduling 47

Network Policies VM1 VM1 VM2 Pseudocode Tenant_1.Memcached -> Pri_1:high Tenant_1.Spark -> Pri_1:low Pri_1 Pri_1 -> RL_WAN(Dst == WAN: 15Gbps) Pri_1 -> RL_None(Dst != WAN: No Limit) RL_WAN RL_None RL_WAN -> FIFO_1; RL_None -> FIFO_1 FIFO_1-> Fair_1:w1 FIFO_1 Tenants_2.Spark -> Fair_1:w1 Fair_1 Fair_1 -> Wire Wire Network operators must specify and enforce a network isolation policy • Enforcing a network isolation policy requires scheduling 48

What is wrong with current NICs? 49

Single Queue Packet Scheduling Limitations • Single core throughput is limited App 1 App 2 (although high with Eiffel) • Especially with very small packets • Energy-efficient architectures may prioritize scalability over single-core performance CPU • Software scheduling consumes CPU NIC NIC • Core-to-core communication increases latency SQ struggles to drive line-rate Wire 50

Multi Queue NIC Background and Limitations • Multi-queue NICs enable parallelism App 2 App 1 • Throughput can be scaled across many tens of cores CPU • Multi-queue NICs have packet scheduler that chose which queue to send packets from • The one-queue-per-core multi-queue NIC NIC model (MQ) attempts to enforces the policy at every core independently • This is the best possible without inter- Wire core coordination, but it is not effective MQ struggles to enforce policies! 51

MQ Scheduler Problems Naïve NIC packet scheduling prevents colocation! CPU It leads to: NIC Packet Scheduler • High latency (Network • Unfair and variable Interface Card) throughput Time (t) 52

MQ Scheduler Problems Naïve NIC packet scheduling prevents colocation! CPU It leads to: NIC Packet Scheduler • High latency (Network • Unfair and variable Interface Card) throughput Time (t) 53

Why should all packet scheduling be offloaded to the NIC? 54

Where to divide labor between the OS and NIC? CPU VM1 VM1 VM2 Pri_1 RL_WAN RL_None FIFO_1 Fair_1 NIC Wire 55

Where to divide labor between the OS and NIC? CPU VM1 VM1 VM2 Pri_1 RL_WAN RL_None FIFO_1 Option 1: Single Queue (SQ) Fair_1 NIC • Enforce entire policy in software Wire • Low Tput/High CPU Utilization 56

Where to divide labor between the OS and NIC? CPU VM1 VM1 VM2 Pri_1 Option 2: Multi Queue (MQ) • Every core independently enforces policy on local traffic RL_WAN RL_None • Cannot ensure polices are enforced FIFO_1 Option 1: Single Queue (SQ) Fair_1 NIC • Enforce entire policy in software Wire • Low Tput/High CPU Utilization 57

Where to divide labor between the OS and NIC? CPU Option 3: Loom VM1 VM1 VM2 • Every flow uses its own queue • All policy enforcement is offloaded to the NIC • Precise policy + low CPU Pri_1 Option 2: Multi Queue (MQ) • Every core independently enforces policy on local traffic RL_WAN RL_None • Cannot ensure polices are enforced FIFO_1 Option 1: Single Queue (SQ) Fair_1 NIC • Enforce entire policy in software Wire • Low Tput/High CPU Utilization 58

Loom is a new NIC design that moves all per-flow scheduling decisions out of the OS and into the NIC Loom uses a queue per flow and offloads all packet scheduling to the NIC 59

Core Problem: It is not currently possible to offload all packet scheduling because NIC packet schedulers are le and configuring them is in in infle lexib ible ineffic icie ient

Core Problem: It is not currently possible to offload all packet scheduling because NIC packet schedulers are le and configuring them is in in infle lexib ible ineffic icie ient NIC packet schedulers are currently standing in the way of performance isolation!

Outline Intro: Loom is a new NIC design that moves all per-flow scheduling decisions out of the OS and into the NIC Specification: A new network policy abstraction: restricted directed acyclic graphs (DAGs) Contributions: Enforcement: A new programmable packet scheduling hierarchy designed for NICs Updating: A new expressive and efficient OS/NIC interface Implementation and Evaluation: BESS prototype and CloudLab 62

Outline Contributions: 1. Specification: A new network policy abstraction: restricted directed acyclic graphs (DAGs) 2. Enforcement: A new programmable packet scheduling hierarchy designed for NICs 3. Updating: A new expressive and efficient OS/NIC interface 63

What scheduling polices are needed for performance isolation? How should policies be specified? 64

Solution: Loom Policy DAG Two types of nodes: Shaping Scheduling Node Node Scheduling nodes: Work-conserving policies for sharing the local link bandwidth VM1 VM1 VM2 Shaping nodes: Rate-limiting policies for sharing the network core (WAN and DCN) Pri_1 Programmability: Every node is programmable with a custom enqueue and dequeue function RL_WAN RL_None Loom can express policies that cannot be expressed with either FIFO_1 Linux Traffic Control (Qdisc) or with Domino (PIFO)! Fair_1 Important systems like BwE (sharing the WAN) and EyeQ (sharing the DCN) require Loom’s policy DAG! Wire 65

Types of Loom Scheduling Policies: Scheduling: • All of the flows from competing Spark jobs J1 and J2 in VM1 fairly share network bandwidth Shaping: All of the flows from VM1 to VM2 are • rate limited to 50Gbps 66

Types of Loom Scheduling Policies: Group by source Scheduling: • All of the flows from competing Spark jobs J1 and J2 in VM1 fairly share network bandwidth Group by destination Shaping: All of the flows from VM1 to VM2 are • rate limited to 50Gbps 67

Types of Loom Scheduling Policies: Group by source Scheduling: • All of the flows from competing Spark jobs J1 and J2 in VM1 fairly share network bandwidth Group by destination Shaping: All of the flows from VM1 to VM2 are • rate limited to 50Gbps Because Scheduling and Shaping polices may aggregate flows differently, they cannot be expressed as a tree! 68

Loom: Policy Abstraction Policies are expressed as restricted acyclic graphs (DAGs) Legend : FIFO P1 P2 P3 Parent P1 P2 Shaping R1 R2 R3 R1 R2 R3 Node Child Child Child 1 2 Child Child Scheduling Node (a) (b) (c) (d) DAG restriction: Scheduling nodes form a tree when the shaping nodes are removed (b) And (d) are prevented because they allow parents to reorder packets that were already ordered by a child node. 69

Loom: Policy Abstraction Policies are expressed as restricted acyclic graphs (DAGs) Legend : FIFO P1 P2 P3 Parent P1 P2 Shaping R1 R2 R3 R1 R2 R3 Node Child Child Child 1 2 Child Child Scheduling Node (a) (b) (c) (d) DAG restriction: Scheduling nodes form a tree when the shaping nodes are removed (b) And (d) are prevented because they allow parents to reorder packets that were already ordered by a child node. 70

Outline Contributions: 1. Specification: A new network policy abstraction: restricted directed acyclic graphs (DAGs) 2. Enforcement: A new programmable packet scheduling hierarchy designed for NICs 3. Updating: A new expressive and efficient OS/NIC interface 71

How do we build a NIC that can enforce Loom’s new DAG abstraction? 72

Loom Enforcement Challenge No existing hardware scheduler can efficiently enforce Loom Policy DAGs New PIFO Block? Domino PIFO Block 1 x 1 x 1 x N x Shaping Scheduling Scheduling Shaping Requiring separate shaping queues for every shaping traffic class would be prohibitive! 73

Insight: All shaping can be done with a single queue because all shaping can use wall clock time as a rank 74

Loom: Flexible and Efficient NIC Packet Scheduling NSDI 2019 Brent - PowerPoint PPT Presentation

Loom: Flexible and Efficient NIC Packet Scheduling NSDI 2019 Brent Stephens Aditya Akella, Mike Swift Loom is a new Network Interface Card (NIC) design that offloads all per-flow scheduling decisions out of the OS and into the NIC Why is

Dr. Oscar Moreno, Manager and Founder, moreno@nic.pr David Soltero-Lugo, david@nic.pr Pedro

CZ.NIC, .cz and DNSSEC CZ.NIC Ondrej Filip / ondrej.filip@nic.cz 25 Jul 2012 Prague / ICANN,

Loom Presentation Guidelines Loom is an additional application of Google Chrome allowing users to

Case Study Energy Saving in Textiles by Converting Shuttle loom to Rapier Loom by Ravinder &

Worm Detection ICMP Packet Analysis Ankur Agiwal 1 2 Packet Content Matching Packet

Czech registry system CZ.NIC Ondrej Filip / ondrej.filip @nic.cz 7. 12. 2006

DNS cache poisoning CZ.NIC Ondrej Filip / ondrej.filip@nic.cz Study by Emanuel Petr CZ.NIC

DNSSEC.CZ CZ.NIC - http://www.nic.cz Ondrej Filip / ondrej.filip @nic.cz Oct 26 2011, Dakar,

OpenID in domain registry CZ.NIC - http://www.nic.cz Ondrej Filip / ondrej.filip @nic.cz Dec 8

A Step Towards G-Governance NIC Initiatives Geoportal https://gismp.nic.in Vivek Chitale Senior

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Introduction to Packet Tracer What is Packet Tracer? Packet Tracer is a protocol simulator

Chapter 7 Packet-Switching Networks Routing in Packet Networks Shortest Path Routing Chapter 7

The The Beverly Beverly Middle Middle School School Flexible Flexible Learning Learning

PowerLoom Overview, Features and PowerLoom Overview, Features and Examples Examples Hans

Detecting Hidden Anomalies in DNS Communication CZ.NIC Ondrej Mikle-Barat / ondrej.mikle@nic.cz

RELATIVE STOCK / UNIT PRICE PERFORMANCE (%) 237 250 226 192 189 200 203 143 144 140 183

A Simple Asymmetric Herding Model Home Page to Distinguish Title Page Contents Among

Principles of Wireless Communications I-Hsiang Wang ihwang@ntu.edu.tw 2/20, 2014

Lecture 7 Empirical Studies of Software Evolution: Change Types Adaptive, Corrective, and

Direct Sales Bjrn Olav Johansen (University of Bergen and BECCLE) Thibaud Verg (ENSAE and

Situation coverage testing for autonomous robots Patrizio Pelliccione Associate Professor,

Understanding Machine Learning for Empirical So7ware Engineering

Welcome to the HHS HR Clinic Christine M. Major Deputy Assistant Secretary for HR Chief Human

Loom: Flexible and Efficient NIC Packet Scheduling NSDI 2019 Brent - PowerPoint PPT Presentation

Loom: Flexible and Efficient NIC Packet Scheduling NSDI 2019 Brent Stephens Aditya Akella, Mike Swift Loom is a new Network Interface Card (NIC) design that offloads all per-flow scheduling decisions out of the OS and into the NIC Why is

Dr. Oscar Moreno, Manager and Founder, moreno@nic.pr David Soltero-Lugo, david@nic.pr Pedro

CZ.NIC, .cz and DNSSEC CZ.NIC Ondrej Filip / ondrej.filip@nic.cz 25 Jul 2012 Prague / ICANN,

Loom Presentation Guidelines Loom is an additional application of Google Chrome allowing users to

Case Study Energy Saving in Textiles by Converting Shuttle loom to Rapier Loom by Ravinder &amp;

Worm Detection ICMP Packet Analysis Ankur Agiwal 1 2 Packet Content Matching Packet

Czech registry system CZ.NIC Ondrej Filip / ondrej.filip @nic.cz 7. 12. 2006

DNS cache poisoning CZ.NIC Ondrej Filip / ondrej.filip@nic.cz Study by Emanuel Petr CZ.NIC

DNSSEC.CZ CZ.NIC - http://www.nic.cz Ondrej Filip / ondrej.filip @nic.cz Oct 26 2011, Dakar,

OpenID in domain registry CZ.NIC - http://www.nic.cz Ondrej Filip / ondrej.filip @nic.cz Dec 8

A Step Towards G-Governance NIC Initiatives Geoportal https://gismp.nic.in Vivek Chitale Senior

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Introduction to Packet Tracer What is Packet Tracer? Packet Tracer is a protocol simulator

Chapter 7 Packet-Switching Networks Routing in Packet Networks Shortest Path Routing Chapter 7

The The Beverly Beverly Middle Middle School School Flexible Flexible Learning Learning

PowerLoom Overview, Features and PowerLoom Overview, Features and Examples Examples Hans

Detecting Hidden Anomalies in DNS Communication CZ.NIC Ondrej Mikle-Barat / ondrej.mikle@nic.cz

RELATIVE STOCK / UNIT PRICE PERFORMANCE (%) 237 250 226 192 189 200 203 143 144 140 183

A Simple Asymmetric Herding Model Home Page to Distinguish Title Page Contents Among

Principles of Wireless Communications I-Hsiang Wang ihwang@ntu.edu.tw 2/20, 2014

Lecture 7 Empirical Studies of Software Evolution: Change Types Adaptive, Corrective, and

Direct Sales Bjrn Olav Johansen (University of Bergen and BECCLE) Thibaud Verg (ENSAE and

Situation coverage testing for autonomous robots Patrizio Pelliccione Associate Professor,

Understanding Machine Learning for Empirical So7ware Engineering

Welcome to the HHS HR Clinic Christine M. Major Deputy Assistant Secretary for HR Chief Human

Case Study Energy Saving in Textiles by Converting Shuttle loom to Rapier Loom by Ravinder &