Linux Kernel Issues in End Host Systems Wenji Wu, Matt Crawford - PowerPoint PPT Presentation

1 Linux Kernel Issues in End Host Systems Wenji Wu, Matt Crawford US-LHC End-to-End Networking Meeting Fermi National Accelerator Lab, 2006 wenji@fnal.gov; crawdad@fnal.gov

2 Topics Background � Linux 2.6 Characteristics � Kernel Memory Layout vs. Packet � Receiving Kernel Preemptivity vs. Linux TCP � Performance Interactivity vs. Fairness in Networked � Linux Systems

3 Background � What, Where, and How are the bottlenecks of Network Applications? � Networks? � Network End Systems? Linux is widely used in the HEP community

4 Linux 2.6 Characteristics � Preemptible Kernel � O(1) Scheduler � Improved Interactivity, more responsive � Improved Fairness � Improved Scalability

5 Kernel Memory Layout vs. Packet Receiving

6 Linux Networking subsystem: Packet Receiving Process Process Socket RCV SoftIrq TrafficSource Ring Buffer Traffic Sink Scheduler Buffer NIC Network IP TCP/UDP SOCK RCV DMA Hardware Processing Processing SYS_CALL Application NIC & Device Driver Kernel Protocol Stack Data Receiving Process Stage 1: NIC & Device Driver � Packet is transferred from network interface card to ring buffer � Stage 2: Kernel Protocol Stack � Packet is transferred from ring buffer to a socket receive buffer � Stage 3: Data Receiving Process � Packet is copied from the socket receive buffer to the application �

7 Experiment Settings Run iperf to send data in one direction between two computer systems; � We have added instrumentation within Linux packet receiving path � Compiling Linux kernel as background system load by running make – � nj Receive buffer size is set as 40M bytes � 10G Fermi Test Network 1 G 1G Cisco 6509 Cisco 6509 Receiver Sender Sender Receiver CPU Two Intel Xeon CPUs (3.0 GHz) One Intel Pentium II CPU (350 MHz) System Memory 3829 MB 256MB Tigon, 64bit-PCI bus slot at Syskonnect, 32bit-PCI bus slot at 33MHz, NIC 66MHz, 1Gbps/sec, twisted pair 1Gbps/sec, twisted pair Sender & Receiver Features

8 Receive ring buffer Running out packet descriptors Total number of packet descriptors in the reception ring buffer of the NIC is 384 Receive ring buffer could run out of its packet descriptors: Performance Bottleneck!

9 Various TCP Receive Buffer Queues Zoom in Background Load 0 Background Load 10 Receive buffer size is set as 40M bytes What do the results mean?

10 How to configure socket receive buffer size? � We usually configure the socket receive buffer to the BDP. � In real world, system administrators often configure /proc/net/ipv4/tcp_rmem high to accommodate high BDP connections. What could be wrong?

11 Linux Virtual Address Layout 3 GB 1 GB user kernel scope of a process ’ page table � 3G/1G partition � The way Linux partition a 32-bit address space � Cover user and kernel address space at the same time � Advantage � Incurs no extra overhead (no TLB flushing) for system calls � Disadvantage � With 64 GB RAM, m em _m ap alone takes up 512 MB memory from lowmem (ZONE_NORMAL).

Partition of Physical Memory 12 (Zone) 0xC0000000 0xF8000000 0xFFFFFFFF virtual vmalloc kmap address area area Kernel Direct mapping Indirect mapping Page table physical ZONE_DMA ZONE_NORMAL ZONE_HIGHMEM address End of memory 0 16 MB 896 MB This figure shows the partition of physical memory its mapping to virtual address in 3G/1G layout

13 Kernel Preemptivity vs. Linux TCP Performance

14 Preemptivity vs. Linux TCP Performance Experiment Settings Run iperf to send data in one direction between two computer systems � We have added instrumentation within Linux packet receiving path � Compiling Linux kernel as background system load by running make – � nj Receive buffer size is set as 40M bytes � 10G Fermi Test Network 1 G 1G Cisco 6509 Cisco 6509 Receiver Sender Sender Receiver CPU Two Intel Xeon CPUs (3.0 GHz) One Intel Pentium II CPU (350 MHz) System Memory 3829 MB 256MB Tigon, 64bit-PCI bus slot at Syskonnect, 32bit-PCI bus slot at 33MHz, NIC 66MHz, 1Gbps/sec, twisted pair 1Gbps/sec, twisted pair Sender & Receiver Features

Tcptrace time-sequence diagram from the sender side 15 What, Why, and How? Background Load 10

16 Kernel Protocol Stack – TCP Traffic Src Ringbuffer NIC Application DMA Hardware sys_call entry iov User Space IP Kernel Processing data N Y Sock TCP Receive Queue Copy to iov Locked? Empty? Processing N Backlog Y Y Receiving Prequeue Task exists? tcp_prequeue_process() sk_backlog_rcv() Empty? N Prequeue Y tcp_v4_do_rcv() Backlog release_sock() Slow path N Empty? Out of Sequence Fast path? Queue Y N N tcp_recvmsg() InSequence return / sk_wait_data() Copy to iovec? Y Y TCP Processing- Process context Y Copy to iovec? N Receive Except in the case of prequeue overflow, Prequeue and Queue Backlog queues are processed within the process context! Application Traffic Sink TCP Processing- Interrupt context

17 Linux Scheduling Mechanism RUNQUEUE Active Priority Array 0 Task: (Priority, Time Slice) 1 Priority (3, Ts1) 2 x Task 1 3 ... Running 138 (139, Ts2 ) (139, Ts3) Task 1 139 Task 2 Task 3 Task Time slice runs out CPU Task 1 Recalculate Priority, Time Slice 0 1 (Ts1', 2) Priority 2 Task 1' 3 ... 138 139 Expired priority Array

18 Interactivity vs. Fairness in Networked Linux Systems

19 Interactivity vs. Fairness Experiment Settings Run iperf to send data in one direction between two computer systems � We have added instrumentation within Linux kernel � Compiling Linux kernel as background system load by running make – � nj Receive buffer size is set as 40M bytes � Fermi Test Network 10G 1 G 1G Cisco 6509 Cisco 6509 Receiver Sender Fast Sender Slow Sender Receiver Two Intel Xeon CPUs One Intel Pentium IV One Intel Pentium III CPU (3.0 GHz) CPU (2.8 GHz) CPU (1 GHz) System Memory 3829 MB 512MB 512MB Syskonnect, 32bit-PCI Intel PRO/1000, 32bit- 3COM, 3C996B-T, 32bit- NIC bus slot at 33MHz, PCI bus slot at 33 MHz PCI bus slot at 33MHz, 1Gbps/sec, twisted pair 1Gbps/sec, twisted pair 1Gbps/sec, twisted pair Sender & Receiver Features

20 What? Why? How? Slow Sender Fast Sender Load Throughput CPU Share Throughput CPU Share BL0 436 Mbps 78.489% 464 Mbps 99.228% BL1 443 Mbps 81.573% 241 Mbps 49.995% BL2 438 Mbps 80.613% 159 Mbps 34.246% BL4 430 Mbps 79.217% 97.0 Mbps 20.859% BL8 440 Mbps 81.093% 74.2 Mbps 15.375%

21 Linux Scheduling Mechanism RUNQUEUE Active Priority Array 0 Task: (Priority, Time Slice) 1 Priority (3, Ts1) 2 x Task 1 3 ... Running 138 (139, Ts2 ) (139, Ts3) Task 1 139 Task 2 Task 3 Task Time slice runs out CPU Task 1 Recalculate Priority, Time Slice 0 1 (Ts1', 2) Priority 2 Task 1' 3 ... 138 139 Expired priority Array

22 Network applications vs. interactivity � A sleep_avg is stored for each process: a process is credited for its sleep time and penalized for its runtime. A process with high sleep_avg is considered interactive, and low sleep_avg is non-interactive. � network packets arrive at the receiver independently and discretely; the “relatively fast” non-interactive network process might frequently sleep to wait for network packets. Though each sleep lasts for a short period of time, the wait-for-packet sleeps occur frequently, more than enough to lead to the interactivity status. � The current Linux interactivity mechanism provides the possibilities that a non-interactive network process could consume a high CPU share, and at the same time be incorrectly categorized as “interactive.”

23 Slow Sender Fast Sender RUNQUEUE Active Priority Array 0 With Slow Sender, iperf in the receiver Task: (Priority, Time Slice) 1 Priority (3, Ts1) 2 is always categorized as interactive. x Task 1 3 ... Running 138 (139, Ts2 ) (139, Ts3) Task 1 139 Task 2 Task 3 With fast sender, iperf in the receiver is Task Time slice runs out CPU Task 1 categorized as non-interactive most of Recalculate Priority, Time Slice 0 1 (Ts1', 2) the time. Priority 2 Task 1' 3 ... 138 139 Expired priority Array

24 Contacts � Wenji Wu, wenji@fnal.gov � Matt Crawford, crawdad@fnal.gov Wide Area Systems, Fermilab, 2006

Linux Kernel Issues in End Host Systems Wenji Wu, Matt Crawford - PowerPoint PPT Presentation

1 Linux Kernel Issues in End Host Systems Wenji Wu, Matt Crawford US-LHC End-to-End Networking Meeting Fermi National Accelerator Lab, 2006 wenji@fnal.gov; crawdad@fnal.gov 2 Topics Background Linux 2.6 Characteristics Kernel

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Debugging the Linux Kernel with GDB Kieran Bingham Debugging the Linux Kernel with GDB Many

Introduction to Linux Aline Abler Aline Abler Linux, whats that? The pieces of a Linux

Linux from Sensors to Servers ! When is Linux Not Linux? ! 1 1 Linux runs across a huge range

Making C Less Dangerous in the Linux Kernel Kees Cook | @keescook LINUX.CONF.AU 21-25 January

Linux Overview Amir Hossein Payberah payberah@gmail.com 1 Agenda Linux Overview Linux

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Intro to Linux Kernel Programming Don Porter Lab 4 You will write a Linux kernel module

CS533 Concepts of Operating Systems Linux Kernel Locking Techniques Intro to kernel locking

Linux Kernel Debugging Linux Kernel Debugging Advanced Operating Systems 2018/2019

Linux Kung Fu Introduction What is Linux? Why Linux? What is the difference between a client

Introduction to Kubernetes Containers container vs virtual machine Virtual machine Container

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Linux Kernel Synchronization System Calls Synchronization in Kernel the kernel RCU File

Current Home-Host Issues in CESE Vienna, March 2009 Current Home-Host Issues in CESE Topics

1 Theres a kernel security researcher named Dan Rosenberg whose done a lot of linux kernel

EUROPA: Efficient User-Mode Packet Forwarding in Network Virtualization Virtualization Yong

OpenIPmap Geolocating Internet Infra-Structure with Inference Engines and Crowdsourcing Jasper

Synthetic Networks Luke Osterritter losterritter@cmu.edu Center for Computational Analysis of

Advanced Network Security 2. Distributed Algorithms: Leader Election Jaap-Henk Hoepman Digital

Nonzero-Sum Games Among Arbitrarily Many Players John Thistle (joint work with Hadi Zibaeenejad)

Leader Election in a Synchronous Ring Paulo S ergio Almeida Distributed Systems Group

Mobile Agents Rendezvous in Mesh-Networks in spite of a Malicious Agent Shantanu Das 1 , Flaminia

Judith Providence Computer Architecture CS 654 Outline Background/Motivation

Sambuz

Useful Links

Newsletter

Mail Us