Processing of hardware interrupts in Linux Petr Holek, Red Hat - - PowerPoint PPT Presentation

processing of hardware interrupts in linux
SMART_READER_LITE
LIVE PREVIEW

Processing of hardware interrupts in Linux Petr Holek, Red Hat - - PowerPoint PPT Presentation

Processing of hardware interrupts in Linux Petr Holek, Red Hat August 17, 2015 HW and kernel Interrupt Hardware interrupt vs softIRQ Interrupt ReQuest from hardware In system represented as interrupt vector Pin-based vs


slide-1
SLIDE 1

Processing of hardware interrupts in Linux

Petr Holášek, Red Hat August 17, 2015

slide-2
SLIDE 2

HW and kernel

slide-3
SLIDE 3

Interrupt

  • Hardware interrupt vs softIRQ
  • Interrupt ReQuest from hardware
  • In system represented as interrupt vector
  • Pin-based vs MSI(-X)
slide-4
SLIDE 4

Pin-based IRQ

  • Triggered by electronic signal
  • Pin can be shared
  • Possible race condition
slide-5
SLIDE 5

MSI(-X)

  • Message Signaled Interrupts
  • Introduced with PCI 2.2
  • Triggered after write to an address
  • Improved version called MSI-X
slide-6
SLIDE 6

Interrupt controller

  • APIC
  • LAPIC (local APIC) - at CPU
  • IOAPIC (I/O APIC) – at device
  • Using system bus
  • APIC bus deprecated
slide-7
SLIDE 7

Interrupt handler

  • Handles received interrupt
  • Need for speed
  • Most of the work deferred
  • Using tasklets or workqueues
slide-8
SLIDE 8

Interrupt handler

  • Multiple CPUs cannot parallelize interrupt

handler

  • Only one interrupt handler running on CPU at

time

  • CPUs can alternate in handling the handler
slide-9
SLIDE 9

userspace

slide-10
SLIDE 10

Kernel interfaces

  • The only visible info for user
  • /proc/interrupts
  • /proc/irq/<irqnum>/...
  • /sys/devices/…/irq
  • /proc/stats
slide-11
SLIDE 11

Interrupt affjnity

  • Mask of possibly receiving processors
  • /proc/irq/X/smp_affinity
  • Hexadecimal mask or list
  • Its value doesn't mean much
slide-12
SLIDE 12

Interrupt distribution

  • Should be done on multiprocessor systems
  • Storage devices, NICs
  • Risk of CPU overload or cache misses
slide-13
SLIDE 13

Hardware topology

  • NUMA node
  • Package
  • Cache domain – L2 or L3
  • CPU
  • numactl tool
slide-14
SLIDE 14

Optimal affjnity layout

  • Identify and group all high-volume interrupts
  • Move them to unique single CPUs
  • Spread out lower-volume interrupts among
  • ther CPUs
  • Do it within the device NUMA node
slide-15
SLIDE 15

irqbalance

slide-16
SLIDE 16

Irqbalance

  • Interrupts load balancing daemon
  • Can improve performance and save power
  • https://github.com/Irqbalance/irqbalance
  • Support for NUMA
slide-17
SLIDE 17

Irqbalance basics

  • Balancing of interrupts is complex task
  • Periodic review of system
  • Affinity management among heterogeneous

systems

slide-18
SLIDE 18

Irqbalance basics 2

  • Don't migrate interrupt out of home NUMA

node

  • CPU load - time spent in interrupt and softIRQ

context

slide-19
SLIDE 19

Irqbalance algorithm 1

  • Parse all available interfaces
  • Evaluate overloaded processors
  • Evaluate the busiest IRQs
  • Rebalance IRQ on processors
slide-20
SLIDE 20

Irqbalance algorithm 2

  • Set new smp_affinity values
  • Wait some time and repeat
slide-21
SLIDE 21

Irqbalance options

  • Can respect affinity_hint set by driver
  • Can ignore selected IRQs
  • Can ignore isolated CPUs
slide-22
SLIDE 22

Alternatives to irqbalance

slide-23
SLIDE 23

“Premature optimization is the root of all evil.” Donald E. Knuth

slide-24
SLIDE 24

Manual pinning

  • Recent irqbalance 1.x addresses most of the

discovered bugs

  • But sometime manual pinning is still better
  • Real-time, HPC
slide-25
SLIDE 25

Rules of manual pinning

  • Don't set affinity mask to all CPUs
  • Move affinity to device rather than to process
  • Let the scheduler do its work
  • Consider faulty hardware
slide-26
SLIDE 26

Kernel IRQ balancing

  • Dropped by 8b8e8c in 2008
  • Return is not planned so far
  • Interrupt locality ideas
slide-27
SLIDE 27

Give irqbalance a second chance

  • Explore recent version
  • Some new features are coming soon
  • Try to compare manual pinning and irqbalance