Dingo: Taming Device Drivers Leonid Ryzhyk Peter Chubb Ihor - - PowerPoint PPT Presentation

dingo taming device drivers
SMART_READER_LITE
LIVE PREVIEW

Dingo: Taming Device Drivers Leonid Ryzhyk Peter Chubb Ihor - - PowerPoint PPT Presentation

Dingo: Taming Device Drivers Leonid Ryzhyk Peter Chubb Ihor Kuz Gernot Heiser UNSW, NICTA, Open Kernel Labs (Australia) The problem with drivers 70% of OS crashes are caused by device drivers Drivers contain 1.5x-7x bugs per loc


slide-1
SLIDE 1

Leonid Ryzhyk Peter Chubb Ihor Kuz Gernot Heiser UNSW, NICTA, Open Kernel Labs (Australia)

Dingo: Taming Device Drivers

slide-2
SLIDE 2

The problem with drivers

1 Ganapathi et al. Windows XP kernel crash analysis, 2006 2 Chou et al. An Empirical study of operating system errors, 2001

  • 70% of OS crashes are caused by device drivers
  • Drivers contain 1.5x-7x bugs per loc compared to

the rest of the kernel

slide-3
SLIDE 3

Previous approaches

Dealing with faulty drivers Runtime isolation

Mach, L4, Nooks, MINIX, XFI, SafeDrive, etc.

Static analysis

SLAM, MC, Singularity, etc.

  • Performance overhead
  • T

ransparent recovery is hard

  • Detects a limited subset
  • f bugs
slide-4
SLIDE 4

The Dingo approach

Localise complexity in driver development

  • Many driver bugs are provoked by

the complexity of the OS interface Reduce bugs by improving the design of this interface

Can we develop drivers that contain fewer bugs in the first place?

slide-5
SLIDE 5

Dingo drivers Native Linux driver

Dingo for Linux

Dingo drivers Native Linux drivers Dingo runtime

slide-6
SLIDE 6

A study of driver bugs

slide-7
SLIDE 7

A study of Linux driver bugs

Driver #bugs USB RTL8150 USB-to-Ethernet adapter 827 16 EL1210a USB-to-Ethernet adapter 710 2 925 15 Generic USB network driver 1028 45 USB hub 2234 67 USB-to-serial converter 989 50 USB mass storage 803 23 IEEE1394 Ethernet controller 1413 22 SBP-2 transport protocol 1713 46 PCI 11718 123 BNX2 Ethernet adapter 5412 51 i810 frame buffer 2920 16 CMI8338 audio 2660 22 498 #loc KL5kusb101 USB-to-Ethernet apapter Firewire Mellanox InfiniHost InfiniBand adapter

slide-8
SLIDE 8

A study of Linux driver bugs

Driver

OS protocol device protocol

slide-9
SLIDE 9

A study of Linux driver bugs

Driver

OS protocol device protocol

Issuing a command to uninitialised device

Writing an invalid register value

Incorrectly managing DMA descriptors Device protocol violation examples:

slide-10
SLIDE 10

Device protocol violations

Device protocol violations

38%

slide-11
SLIDE 11

OS protocol violations

Driver

OS protocol device protocol

`

Mellanox Infinihost controller driver RESET READY

if(cur_state==IB_RESET && new_state==IB_RESET){ return 0; }

slide-12
SLIDE 12

OS protocol violations

Device protocol violations

38% 38% 20%

OS protocol violations

slide-13
SLIDE 13

Concurrency errors

5 10 15 20 25 30 35

Race in config functions: Race in hot unplug handler: Deadlock in an atomic context: Race in the data path: Race in PM functions: Uninitialised lock: Imbalanced locks: Other:

slide-14
SLIDE 14

Concurrency errors

5 10 15 20 25 30 35

Race in config functions: Race in hot unplug handler: Deadlock in an atomic context: Race in the data path: Race in PM functions: Uninitialised lock: Imbalanced locks: Other:

slide-15
SLIDE 15

Concurrency errors

5 10 15 20 25 30 35

Race in config functions: Race in hot unplug handler: Deadlock in an atomic context: Race in the data path: Race in PM functions: Uninitialised lock: Imbalanced locks: Other:

slide-16
SLIDE 16

Concurrency errors

Device protocol violations OS protocol violations

38% 38% 20% 38% 20% 19%

Concurrency errors

slide-17
SLIDE 17

Generic errors

38% 38% 20% 38% 20% 19% 38% 20% 19% 23%

Device protocol violations OS protocol violations Concurrency errors Generic errors

slide-18
SLIDE 18

Dealing with concurrency bugs

slide-19
SLIDE 19

Dealing with concurrency bugs

driver

Threads

request1 request2 irq

slide-20
SLIDE 20

Dealing with concurrency bugs

driver

Threads

request1 request2 irq

evt3

request1 request2 driver

evt2 evt1

Dingo

Events

irq

slide-21
SLIDE 21

Writing non-blocking drivers

int probe () { ... write_config_reg (); msleep(20); read_status_reg (); ... }

Linux

void probe () { ... write_config_reg (); timeout(20, probe2); } void probe2 () { read_status_reg (); ... }

Dingo

slide-22
SLIDE 22

Writing non-blocking drivers

int probe () { ... write_config_reg (); msleep(20); read_status_reg (); ... }

Linux

void probe () { simple_evt notif; ... write_config_reg (); CALL (timeout(20), notif); read_status_reg (); ... }

Dingo

slide-23
SLIDE 23

Performance of the AX88772 USB-to-Ethernet adapter driver

1 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 10 20 30 40 50 Linux Dingo CPU Utilisation (%) Number of Connections 1 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 200 400 600 800 Round-Trip (μsec)

Evaluation platform: 4 x 2GHz Itanium II (SMT, 2 threads per core)

slide-24
SLIDE 24

Impact of serialisation on performance

Special case: drivers for very-high-performance devices Solution: Re-introduce multithreading at the data path

  • Examples: 10Gb Ethernet, Infiniband
  • For such drivers, serialisation affects performance on

multiprocessors

  • Avoid concurrency bugs at the control path, while

maintaining high performance at the data path

slide-25
SLIDE 25

Performance of the Mellanox InfiniBand adapter driver

CPU Utilisation (%) Number of Connections 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 1000 2000 3000 4000 5000 Throughput (Mb/s) 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 10 20 30 40 50

Linux Dingo (serialised) Dingo (multithreaded)

slide-26
SLIDE 26

Dealing with OS protocol violations

slide-27
SLIDE 27

Modeling driver protocols with state machines

init start running stop unplugged ?start !startComplete ?stop !stopComplete ?unplugged ?unplugged ?unplugged !stopComplete

? - incoming call from the OS ! - outgoing call to the OS

slide-28
SLIDE 28

Ethernet controller protocol fragment

disabled ?enable enable enabled disable !disableComplete !enableComplete ?disable txq_stalled txq_running !txStopQueue !txStartQueue ?transmit rx ?receive ?suspend

...

slide-29
SLIDE 29

Other features of the language

Other features of the specification language:

  • Timeouts
  • Protocol variables
  • Dynamic protocol spawning
  • etc.
slide-30
SLIDE 30

Ethernet controller protocol fragment

disabled ?enable enable enabled disable !disableComplete !enableComplete ?disable txq_stalled txq_running !txStopQueue !txStartQueue ?transmit rx ?receive ?suspend

...

slide-31
SLIDE 31

Runtime failure detection

Driver

OS protocol

slide-32
SLIDE 32

Runtime failure detection

EthernetController protocol SM

Driver

OS protocol

slide-33
SLIDE 33

Evaluation

slide-34
SLIDE 34

Evaluation

How effective is Dingo in reducing driver bugs?

  • Evaluation methodology: artificially injected 61 bugs

found in similar Linux drivers into Dingo drivers

slide-35
SLIDE 35

Evaluation

How effective is Dingo in reducing driver bugs?

  • Evaluation methodology: artificially injected 61 bugs

found in similar Linux drivers into Dingo drivers

59% 21% 20%

Bugs eliminated by design Reduced likelihood Unchanged likelihood

slide-36
SLIDE 36

Summary

  • 40% of driver bugs are caused by the complexity
  • f the OS interface
  • Dingo reduces bugs through an improved design
  • f this interface
  • These improvements are implemented in an

existing operating system without sacrificing the performance