SLIDE 1
Leonid Ryzhyk Peter Chubb Ihor Kuz Gernot Heiser UNSW, NICTA, Open Kernel Labs (Australia)
Dingo: Taming Device Drivers
SLIDE 2 The problem with drivers
1 Ganapathi et al. Windows XP kernel crash analysis, 2006 2 Chou et al. An Empirical study of operating system errors, 2001
- 70% of OS crashes are caused by device drivers
- Drivers contain 1.5x-7x bugs per loc compared to
the rest of the kernel
SLIDE 3 Previous approaches
Dealing with faulty drivers Runtime isolation
Mach, L4, Nooks, MINIX, XFI, SafeDrive, etc.
Static analysis
SLAM, MC, Singularity, etc.
ransparent recovery is hard
- Detects a limited subset
- f bugs
SLIDE 4 The Dingo approach
Localise complexity in driver development
- Many driver bugs are provoked by
the complexity of the OS interface Reduce bugs by improving the design of this interface
Can we develop drivers that contain fewer bugs in the first place?
SLIDE 5
Dingo drivers Native Linux driver
Dingo for Linux
Dingo drivers Native Linux drivers Dingo runtime
SLIDE 6
A study of driver bugs
SLIDE 7
A study of Linux driver bugs
Driver #bugs USB RTL8150 USB-to-Ethernet adapter 827 16 EL1210a USB-to-Ethernet adapter 710 2 925 15 Generic USB network driver 1028 45 USB hub 2234 67 USB-to-serial converter 989 50 USB mass storage 803 23 IEEE1394 Ethernet controller 1413 22 SBP-2 transport protocol 1713 46 PCI 11718 123 BNX2 Ethernet adapter 5412 51 i810 frame buffer 2920 16 CMI8338 audio 2660 22 498 #loc KL5kusb101 USB-to-Ethernet apapter Firewire Mellanox InfiniHost InfiniBand adapter
SLIDE 8
A study of Linux driver bugs
Driver
OS protocol device protocol
SLIDE 9 A study of Linux driver bugs
Driver
OS protocol device protocol
Issuing a command to uninitialised device
Writing an invalid register value
Incorrectly managing DMA descriptors Device protocol violation examples:
SLIDE 10
Device protocol violations
Device protocol violations
38%
SLIDE 11
OS protocol violations
Driver
OS protocol device protocol
`
Mellanox Infinihost controller driver RESET READY
if(cur_state==IB_RESET && new_state==IB_RESET){ return 0; }
SLIDE 12
OS protocol violations
Device protocol violations
38% 38% 20%
OS protocol violations
SLIDE 13 Concurrency errors
5 10 15 20 25 30 35
Race in config functions: Race in hot unplug handler: Deadlock in an atomic context: Race in the data path: Race in PM functions: Uninitialised lock: Imbalanced locks: Other:
SLIDE 14 Concurrency errors
5 10 15 20 25 30 35
Race in config functions: Race in hot unplug handler: Deadlock in an atomic context: Race in the data path: Race in PM functions: Uninitialised lock: Imbalanced locks: Other:
SLIDE 15 Concurrency errors
5 10 15 20 25 30 35
Race in config functions: Race in hot unplug handler: Deadlock in an atomic context: Race in the data path: Race in PM functions: Uninitialised lock: Imbalanced locks: Other:
SLIDE 16
Concurrency errors
Device protocol violations OS protocol violations
38% 38% 20% 38% 20% 19%
Concurrency errors
SLIDE 17
Generic errors
38% 38% 20% 38% 20% 19% 38% 20% 19% 23%
Device protocol violations OS protocol violations Concurrency errors Generic errors
SLIDE 18
Dealing with concurrency bugs
SLIDE 19
Dealing with concurrency bugs
driver
Threads
request1 request2 irq
SLIDE 20
Dealing with concurrency bugs
driver
Threads
request1 request2 irq
evt3
request1 request2 driver
evt2 evt1
Dingo
Events
irq
SLIDE 21
Writing non-blocking drivers
int probe () { ... write_config_reg (); msleep(20); read_status_reg (); ... }
Linux
void probe () { ... write_config_reg (); timeout(20, probe2); } void probe2 () { read_status_reg (); ... }
Dingo
SLIDE 22
Writing non-blocking drivers
int probe () { ... write_config_reg (); msleep(20); read_status_reg (); ... }
Linux
void probe () { simple_evt notif; ... write_config_reg (); CALL (timeout(20), notif); read_status_reg (); ... }
Dingo
SLIDE 23
Performance of the AX88772 USB-to-Ethernet adapter driver
1 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 10 20 30 40 50 Linux Dingo CPU Utilisation (%) Number of Connections 1 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 200 400 600 800 Round-Trip (μsec)
Evaluation platform: 4 x 2GHz Itanium II (SMT, 2 threads per core)
SLIDE 24 Impact of serialisation on performance
Special case: drivers for very-high-performance devices Solution: Re-introduce multithreading at the data path
- Examples: 10Gb Ethernet, Infiniband
- For such drivers, serialisation affects performance on
multiprocessors
- Avoid concurrency bugs at the control path, while
maintaining high performance at the data path
SLIDE 25
Performance of the Mellanox InfiniBand adapter driver
CPU Utilisation (%) Number of Connections 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 1000 2000 3000 4000 5000 Throughput (Mb/s) 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 10 20 30 40 50
Linux Dingo (serialised) Dingo (multithreaded)
SLIDE 26
Dealing with OS protocol violations
SLIDE 27
Modeling driver protocols with state machines
init start running stop unplugged ?start !startComplete ?stop !stopComplete ?unplugged ?unplugged ?unplugged !stopComplete
? - incoming call from the OS ! - outgoing call to the OS
SLIDE 28
Ethernet controller protocol fragment
disabled ?enable enable enabled disable !disableComplete !enableComplete ?disable txq_stalled txq_running !txStopQueue !txStartQueue ?transmit rx ?receive ?suspend
...
SLIDE 29 Other features of the language
Other features of the specification language:
- Timeouts
- Protocol variables
- Dynamic protocol spawning
- etc.
SLIDE 30
Ethernet controller protocol fragment
disabled ?enable enable enabled disable !disableComplete !enableComplete ?disable txq_stalled txq_running !txStopQueue !txStartQueue ?transmit rx ?receive ?suspend
...
SLIDE 31
Runtime failure detection
Driver
OS protocol
SLIDE 32 Runtime failure detection
EthernetController protocol SM
Driver
OS protocol
SLIDE 33
Evaluation
SLIDE 34 Evaluation
How effective is Dingo in reducing driver bugs?
- Evaluation methodology: artificially injected 61 bugs
found in similar Linux drivers into Dingo drivers
SLIDE 35 Evaluation
How effective is Dingo in reducing driver bugs?
- Evaluation methodology: artificially injected 61 bugs
found in similar Linux drivers into Dingo drivers
59% 21% 20%
Bugs eliminated by design Reduced likelihood Unchanged likelihood
SLIDE 36 Summary
- 40% of driver bugs are caused by the complexity
- f the OS interface
- Dingo reduces bugs through an improved design
- f this interface
- These improvements are implemented in an
existing operating system without sacrificing the performance