Th The E e Eph phemer emeral al Smo mokin ing Gu g Gun - - PowerPoint PPT Presentation

th the e e eph phemer emeral al smo mokin ing gu g gun
SMART_READER_LITE
LIVE PREVIEW

Th The E e Eph phemer emeral al Smo mokin ing Gu g Gun - - PowerPoint PPT Presentation

Th The E e Eph phemer emeral al Smo mokin ing Gu g Gun Using ng ftra trace e and nd kgdb to to res resolve a pthre thread dea eadlock Bra rad Mouri uring LabVIEW Rea eal- Ti Time Nati tional In Instrum trument


slide-1
SLIDE 1

Th The E e Eph phemer emeral al Smo mokin ing Gu g Gun

Using ng ftra trace e and nd kgdb to to res resolve a pthre thread “dea eadlock” Bra rad Mouri uring LabVIEW Rea eal- Ti Time Nati tional In Instrum trument ents

slide-2
SLIDE 2

2

The Intro – Who am I

  • Work at National Instruments in the RTOS

R&D group

  • Multiple product lines use RTOS
  • NI Switched to Linux 2-3 yrs ago
  • Single-mode RTOS ⇒ Dual-mode RTOS
  • Functionality and support that comes
  • Mindset within company about FOSS

– Work with maintainers, minimize out-of-tree

The Ephemeral Smoking Gun Brad Mouring

slide-3
SLIDE 3

3

The Setup – Crashing Application

  • User-mode application crashed after a

few hours of running

  • The clincher: new issue from existing

code

  • The same application ran continually without

issue on older, singlemode RTOS

The Ephemeral Smoking Gun Brad Mouring

slide-4
SLIDE 4

4

Initial Investigation

  • Confjgured to provide a core fjle on crash
  • Checking the core fjle fjngered a SIGABRT
  • Normally used for assert() and critical errors
  • Coming from glibc,

__pthread_mutex_lock_full()

  • T
  • enable: ulimit -c ${blocks}
  • May need to edit /etc/security/limits.conf
  • Can set in the /etc/profjle(.d/*)

The Ephemeral Smoking Gun Brad Mouring

slide-5
SLIDE 5

5

Digging In Further

  • Reproduced the issue checking stderr
  • “pthread_mutex_lock.c:309: Assertion `...'

failed.”

  • Points me to a fjle and line number
  • Assertion is checking the return from a futex

syscall

– Checking for a reported deadlock on certain lock

types

The Ephemeral Smoking Gun Brad Mouring

slide-6
SLIDE 6

6

Background: Pthread_mutexes

  • Mutexes used to protect a few difgerent

application execution system state structures

  • The application uses pthread_mutex_t's

confjgured to be priority-inheriting

The Ephemeral Smoking Gun Brad Mouring

slide-7
SLIDE 7

7

Background: Priority Inversion

T ask A (prio 10) T ask B (prio 11) T ask C (prio 90) A is running on the processor

The Ephemeral Smoking Gun Brad Mouring

slide-8
SLIDE 8

8

Background: Priority Inversion

T ask A (prio 10) T ask B (prio 11) T ask C (prio 90) A is running on the processor A takes mutex M

The Ephemeral Smoking Gun Brad Mouring

slide-9
SLIDE 9

9

Background: Priority Inversion

T ask A (prio 10) T ask B (prio 11) T ask C (prio 90) A is running on the processor A takes mutex M B becomes runnable, is scheduled in

The Ephemeral Smoking Gun Brad Mouring

slide-10
SLIDE 10

10

Background: Priority Inversion

T ask A (prio 10) T ask B (prio 11) T ask C (prio 90) A is running on the processor A takes mutex M B becomes runnable, is scheduled in C becomes runnable, is scheduled in

The Ephemeral Smoking Gun Brad Mouring

slide-11
SLIDE 11

11

Background: Priority Inversion

T ask A (prio 10) T ask B (prio 11) T ask C (prio 90) A is running on the processor A takes mutex M B becomes runnable, is scheduled in C becomes runnable, is scheduled in C blocks on mutex M

The Ephemeral Smoking Gun Brad Mouring

slide-12
SLIDE 12

12

Background: Priority Inversion

T ask A (prio 10) T ask B (prio 11) T ask C (prio 90) A is running on the processor A takes mutex M B becomes runnable, is scheduled in C becomes runnable, is scheduled in C blocks on mutex M B is scheduled in (prio 11), blocking C (prio 90) from running!

The Ephemeral Smoking Gun Brad Mouring

slide-13
SLIDE 13

13

Background: Priority Inheritance A Solution

T ask A (prio 10) T ask B (prio 11) T ask C (prio 90) A is running on the processor A takes mutex M B becomes runnable, is scheduled in C becomes runnable, is scheduled in C blocks on mutex M

The Ephemeral Smoking Gun Brad Mouring

slide-14
SLIDE 14

14

Background: Priority Inheritance A Solution

T ask A (prio 90) T ask B (prio 11) T ask C (prio 90) A is running on the processor A takes mutex M B becomes runnable, is scheduled in C becomes runnable, is scheduled in C blocks on mutex M A receives C's priority, fjnishes with mutex M, releases M

The Ephemeral Smoking Gun Brad Mouring

slide-15
SLIDE 15

15

Background: Priority Inheritance A Solution

T ask A (prio 10) T ask B (prio 11) T ask C (prio 90) A is running on the processor A takes mutex M B becomes runnable, is scheduled in C becomes runnable, is scheduled in C blocks on mutex M A receives C's priority, fjnishes with mutex M, releases M A receives its previous priority, C is scheduled in

The Ephemeral Smoking Gun Brad Mouring

slide-16
SLIDE 16

16

Background: Pthread_mutexes and Futexes

  • pthread_mutex use futexes when contended
  • Uncontested lock stays in userspace (cmpxchg)
  • Uses the kernel sys_futex call if contested
  • Creates a queue of tasks to wake when the holder

releases the lock (FUTEX_LOCK_PI)

  • Sits atop rtmutex code within the kernel
  • On release, previous holder notes that there are

waiters, wakes one or more (FUTEX_UNLOCK_PI)

  • The underlying rt_mutex subsystem provides some

nice features (deadlock detection, e.g.)

The Ephemeral Smoking Gun Brad Mouring

slide-17
SLIDE 17

17

Background: RT Mutexes

  • RT Mutexes were designed the linux-rt tree
  • Used to silently replace normal spinlocks
  • Sold to mainline as a solution for prio

inversion through futexes

  • Prio inheritance is attained through

tracking:

  • The tasks blocked on a mutex (sorted by prio)
  • The task that owns a mutex

The Ephemeral Smoking Gun Brad Mouring

slide-18
SLIDE 18

18

Background: RT Mutexes Visually

  • These relationships allow for prio

inheritance

task task task mutex task mutex

Blocked-on Owns

The Ephemeral Smoking Gun Brad Mouring

slide-19
SLIDE 19

19

Background: RT Mutexes Visually

  • These relationships allow for prio

inheritance

  • Also handy for checking for deadlocks

task task mutex mutex

Blocked-on Owns

The Ephemeral Smoking Gun Brad Mouring

task

slide-20
SLIDE 20

20

How to debug, and where?

  • EDEADLK returned in a few locations,

including a few in futex/mutex/rtmutex code

  • Place a kgdb_breakpoint at these sites
  • Build a kernel with kgdb enabled
  • Reproduce the issue
  • Troubleshoot from there

The Ephemeral Smoking Gun Brad Mouring

slide-21
SLIDE 21

21

Background: How to enable KGDB

  • Confjgure the kernel
  • CONFIG_DEBUG_INFO
  • CONFIG_KGDB
  • CONFIG_KGDB_method_to_connect

– e.g. CONFIG_KGDB_SERIAL_CONSOLE

  • CONFIG_KGDB_KDB (optional)

The Ephemeral Smoking Gun Brad Mouring

slide-22
SLIDE 22

22

Background: Connecting to a KGDB target

  • You have a few options
  • Serial port (null-modem connection)
  • Over Ethernet (kgdboe) with out-of-tree

source¹

  • Set module params on boot, on module

load, or thereafter through sysfs

  • Port and baud

¹http://sysprogs.com/VisualKernel/kgdboe/

The Ephemeral Smoking Gun Brad Mouring

slide-23
SLIDE 23

23

Tips for using kgdb/gdb

  • Find (or write) useful user-defjned cmds
  • Sequences you use frequently
  • Pop cmds and settings in your ~/.gdbinit
  • Graphical frontends are available
  • Excellent resources online
  • https://sourceware.org/gdb/current/onlinedocs/gdb/

The Ephemeral Smoking Gun Brad Mouring

slide-24
SLIDE 24

24

KGDB leads to a dead end

  • EDEADLK came from rtmutex priority chain

walking code (rt_mutex_adjust_prio_chain)

  • The priochain walking code seemed to think

that we had a loop in the chain

  • Walking the chain manually in gdb from the
  • riginal mutex, we reach a mutex who has no
  • wner
  • We were supposed to loop back around to the
  • riginal mutex, as that's the current state of

the pointers within the chain walking function

… and that's not necessarily a bad thing.

The Ephemeral Smoking Gun Brad Mouring

slide-25
SLIDE 25

25

State of the Priority Chain at EDEADLK

A B D C M2 M3 M1 Iterator used to walk priority chain

  • rig_lock

Blocked-on Owns

T ask about to get EDEADLK

The Ephemeral Smoking Gun Brad Mouring

slide-26
SLIDE 26

26

A Few Clues From the Scene of the Crime

  • Mutex M2 recently had an owner but

doesn't currently

  • There are two tasks (A, B) blocked on

mutex M1

  • The checks that occur while walking the

chain don't see anything odd and complain until a deadlock is detected

The Ephemeral Smoking Gun Brad Mouring

slide-27
SLIDE 27

27

Re-ftrace-ing my steps

  • A picture of what's going on leading up to

the detected deadlock may shed some light into what's going on

  • Ftrace and a set of tracers were already

enabled on our kernel

  • Insert some strategic trace_printk()s
  • Add SIGABRT handler to app to stop tracing
  • Reproduce the issue, use trace-cmd extract

The Ephemeral Smoking Gun Brad Mouring

slide-28
SLIDE 28

28

kernelshark comes into the picture

slide-29
SLIDE 29

29

kernelshark comes into the picture

  • Pulling the dump into kernelshark to take a

closer look, we notice a few interesting points

  • T

ask 'B' (received EDEADLK) scheduled out between attempting to take mutex and reporting EDEADLK

  • Quite a bit of mutex activity while B is out
  • We can begin to form a narrative on what is

happening leading up to the reported deadlock

The Ephemeral Smoking Gun Brad Mouring

slide-30
SLIDE 30

30

Re-ftrace-ing my steps

A B C M2 M1

  • rig_lock

task iter

B blocks on M1 M1 is held by C C is blocked on M2 M2 is held by A Blocked-on Owns

The Ephemeral Smoking Gun Brad Mouring

slide-31
SLIDE 31

31

Re-ftrace-ing my steps

A B C M2 M1

B blocks on M1 M1 is held by C C is blocked on M2 M2 is held by A B begins walking the prio chain

  • rig_lock

task iter

Blocked-on Owns

The Ephemeral Smoking Gun Brad Mouring

slide-32
SLIDE 32

32

Re-ftrace-ing my steps

A B C M2 M1

B blocks on M1 M1 is held by C C is blocked on M2 M2 is held by A B begins walking the prio chain...PREEMPT!

  • rig_lock

(B)

Blocked-on Owns

task iter (B)

The Ephemeral Smoking Gun Brad Mouring

slide-33
SLIDE 33

33

Re-ftrace-ing my steps

A B C M2 M1

A is scheduled in, releases M2

  • rig_lock

(B) task iter (B)

Blocked-on Owns

The Ephemeral Smoking Gun Brad Mouring

slide-34
SLIDE 34

34

Re-ftrace-ing my steps

B C M2 M1

A is scheduled in, releases M2 A takes (uncontended) M3 in userspace A blocks on M1

A

  • rig_lock

(B) task iter (B)

The Ephemeral Smoking Gun Brad Mouring

slide-35
SLIDE 35

35

Re-ftrace-ing my steps

D is scheduled in, blocks on M3 (creates rtmutex)

D M3 B C M2 M1 A

  • rig_lock

(B) task iter (B)

The Ephemeral Smoking Gun Brad Mouring

slide-36
SLIDE 36

36

Re-ftrace-ing my steps

B is scheduled back in, continues its walk of the prio chain

D M3 B C M2 M1 A task iter

  • rig_lock

The Ephemeral Smoking Gun Brad Mouring

slide-37
SLIDE 37

37

Re-ftrace-ing my steps

B is scheduled back in, continues its walk of the prio chain

D M3 B C M2 M1 A task iter

  • rig_lock

The Ephemeral Smoking Gun Brad Mouring

A blocks on the same mutex as B!

slide-38
SLIDE 38

38

Re-ftrace-ing my steps

B is scheduled back in, continues its walk of the prio chain

D M3 B C M2 M1 A task iter

  • rig_lock

The Ephemeral Smoking Gun Brad Mouring

A blocks on the same mutex as B!

(False) Deadlock detected!

slide-39
SLIDE 39

39

T akin' it to the Streets

  • Came to the linux-rt-users mailing list
  • Had fjndings writeup, preliminary patch
  • tglx saw the issue at hand, didn't like my

patch, proposed a difgerent fjx

  • Result: issue got fjxed, learned about

working with the mailing lists

  • Moral: Don't be afraid to engage the

community, but be ready for feedback

The Ephemeral Smoking Gun Brad Mouring

slide-40
SLIDE 40

40

Conclusions

  • There are some great tools (and online

documentation) to solve kernel issues

  • I've only covered two, there are many

more

  • Lockdep checking
  • RCU diagnostics
  • kdump kernel(s)
  • KDB
  • Vendor tools

The Ephemeral Smoking Gun Brad Mouring

slide-41
SLIDE 41

41

Questions? Comments? Thanks!

The Ephemeral Smoking Gun Brad Mouring