Coccinelle: 10 Years of Automated Evolution in the Linux Kernel - - PowerPoint PPT Presentation

coccinelle 10 years of automated evolution in the linux
SMART_READER_LITE
LIVE PREVIEW

Coccinelle: 10 Years of Automated Evolution in the Linux Kernel - - PowerPoint PPT Presentation

Coccinelle: 10 Years of Automated Evolution in the Linux Kernel Julia Lawall (Inria-Whisper team, Julia.Lawall@inria.fr) March 2, 2020 1 Our focus: The Linux kernel Open source OS kernel, developed by Linus Torvalds First released in


slide-1
SLIDE 1

Coccinelle: 10 Years of Automated Evolution in the Linux Kernel

Julia Lawall (Inria-Whisper team, Julia.Lawall@inria.fr) March 2, 2020

1

slide-2
SLIDE 2

Our focus: The Linux kernel

  • Open source OS kernel, developed by Linus

Torvalds

  • First released in 1991
  • Version 1.0.0 released in 1994
  • Today used in the top 500 supercomputers,

billions of smartphones (Android), battleships, stock exchanges, …

2

slide-3
SLIDE 3

Some history

First release in 1991.

  • v1.0 in 1994: 121 KLOC, v2.0 in 1996: 500 KLOC

Recent evolution:

5 10 15 20 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Million LOC 500 1,000 1,500 2,000 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 contributors 1 10 50 100 500 1000 5000 10000 101 102 103 104 contributions contributors

3

slide-4
SLIDE 4

Key challenge

As software grows, how to ensure its continued maintenance?

  • Updating interfaces is easy.

Make functions and data structures:

– More effjcient – Easier to use correctly – Better adapted to their usage context

  • Updating the uses of interfaces gets harder as the software grows.

– More time consuming – More error prone – Need to communicate new coding strategies to all developers

Developers may hesitate to make needed changes.

4

slide-5
SLIDE 5

Key challenge

As software grows, how to ensure its continued maintenance?

  • Updating interfaces is easy.

Make functions and data structures:

– More effjcient – Easier to use correctly – Better adapted to their usage context

  • Updating the uses of interfaces gets harder as the software grows.

– More time consuming – More error prone – Need to communicate new coding strategies to all developers

Developers may hesitate to make needed changes.

4

slide-6
SLIDE 6

Key challenge

As software grows, how to ensure its continued maintenance?

  • Updating interfaces is easy.

Make functions and data structures:

– More effjcient – Easier to use correctly – Better adapted to their usage context

  • Updating the uses of interfaces gets harder as the software grows.

– More time consuming – More error prone – Need to communicate new coding strategies to all developers

Developers may hesitate to make needed changes.

4

slide-7
SLIDE 7

Example change: init_timer → setup_timer

Initializing a timer requires:

  • The callback function to run when the timer expires
  • The data that should be passed to that callback function

Original initialization strategy (present in Linux v1.2.0, 1995):

5

slide-8
SLIDE 8

Example change: init_timer → setup_timer

Initializing a timer requires:

  • The callback function to run when the timer expires
  • The data that should be passed to that callback function

Original initialization strategy (present in Linux v1.2.0):

init_timer(&ns_timer); ns_timer.data = 0UL; ns_timer.function = ns_poll; 6

slide-9
SLIDE 9

Example change: init_timer → setup_timer

Replacement initialization strategy (introduced in Linux v2.6.15, Jan. 2006):

setup_timer(&ns_timer , ns_poll , 0UL);

Advantages:

  • More concise
  • More uniform
  • More secure

7

slide-10
SLIDE 10

Example change: init_timer → setup_timer

200 400 600 v2.6.15 Jan 2006 v3.0 Jul 2011 v4.0 Apr 2015 v4.14 Nov 2017 Call sites init_timer setup_timer

8

slide-11
SLIDE 11

Example bug: missing of_node_puts

Device node structures are reference counted:

  • of_node_get to access the structure.
  • of_node_put to let go of the structure.

Iterators, e.g., for_each_child_of_node, put one value and get another.

  • Explicit put needed on break, return, goto out of the loop.
  • Often forgotten.

9

slide-12
SLIDE 12

Example bug: missing of_node_puts

50 100 150 200 250 v2.6.17 Jun 2006 v3.0 Jul 2011 v4.0 Apr 2015 v5.5 Jan 2020 Jump sites missing present

10

slide-13
SLIDE 13

Assessment

  • Changes may involve scattered code fragments and data and control fmow

relationships between them.

– Grep insuffjcient to fjnd the problem.

  • Changes may be widely scattered across the code base.

– Tedious and time-consuming to fjnd all occurrences.

  • Changes may come in many variants.

– Hard to anticipate; some variants may be overlooked.

  • Developers are unaware of changes that afgect their code.

– New code can be introduced using the old coding strategy.

11

slide-14
SLIDE 14

Assessment

  • Changes may involve scattered code fragments and data and control fmow

relationships between them.

– Grep insuffjcient to fjnd the problem.

  • Changes may be widely scattered across the code base.

– Tedious and time-consuming to fjnd all occurrences.

  • Changes may come in many variants.

– Hard to anticipate; some variants may be overlooked.

  • Developers are unaware of changes that afgect their code.

– New code can be introduced using the old coding strategy.

11

slide-15
SLIDE 15

Assessment

  • Changes may involve scattered code fragments and data and control fmow

relationships between them.

– Grep insuffjcient to fjnd the problem.

  • Changes may be widely scattered across the code base.

– Tedious and time-consuming to fjnd all occurrences.

  • Changes may come in many variants.

– Hard to anticipate; some variants may be overlooked.

  • Developers are unaware of changes that afgect their code.

– New code can be introduced using the old coding strategy.

11

slide-16
SLIDE 16

Assessment

  • Changes may involve scattered code fragments and data and control fmow

relationships between them.

– Grep insuffjcient to fjnd the problem.

  • Changes may be widely scattered across the code base.

– Tedious and time-consuming to fjnd all occurrences.

  • Changes may come in many variants.

– Hard to anticipate; some variants may be overlooked.

  • Developers are unaware of changes that afgect their code.

– New code can be introduced using the old coding strategy.

11

slide-17
SLIDE 17

Coccinelle to the rescue!

12

slide-18
SLIDE 18

What is Coccinelle?

  • Pattern-based tool for matching and transforming C code
  • Under development since 2005. Open source since 2008.
  • Allows code changes to be expressed using patch-like code patterns

(semantic patches).

  • Goal: Automate large-scale changes in a way that fjts with the habits of the

Linux kernel developer.

13

slide-19
SLIDE 19

Starting point: a patch

  • -- a/drivers/atm/nicstar.c

+++ b/drivers/atm/nicstar.c @@ -287,4 +287,2 @@

  • init_timer(&ns_timer);

+ setup_timer(&ns_timer, ns_poll, 0UL); ns_timer.expires = jiffies + NS_POLL_PERIOD;

  • ns_timer.data = 0UL;
  • ns_timer.function = ns_poll;

14

slide-20
SLIDE 20

Semantic patches

  • Like patches, but independent of irrelevant details

(line numbers, spacing, variable names, etc.)

  • Derived from code, with abstraction.

15

slide-21
SLIDE 21

Example: Creating an init_timer → setup_timer semantic patch

A patch: derived from drivers/atm/nicstar.c

  • init_timer(&ns_timer);

+ setup_timer(&ns_timer, ns_poll, 0UL); ns_timer.expires = jiffies + NS_POLL_PERIOD;

  • ns_timer.data = 0UL;
  • ns_timer.function = ns_poll;

16

slide-22
SLIDE 22

Example: Creating an init_timer → setup_timer semantic patch

Remove irrelevant code:

  • init_timer(&ns_timer);

+ setup_timer(&ns_timer, ns_poll, 0UL); ...

  • ns_timer.data = 0UL;
  • ns_timer.function = ns_poll;

17

slide-23
SLIDE 23

Example: Creating an init_timer → setup_timer semantic patch

Abstract over subterms:

@@ expression timer, fn_arg, data_arg; @@

  • init_timer(&timer);

+ setup_timer(&timer, fn_arg, data_arg); ...

  • timer.data = data_arg;
  • timer.function = fn_arg;

18

slide-24
SLIDE 24

Example: Creating an init_timer → setup_timer semantic patch

Generalize a little more:

@@ expression timer, fn_arg, data_arg; @@

  • init_timer(&timer);

+ setup_timer(&timer, fn_arg, data_arg); ...

  • timer.data = data_arg;

...

  • timer.function = fn_arg;

19

slide-25
SLIDE 25

Results

Dataset: 598 Linux kernel init_timer fjles from difgerent versions.

  • 828 calls.
  • Our semantic patch updates 308 of them.

Untreated example: drivers/tty/n_gsm.c:

20

slide-26
SLIDE 26

Results

Dataset: 598 Linux kernel init_timer fjles from difgerent versions.

  • 828 calls.
  • Our semantic patch updates 308 of them.

Untreated example: drivers/tty/n_gsm.c:

init_timer(&dlci->t1); dlci->t1.function = gsm_dlci_t1; dlci->t1.data = (unsigned long)dlci; 21

slide-27
SLIDE 27

Example: Creating an init_timer → setup_timer semantic patch

Extended semantic patch:

@@ expression timer, fn_arg, data_arg; @@

  • init_timer(&timer);

+ setup_timer(&timer, fn_arg, data_arg); ...

  • timer.data = data_arg;

...

  • timer.function = fn_arg;

Covers 656/828 calls.

22

slide-28
SLIDE 28

Example: Creating an init_timer → setup_timer semantic patch

Extended semantic patch:

@@ expression timer, fn_arg, data_arg; @@

  • init_timer(&timer);

+ setup_timer(&timer, fn_arg, data_arg); ...

  • timer.data = data_arg;

...

  • timer.function = fn_arg;

@@ expression timer, fn_arg, data_arg; @@

  • init_timer(&timer);

+ setup_timer(&timer, fn_arg, data_arg); ...

  • timer.function = fn_arg;

...

  • timer.data = data_arg;

Covers 656/828 calls.

23

slide-29
SLIDE 29

Example: Creating an init_timer → setup_timer semantic patch

Remaining issues

  • Some code initializes the function and data before calling init_timer.
  • Some timers have no data initialization, default to 0.
  • Coccinelle sometimes times out.

Complete semantic patch

  • 6 rules, 68 lines of code.
  • Covers 808/828 calls.
  • TODO: Some timers have no local function or data initialization.

24

slide-30
SLIDE 30

Semantic patch example

@@ expression root,e; local idexpression child; iterator name for_each_child_of_node; @@ for_each_child_of_node(root, child) { ... when != of_node_put(child) when != e = child +

  • f_node_put(child);

? break; ... } ... when != child

Used in the big v5.4 cleanup.

25

slide-31
SLIDE 31

Assessment

  • Changes may involve scattered code fragments and data and control fmow

relationships between them.

– ... connects related fragments over control-fmow paths.

  • Changes may be widely scattered across the code base.

– Coccinelle fjnds an updates all relevant code automatically.

  • Changes may come in many variants.

– Semantic patches are easily adapted to new variants.

  • Developers are unaware of changes that afgect their code.

– Semantic patches in commit logs document changes. – Semantic patches can be collected in a library and checked during continuous integration.

26

slide-32
SLIDE 32

Assessment

  • Changes may involve scattered code fragments and data and control fmow

relationships between them.

– ... connects related fragments over control-fmow paths.

  • Changes may be widely scattered across the code base.

– Coccinelle fjnds an updates all relevant code automatically.

  • Changes may come in many variants.

– Semantic patches are easily adapted to new variants.

  • Developers are unaware of changes that afgect their code.

– Semantic patches in commit logs document changes. – Semantic patches can be collected in a library and checked during continuous integration.

26

slide-33
SLIDE 33

Assessment

  • Changes may involve scattered code fragments and data and control fmow

relationships between them.

– ... connects related fragments over control-fmow paths.

  • Changes may be widely scattered across the code base.

– Coccinelle fjnds an updates all relevant code automatically.

  • Changes may come in many variants.

– Semantic patches are easily adapted to new variants.

  • Developers are unaware of changes that afgect their code.

– Semantic patches in commit logs document changes. – Semantic patches can be collected in a library and checked during continuous integration.

26

slide-34
SLIDE 34

Assessment

  • Changes may involve scattered code fragments and data and control fmow

relationships between them.

– ... connects related fragments over control-fmow paths.

  • Changes may be widely scattered across the code base.

– Coccinelle fjnds an updates all relevant code automatically.

  • Changes may come in many variants.

– Semantic patches are easily adapted to new variants.

  • Developers are unaware of changes that afgect their code.

– Semantic patches in commit logs document changes. – Semantic patches can be collected in a library and checked during continuous integration.

26

slide-35
SLIDE 35

Impact: Patches in the Linux kernel

Over 7700 Linux kernel commits up to Linux v5.5 (Jan 2020). 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 200 400 number

Coccinelle developers Outreachy interns Kernel maintainers Dedicated user Others

27

slide-36
SLIDE 36

Impact: Cleanup vs. bug fjx changes among maintainer patches using Coccinelle

2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 100 200 300 number

Cleanups Bug fjxes

28

slide-37
SLIDE 37

Impact: Maintainer use examples

  • TTY. Remove an unused function argument.
  • 11 afgected fjles.
  • DRM. Eliminate a redundant fjeld in a data structure.
  • 54 afgected fjles.
  • Interrupts. Prepare to remove the irq argument from interrupt handlers, and then

remove that argument.

  • 188 afgected fjles.

29

slide-38
SLIDE 38

Impact: 0-day reports mentioning Coccinelle per year

2013 2014 2015 2016 2017 200 400 # with patches api free iterators locks null tests misc 2013 2014 2015 2016 2017 100 200 # with message only

30

slide-39
SLIDE 39

Conclusion

  • Coccinelle: brings automatic matching and transformation to the systems software

developer.

– Enables needed evolution, independent of the amount of afgected code.

  • Success: Almost 8000 commits in the Linux kernel based on Coccinelle.
  • Future work: Automatic generation of semantic patches from examples.
  • Beyond C: Some support for C++, a variant for Java (Coccinelle4J)
  • Probably, everyone in this room uses some Coccinelle modifjed code!

http://coccinelle.lip6.fr/ https://github.com/coccinelle/coccinelle https://github.com/kanghj/coccinelle/tree/java

31

slide-40
SLIDE 40

Conclusion

  • Coccinelle: brings automatic matching and transformation to the systems software

developer.

– Enables needed evolution, independent of the amount of afgected code.

  • Success: Almost 8000 commits in the Linux kernel based on Coccinelle.
  • Future work: Automatic generation of semantic patches from examples.
  • Beyond C: Some support for C++, a variant for Java (Coccinelle4J)
  • Probably, everyone in this room uses some Coccinelle modifjed code!

http://coccinelle.lip6.fr/ https://github.com/coccinelle/coccinelle https://github.com/kanghj/coccinelle/tree/java

31

slide-41
SLIDE 41

Conclusion

  • Coccinelle: brings automatic matching and transformation to the systems software

developer.

– Enables needed evolution, independent of the amount of afgected code.

  • Success: Almost 8000 commits in the Linux kernel based on Coccinelle.
  • Future work: Automatic generation of semantic patches from examples.
  • Beyond C: Some support for C++, a variant for Java (Coccinelle4J)
  • Probably, everyone in this room uses some Coccinelle modifjed code!

http://coccinelle.lip6.fr/ https://github.com/coccinelle/coccinelle https://github.com/kanghj/coccinelle/tree/java

31

slide-42
SLIDE 42

Conclusion

  • Coccinelle: brings automatic matching and transformation to the systems software

developer.

– Enables needed evolution, independent of the amount of afgected code.

  • Success: Almost 8000 commits in the Linux kernel based on Coccinelle.
  • Future work: Automatic generation of semantic patches from examples.
  • Beyond C: Some support for C++, a variant for Java (Coccinelle4J)
  • Probably, everyone in this room uses some Coccinelle modifjed code!

http://coccinelle.lip6.fr/ https://github.com/coccinelle/coccinelle https://github.com/kanghj/coccinelle/tree/java

31

slide-43
SLIDE 43

Conclusion

  • Coccinelle: brings automatic matching and transformation to the systems software

developer.

– Enables needed evolution, independent of the amount of afgected code.

  • Success: Almost 8000 commits in the Linux kernel based on Coccinelle.
  • Future work: Automatic generation of semantic patches from examples.
  • Beyond C: Some support for C++, a variant for Java (Coccinelle4J)
  • Probably, everyone in this room uses some Coccinelle modifjed code!

http://coccinelle.lip6.fr/ https://github.com/coccinelle/coccinelle https://github.com/kanghj/coccinelle/tree/java

31

slide-44
SLIDE 44

Conclusion

  • Coccinelle: brings automatic matching and transformation to the systems software

developer.

– Enables needed evolution, independent of the amount of afgected code.

  • Success: Almost 8000 commits in the Linux kernel based on Coccinelle.
  • Future work: Automatic generation of semantic patches from examples.
  • Beyond C: Some support for C++, a variant for Java (Coccinelle4J)
  • Probably, everyone in this room uses some Coccinelle modifjed code!

http://coccinelle.lip6.fr/ https://github.com/coccinelle/coccinelle https://github.com/kanghj/coccinelle/tree/java

31