Compressing Strings of the Kernel Wolfram Sang Consultant - - PowerPoint PPT Presentation

compressing strings of the kernel
SMART_READER_LITE
LIVE PREVIEW

Compressing Strings of the Kernel Wolfram Sang Consultant - - PowerPoint PPT Presentation

. . Compressing Strings of the Kernel Wolfram Sang Consultant 21.8.2014, LinuxCon14 Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 1 / 36 The origin: CEWG project 1 kernel debug messages.


slide-1
SLIDE 1

. .

Compressing Strings of the Kernel

Wolfram Sang

Consultant

21.8.2014, LinuxCon14

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 1 / 36

slide-2
SLIDE 2

The origin: CEWG project1

.

From the proposal:

. . Attempts have been made in the past to compress printk messages to save kernel runtime footprint. There is an

  • ption to disable all printks, but many embedded

developers do not use it, even when they find the space savings attractive, because they still would like to see kernel debug messages. … Timothy Miller did some work on this in 2003 …

1http://elinux.org/Compressed_printk_messages

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 2 / 36

slide-3
SLIDE 3

Timothy’s approach2

. .

1

Compile kernel and keep .i-files . .

2

Filter them for printk strings . .

3

Compress those strings using tokenization . .

4

Create copies of the source files . .

5

There, replace strings with tokens .

6

Compile again

2http://lwn.net/Articles/28935/

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 3 / 36

slide-4
SLIDE 4

Further notes

no code was made public, only the description, a codebook and some results not even sure depacking at printk was ever made allyesconfig was used for the tests based on 2.4.20 and 2.5.68

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 4 / 36

slide-5
SLIDE 5

Author asks the golden question, too

.

Timothy Miller:3

. . ”So, I ask... is this a useful savings? Is there any chance anyone would bother to increase their compile time by a factor of 5 in order to shave off 4% or 100k bytes?”

3https://lkml.org/lkml/2003/6/6/207

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 5 / 36

slide-6
SLIDE 6

The Graph!

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 6 / 36

slide-7
SLIDE 7

The Graph!

1 2 3 4 5 before after compiletime size

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 6 / 36

slide-8
SLIDE 8

Three problems identified

. .

1

Extract printk format strings . .

2

Compress printk format strings . .

3

Replace printk format strings

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 7 / 36

slide-9
SLIDE 9

Extracting

.

Problem: Find all printk-strings

. . There are lots of functions/defines embedding printk/vprintk_emit They are nested in all ways you can think of Moving target, there will be more

<new_subsys>_dev_err, …

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 8 / 36

slide-10
SLIDE 10

Extracting: Options I

.

Scan the source files

. . needs to know all printk-emerging functions misses merging of literals handle all ways of string concatenation

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 9 / 36

slide-11
SLIDE 11

Extracting: Options II

.

printk strings to own section

. . scales a bit better (only base functions need to be converted) no knowledge where strings came from needs changes to core functions

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 10 / 36

slide-12
SLIDE 12

Extracting: own section

+#define __printk(fmt, args...) \ +do { \ + if (__builtin_constant_p(fmt)) { \ + static const __attribute__((section("__printk"))) \ + char __f[] = fmt; \ + printk(__f, ##args); \ + } else \ + printk(fmt, ##args); \ +} while (0)

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 11 / 36

slide-13
SLIDE 13

Extracting: own section II

#define pr_emerg(fmt, ...) \

  • printk(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)

+ __printk(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__) #define pr_alert(fmt, ...) \

  • printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)

+ __printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__) ...

BTW don’t redefine printk. Really, don’t!

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 12 / 36

slide-14
SLIDE 14

Extracting: own section III

Author: Wolfram Sang <wsa@the-dreams.de> WIP: move printk strings to a special section Only pr_*, dev_*, BUG, and WARN are supported. Kernel doesn't fully build yet. Signed-off-by: Wolfram Sang <wsa@the-dreams.de> drivers/base/core.c | 36 ++++------------------------- include/asm-generic/bug.h | 19 ++++++++++++---- include/asm-generic/vmlinux.lds.h | 5 ++++ include/linux/device.h | 48 ++++++++++++++++++++++++++------------- include/linux/printk.h | 27 +++++++++++++++------- 5 files changed, 75 insertions(+), 60 deletions(-)

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 13 / 36

slide-15
SLIDE 15

Upstreaming forecast

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 14 / 36

slide-16
SLIDE 16

Upstreaming forecast

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 14 / 36

slide-17
SLIDE 17

Compressing

.

Problem: Algorithm

. . lots of small strings should be instantly available

not somewhere in the middle of packed data

no significant overhead .

Conclusion

. . no sliding window algos (LZ and friends) no variable length encoding (Huffman and friends) no frequency based compression (stats3)

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 15 / 36

slide-18
SLIDE 18

Compressing

.

Problem: Algorithm

. . lots of small strings should be instantly available

not somewhere in the middle of packed data

no significant overhead .

Conclusion

. . no sliding window algos (LZ and friends) no variable length encoding (Huffman and friends) no frequency based compression (stats3)

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 15 / 36

slide-19
SLIDE 19

Compressing II

tokenization is actually a good option BytePairEncoding works, too both achieve ≈ 50% of compression still ≈ 4% of the kernel size gained .

Problem: UTF8

. . Both approaches need ’empty’ symbols which might collide with UTF8 encoding.

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 16 / 36

slide-20
SLIDE 20

Compressing II

tokenization is actually a good option BytePairEncoding works, too both achieve ≈ 50% of compression still ≈ 4% of the kernel size gained .

Problem: UTF8

. . Both approaches need ’empty’ symbols which might collide with UTF8 encoding.

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 16 / 36

slide-21
SLIDE 21

Upstreaming forecast

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 17 / 36

slide-22
SLIDE 22

Upstreaming forecast

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 17 / 36

slide-23
SLIDE 23

Compressing III

.

Problem: Codebook

. . allyesconfig is unrealistic for tiny systems smaller kernels, smaller pool for codes what about modules?

share codebook from kernel → tied to that build

  • wn codebook → overhead eats gain

.

Brainstorming

. . predefined codebook? could save second kernel compile, too

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 18 / 36

slide-24
SLIDE 24

Replacing

.

Scanning

. . run source files through filter before compiling .

Own section

. . Work on the section directly?

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 19 / 36

slide-25
SLIDE 25

Upstreaming forecast

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 20 / 36

slide-26
SLIDE 26

Upstreaming forecast

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 20 / 36

slide-27
SLIDE 27

Further issues

printk strings are only a subset devicetree uses a lot of strings!

which should be easier to tackle since they are accessed via of_* functions

address this problem at a higher level

all strings? all .rodata?

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 21 / 36

slide-28
SLIDE 28

Summary

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 22 / 36

slide-29
SLIDE 29

Summary

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 22 / 36

slide-30
SLIDE 30

This quote still makes sense

.

From: Managing Gigabytes4

. . ”We find ourselves in the midst of a practically important and intelectually fascinating convergence between the desire for more and better compression and the need to learn about what ’structure’ there is in data.”

4Witten/Moffat/Bell, 1st edition, p. 385

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 23 / 36

slide-31
SLIDE 31

Analyzing the data

  • bservations from 3.16-rc5:

x86-64 allyesconfig arm-cortexa8 customer kernel

maybe a bit biased for device drivers

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 24 / 36

slide-32
SLIDE 32

Print from central locations

.

Proposal

. . strings should be emitted from as centralized locations as possible bonus: consistent messages .

Examples

. . OOM error message removal devm_ioremap_resource()

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 25 / 36

slide-33
SLIDE 33

Print from central locations II

.

Possibilities

. . have basic functions not printing messages have a convenience function suitable for most cases printing error messages devm_get_*_optional are also good candidates Removing error strings for devm_clk_get saved 20K instantly lots of other candidates

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 26 / 36

slide-34
SLIDE 34

Prefixes

.

Observation

. . usually done by a literal prefixing the format string creates unique strings …which have redundancy in them use dev_* and friends if possible

$ strings ubi.o ... 3UBI error: %s: negative values 3UBI error: %s: bad alignment 3UBI error: %s: alignment is not multiple of min I/O ...

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 27 / 36

slide-35
SLIDE 35

Prefixes: Dead simple tinyfication

How pr_fmt gets applied:

#define pr_alert(fmt, ...) \ printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)

How to simplify it:

  • #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

+#define pr_fmt(fmt) "%s" fmt, KBUILD_MODNAME ": "

That saved 15% (or 250 byte) for sn9c20x. Applied to all 900 instances in the kernel, it saved about 30K (or 4‰). So, works only for some places.

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 28 / 36

slide-36
SLIDE 36

Prefixes: Dead simple tinyfication

How pr_fmt gets applied:

#define pr_alert(fmt, ...) \ printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)

How to simplify it:

  • #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

+#define pr_fmt(fmt) "%s" fmt, KBUILD_MODNAME ": "

That saved ≈ 15% (or 250 byte) for sn9c20x. Applied to all 900 instances in the kernel, it saved about 30K (or 4‰). So, works only for some places.

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 28 / 36

slide-37
SLIDE 37

Prefixes: More easy stuff

/* UBI error messages */

  • #define ubi_err(fmt, ...) pr_err("UBI error: %s: " fmt "\n",

\

  • __func__, ##__VA_ARGS__)

+#define ubi_err(fmt, ...) printk("%s%s: " fmt "\n", \ + KERN_ERR "UBI error: ", __func__, ##__VA_ARGS__)

That saved ≈ 15% (or 2.5K). UBIFS, JFFS2, SCSI layer seem also to be promising candidates.

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 29 / 36

slide-38
SLIDE 38

Copy’n’paste

switch (sd->sensor) { case SENSOR_OV9650:

  • v9650_init_sensor(gspca_dev);

if (gspca_dev->usb_err < 0) break; pr_info("OV9650 sensor detected\n"); break; case SENSOR_OV9655:

  • v9655_init_sensor(gspca_dev);

if (gspca_dev->usb_err < 0) break; pr_info("OV9655 sensor detected\n"); break; case SENSOR_SOI968: soi968_init_sensor(gspca_dev); if (gspca_dev->usb_err < 0) break; pr_info("SOI968 sensor detected\n"); break; /* ... 7 more ... */

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 30 / 36

slide-39
SLIDE 39

Copy’n’paste II

And in the init functions:

if (gspca_dev->usb_err < 0) pr_err("OV9650 sensor initialization failed\n"); ... if (gspca_dev->usb_err < 0) pr_err("OV9655 sensor initialization failed\n"); ... if (gspca_dev->usb_err < 0) pr_err("SOI968 sensor initialization failed\n"); ...

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 31 / 36

slide-40
SLIDE 40

Copy’n’paste: A little love <3

proper cleanups are most sustainable will not only remove strings but code as well somebody up to it? I donate the camera :)

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 32 / 36

slide-41
SLIDE 41

Layer 8

.

Be aware when adding strings

. . be precise with printk level

adds possibility to compile out based on level

be conservative with non debug strings

we need rules of thumb here…

try to be consistent with the strings you create strings cost (a little), so they should be worth it

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 33 / 36

slide-42
SLIDE 42

Conclusion I

.

Compressed printk-strings, only for corner-cases

. . the price for compressed printk-strings is still high lots of side effects and increase of build time looking at printk-strings is not enough

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 34 / 36

slide-43
SLIDE 43

Conclusion II

.

General improvements, benefit for all

. . there is quite some potential to simply reduce # of strings, especially throught centralization …which mostly make them more consistent, too should be the first goal before stepping further

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 35 / 36

slide-44
SLIDE 44

The End

. . Thank you for your attention! .

Questions? Comments?

. . right now anytime at this conference wsa@the-dreams.de

Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 36 / 36