Swapping and embedded: compression relieves the pressure? Vitaly - PowerPoint PPT Presentation

Swapping and embedded: compression relieves the pressure? Vitaly Wool Embedded Linux Conference 2016

Intro> Swapping (Paging) ● Paging: [OS capability of] using a secondary storage to store and retrieve data – With RAM being primary – Storing and retrieving happens on a per-page basis ● Page – Uni-size storage block, usually of size 2 n – Corresponds to a single record in page table ● Paging is only possible with VM enabled

Intro> Swapping

Intro> Embedded device objectives ● [very] limited RAM ● [relatively] slow storage – Using swap will hurt performance ● [relatively] small storage – Hardly is there a place for big swap ● Flash chip used as a storage – Swap on flash wears it out fast

Intro> Swapping in Embedded ● Should be applicable – Constrained RAM ● But is isn't sometimes – Constrained storage ● May have adverse effects – Flash storage faster wear-out – Longer delays if the storage device is slow ● There has to be a way out...

Smarter swapping> Swapping optimization: zswap ● zswap: compressed write-back cache for swapped pages – Write operation completion signaled on write-to- cache completion ● Compresses swapped-out pages and moves them into a pool – This pool is dynamically allocated in RAM ● Configurable parameters – Pool size – Compression algorithm

Smarter swapping> zswap backend: zbud ● zbud: special purpose memory allocator – allocation is always per-page ● Stores up to 2 compressed pages per page – One bound to the beginning, one to the end – The in-page pages are called “buddies” ● Key characteristics – Simplicity and stability ● zbud is the allocator backend for zswap

Smarter swapping> RAM as a swap storage ● Compression required – No gain otherwise – But increases CPU load ● Implementation of a [virtual] block device required ● Careful memory management is required – Should not use high-order page allocations

Smarter swapping> ZRAM ● Block device for compressed data storage in RAM – Compression algorithm is configurable – Default algorithm is LZO – LZ4 is used mostly ● Usually deployed as a self-contained swap device – The size is specified in runtime (via sysfs) – Configuration is the same otherwise

Smarter swapping> ZRAM vs Flash swap ● Compared on Carambola (MIPS24kc) – Details on the configuration will follow ● Standard I/O measurement tools – 'fio' with 'tiobench' script ● Results – Average read speed: 730 vs 699 (kb/s) – Average write speed: 180.5 vs 172 (kb/s) ● Difference is larger where RAM is faster

Smarter swapping> zsmalloc: ZRAM backend ● Special purpose pool-based memory allocator ● Packs objects into a set of non- contiguous pages – ZRAM calls into zsmalloc to allocate space for compressed data – Compressed data is stored in scattered pages within the pool

z--- in detail> zsmalloc and zbud compared zsmalloc zbud Compression ratio High (3x – 4x) Medium/Low (1.8x – 2x) CPU utilization Medium/High Medium Internal yes no fragmentation Latencies Medium/Low Low

z--- in detail> zpool: a unified API ● Common API for compressed memory storage ● Any memory allocator can implement zpool API – And register in zpool ● 2 main zpool users – zbud – zsmalloc

z--- in detail> zswap uses zpool API! ● zswap is now backend-independent – As long as the backend implements zpool API ● zswap can use zsmalloc – Better compression ratio – Less disk/flash utilization

ZRAM moving forward> What if ZRAM used zbud? ● Persistent storage is not used anyway – Compression ratio may not be the key ● No performance degrade over time ● Less dependency on memory subsystem ● CPU utilization may get lower ● Throughput may get higher ● Latencies may get lower

ZRAM moving forward> Why can't ZRAM use zbud? ● zbud can't handle PAGE_SIZE allocations – Uses small part of the page for internal structure ● Called struct zbud_header – Easy to fix: it can go to struct page ● ZRAM doesn't use zpool API – zsmalloc API fits zpool API nicely – Easy to fix: just implement it

ZRAM moving forward> Allow ZRAM to use zbud ● An initiative taken by the author – Allow PAGE_SIZE allocations in ZBUD – Make ZRAM use zpool ● Two mainlining attempts ● https://lkml.org/lkml/2015/9/14/356 [1] ● https://lkml.org/lkml/2015/9/22/220 [2] – Faced strong opposition from ZRAM authors – Vendor neutrality questionable ● More attempts to come

Measurements> Prerequisites ● Use fio for performance measurement – Written by Jens Axboe – Flexible and versatile ● EXT4 file system on /dev/zram0 – 50% full ● A flavor of fio 'enospc' script – Adapted for smaller block device (zram) ● 40 iterations per z--- backend (zbud/zsmalloc)

Measurements> Test device 1 ● Sony Xperia Z2 – MSM8974 CPU ● 2.3 GHz Quad-Core Krait TM – 3 GB RAM ● Cyanogenmod build as of Jan 15, 2016 (12.1) – A flavor of Android 5.1.1 – Custom 3.10-based kernel

Measurements> ZRAM performance: Android 200000 180000 160000 140000 120000 100000 zsmalloc zbud 80000 60000 Outcome: zbud clearly outperforms 40000 20000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

Measurements> ZRAM latency: Android 80000 70000 Outcome: zbud outperforms again 60000 50000 zsmalloc zbud 40000 30000 20000 10000 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75

Measurements> ZRAM performance: Android Okay what happens in the long run, does zbud remain superior to zsmalloc?

Measurements> ZRAM performance: Android 200000 200000 180000 180000 160000 160000 140000 140000 120000 120000 zsmalloc zsmalloc zbud zbud 100000 100000 80000 80000 60000 60000 Outcome: yes it does. 40000 40000 20000 20000 0 0 2 2 4 4 6 6 8 8 10 10 12 12 14 14 16 16 18 18 20 20 22 22 24 24 26 26 28 28 30 30 32 32 34 34 36 36 38 38 40 40 42 42 44 44 46 46 48 48 50 50 52 52 54 54 56 56 58 58 60 60 62 62 64 64 66 66 68 68 70 70 72 72 74 74 76 76 78 78 80 80 82 82 84 84 86 86 88 88 90 90 92 92 94 94 96 96 1 1 3 3 5 5 7 7 9 9 11 11 13 13 15 15 17 17 19 19 21 21 23 23 25 25 27 27 29 29 31 31 33 33 35 35 37 37 39 39 41 41 43 43 45 45 47 47 49 49 51 51 53 53 55 55 57 57 59 59 61 61 63 63 65 65 67 67 69 69 71 71 73 73 75 75 77 77 79 79 81 81 83 83 85 85 87 87 89 89 91 91 93 93 95 95 97 97

Measurements> Test device 2 ● Intel Minnowboard Max EVB – 64bit Atom TM CPU E3815 @ 1.46GHz – DDR3 2 GB RAM – Storage 4 GB eMMC ● Debian 8.4 64 bit – Custom 4.3-based kernel

ZRAM performance: x86_64 500000 450000 400000 350000 300000 zsmalloc zbud 250000 200000 150000 Outcome: obvious. 100000 50000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Measurements> ZRAM latency: x86_64 20000 Outcome: zbud is better again. 18000 16000 zsmalloc zbud 14000 12000 10000 8000 6000 4000 2000 0

Measurements> Test device 3 ● Carambola 2 – MIPS32 24Ke – Qualcomm/Atheros AR9331 SoC – 400 MHz CPU – 64 MB DDR2 RAM – Storage 512 MB NAND flash ● OpenWRT – Git as of Jan 15, 2016 – Custom 4.3-based kernel

Measurements> ZRAM performance: MIPS32 30000 25000 20000 15000 10000 zsmalloc zram 5000 Outcome: roughly equal. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Measurements> ZRAM latency: MIPS32 50000 Outcome: more stability with zbud. 45000 40000 zsmalloc zbud 35000 30000 25000 20000 15000 10000 5000 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75

Wrap-Up ● Compressed RAM swap is a generous idea – Many systems can benefit from it ● Two implementations mainlined – Zswap: mostly targeting big systems – ZRAM: mostly for embedded / small systems ● Each has its own backend – Zswap uses zbud – ZRAM uses zsmalloc

Conclusions ● Compressed RAM swap is the way out for embedded systems ● ZRAM over zbud is a good match for non-compression-ratio-demanding cases – Lower latencies – Higher throughput – Minimal aging ● Having options is good

swapping completed. Questions? mailto: vitalywool@gmail.com

Swapping and embedded: compression relieves the pressure? Vitaly - PowerPoint PPT Presentation

Swapping and embedded: compression relieves the pressure? Vitaly Wool Embedded Linux Conference 2016 Intro> Swapping (Paging) Paging: [OS capability of] using a secondary storage to store and retrieve data With RAM being primary

Embedded PC The modular Industrial PC for mid-range control Embedded PC 1 Embedded OS

EMBEDDED EMBEDDED REAL TIME SYSTEMS REAL TIME SYSTEMS EMBEDDED EMBEDDED REAL TIME SYSTEMS

Platform Convergence Journey Windows Embedded Standard 7 Windows Embedded Standard 8 Converged

Operating System Principles: Memory Management Swapping, Paging, and Virtual Memory CS 111

Swapping ! Active processes use more physical memory than system has Address Binding Swap out

Embedded PC The modular Industrial PC for mid-range control Stefan Hoppe 14.09.2007 1 Embedded

Embedded systems and the role of programmable logic devices in embedded systems Embedded system :

Embedded implicatures Bart Geurts Embedded implicatures?!? (with Nausicaa Pouscoulous) In:

HW/SW Codesign w/ FPGAs Embedded Systems ECE 495/595 Overview (Slides from Embedded Systems

Embedded C for Zynq C r i s t i a n S i s t e r n a U n i v e r s i d a d N a c i o n a l

4TU MASTER EMBEDDED SYSTEMS Bert Molenkamp 19/03/2020 Master Embedded Systems 1 Table of

Embedded Embedded Architecture Architecture Systems Systems Jakob Engblom, PhD Jakob

EMBEDDED RUST ON THE BEAGLEBOARD X15 MEETING EMBEDDED Jonathan Pallant 14 November 2018

C++ for Embedded development C++ for Embedded development Thiago Macieira Thiago Macieira

Virtual Memory and Paging 6A. Introduction to Swapping and Paging 6B. Paging MMUs and Demand

Design and Architectures for Design and Architectures for Embedded Systems (ESII) Embedded

GARDE NI NG HAS NOT BE E N CANCE L E D. C a the rine Wissne r Unive rsity o f Wyo m

CloudABI: safe, testable and maintainable software for UNIX Speaker: Ed Schouten, ed@nuxi.nl

Supply of Quality and Safe Tropical Fruits Through Efficient Supply Chain TFNet Azizi Meor Ngah

B S D O p e r a t i n g S y s t e ms F r e d Mo r c o s F H L U G

PerfGuard: Binary-Centric Application Performance Monitoring in Production Environments + *

Interpreter for crme CAraMeL Lecture 5 Formal Languages and Compilers 2011 Nataliia

Updates in Modeling the Updates in Modeling the CIV Broad Line Region CIV Broad Line Region

rs ssts t